Milliseconds vs Microseconds

I recently ran into an issue that I didn’t see coming. I was creating records, and then querying for them from the database.

I knew that they were created very close together in time, but they were created in two separate steps, so I knew that the timestamps would be different.

Only when I queried for them and compared the timestamps in my tests, I was getting the same timestamp back on both records.

After some digging I realized that the records had different timestamps in the database…where things were being measured in microseconds, but Javascript only tracks down to the millisecond, so under certain circumstances I was seeing one record round up and the other record round down such that the two of them were showing the exact same timestamp — to the millisecond on my in-memory object in Javascript.

I never would have expected to be in a situation where I needed or cared about more precision than 1/1000th of a second, but there you have it.

Javascript Milliseconds vs Postgres Microseconds

I recently ran into something that I never would have even considered thinking about before now.

I was trying to query Postgres looking for all records with a specific result in a field that was a timestamp.

I got zero results, which was a real head scratcher given that I could see the records in the database.

As it turns out, by default, Postgres saves timestamps that track 6 digits to the right of the decimal. (Microseconds.) On the other hand, when you convert a Javascipt Date object to it’s string equivelent, you only get 3 digits after the decimal (Milliseconds).

So, when I asked passed in my query string it was something like YYYY-MM-DD HH:mm:ss.333. Postgres would then look at that and say ‘I have YYYY-MM-DD HH:mm:ss.333125′ but that is slightly after what you’re asking for, so I have no results to return to you’.

You can over-ride the default settings for a timestamp in Postgres to be only 3 digits past the decimal at the time you create the table/field by defining it as ‘timestamp(3)’

Queue Creation Options

I should be sleeping right now, but I really wanted to figure out what the best option is for creating a queue in Javascript.

It’s not super relevant since it can’t be used as a queue, but for reference purposes, I ran a loop for 100,000,000 (100 million) iterations, pushing values 0-99,999,999 into an array and then popping the values back out. On my system that took ~4.5 seconds.

Trying to place those same values into a linked list that had both a head and a tail attribute resulted in the process crashing after 32 seconds. Dropping the items being stored down by an order of magnitude to 10 million resulted in the process taking ~2 seconds.

I tried storing just the value 1 in each of the nodes, hoping that would stop the thread from crashing, but that didn’t make any kind of difference–we still crashed at around 32 seconds.

Also for comparison purposes, I pushed a million records (0-999,999) into and array and then used shift to pull them all back out. This is an order of magnitude less than the 10 million records that I put into my linked list and pulled back out in ~2 seconds. I let the algorithm run for more 200 seconds before canceling it.

That’s indicative that using push and shift is at least 3 orders of magnitude slower than a linked list at those kinds of sizes.

If you push and item and then immediately shift it back off of an array 100,000,000 times, the process takes less than 2 seconds, so the issue is with shift having to run through the existing items in the array.

If your queue is always going to be fairly small, then it’s entirely possible that push and shift is the way to go, however, if your queue is going to get very large, then it appears that a linked list would be a better option–as long as it doesn’t get so large that you run into other problems.

It might be worth trying out a map. A map can be iterated through, and it stores the order that things were added into it, but my best guess is that the fact that it is storing the order keys were inserted is going to work against it and we’ll have the same kind of issue that we have with shift and unshift on arrays.

Array.Shift speed

I got to thinking tonight that before blindly assuming that unshift in JavaScript was slow, but shift was fast, that I should double check that.

As it turns out, shift is roughly as slow as unshift on arrays, which means I need to re-think how I would implement a queue in JavasScript. Just using array.push() and array.shift() works, but is unnecessarily slow.

My initial thought is to use a linked list that has both a head and a tail pointer. I’ll need to test out the speed of that kind of construct against the speed of array.push() and array.shift(). It’s possible that there are some other factors that would slow a linked list down in comparison for use as a queue.

More on that once I’m done testing.

Unshift in Javascript

For a very long time now, I had been under the understanding that since Javascript arrays were just a kind of object, that array.unshift() wasn’t any more costly that array.push().

Today I finally did some testing and found out that I was very much wrong. Now you know and won’t experience the same performance issues that I ran into.

Merge Sort

As discussed in my last post, I’ve been spending time learning new algorithms, and reminding myself how to use ones that I haven’t used in years.

Here is the evolution of my Merge Sort algorithm:

I did some searching online looking for a high-level discussion of what a merge sort was, and it’s pros and cons. tldr: It trades the usage of more space for better performance than Quick Sort.

Then I found this example by Tim Han.

I scanned through that quickly, and then worked on other stuff for several days. When I came back to take my first shot at writing my merge sort (crazily enough, this isn’t one that I encountered during my year and a half of computer science classes many, many years ago), all I remembered was that you divide the array, and then you return a merge call of those array halves being recursively divided. Oh, and something about there being a single leftover data point that needed tacked onto something else at the end of the merge.

Here is my first compilable attempt:

//Mergesort. Works by splitting array into half until getting arrays of size 1
//Then merges the arrays back together, sorting as they are merged.

const mergeSort = (array) => {
    const length = array.length;

    if(length == 1) {
        return array;
    }

    const midpoint = Math.floor(length / 2);

    let left = array.slice(0,midpoint);
    let right = array.slice(midpoint);

    return merge(mergeSort(left), mergeSort(right));
}

const merge = (array1, array2) => {
    if(!array2.length) {
        return array1;
    }

    let mergedArray = [];

    while(array1.length && array2.length) {
        if(array1[0] <= array2[0]) {
            mergedArray.push(array1.shift());
        } else {
            mergedArray.push(array2.shift());
        }
    }

    if(!array1.length) {
        while(array2.length) {
            mergedArray.push(array2.shift());
        }
    }

    if(!array2.length) {
        while(array1.length) {
            mergedArray.push(array1.shift());
        }
    }

    return mergedArray;
}

const rawArray = [9,1,2,3,4,5,7,88,77,5,5,66,3,8,7,49,59,62,73,24,81,67,32,5,3,2];

let sorted = mergeSort(rawArray);
console.log(sorted);

As a side note, the reason that the version I wrote on paper didn't work was two-fold.
1. I just merged left and right inside of "mergeSort" rather than merging what was returned from calling mergeSort on each of those two array halves.
2. I used a Math.ceil rather than a Math.floor to generate the midpoint, which resulted in an array of length two not being properly split.

While writing my initial algorithm, the merge is the part that gave me pause, primarily, because I couldn't see how we could consistently guarantee that once we were done with the while loop that there would just be one item in one of the arrays. I could envision a scenario (for instance in an already-sorted array), where array1 gets emptied before any items are taken from array2

After a few minutes, I just went with my gut, and created something that would deal with emptying an unknown number of items out of whichever of the arrays still had items in it.

Once I'd confirmed that my code was compiling and properly sorting the array, I went back to Tim Han's algorithm and compared it to what I'd come up with.

The biggest thing that stuck out to me was the fact that he was using pointers rather than shifting items off of the two arrays passed into the merge function.

The more experienced of you are already nodding your heads, and don't need me to tell you that his approach is superior, but in my defense, I'm used to thinking in terms of JavaScript arrays, which are different than how most languages define an array.

In Java, an array is created with a pre-determined size and you can't grow or shrink that array after it is created. Instead, you have to create a new array of the desired size and copy the desired items from the first array into the new array.

The fact that Java arrays are all a single contiguous block of memory likely makes them faster than JavaScript arrays, which as I understand it are just objects that can be added to and removed from without creating a new object and destroying the old object.

Still, even running this in JavaScript, there isn't any reason that we have to shorten array1 and array2 while growing the mergedArray--unless maybe memory was becoming a constraint and we needed to free up the space being occupied by array1 and array2 simultaneously to growing the mergedArray.

Still, though, that seems unlikely. Generally, you'd just run things on a machine with more memory, or use a streaming type approach to the algorithm.

The next thing that I noticed was that I didn't need my if(!array2.length) statement. Ultimately, if array2 (or array1 for that matter) were empty, it's just going to skip the while loop and tack the other array onto the mergedArray.

Lastly, the use of the concat function rather than a while loop was nice and slick. It's another reminder that I need to spend some time just reading through JavaScript's built-in methods and arrays. I know a bunch of them by memory, but there are still tons of them that I don't use because I don't realize that they are out there.

Here's my updated version:

//Mergesort. Works by splitting array into half until getting arrays of size 1
//Then merges the arrays back together, sorting as they are merged.
const mergeSort = (array) => {
    const length = array.length;

    if(length == 1) {
        return array; //split down as far as we can go. Time to return so we can merge.
    }

    const midpoint = Math.floor(length / 2);

    let left = array.slice(0,midpoint);
    let right = array.slice(midpoint);

    return merge(mergeSort(left), mergeSort(right)); //split left and right again and the merge what comes back
}

const merge = (left, right) => { //merge the two arrays together into a sorted array
    let mergedArray = [];
    let leftIndex = 0;
    let rightIndex = 0;

    while(leftIndex < left.length && rightIndex < right.length) {
        //compare the two and put the smallest into mergedArray until we get to the end of one of them
        if(left[leftIndex] <= right[rightIndex]) {
            mergedArray.push(left[leftIndex]);
            leftIndex++;
        } else {
            mergedArray.push(right[rightIndex]);
            rightIndex++;
        }
    }

    //either left or right will still have values that can be tacked onto the end of
    //our sorted array
    let leftRemainder = left.slice(leftIndex);
    let rightRemainder = right.slice(rightIndex);

    return mergedArray.concat(leftRemainder).concat(rightRemainder);
}

const rawArray = [9,1,2,3,4,5,7,88,77,5,5,66,3,8,7,49,59,62,73,24,81,67,32,5,3,2];

let sorted = mergeSort(rawArray);
console.log(sorted);

Performance wise, my algorithm is the same now as what Tim did (which makes sense given that I read his algorithm several days before sitting down to do my own). I explicitely broke out the leftRemainder and rightRemainder because I think it's easier to follow what's going on that way.

I did confirm that while Tim's algorithm is great, he was wrong about there being just a single item left over after the while loop in the merge function. If you log leftRemainder and rightRemainder out, you'll see that they can be more than 1 item.

A big thanks to Tim for putting such a clear, well-written algorithm out there!

Count Triplets Problem

Every so often, I grab a HackerRank problem in an attempt to learn something new. I recently tried the Count Triplets challenge. As is normal for me with HackerRank problems, the biggest issue was figuring out what the author was after.

Here was my initial code:

function countTriplets(arr, r) {
    let numbers = new Map();

    for(let i=0; i<arr.length; i++) {
        addToHash(numbers, arr, i);  
    }

    let triplets = 0;
    for(let i=0; i<arr.length; i++) {
        let number = arr[i];
        let secondNumber = (number * r);
        let thirdNumber = (number * r * r);

        let secondNumberLocations = numbers.get(secondNumber);
        let thirdNumberLocations = numbers.get(thirdNumber);

        if(secondNumberLocations && thirdNumberLocations) {
            triplets += secondNumberLocations.length * thirdNumberLocations.length;
        }
    }

    return triplets;
}

function addToHash(numbers, arr, index) {
    let locations = numbers.get(arr[index])
        if(!locations) {
            numbers.set(arr[index], [index]);
        } else {
            locations.push(index);
            numbers.set(arr[index], locations);
        }
}

It did exactly what I wanted it to, and returned the number of unique trios of indices that when up by a multiple of ‘r’. Unfortunately, it only passed 8/13 test cases.

As it turned out, they only wanted a match when the numbers in the indices were in order from smallest to largest.

This was my next attempt, which worked, but was too slow to complete one of the tests in the allotted time:

function countTriplets(arr, r) {
    let numbers = new Map();

    let triplets = 0;

    for(let i=0; i<arr.length; i++) {
        const number = arr[i];
        let numberArray = numbers.get(number.toString()); //Number in Map already?

        //Need to get all of the next smallest in sequence already in the Map
        //Lets us count matches and update Map with correct info for number
        let numberFraction = number / r; /*Next smallest in sequence*/
        let numberFractionArray = numbers.get(numberFraction.toString());

        //Do the counting
        if(numberFractionArray) { //potential exists for triplet
            for(let j=0; j<numberFractionArray.length; j++) {
                triplets += numberFractionArray[j].fractionCount;
            }
        }

        //Generate a numberArray object for addition to Map
        let numberArrayObject;
        if(numberFractionArray) {
            numberArrayObject = {
                index: i,
                fractionCount: numberFractionArray.length
            }
        } else {
            numberArrayObject = {
                index: i,
                fractionCount: 0
            }
        }

        //Add current number info to Map so that it can be used by a higher multiple
        if(!numberArray) { //Add for first time
            numbers.set(number.toString(), [numberArrayObject]);
        } else { //Add the new item to the existing array
            numberArray.push(numberArrayObject)
            numbers.set(number.toString(), numberArray);
        }
    }

    return triplets;
}

The above, second version only goes through the initial array one time, but when it finds a number that works as the largest or middle value in the triplet, the execution has to go through the entire array stored on the next number down in the sequence.

This was more or less instinctive for me. I have a tendency to grab hold of any data that comes through under the theory that I’ll probably need it later, but strictly speaking, I don’t need it to comply with the requirements of this particular exercise.

My third version of the code doesn’t try to story any kind of detailed data, instead grabbing summary totals and putting them into the map at each step.

function countTriplets(arr, r) {
    let numbers = new Map();

    let triplets = 0;

    for(let i=0; i<arr.length; i++) {
        const number = arr[i];
        let numberObject = numbers.get(number.toString()); //Number in Map already?

        /* {
            count: x,
            subcount: y
        } */

        //Need to get the value of the next smallest in sequence already in the Map
        //Lets us count matches and update Map with correct info for number
        let numberFraction = number / r; /*Next smallest in sequence*/
        let numberFractionObject = numbers.get(numberFraction.toString());

        //Subcount semi problematic. Need to increment numberObject
        //But only if numberFractionObject exsits
        let additionalSubcount = 0; //Assume 0. Change if numberFractionObject exists

        //Increase Triplets where appropriate
        if(numberFractionObject) {
            triplets += numberFractionObject.subcount;
            additionalSubcount = numberFractionObject.count;
        }

        //Add current number info to Map so that it can be used by a higher multiple
        if(!numberObject) { //Add for first time
            numbers.set(number.toString(), {
                count: 1,
                subcount: additionalSubcount
            });
        } else { //Add the new item to the existing array
            numberObject.count++;
            numberObject.subcount+= additionalSubcount
            numbers.set(number.toString(), numberObject);
        }
    }

    return triplets;
}

My solution is O(n), and the space required is likely to come in around 2n. If there are no duplicates, and no triplets, then it would just be a hash table of size n. If there were no duplicates and everything came back as a triplet, then the size would be 3n because you’d store each item and have two additional data points next to it.

If you have a bunch of duplicates that all come down to the same triplet, then you’d need space of 9 (3 triplets with 3 data points each).

In looking at the editorial on HackerRank and the solution there, the solution requires space of 3n, and they are claiming complexity of O(nlogn), but it looks to me like they are going through the list twice, which would in theory mean that it’s O(2n) which would reduce down to O(n), but my algorithm should be roughly twice as fast.

If Statements Have Their Own Scope

I just ran into a bug that was the result of thinking that the block inside the body of an if statement had the same scope as the code outside of the if statement.

To clarify, the ‘{}’ in an if statement has its one scope, meaning that it can see the variables from the code outside of the ‘{}’, but that that exterior code is unable to see into the scope of the if statement.

Return Statements & Adding Event Listeners

I’ve been working with TCP connections lately, which means that upon connection from the client, my server needs to create a bunch of event listeners for things like .on(‘data’, and .on(‘timeout’

I also need to go through some logic to determine what I should do after connection, but before data or timeout. In my first pass, I created the event listeners towards the bottom of the .on(‘connection’ function.

That worked great until I started using return statements to change the flow inside of the .on(‘connection’ function. Once I did that, I started running into issues because the return statement resulted in the event listeners never being declared.

The lesson there for me is that I’m nearly always going to be best off setting up an event listener earlier rather than later. A niaeve concern might be that the on data listener will fire off before you get to the rest of your on connect function, but all the on data listener does is let stuff be placed on the event table, and then eventually be moved down to the event queue.

Nothing will be pulled off of the event queue though until the current function has finished running. So, you’re generally going to be just fine…unless maybe your current function is an async/await function, in which case it is possible that you’ll set up the event listeners, and then shift the function to the event table, allowing the on data listeners to fire off and get pulled onto the call stack before the original function resolves its asynchronous call and is placed back on the call stack.

Layers inside of layers, so my revised rule of thumb is to place any event listeners right after my await call. I hope that is helpful!

Async/Await Performance Implications – Part 2

In my last post, I discussed the fact that Async/Await pushes the async function onto the event table when the executing thread hits an await call. On the face of it, that means that a program using async/await doesn’t ‘freeze up’ when an asynchronous operation occurs.

I also pointed out that while the whole program doesn’t freeze, the async function is paused while it waits on the await line.

The next question is what that last bit means from a performance standpoint. Here is an example piece of code using callbacks, and setTimeout to simulate an API call or other asynchronous event:

const timeoutFunction = () => {
    setTimeout(()=> {
        console.log("Done");
    }, 5000)

    setTimeout(()=> {
        console.log("Done");
    }, 5000)

    setTimeout(()=> {
        console.log("Done");
    }, 5000)
}
timeoutFunction();

As far as output goes, there is a 5-second pause, and then three instances of “Done” being logged out right away, one right after the other.

Contrast that with this code:

const asyncTimeout = () => {
    return new Promise(function(resolve, reject) {
        setTimeout(resolve, 5000);
    }).then(() => {
        console.log("Done");
    })
}

const testAsync = async () => {
    await asyncTimeout();
    await asyncTimeout();
    await asyncTimeout();
}

testAsync();

This second block of code results in a 5-second pause, the logging of “Done”, another 5-second pause, the logging of “Done”, another 5-second pause, and then finally the third logging of “Done”.

Much like the second block of code in my last blog post, we can see that inside of the async function, execution is paused at each await statement, which means that you have 3 separate, 5-second pauses as compared to the first code-block using callback functions, in which the three, 5-second pauses have a nearly 100% overlap.

In summary:
As most of you have no-doubt already deduced, using async/await for a single asynchronous call has no performance hit against doing that same asynchronous operation as a callback.

When doing multiple asynchronous operations in a single function, if the asynchronous calls build on each other (meaning that data from the first call is needed to make the second call), then you lose nothing performance-wise by using async/await, and likely dramatically simplify how you construct your code.

However, if you need to make multiple un-related asynchronous calls, then you’re better off not putting them all in the same async function because doing so eliminates any overlap in the wait for the asynchronous calls to report back.