I ran into this problem on HackerRank:
Starting with a 1-indexed array of zeros and a list of operations, for each operation add a value to each of the array element between two given indices, inclusive. Once all operations have been performed, return the maximum value in your array.
My first go at this works, but isn’t fast enough:
let myArray = [];
for (let i = 0; i < n; i++) {
myArray.push(0);
}
for (let i = 0; i < queries.length; i++) {
let operationStart = queries[i][0] – 1;
let operationEnd = queries[i][1];
let action = queries[i][2];
for (let j = operationStart; j < operationEnd; j++) {
myArray[j] += action;
}
}
return Math.max(…myArray);
As I thought more about the problem, I realized that only the end points of each operation mattered. I tried a few different approaches, but still was coming back with my algorithms taking too long to execute on several of the tests.
I finally threw in the towel, and read the discussion, which pointed out that you could treat it as a signal processing problem, and only record the changes–in essence have an array with a plus on the start point of the range being summed by an operation and a minus one spot after the end of the summation.
For example:
If the operation is 2, 4, 100 (meaning add 100 to the 2nd, 3rd, and 4th spots in the array)
[0, 100, 100,100,0] could instead be treated as:
[0, 100,0,0,-100]
The approach being advocated in the comments essentially required n operations to create the array of zeros, then a set of operations to populate the changes, and then n more operations to run back through the array keeping a running total in order to figure out what the largest number is.
That made sense to me, but I wondered if there was a way to combine the two approaches and come up with something that required fewer operations.
My thought was that all you really needed was to record the points where the signal changed, and the magnitude of the change.
let endPoints = new Set();
for (let i = 0; i < queries.length; i++) {
endPoints.add(queries[i][0]);
endPoints.add(queries[i][1]+1);
}
let sortedEndPoints = Array.from(endPoints)
sortedEndPoints.sort((a, b) => a-b);
let values = [];
for (let i = 0; i < sortedEndPoints.length; i++) {
values.push(0);
}
for (let i = 0; i < queries.length; i++) {
let leftIndex = sortedEndPoints.findIndex((element) => {
return element === queries[i][0];
})
let rightIndex = sortedEndPoints.findIndex((element) => {
return element === queries[i][1]+1;
})
values[leftIndex] += queries[i][2];
values[rightIndex] -= queries[i][2];
}
let maximum = 0;
let runningTotal = 0;
for (let i = 0; i < values.length; i++) {
runningTotal += values[i];
if (runningTotal > maximum) {
maximum = runningTotal;
}
}
return maximum;
The solution above came back as a fail on a bunch of tests (due to a time out) on HackerRank.
That really surprised me, and continued to not make sense to me until I went ahead and unlocked some of the test cases that had more data.
I had been envisioning problems that scaled up to to something like a billion data points in the array and ten thousand add operations.
The test cases scaled up to 4k points in the array and 30k addition ranges or 10 million points in the array and 100k addition ranges.
In that type of data set, the overhead from sorting the array of edges gets really intensive really quickly as compared to the overhead from traversing the relatively smaller array that they were using compared to what I’d been envisioning.
In the interest of proving my theory, I used their test data to create some test data that fit the profile I’d been envisioning.
The test data was as follows:
Array of 5 million places with 3 addition ranges.
Array of 4 million places with 3 addition ranges.
Array of 4 million places with 30 addition ranges.
Array of 4 million places with 30 addition ranges.
Array of 4 million places with 4,894 addition ranges.
Array of 4 million places with 4,994 addition ranges.
I then duplicated the test data 27 times and ran a comparison with a stopwatch.
On average, the method suggested by the users at HackerRank took ~8 seconds to run through that data on my machine and my algorithm took ~2.5 seconds to run through the same test set. The margin of error on something like that is probably half a second, so it’s not super accurate, but it does tend to support the idea that depending on the data you’re dealing with, the overhead of sorting the array of edge points can still end up being much less than the overhead of traversing the full array.
Here is the version that the guys and gals in the discussion for the problem on HackerRank suggested:
let edgesArray = Array(n).fill(0)
queries.forEach(([a, b, k]) => {
edgesArray[a-1] += k;
edgesArray[b] -= k;
})
let maximum = 0;
let tempAccumulater = 0;
edgesArray.reduce((acc, cur) => {
tempAccumulater = acc + cur;
if (tempAccumulater > maximum) {
maximum = tempAccumulater;
}
return tempAccumulater;
})
return maximum;
At some point, I would like to spend some more time trying to tweak my ‘edges only’ solution to figure out a way to reduce the overhead involved in the sort. I’m thinking that putting the edge points into some sort of tree structure might reduce the sorting overhead and allow for my solution to be more competative across a broader set of test cases.
A better route still would be if I could figure out how to sort or partially sort the edge points as I put them into the set or array, but so far nothing I jumping out at me as to how I could make that happen.
As far as optimizing the algorithm from the people at HackerRank, I considered putting in ‘leftMost’ index and a ‘rightMost’ index that could be used to trim the leftmost and rightmost parts of the array so that the final traversal can be accomplished more quickly, but that ends up introducing extra overhead an a problem set like they are using. If you tended to have fewer tests that were clustered around one part of the set of possible locations in the array, it could be helpful on average. I can think of a few real-world type situations where that might be the case. Maybe involved with calibrating some kind of sensor or machinery where you know that once it’s mostly aligned most of the data is going to hit the same part of the sensor, but on the first few runs you don’t know which part of the sensor are going to be hit, so you have to watch the entire sensor.
It’s definitely an unlikely set of edge cases, but something that’s kind of fun to think about.