Endpoint/Route Design

I just found myself in a position where I need to re-write an endpoint to deal with what in hindsight feels like an obvious mistake.

If you are ever writing an endpoint and you’re returning one type of object for one set of requests and a highly-related, but still slightly different object for another set of requests, that is a sign that you probably need to split your one endpoint into two endpoints.

In my case, both use cases are asking the endpoint for a list of the same kind of underlying widget, but they are asking for slightly different ‘snapshots’ of that underlying object.

I’ve partially de-normalized my database when it comes to this foundational (to my application) widget as a way of reducing joins, which means that depending on which view of the widget one is after, a different table is hit.

I started out thinking ‘this route gives me a list of widgets’, but now I’m realizing that I need to proceed under a ‘this route gives me this view of the widgets’ and ‘that route gives me a different view of the widgets’ which means that I always return the exact same type of object across the one route and a different (but always the same for that route) type of object for the second route.

As painful as it is to have to re-work an endpoint (and all of the associated tests), I suspect this is the better option in the long run, and will keep things from getting unnecessarily complicated down the road.

A Trap For New Engineers

There is a common tendency when learning to program for people to move from language to language, or from tiny project to tiny project.

I think this is dangerous because oftentimes a particular concept in programming, say for instance orthogonality, doesn’t really sink in until after you’ve taken a project where you made a mistake around that concept far enough to see the problems surface naturally as a result of increased scale or complexity.

I mention orthogonality specifically, because I sat in a two separate programming interviews and explained what orthogonality meant in programming, and why it was important. I did a good enough job in the interviews that I was offered a job at that company (which I ultimately didn’t accept).

I then turned around a few weeks later in my day job and violated that principle by thinking that I could leverage an existing option being passed into a function to do something more than what it had been originally envisioned to do.

By the time the problem I’d created surfaced in my code a few weeks after that, I’d forgotten some of the context around what I’d done, and had a couple of days of tricky debugging before I realized my mistake.

Needless to say, I’m much less likely to make that mistake in the future, which illustrates one of the reasons that senior software engineers are paid so much more than junior software engineers.

There is real value in either having learned why not to do certain things, or having made the mistakes and learned from them. It it is hard to overstate how valuable it is not to make a mistake that sends an entire team of highly-paid engineers down a bad path that burns up weeks or months of effort.

It’s not just paying the salaries around that wasted effort that is costly. In some circumstances, a mistake like that can result in a competitor being first to market with a feature or product with a knock-on loss of opportunities that can value in the tens of millions of dollars.

All of which brings me to the point of this post. When working on a side project, I think that it’s important to pick something with enough complexity and scale that some of your design mistakes have a chance to surface. It’s better to learn the principles and never make the mistakes, but the more of those mistakes you surface and recognize, the more value you’ll ultimately be able to contribute to whatever organization you belong to.

Milliseconds vs Microseconds

I recently ran into an issue that I didn’t see coming. I was creating records, and then querying for them from the database.

I knew that they were created very close together in time, but they were created in two separate steps, so I knew that the timestamps would be different.

Only when I queried for them and compared the timestamps in my tests, I was getting the same timestamp back on both records.

After some digging I realized that the records had different timestamps in the database…where things were being measured in microseconds, but Javascript only tracks down to the millisecond, so under certain circumstances I was seeing one record round up and the other record round down such that the two of them were showing the exact same timestamp — to the millisecond on my in-memory object in Javascript.

I never would have expected to be in a situation where I needed or cared about more precision than 1/1000th of a second, but there you have it.

Postgres Create & Return Order

I recently wrote some tests that ended up being flakey. After some investigation, I realized that the objects I was getting back from my Postgres database weren’t always in the order that I was expecting them to be.

I was using a single update or insert statement to create several records at the same time and then querying that same table. My initial experiments had indicated that the objects were always being returned in the same order that I passed them into the insert/update statement, but as I grew the number of tests using that assumption and ran them dozens of times, it became apparent that this assumption is flawed.

It appears that Postgres by default generally saves the objects in the order they are passed in the insert/update statement, and that it generally returns them in that same order, but sometimes it doesn’t.

I haven’t pinned down whether the breakdown is in saving, the retrieving, or both. My money is that it’s on the saving side of things, but it doesn’t actually matter for my purposes. For now, I’ve been updating these tests to return in a given order, ordering by something other than the created_at timestamp.

A good learning for me there about not leaving anything to chance when it comes to a test. Testing…one of the few places where being more explicit is better.

The Hard Parts

I remember talking to a friend who worked at a company that did contract programming for people. The gist of the conversation was that they got a signficant amount of work from people who had some kind of application that had been programed by someone’s sibling/cousin/best friend. The app worked, and was starting to bring in money, but as the business continued to grow, the app wasn’t scaling the way that it needed to be.

The take away was that companies were engaging his firm because the writer of the original app had hit the limit of what they could do, either because they had other commitments, or because they had hit the limits of their expertise as a software engineer.

At the time, I was lucky if I could just get the right onto my screen when I was programming.

In the years since, that story has come to represent what I think of as the ‘hard’ parts of programming.

The initial roll out of a project is hard. You’re making dozens, hundreds even, of decisions about how to structure things. If you have deep subject matter expertise in the area that you’re trying to automate, then your decisions are more likely to be correct than if you don’t but even then, some of your assumptions may be drastically wrong once the app is in the hand of the users.

Those wrong assumptions can then result in needing to put in a lot of work to fundamentally re-structure the application.

Additionally, when first creating any kind of software application, there are a whole host of things that you need to get working together which aren’t a factor for someone coming along later in the process. It’s one thing to access the database via code in an app that’s been running for a year or two using methods that work hundreds of times over in other spots in the code base. It’s a whole different kettle of fish setting that database up and writing those methods which will allow those who come along later to access the database.

So, setting things up is hard. You can get around that to some extent by using open source tools that packing different components together, but I still have mad respect for those people who get the initial version of a project up and running.

The other thing that I’m realizing is really, really, hard is extending an existing project in ways that minimize the complexity so that you don’t cut your throat making future changes harder and harder to bring about.

Props to all of the engineers out there who are doing that or who are removing unnecessary complexity from an existing system.

I suspect it’s not the end-all-be-all, but in this area, I got a lot out of Code Simplicity by Max Kanat-Alexander (affiliate link).

Javascript Milliseconds vs Postgres Microseconds

I recently ran into something that I never would have even considered thinking about before now.

I was trying to query Postgres looking for all records with a specific result in a field that was a timestamp.

I got zero results, which was a real head scratcher given that I could see the records in the database.

As it turns out, by default, Postgres saves timestamps that track 6 digits to the right of the decimal. (Microseconds.) On the other hand, when you convert a Javascipt Date object to it’s string equivelent, you only get 3 digits after the decimal (Milliseconds).

So, when I asked passed in my query string it was something like YYYY-MM-DD HH:mm:ss.333. Postgres would then look at that and say ‘I have YYYY-MM-DD HH:mm:ss.333125′ but that is slightly after what you’re asking for, so I have no results to return to you’.

You can over-ride the default settings for a timestamp in Postgres to be only 3 digits past the decimal at the time you create the table/field by defining it as ‘timestamp(3)’

On Linting

I still remember the first time I heard about the concept of linting. My initial response was unfavorable to the concept.

That was partially because the person speaking up in favor of linting didn’t have the best reputation at the organization where I worked. It was partially because my mentor at the time seemed to be anti linting, and it was partially because the concept of something telling me that I was doing a ‘bad’ job writing code sounded as pleasant as going in for unnecessary dental work.

Fast forward several years, and now I’ve been working for a few months in a codebase that has linting enabled.

My takeaway now? Why wouldn’t you want an automated tool that automatically highlights stuff that you forgot to address while you were working?

Sure, in any big organization, there are likely going to be some linting rules that you think take things too far, but the value to having something help debug your work even before it has a chance to break is incredible.

I guess, for me, it’s a lot like Typescript. There’s a reason that big organizations tend to use linting. It’s not because they are dinosaurs who can’t change and who like to move slowly. It’s because they’ve realized that it’s a great way to increase developer efficiency by reducing the number of bugs that get out the door.

SQL Injection & Parameterization

After my last post I realized that not everyone knows what SQL injection is. In short, it involves providing SQL commands via user-input fields which results in the database running those commands as part of what is supposed to just be an insert or update. This injection of SQL into fields that aren’t expecting SQL can cause serious problems in your database (like dropping entire tables).

Here’s a brief overview: https://www.w3schools.com/sql/sql_injection.asp

Postgres (and the PG npm package) will allow you to prevent SQL injection through a process called parameterization. (https://www.postgresql.org/docs/9.2/sql-prepare.html) in short, via a parameterized query, you are telling your database which pieces of your query are meant to be plain text (which won’t run SQL commands regardless of what the user might try to inject), and which pieces are meant to include actual database-changing commands.

PG Parameterized Database Update Command

In my last post, I talked about using the update command and offered up this simple example:

UPDATE users
SET instances = instances - 1
WHERE id = 102
RETURNING *

Switching this to Typescript code that would work with PG (and having it update the first_name rather than the instances) would look like this:

import { Pool } from 'pg';

const pool = new Pool({
  host: 'localhost',
  user: 'database-user',
  max: 20,
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

const updateFirstName = async (
  firstName: string,
  id: number,
): Promise<void> => {
  const client = await pool.connect(); // create a connection

  try {

    const query = `
      UPDATE users
      SET first_name = '${firstName}'
      WHERE id = ${id}
      RETURNING *
    `;

    await client.query(query); // use the connection
  } catch (error) {
    // Deal with your catch error
  } finally {
    client.release(); // release the connection
  }
};

The problem with this, obviously, is that there isn’t any parameterization, which means that it’s vulnerable to SQL injection attacks.

I looked around trying to find an example of parameterization with the PG npm package as it related to an UPDATE query/command. I found plenty of examples for the INSERT INTO query/command, but nothing with the UPDATE.

So, I did some experimenting and determined that this works:

import { Pool } from 'pg';

const pool = new Pool({ // Customize all of this...
  host: 'localhost',
  user: 'database-user',
  max: 20,
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
})

const updateFirstName = async (
  firstName: string,
  lastName: string,
  id: number,
): Promise<void> => {
  const client = await pool.connect(); // create a connection

  try {

    if(typeof id !== 'number) {
      throw new Error('non-number id passed into updateFirstName');
    }
    const query = `
      UPDATE users
      SET first_name = $1, last_name = $2
      WHERE id = ${id}
      RETURNING *
    `;

    const values = [firstName, lastName];

    await client.query(query, values); // use the connection
  } catch (error) {
    // Deal with your catch error
  } finally {
    client.release(); // release the connection
  }
}

You’ll obviously have to customize that some to match your use case, but that should give you the correct approach.