Database Denormalization

I’ve been working on a side project for quite a while now. It’s been a great learning experience, and I hope to eventually have something that I can monetize.

One of the initial decisions resolved around what to use for my database solution.

I ended up in a normalized, SQL database solution for various reasons, but I flirted quite heavily with the idea of a No-SQL database solution.

As I was building out my SQL tables, though I came to the conclusion that a fully normalized approach was the wrong answer.

I have certain tables which will see record creates take place, and some edits to those records across certain fields, but there are some fields that will never be edited.

Under a very common use-case, I need data from what would be two different tables under a fully-normalized approach. That would mean that every time a request was made to that endpoint, the database would have to make a join on those two tables.

I’m still getting my feet under me as it relates to the correct way to scale a system, but my sense is that your database generally ends up being the constraint that is the hardest to scale, so I’ve tried to build with an aim towards minimizing database load whenever possible.

A de-normalized scheme generally seems to trade additional space consumption for reduced load as a result of fewer joins, so that’s exactly what I ended up doing.

I took a few fields from one table and am also storing them on the table that my endpoint is primarily reading from. The fields in question aren’t being updated after the initial write, so I’m not introducing load to keep the two tables in sync, and the elimination of those joins should drastically push back the point where I have to worry about database load being a problem.

Leave a Reply