On Software Performance

If you have ever talked with your manager or a client about the performance, you’ve most probably heard the dreaded “Is it fast?” question. There’s also the imperative variant, “It has to be fast.”, although far less common. In both cases, I usually follow up with a serious face, forcing the other person to stay with me and actually define what it means for it to be fast.

As you slowly get angry (because nobody knows the code better than you), you start to notice that the client is not really talking about the performance itself, but rather the feeling of it – the perceived performance. That opens a whole new world of quirks of human perception. And hacks, of course.

There are also other cases when something could be improved, but there are no plans nor resources for that, as it’s already fast enough. How can one strongly suggest actually putting work into it? Well, you have to know how to sell it.

Words, words, words

The performance itself has no clear definition. Not because it’s hard to give one, but because it really means several things. At first, we have to know the difference between the latency and throughput. A good enough simplification would be that the former says how quickly we’ll see a single result. The latter, how many results we’ll see within a given time frame.

But there are other words: effectiveness, efficiency, speed. While all of them are used in the context of software, all three originate in physics. And yes, there are cases where you actually have to care about the amount of electricity used; most programmers don’t. We do, however, care about the resources needed. About the capacity of the machine and the environment running the software.

Remind yourself that the software has to be created as well. And while we can define the “programmer’s performance”, I’d strongly advise not to. It leads to a situation where people focus only on these numbers. Yes, micromanagement. It’s not necessarily bad – in my honest opinion, a decent level of transparency and observability is a must-have for any kind of performance. Even mine.

Performance for masses

People do care whether the software they use is fast; that’s a fact¹. But do they care about exact numbers? Not directly, of course, as it doesn’t matter whether the page took 990 or 1010 milliseconds to load. But if it’s one second versus two, everyone will notice it. Including you, your client, and the users.

What if we ignore the individuals and care only about the average times? In the end, someone has to be in the long tail, and maybe we’re giving them a little too much ~~attention~~ resources? But what is a metric you should care about? The average? 90th percentile? 95th? 99th? It depends on the very case, but let’s say 90th is a good start. If your monitoring tool doesn’t provide percentiles, just stick to the average – it’s far better than random sampling.

If you haven’t set up any APM, I advise doing that. In the beginning, you’ll spend hours fiddling with all kinds of charts and metrics. (Or maybe it’s just me.) At the end of the day, you’ll end up with ~~a new dashboard~~ the data to work with. If there’s no go-to APM for your technology, consider logging. It won’t provide you that much insight but will be enough for API metrics.

This leads us to a point where it’s more than obvious that we have to combine these two metrics. The more we optimize, the better, but we have to know the number of impacted users. And as people are bad at large numbers, please, use a calculator. Or at least this xkcd comic.

Making the code faster (not really)

Can we make the code faster by making it slower? Of course not, but we can make it faster for some by making it slower for others. Such tradeoffs happen on both theoretical and practical levels, more often than you think.

An excellent example of a theoretical tradeoff is the ordering of loops. Let’s say that we have already analyzed our possibilities and now have to choose between O(n log m) and O(m log n). Based on our domain knowledge, and a few quick data checks, we can be pretty confident that one is strictly better.

As a practical example, I’ll give you a great resource optimization tip. Sorry, no code this time. The example, however, is quite universal. Let’s assume that there’s a typical (in terms of user actions), CPU-heavy operation that normally takes between 10 and 50 milliseconds to complete (e.g., a database operation). Now, what will happen if we make it always take not less than a hundred?

We made it slower. Like, a lot slower. But does it really matter? In most cases, it does not, not at all. What changed is that within this time, other users were able to interact with the system. And the users that waited for a little more? They’ll be fine. At least as long as it’s less than 200ms.

A similar “unclogging” approach fits nicely into the JavaScript world. Whenever you perform a long blocking operation (i.e., a CPU-heavy function), check whether sprinkling a setTimeout/setImmediate/await here and there does help. It won’t make the code run faster but will make the overall performance degrade slower with more concurrent executions².

Making the code faster (now really)

Where does the I made code faster kind of performance lie? It often means that the code finishes sooner. It’s not necessarily because you’ve made it to use less CPU but maybe because you made it utilize multiple cores. Or maybe through reducing the used memory, the CPU usage dropped. (The last one is even more important in the world of garbage-collected languages.)

Benchmarking is an art in itself. You have to understand not only your code and the entire environment (e.g., browser) but also the language characteristics, standard library, or even the CPU architecture. To make the most out of every run, use a full-blown library, like the excellent criterion.rs or Benchmark.js.

If you already know how to benchmark the code, it’d be good to know where to focus. To find such a hot spot, both APM and logging will come in handy. But the ultimate weapon is the profiler. People are not fond of it, but my wild guess would be that it’s the same as with the debugger: it’s easier to “mark” the code here and there with some logs rather than actually debug the code.

I profiled several technologies in my career, and I have to say that JavaScript is the easiest. Yes, you have to remember that (probably) there’s a JIT compiler involved, but the simplicity of tools is just crazy. In Python, using the profile module is enough. In Lua, you have to choose one of many solutions and often alter it in some way. In C/C++, the tools are there, but it’s hard to understand the results (ah, perf and valgrind). Rust is slightly better on this front but still requires more work than JS.

Now, if the code finishes sooner, it’ll usually mean that more operations could be handled using the same resources. There are some hard limits, driven by the maximum number of open connections, the network bandwidth, the amount of RAM, etc. The framework or even the programming language itself can be a barrier as well, however, these can be replaced as well³.

Making the making faster

How much time do you put into making the development faster? It’s not only about coding – once the code is written, you have to test whether it works, then create a pull request, someone else has to review it, then do the deploy… There are also the less common tasks, like building a release-ready version, adding translations, or updating the documentation and dependencies.

All of these can add up to a couple of hours each week. In extreme cases, it may be half of the entire time needed to actually deliver anything. Now multiply it by the numbers of all developers. We are talking hundreds or even thousands of dollars each week! Really, go and check it in your project.

Hopefully, some of you may have noticed that one of two of these tasks is already automated in their project. The code review is faster with CI in place, the API docs can be generated automatically⁴, deployment/building a release version can be a single-click action, or even happen automatically on merge.

All these automations take time to set up, that’s true. But it’s an investment that really pays of, even in the short run. And it’s often the case that the client is willing to pay a couple dollars each month to have more productive hours.

Closing thoughts

Many factors make the performance bad or good. Nobody can tell whether it’s good enough just by looking at the code – you have to have the data. Luckily, people like data (and charts). They can also hate it, especially when it involves them and their work.

Be prepared for the next discussion about the ~~money~~ resources budget for the performance. People really care and are okay with paying a little extra to save a couple of hours each month. They care more if you shave off some costs.

There are some great write-ups on how performance led to some insane gains of mobile apps or websites. I really like the one about Furniture Village and Pinterest, as both are very informative and can actually help you. The rest is either hard to reproduce (AutoAnything.com) or too specific (Netflix).

Most frameworks are fine, really. Do focus on delivering the value first – the scaling comes later. And if you are at the point where you do need to scale, you most probably already have the means to do so, even if it requires a switch.

Data exports often involve a lot of serial computation. In that case, you could split the work into chunks, waiting a couple milliseconds between the batches. It will increase the export time by a couple percent but make the system stay responsive at all times. There’s a good Snyk.io blog post on that.

⁴

People noticed that maintaining useful API documentation takes time a long time ago, so there is plenty of options. On one end, there are language-agnostic tools like Swagger. On the other, some languages are one step ahead and have an official way of creating documentation, like rustdoc for Rust, ex_doc for Elixir, or inline documentation in GraphQL.

Table of contents

Intro