I often hear that the hardest problems in writing software are cache invalidation, naming things, and off-by-one errors.
However, I think that there’s something else, something that’s not only harder than all of those, but crucially pervades all aspects of writing software that’s often forgotten: the problem of dealing effectively with abstraction.
Abstractions are ubiquitous in how we deal with the world. They let us take complicated ideas and summarise them by choosing which details to reveal and, crucially, which to hide. This blog post looks at several areas of writing software, pointing out some abstractions that occur and how tricky they are to get right.
In a Google AdWord you’ve got about 95 characters of text to play with. That’s not very much. In fact, I’ve gone over that limit already in this paragraph alone. You need a really good abstraction — text that’s both really short but also manages to get people interested enough to click it.
User experience design
When a new user starts using your software for the first time, chances are that they won’t know how to use it. But to make software ingeniously simple, the fact that they’ve never seen it before must not matter. And they’re not going to read the manual — they need to instantly ‘get it’.
Essentially there needs to be a good abstraction. You need to give the user a mental model, and the challenge is to give them one that’s good enough to use the product, and to give it to them quickly, before they get bored and give up. But, more than that, the challenge is to constantly adjust the abstraction you give over time.
Beginners will want to feel like they’re getting value out of the tool quickly and easily without being overwhelmed, whereas experts will want something more sophisticated. They already know what your core concepts are and how to use the product to help them do their day job, but that keyboard shortcut that’ll save them 10 seconds each time they do that operation? That’s what they love.
I recently encountered commits that failed code review because of poor abstraction, so I’ll use that as an example. The commits tried to add a new feature that involved getting a string from another library, parsing it, and then doing something with the parsed result (see what I did there? — I gave you an abstraction so that I didn’t bore you with the details of the new feature :) This is a perfect example of when abstraction goes wrong.
First up, the problem with the commits is that they didn’t handle several things that the library could throw at us. When writing a parser, you need to know precisely all the forms that it can take. If you don’t go and peel away every abstraction and do that literature review, but instead rely on only a few examples, you literally don’t know what could happen, because you don’t know what you don’t know. So you shouldn’t be surprised that there’s several bugs in your code.
Second up, the library suffers from primitive obsession and is broken. An object would be a vastly better abstraction than a string because an object has properties and methods, so that we, and lots of other consumers of that library, wouldn’t need to write a parser to pull apart the string. Overall, there’d be less repetition of code if the library just did what lots of its consumers wanted.
So, choosing the abstraction is hard. If you get it right, you’ll make it faster overall for consumers of your library to write code, and you’ll reduce their bug rate. If you get it wrong, people will find your library annoying to work with — they’ll spend their time writing boilerplate code, and fixing bugs in code they should never have had to write in the first place.
But abstraction is about more than just the behaviour of an API. It’s also about its names: an article on the importance of naming things concludes that the point of a name is “telling the user exactly what they need to know and no more”. In other words, you first need to decide what abstraction you want — what does the user need to know — and only then can you come up with a name that does precisely that.
Naming things goes beyond just an API though; for example, consider the terms “false positive” and “false negative”. They are catch-all terms that cover all the distinct ways in which a system might fail. They are a way of talking about failures without going into the specific details. They are an abstraction, and I don’t think they’re a very good one.
Firstly, I can never actually remember which way round positive and negative are. For example, if a user buys something on the internet and the purchase gets wrongly rejected, is that a false positive or a false negative? Choosing different terms could make things easier to understand.
Secondly, we’ve just annoyed one of our users. Whenever we’re discussing possible features or bug fixes to the internet purchasing system, I want to keep the user at the forefront of our mind, and one way to help do that is to choose an evocative word. That’s why I prefer the terms “insult rate” and “fraud rate”, instead of “false positive” and “false negative”.
When writing any test you need to trade-off between the insult rate and the fraud rate. If you’re not careful, unit tests become too tightly coupled to the product code under test. Any small change to the application causes tonnes of tests to fail erroneously (a high insult rate). The tests are “brittle” and a failure to handle abstraction is often at play.
When writing a unit test you need to decide what the asserts should actually test and, on the other hand, what is implementation-defined behavior. In other words, you need to choose what are the implementation details of the method that should be hidden from the test, and you shouldn’t write asserts for that behaviour.
This makes the test less coupled to the product, less brittle, and less likely to break when the product code gets changed slightly.
We’ve seen how abstraction (i.e. choosing which details to reveal and, crucially, which to hide), cuts across many different areas when writing software, from marketing to user experience design to writing code to designing the behaviour of an API to naming stuff to testing — it gets everywhere.
And not only is it really hard, but often it’s subtle to spot.
And that’s why I think it’s more important than cache invalidation, naming things, and off-by-one errors.
And mastering it is crucial in upping your game as a member of a software team.
Next time, we’ll look at HTTPS and how the abstraction that it provides is not only the wrong one, but also dangerous. By default, HTTPS is not as secure as it can be, but by peeling back the abstraction, and looking within, we get access to a whole host of options that we can tweak to make our website safer.