Category Archives: Thoughts

Why Publish().RefCount() doesn’t work well with Retry()

When you’re designing a library with a large number of methods, one of the hard problems is making sure that all of the methods work together in a consistent way. If done correctly, then you can make your library a joy to use. However, if done badly, then your users will spend a frustrating number of hours debugging what’s going on…

And that experience of spending a frustrating number of hours debugging what’s going on is exactly what happened to me recently when I was using Reactive Extensions.

It all started with a performance problem.

We quickly identified what was causing the performance problem, and so we added a performance optimisation — Publish().RefCount() — in order to fix it. And that’s when our code broke.

We were lucky — our tests caught the fact that the performance optimisation had broken our code — and so we set about trying to get the best of both worlds: working code that was also fast.

This blog post goes into the detail of precisely what happened and how we fixed it.

A failing test

Let’s look at a test that demonstrates this. The test below creates an observable, uses Retry() to add error handling, and then it asserts that the observable is equal to 0 followed by 1.

[Test]
public void Test([Values(false, true)] bool withPublishAndRefCount)
{
    IObservable<int> original = CreateObservable(withPublishAndRefCount);

    var withRetry = original.Retry(2);

    Assert.That(withRetry.ToList().Wait(), Is.EqualTo(new[] { 0, 1 }));
}

The test is executed twice. The difference between the two executions is the value of the parameter withPublishAndRefCount.

Creating the observable

The method below shows how that parameter is used. When it’s set to false, Publish().RefCount() isn’t used, and the test passes. However, when it’s set to true, CreateObservable wraps the observable using Publish().RefCount() and just like it did for us, this causes the test to fail.

private IObservable<int> CreateObservable(bool withPublishAndRefCount)
{
    var observable = Observable.Defer(ObservableFactory);
    if (withPublishAndRefCount)
    {
        return observable.Publish().RefCount();
    }
    else
    {
        return observable;
    }
}

Note that the return type of CreateObservable is IObservable<int>. At this point, it’s probably worth mentioning the Liskov Substitution Principle which essentially says that swapping one implementation of IObservable<int> for another shouldn’t cause the test to fail, but that’s precisely what’s happening! When we use Publish().RefCount() we get a different implementation of IObservable<int> and the test fails.

The observable factory

The method below shows the observable factory. The first subscription results in OnNext(0) followed by OnError(e). When Retry() sees this error, it will hide the error from everything downstream and re-subscribe. The second subscription results in OnNext(1) followed by OnCompleted()

private int nextSubscription = 0;
private IObservable<int> ObservableFactory()
{
    var thisSubscription = nextSubscription;
    nextSubscription++;
    switch (thisSubscription)
    {
        case 0:
            var e = new ApplicationException();
            return Observable.Return(thisSubscription)
                             .Concat(Observable.Throw<int>(e));
        default:
            return Observable.Return(thisSubscription);
    }
}

The root cause

So why does wrapping the observable using Publish().RefCount() cause the test to fail?

The test fails because Retry() isn’t able to recover from the error. When it sees the error and re-subscribes, it sees precisely the same error again, and again, and again… If we hadn’t specified a maximum number of times that it can retry, Retry() would spin forever (well, until the test times out)! Essentially the error has been cached by Publish().RefCount() so that subscriptions aren’t independent.

So what’s caching this error?

Well Publish().RefCount() is short-hand for Multicast(new Subject<T>()).RefCount()

And it is Subject<T> that caches the error.

The fix

Now that we know precisely what’s happening, we can work around it.

What we need to do is write a replacement for Subject<T> that keeps subscriptions independent, and then we simply replace all instances of Publish().RefCount() with Multicast(new IndependentSubscriptionsSubject<T>()).RefCount()

And here’s the code for IndependentSubscriptionsSubject<T>

public class IndependentSubscriptionsSubject<T> : ISubject<T>
{
    private ISubject<T> _innerSubject = new Subject<T>();

    public IDisposable Subscribe(IObserver<T> observer)
    {
        return _innerSubject.Subscribe(observer);
    }

    public void OnNext(T value)
    {
        _innerSubject.OnNext(value);
    }

    public void OnCompleted()
    {
        _innerSubject.OnCompleted();
    }

    public void OnError(Exception error)
    {
        var erroringInnerSubject = _innerSubject;
        _innerSubject = new Subject<T>();
        erroringInnerSubject.OnError(error);
    }
}

Conclusion

We’ve seen how Publish().RefCount() caches errors, which breaks Retry(), and we’ve also seen how to work around it by writing a replacement for Subject<T> that doesn’t cache errors.

Buying a house is like flying a rocket into space… but in reverse

Flying a rocket into space takes a lot of fuel, a huge amount in fact. The reason for this is that every kilogram on board needs to be lifted up against the Earth’s gravity, and that takes energy. Rockets and their payloads tend to be quite heavy, and the Earth’s gravity is quite strong; these two factors combine to mean that it takes a lot of energy, a huge amount in fact.

When the rocket is part of the way up, some of the fuel will now have been burnt, and so one of the many fuel tanks will now be empty. Given that the rocket has no further use of an empty fuel tank, and that it is literally weighing the rocket down, in a multistage rocket it will be jettisoned.

Thinking about it another way, the rocket’s objective is to put the payload into space, but that takes fuel, and you can’t carry fuel without fuel tanks; so we need fuel tanks, but they add weight, which means more fuel. The fuel tanks are essentially a necessary evil, but we can mitigate that by jettisoning them as soon as they become empty.

If you haven’t followed either of those explanations, there’s more detail of the maths behind this here.

But what does all of this have to do with buying a house? Well, the property ladder is just like a multistage rocket but in reverse!

Mortgage costs

Let’s look at the marginal cost of adding an extra £ 1 to a 25-year mortgage with 3% APR compound interest.

Because we’re analysing the marginal cost, we assume that the mortgage is paid off in precisely the same way that it would have been, and so 25 years later everything has been paid off apart from the extra £ 1 that we added at the beginning. However, the interest on that extra £ 1 has been accumulating so that the debt is now:

£ 1 * 1.03 ^ 25 = £ 2.09

However, we’re currently assuming that the interest rate will remain at 3% for 25 years, which is perhaps a little over-optimistic :)  Nobody knows precisely what’ll happen to it, but let’s pick some numbers out of thin air. Let’s assume that it’s 3% for 5 years, then 6% for 5 years, then 9% for 5 years, and then 3% again for 5 years, followed by 6% for 5 years:

£ 1 * 1.03 ^ 10 * 1.06 ^ 10 * 1.09 ^ 5 = £ 3.70

Now this seems high, but bear in mind that it’s an amount of money in 2039. That’s way in the future — at that point we’ll know how Y2038 compared to the Y2K bug… — and a lot of inflation will have occurred. Expressing it in today’s money, assuming an inflation rate of 2.5% per year:

£ 1 * 1.03 ^ 10 * 1.06 ^ 10 * 1.09 ^ 5 / 1.025 ^ 25 = £ 2.00

Property ladder

Consider two strategies:

Firstly, buying the house that we really want straight away. With this strategy, we accumulate interest on the whole amount from day 1.

Secondly, buying a cheaper house, and living there for 10 years, before upgrading to the house that we originally really wanted. With this strategy, for the first 10 years, we’re only accumulating interest on the lower debt of the cheaper house, then, later on, we take on the debt associated with the more expensive house, and at that point, start to accumulate interest on it too.

The second strategy is just like the multistage rocket in reverse. Just as we only carried the fuel tanks for the minimum amount of time that they were needed, we’ve used the same trick with mortgage debt, and only carried it for the minimum amount of time that it’s needed.

In the real world…

However, in the real world it’s not as simple as all that. To start with, house prices tend to go up, and we haven’t taken account of that; next up, moving from our starter home to the place we really wanted is not zero cost: there’s estate agent fees, legal fees, stamp duty, etc…

With the first strategy, we pay interest to the bank on the whole amount.

If we assume that all houses prices go up at 3.5% per year (regardless of their current value), then with the second strategy, we still pay interest on the smaller amount to the bank, but for the difference between the amounts, instead of paying interest on that, we instead pay the owner, because of the higher purchase price when we come to buy it.

Again, let’s think of what happens to our marginal £ 1 for the first 10 years where the strategies differ (again, expressed in today’s money):

First strategy: £ 1 * 1.03 ^ 5 * 1.06 ^ 5 / 1.025 ^ 10 = £ 1.21

Second strategy: £ 1 * 1.035 ^ 10 / 1.025 ^ 10 = £ 1.10

So far, the second strategy is winning, by 11p per marginal £ 1.

To make it more concrete, let’s assume that there are 80,000 of these marginal £ 1s, i.e. the difference between the two houses that we buy under the second strategy is £ 80,000. Therefore, so far, the second strategy is winning by £ 8,800.

However, there are several costs with the second strategy when we move house: estate agent fees, legal fees, stamp duty, etc… which might be around £ 10,000. This not only wipes out its lead, but firmly tips the balance in the favour of the first strategy.

Conclusion

I picked assumptions out of the air to write this blog post. That we lived in our first house for 10 years before moving, an inflation rate of 2.5% per year, the housing market increasing at 3.5% per year, interest rates rising from 3% to 6%, that the difference between our first and second house was £ 80,000, and moving costs of £ 10,000.

If instead, houses prices rise not by 3.5% per year, but by 3% per year, then instead of losing by £ 1,200 the second strategy actually wins by £ 3,000.

Separately, if interest rates end up not at 6% but at 7%, then instead of losing by £ 1,200 the second strategy actually wins by nearly £ 3,500.

Basically, you can prove anything with statistics :)  And, as always, to get the answer that you’re looking for, simply tweak the assumptions… Or ask Captain Hindsight!

At the end of the day, it’s not clear which strategy to choose, and in any case, the bank may well force the decision on you, by limiting the amount of money that they’re prepared to lend you, which will make the second strategy the only viable one.

Dealing effectively with abstraction

I often hear that the hardest problems in writing software are cache invalidation, naming things, and off-by-one errors.

However, I think that there’s something else, something that’s not only harder than all of those, but crucially pervades all aspects of writing software that’s often forgotten: the problem of dealing effectively with abstraction.

Abstractions are ubiquitous in how we deal with the world. They let us take complicated ideas and summarise them by choosing which details to reveal and, crucially, which to hide. This blog post looks at several areas of writing software, pointing out some abstractions that occur and how tricky they are to get right.

Marketing

In a Google AdWord you’ve got about 95 characters of text to play with. That’s not very much. In fact, I’ve gone over that limit already in this paragraph alone. You need a really good abstraction — text that’s both really short but also manages to get people interested enough to click it.

User experience design

When a new user starts using your software for the first time, chances are that they won’t know how to use it. But to make software ingeniously simple, the fact that they’ve never seen it before must not matter. And they’re not going to read the manual — they need to instantly ‘get it’.

Essentially there needs to be a good abstraction. You need to give the user a mental model, and the challenge is to give them one that’s good enough to use the product, and to give it to them quickly, before they get bored and give up. But, more than that, the challenge is to constantly adjust the abstraction you give over time.

Beginners will want to feel like they’re getting value out of the tool quickly and easily without being overwhelmed, whereas experts will want something more sophisticated. They already know what your core concepts are and how to use the product to help them do their day job, but that keyboard shortcut that’ll save them 10 seconds each time they do that operation? That’s what they love.

Writing code

I recently encountered commits that failed code review because of poor abstraction, so I’ll use that as an example. The commits tried to add a new feature that involved getting a string from another library, parsing it, and then doing something with the parsed result (see what I did there? — I gave you an abstraction so that I didn’t bore you with the details of the new feature :) This is a perfect example of when abstraction goes wrong.

First up, the problem with the commits is that they didn’t handle several things that the library could throw at us. When writing a parser, you need to know precisely all the forms that it can take. If you don’t go and peel away every abstraction and do that literature review, but instead rely on only a few examples, you literally don’t know what could happen, because you don’t know what you don’t know. So you shouldn’t be surprised that there’s several bugs in your code.

Second up, the library suffers from primitive obsession and is broken. An object would be a vastly better abstraction than a string because an object has properties and methods, so that we, and lots of other consumers of that library, wouldn’t need to write a parser to pull apart the string. Overall, there’d be less repetition of code if the library just did what lots of its consumers wanted.

So, choosing the abstraction is hard. If you get it right, you’ll make it faster overall for consumers of your library to write code, and you’ll reduce their bug rate. If you get it wrong, people will find your library annoying to work with — they’ll spend their time writing boilerplate code, and fixing bugs in code they should never have had to write in the first place.

Naming

But abstraction is about more than just the behaviour of an API. It’s also about its names: an article on the importance of naming things concludes that the point of a name is “telling the user exactly what they need to know and no more”. In other words, you first need to decide what abstraction you want — what does the user need to know — and only then can you come up with a name that does precisely that.

Naming things goes beyond just an API though; for example, consider the terms “false positive” and “false negative”. They are catch-all terms that cover all the distinct ways in which a system might fail. They are a way of talking about failures without going into the specific details. They are an abstraction, and I don’t think they’re a very good one.

Firstly, I can never actually remember which way round positive and negative are. For example, if a user buys something on the internet and the purchase gets wrongly rejected, is that a false positive or a false negative? Choosing different terms could make things easier to understand.

Secondly, we’ve just annoyed one of our users. Whenever we’re discussing possible features or bug fixes to the internet purchasing system, I want to keep the user at the forefront of our mind, and one way to help do that is to choose an evocative word. That’s why I prefer the terms “insult rate” and “fraud rate”, instead of “false positive” and “false negative”.

Testing

When writing any test you need to trade-off between the insult rate and the fraud rate. If you’re not careful, unit tests become too tightly coupled to the product code under test. Any small change to the application causes tonnes of tests to fail erroneously (a high insult rate). The tests are “brittle” and a failure to handle abstraction is often at play.

When writing a unit test you need to decide what the asserts should actually test and, on the other hand, what is implementation-defined behavior. In other words, you need to choose what are the implementation details of the method that should be hidden from the test, and you shouldn’t write asserts for that behaviour.

This makes the test less coupled to the product, less brittle, and less likely to break when the product code gets changed slightly.

Abstraction

We’ve seen how abstraction (i.e. choosing which details to reveal and, crucially, which to hide), cuts across many different areas when writing software, from marketing to user experience design to writing code to designing the behaviour of an API to naming stuff to testing — it gets everywhere.

And not only is it really hard, but often it’s subtle to spot.

And that’s why I think it’s more important than cache invalidation, naming things, and off-by-one errors.

And mastering it is crucial in upping your game as a member of a software team.

Next time, we’ll look at HTTPS and how the abstraction that it provides is not only the wrong one, but also dangerous. By default, HTTPS is not as secure as it can be, but by peeling back the abstraction, and looking within, we get access to a whole host of options that we can tweak to make our website safer.