Prescriptive Software Practices: Code Re-use Edition

Individual software practices don’t exist in a vacuum, and need to be viewed collectively.

Today I saw this tweet, that I initially violently agreed with, before realising the answer is really more “it depends”.

Now I fully agree that demanding that people write the abstraction layer before they’ve even written the first component to use the underlying tool, is a folly that leads to bad libraries. You don’t know how to best use the underlying API, and you don’t know how you want to use it, and which of the methods you want to wrap or enhance.

The requirement to wrap every ‘method’ is the main reason I dislike intermediate libraries, one time I asked “are we using this new AWS feature that’s perfect for our use case?” The answer: “No, we can’t because Terraform doesn’t support it yet.”

Any time you put something in-between you and the underlying service you’re introducing a potential roadblock. I’ll explain later how I think you can minimise this.

The main reason I think code-reuse/libraries are hard to get right is a conflict at the core of them:

  • A trivial library can be simple to use, but if the functionality is simple, what is it really adding?
  • A feature-filled library is usually (but not always) harder to make use of, and if most people only use a fraction of it, what makes it worth they overhead?

Things don’t exist in isolation…

Warning, inbound analogy: Very often “we” like to look to other counties and cite how wonderfully they do a thing. An example from the UK is that we’re often told that “people shouldn’t mind renting flats because Germany people tend to buy later.”

Which sounds great, but when you point out that Germany has a bunch of related things – longer leases, more freedom to decorate/change properties, and that they consistently build houses to maintain far more modest house-price rises – people tend to go quiet.

Returning to software, everything is similarly related and supported by other practices. If you don’t fully understand a problem, you can’t cleanly decompose it in a sensible collection of services, and only when you’ve done that will sensible opportunities for code re-use/libraries emerge. (At this point you’re welcome to argue that if you’ve decomposed your system properly then you should need to reimplement functions).

XP/Agile/Clean Code/BDD/TDD/… can become quasi-religious in how much you must adhere to all of their tenets. I suspect very few people are fully compliant with any one tribe, and to be effective as teams you need to view things are recommendations or possibilities, and not commandments that thou shall obey.

How to do code re-use right…

This is just my experience, but a few questions to ask or points that I’ve found have worked for the people I’ve worked with in the past:

  • Avoid needing them in a first place: if your transaction volume is low enough just have a dedicated service that does the particular thing… A single point of truth is the easiest way, but that isn’t always possible due to latency or cost concerns
  • Consider Security/Auth/Data-protection first: These are things that you need to create decent libraries/patterns for, because if the easiest thing is the right thing, you’re going to be making fewer critical mistakes, and it can make patching easier if you’re exposing a consistent interface but have to update an underlying library with breaking changes
  • Judge the demand: While many times people can be “wow, I didn’t realise I needed x until it appeared” unless it’s really obvious that lots of people have the exact problem, do you really need to write a library?
  • Understand it before you abstract it: Don’t write them first. My ideal preference is that when you have a few teams working in the domain, let them create distinct implementations. Afterwards, regroup and use that learning as the basis for a library. This is more work, but the result will be much better
  • Keep the library fresh: Is it one person’s job? Is it a collective whole-team effort? A library needs to be a living thing as the systems it interacts with will change. Developers will rightly shy away from using a clunky piece of abandoned code
  • Layer up in blocks: a client has a back-end system with specific authentication requirements and has been building out client libraries. There are 3 distinct libraries: connection management, request building and result parsing. You didn’t have to use all of these, and can just use the connection library if you want more direct access
  • Make your library defensive but permissive: TypeScript has got me back into typing, but previous experience makes me nervous. In micro-services environments a library update can require many unrelated deployments, when only be two components are functionally involved. Errors because enums aren’t valid can be useful, but can you expose the error when that property is accessed rather than parsed?

In summary…

Teams need to find their own path, and find where on the line between “Don’t Repeat Yourself” and “Just Copy-Paste/Do Your Own Thing” they lie. It is highly unlikely to be at either extreme.

“It Depends” isn’t a particularly enlightening answer, but like so many things about building decent products, it is what it is.

On THAT Excel Issue

What can we actually learn from the government’s Excel related issues?

There have been many comments posted in the last week about “excelgate” or whatever we want to call a life-threating data exchange problem. This post is not about absolving the government of blame for this, or the countless failings they’ve made across the Test & Trace programme. Between the app that everyone who understood iOS Bluetooth told them wouldn’t work, giving the bulk of Contract Tracing to private companies not to local health teams… I’m really not excusing them.

But, I do think there are more naunced lessons that can be learned beyond “LOL WOT THESE N00B5 USING M$ EXCEL. Y U NO PYTHON?” which is an exageration, but not by much, of some of what I’ve seen online.

I’m writing this based on the following assumptions/guesses: Data had to get from one system, to another – and .xls not .xlsx was used, this hit a row limit. (This really should have been an automated feed, but that’s not what I want to explore here, I want to explore how organisations can prevent people doing ‘good’ things)

So, we’re using an inappropriate data transfer format, with a hard limit of how many rows it can contain… This sets up a few different scenarios:

  • Nobody foresaw this problem
  • The problem was known, but the decision was taken not to fix it
  • It was known, people wanted to fix it, but couldn’t

If we explore these, I think there’s some learning we can take away for organisations we work for or with, about how some of our anti-patterns might lead to scenarios that put us into them.

Nobody Foresaw This

This would be the most damning of the outcomes. It was a risk that nobody had realised that they were living with, and crucially that the software doing the export didn’t warn you about.

Tips to avoid it:

  • To borrow from the WHO: Testers. Testers. Testers. Hire decent testers, the one who infuriates you with “What if this series of 3 highly improbable events happens?”
  • As we’ll come onto in a second, listen to them when they say these things.

It was known about, but decisions were taken not to fix.

These aren’t fun, especially as someone who predicted a particularly nasty auto-scaling bug one time, tried to warn people, but it wasn’t accepted that it needed to be fixed until it occurred, it can always leave you feeling “if only you’d argued the case better”.

But it’s legacy…

Matt Hancock, the UK Health & Social Care Secretary, described the system as (paraphrased) “Legacy and being replaced”.

We’ve been here, a system that is old, being replaced, is considered frozen because “it’s going away”. However, I know of systems that were due for replacement in the next 6 months, but 3+ years later development hasn’t started. This was used as a reason not to do relatively trivial UX changes, that could have been a great improvement to the operators.

Tips to avoid:

  • Until you unplug the server, turn off the instance or stop new data flowing into it, no system is “legacy”

“It’s very unlikely… we can live with it”

Nobody, apart from epidemologists and software billionaires, predicted a future epidemic on this scale – so I guess that maybe the problem was known, the decision could have been taken to live with it. Going back to the first recommendation and hiring a tester, sometimes so many scenarios are found, it’s easy to tune out because like Cassandra, the tester is always talking about problems.

Tips to avoid:

  • It’s ok not to fix everything, but if you’re living with a risk, make sure it’s known, and doesn’t fail silently.
  • Keep it in your risk log, and actually re-read that once a quarter and assess if they’re now more of a problem.
  • Try to be a little less agile, at least in methodological purity, and go beyond “what we’re building next” and look a few steps ahead.

We wanted to fix it…

This is when we get into some of the most depressing collection of scenarios:

“You can’t just make a change, this needs a PROJECT”

Changes need to be properly developed, tested and deployed, but sometimes this doesn’t need a full project structure created. When all improvements are painful to implement, people just accept and build workarounds, some of which you may not be aware are in place.

Tips to avoid:

  • Have a lightweight process for “short-order” requests that are small.
  • Find ways to bundle these into bigger releases alongside the “im-por-tant” work.

“It’s too expensive”

If you have a bad contract with your supplier, it could just cost too much to viably fix.

Tips to avoid:

  • Only buy software/services where the API is included, and is nice to develop against (I’m looking at you SOAP)
  • Have clear boundaries in your systems/components, own the integrations yourself, so you can swap components or combine as required

“The person who develops it is too busy/gone away”

You could imagine that if this system was modifiable, that right now the people with IT skills are maybe elsewhere working on the other plethora of systems that have have to be spun-up to cope with the current situation.

Worse though, is when software has gone-stale and while you maybe have developers who could work on the problem, nobody really understands how to build/deploy it anymore, it’s effectively stuck.

I’ve worked with clients who had problems with code going stale, and instituted very strict “if you modify a component you must fully adapt it to be inline with our current standards” to fix this. However, this just introduced a disincentive to make minor changes to improve things, because the developers knew that alongside 5 lines of functional code changes, they had to make 500 of dependency related changes.

Tips to avoid this:

  • Avoid one product/system/component being solely one persons ‘thing’.
  • Find ways to allow people to deploy minor changes as a BAU process, gradually updating components into modern ways of working without dogmatically requiring every component to be fully updated.

In conclusion

We’ve all used excel files or CSVs in email, or a google sheet as an interim solution. The problem is that these interims become permanent and eventually they stop working. I’m lucky in that mine were about keeping TV or VOD on-air, and not about life or death statistical reporting processes.

But still, let’s tone down the sneering “BUT WHY WASN’T IT AUTOMATED” talk, yes, it clearly should have been, but none of us know the decisions being made, or the available software hooks that the operators/developers had access to.

Always monitor your systems, spot where things can be better and make the incremental improvements because they add up over time. Never invest all your hope in the new system/rewrite because they’re always years away, and usually come with their own new ‘quirks’.