“What will the performance be like?”

Despite the documentation, blog posts and experiences, you’re just going to have to test it…

AWS offers a dazzling array of services for similar things. Dazzling is a great synonym for “confusing”. Feels like “Speed of Innovation”, “Range of Services” and “Coherence of Offerings” is a “pick 2 of 3” situation…

An often asked question on the AWS communities I hang out on is “can we use a Lambda for that?”

People asking this are often primed with some of the problems of serverless compute, and jump to the advanced features to mitigate them:

  • “I’m worried about cold starts, do I need to pay for provisioned concurrency?”
  • “I’m accessing a database, do I need to add an RDS Proxy?”
  • “I worry about response time, do I need to enable API Gateway caching?”
  • “I worry about out-of-order”, do I need First-In First-Out queues1

These are valid questions, with valid solutions, but paraphrasing a global sportswear brand: Just Try It.

Deploy your code, and see how it performs/responds. Then apply the tricks to make it better.

Starting with the dirtiest hack of all: Make it bigger.

Lambda CPU is only controllable by memory, and I’ve been in multiple situations where there was a step-change in performance jumping from 64 to 128 megabytes. There’s tooling that can help you find the right size for your application.

It’s lazy, brute force, but takes literally seconds to change the memory and re-test.

I’m a big proponent of ‘innovation as Business as Usual’: you shouldn’t need the excuse of an innovation day to try the new things. That only works if you start simple.

But I’m an even bigger proponent of not stressing yourself/teams unnecessarily: If you don’t have time to test in your current project, do what you know works. Hit your shipping dates, have an easier sprint.

Hopefully you can have a play in the future and add something else list of what “works”, it’s nice to expand that list, but only when you’ve enough headroom to do so.

In conclusion, while sometimes I can tell you that something is unlikely to work, most of the time you’re just going to have to try it.

Sorry.

  1. I understand FIFO messaging has its place, but I work in content, not transactions. Including a ‘reliable’ timestamp in the event (e.g. the time an article was published, not the time the message was sent) is cheaper and allows consumers to identify if they should process an event, allowing easier recovery if databases have to be re-crawled. ↩︎

Prescriptive Software Practices: Code Re-use Edition

Individual software practices don’t exist in a vacuum, and need to be viewed collectively.

Today I saw this tweet, that I initially violently agreed with, before realising the answer is really more “it depends”.

Now I fully agree that demanding that people write the abstraction layer before they’ve even written the first component to use the underlying tool, is a folly that leads to bad libraries. You don’t know how to best use the underlying API, and you don’t know how you want to use it, and which of the methods you want to wrap or enhance.

The requirement to wrap every ‘method’ is the main reason I dislike intermediate libraries, one time I asked “are we using this new AWS feature that’s perfect for our use case?” The answer: “No, we can’t because Terraform doesn’t support it yet.”

Any time you put something in-between you and the underlying service you’re introducing a potential roadblock. I’ll explain later how I think you can minimise this.

The main reason I think code-reuse/libraries are hard to get right is a conflict at the core of them:

  • A trivial library can be simple to use, but if the functionality is simple, what is it really adding?
  • A feature-filled library is usually (but not always) harder to make use of, and if most people only use a fraction of it, what makes it worth they overhead?

Things don’t exist in isolation…

Warning, inbound analogy: Very often “we” like to look to other counties and cite how wonderfully they do a thing. An example from the UK is that we’re often told that “people shouldn’t mind renting flats because Germany people tend to buy later.”

Which sounds great, but when you point out that Germany has a bunch of related things – longer leases, more freedom to decorate/change properties, and that they consistently build houses to maintain far more modest house-price rises – people tend to go quiet.

Returning to software, everything is similarly related and supported by other practices. If you don’t fully understand a problem, you can’t cleanly decompose it in a sensible collection of services, and only when you’ve done that will sensible opportunities for code re-use/libraries emerge. (At this point you’re welcome to argue that if you’ve decomposed your system properly then you should need to reimplement functions).

XP/Agile/Clean Code/BDD/TDD/… can become quasi-religious in how much you must adhere to all of their tenets. I suspect very few people are fully compliant with any one tribe, and to be effective as teams you need to view things are recommendations or possibilities, and not commandments that thou shall obey.

How to do code re-use right…

This is just my experience, but a few questions to ask or points that I’ve found have worked for the people I’ve worked with in the past:

  • Avoid needing them in a first place: if your transaction volume is low enough just have a dedicated service that does the particular thing… A single point of truth is the easiest way, but that isn’t always possible due to latency or cost concerns
  • Consider Security/Auth/Data-protection first: These are things that you need to create decent libraries/patterns for, because if the easiest thing is the right thing, you’re going to be making fewer critical mistakes, and it can make patching easier if you’re exposing a consistent interface but have to update an underlying library with breaking changes
  • Judge the demand: While many times people can be “wow, I didn’t realise I needed x until it appeared” unless it’s really obvious that lots of people have the exact problem, do you really need to write a library?
  • Understand it before you abstract it: Don’t write them first. My ideal preference is that when you have a few teams working in the domain, let them create distinct implementations. Afterwards, regroup and use that learning as the basis for a library. This is more work, but the result will be much better
  • Keep the library fresh: Is it one person’s job? Is it a collective whole-team effort? A library needs to be a living thing as the systems it interacts with will change. Developers will rightly shy away from using a clunky piece of abandoned code
  • Layer up in blocks: a client has a back-end system with specific authentication requirements and has been building out client libraries. There are 3 distinct libraries: connection management, request building and result parsing. You didn’t have to use all of these, and can just use the connection library if you want more direct access
  • Make your library defensive but permissive: TypeScript has got me back into typing, but previous experience makes me nervous. In micro-services environments a library update can require many unrelated deployments, when only be two components are functionally involved. Errors because enums aren’t valid can be useful, but can you expose the error when that property is accessed rather than parsed?

In summary…

Teams need to find their own path, and find where on the line between “Don’t Repeat Yourself” and “Just Copy-Paste/Do Your Own Thing” they lie. It is highly unlikely to be at either extreme.

“It Depends” isn’t a particularly enlightening answer, but like so many things about building decent products, it is what it is.

Recipe for success

What does a team need to deliver a successful software project, starting to think about what I’ll want in my next engagement.

There’s plenty left to do, but as I approach the end of my current main assignment as a Technical Architect, I’m starting to think what my future engagements should have.

This is my starter for ten five:

  1. Anything but waterfall
  2. Genuine Public Cloud, with a hint of lock-in
  3. Internal users matter just as much
  4. Partnership with your Product Owner
  5. Embedded QA, seen as a benefit, not a drag

Anything but waterfall

Scrum? Kanban? Scrumban? I don’t really care exactly what it is, more that it works for the project, everyone understands and supports it.

I hate designing things entirely upfront, it just seems so conceited that you can genuinely design an entire system without trying to make any of it. While I know this doesn’t apply when you’re building a rocket1 or CERN, you’re not doing that, are you?

Yes, you absolutely need a sense of roughly where you’re heading, and ideally an end goal that you’re heading towards – but you also need the pragmatism to know if you try to build that from the start, you’re going to burn lots of rubber on the road, while making very little progress.

Show your dev teams that you can and do go back to make things better. Build the sense of trust that when you say “Just build the slightly-hacky ‘tactical’ thing, we will fix it later” that you do go back and fix it.

You’ll free everyone up from the performance anxiety of “Must get it right first time, because I can’t go back and fix it”.

Genuine Public Cloud, with a hint of lock-in

I would like to think that cloud is a given, but I still face people who say things like “It’s just someone else’s computer” – yes, but in general they have better capacity planning than you, or the “I could do x for cheaper” – which I’m sure you could, but you’re usually not factoring in the hidden costs.

The main system we built does have an on-premise element, but it’s controlled by the cloud, and deployed in a similar way.

We host the core of the system in the cloud, and that gives us an agility in scale and deployment we don’t have on-premise. Now, could we get that in time, I’m sure we could, but then we lose the benefits of the AWS value-add services…

“we use Amazon, but we only use EC2 and we don’t use any of their special services, so we’re not locked-in”

Speaking of which, when I hear that particular line, I want to congratulate the person on ensuring they’ve deployed their software in a way that will either cost them more, or be less reliable, or both.

At some level, to get the best value out of a cloud provider, you do need to be using their value-add services, meaning you can run bits of your application server-less or other bits as more scalable state-less systems.

Yes, if you write a Lambda, you can’t instantly port that to Google Cloud Functions, but given they both run Node, provided you put the thing that does the work in a scoped module, migrating should mean you write the Google invoking code.

I’m not saying use every service, but to start with the position that you’re just going to use Infrastructure as a Service, is too dogmatic.

Internal users matter just as much

Yes it’s an internal system. Yes it’s not public facing.

Yes it should still be as performant and usable as your public properties.

Facebook probably does more than your system. Facebook is generally fast to use, and yet nobody gets training in how to use it. If your system requires lots of training, are you doing things as well as you could?

Consumer technology and services are good. Very good. Your users expect your system to match that, and when you give people tools that work well, they’re freed from hating the system they are using, and allowed to actually focus on the tasks they’re doing.

Focussing on my current engagement, a partnership with our core users meant they took up some extra manual working, while we ran the extended migration. They only agreed to those once we had earned their trust, and they realised that “could you do this for 3 months” was just that. (granted it was more like 4 months).

Partnership with your Product Owner

Product Management is still a relatively new discipline, so there is no one-true-way, and I hope there doesn’t become one, because not all products are the same.

Regardless, partnership with your Product Owner is crucial, and if they’re technical you want to work hand-in-hand with them on key design decisions. If they’re less so, you need their trust and for them to delegate responsibility.

Embedded QA, seen as a benefit, not a drag

The embedded tester in the team is a key resource. They should ask questions, spot the things we didn’t, and invariably are a first call for “do we know what happens in situation x?”.

For all the frustration that Test Driven Development can cause when doing genuine micro-services, the testing framework that provides means that we never ship the same bug twice. Sometimes when we’ve suspected bugs, modifying an existing test have helped us check our hypotheses quickly.

Easy regression testing make you far more able to build and iterate quickly.

In conclusion

You can’t make a project be a success, but there are things you can do that increase the chances…

 

  1. And talking of rockets, look at what SpaceX have done, which looks pretty like rapid evolution of a rocket platform adding more capabilities…

Re-use more than code?

“You can just re-use the code from x, can’t you..?” is a common call in organisations, but does it always make sense?

I’ve been working on a project recently, and when it started, we were “just going to use the components from <another project>”.

You’ve written many lines before, so why wouldn’t you re-use them? In the abstract it seems a pretty sensible thing, but it rarely works so much in practice.

It’s unlikely your company is writing something as fundamental as a security library where the domain is fixed or as universal the company Active Directory, where you only need one.

What you likely have done is a series of tactical solutions that meet the needs of each silo, which isn’t a bad thing because they’re probably bits of code that were delivered. How often have we waited for the ‘generic’ solution that didn’t really work for anyone.

Now I’m not saying that where it’s genuinely re-usable, you shouldn’t avoid code re-use. If the domain is simple and generic enough, converge on one library. But code isn’t the only thing you can re-use.

Going back to the specific example, I spoke to the architects from the project that we were just going to lift-and-shift; and we discussed how the new things that AWS launched made much of it moot, or far more heavyweight than you’d build if you were starting from today. “You could re-use this, but why don’t you look at doing that” was the outcome.

Instead the value came from, speaking about the things that they couldn’t (feasibly) change now, but would want to, “We have too much data in this account, and we can’t ever move it”. We used those as a basis so we didn’t end up in the same situation.

Experiences and things learned along the way, are just as valuable as avoiding writing some code.