Just because a number is long, doesn’t mean that it’s secure…
For many years, despite repeated requests, a South African bank has been sending me bank statements.
The thing is that I don’t live in South Africa, have never visited, nor banked there. But I do have a particularly “good” email address that gets a lot of misdirected emails… I usually don’t read them, but this week I did, as I replied for the latest attempt of “please stop”.
The email included a PDF statement, password protected with the South African ID number of the customer. I suspect that protection is why the service centre seem so unbothered by the repeated requests to stop this information.
A few years back, I got very embedded in PII/GDPR. We were designing a data warehouse setup that allowed analysis of user activity, while protecting privacy, and enabling easier compliance with GDPR deletion requests. There was discussion about the feasibility of SHA2 hash reversal… and we took a lot of time to communicate the infeasibility with the legal team.
So this week, I started to wonder: If I were a bad actor (which I am not), could I feasibly crack this ID with some Python?
What’s the challenge?
There are three sets to consider in this:
The Absolute Keyspace: without any knowledge of the identifiers, the number of combinations?
The Available Keyspace: if the ID has constraints, how many valid combinations are there?
The Effective Keyspace: if we know anything more, how many combinations are applicable?
A good security identifier should make discerning the differences between these difficult: it should involve a reasonable amount of calculation before it becomes obvious that an ID is valid, and correct.
What do we know?
South African ID Numbers are 13 decimal digits.
A single decimal can be 0-9, ten in total, and so each digit has a a cardinality of 10.
(Cardinality being the maths word for ‘number of choices’ in a set)
We can use this information to calculate the absolute keyspace:
Graphing this is really hard, differences in scale make it really difficult to communicate.
Since I’m not a data-vis genius, this uses a log-scale.
Python to generate Graph
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
data = df['Count'][::-1]
data.plot(kind='barh', logx=True, title="Keyspace Reduction\nnote the log-scale",ax=ax)
vals = ax.get_xticks()
ax.set_xlabel("Keys in Keyspace")
This analysis shows that in common with recent attacks on the Tetra encryption system , if you’re not using all of the absolute keyspace, your protection is far weaker than may appear from a big number.
These national/structured IDs do not make good secrets: the structure inherently reduces the size of the effective keyspace, and makes it very easy to exclude ranges of people (by age or gender).
While phishing is a problem, and emails need to be/appear authentic – we need to use mechanisms to achieve this at the email level: SPF, DKIM, DMARC, BIMI. While imperfect, these are far better than including information directly related to the ID/information being protected.
In this scenario even with a naive implementations, it would be entirely feasible to brute-force this particular email/pdf combination, which would expose customer information.
Now I don’t know how valuable that information in the statement is, but I wonder if it be used as part of a social engineering attack?
A plea to companies: If I message asking you to stop send me statements, maybe stop?
Getting past uncertainty or unwillingness to use IPv6 is actively costing you money.
I (mostly) have IPv6 deployed in my home network, alas a bug on my router currently prevents it working completely, but for years it’s been enabled, and mostly as a novelty to feel like I was ready for the grand future – but without really noticing any differences.
Last week however, I had a chat that made me realise that using IPv6 is now a cost-saving measure… and it’s maybe time to get over our resistance to do it by default.
The Joy of NAT
Typically you deploy an AWS VPC using internal IPv4 addresses, and (if you’re as allergic to avoidable self-managed service as I am), a managed NAT gateway.
As with nearly everything AWS, there’s an hourly cost, and also charges for the volume of data transferred.
In an AWS group I’m a member of, someone asked “We’re paying a lot for NAT, what can we do” – and I said “I mean, you could try the IPv6 Egress Gateway, but I dunno if the APIs you’re using support IPv6”.
The Egress only IPv6 gateway only charges ‘standard’ egress rates and it has no rental cost. I’ve been aware of its existence for years, but have never deployed it or had reason to.
I was expecting to be told “Only one of my APIs supports IPv6” but the person reported back “Actually nearly all my APIs are on IPv6, but I can’t deploy the IPv6 easily because of <reasons>”.
This was not what I was expecting: despite, remembering that for the last few years wherever I can deploy a dual-stack endpoint, that I have been…
So why haven’t we
There are a few reasons why we haven’t been deploying IPv6 more routinely, and we should try to change these:
IPv6 didn’t have an advantages. In most cases the IPv4 setup works ok… So what’s the point of adding it to a working setup…
The deployment tooling doesn’t support it – this was the case here, the person was using the lovely AWS CDK to deploy their VPC, and the current VPC ‘construct’ doesn’t support IPv6 easily
We like NAT’s Security by default: With NAT, your compute resources aren’t exposed to the internet, all that endpoints see are 1 or 2 shared IPs… There’s no way to accidentally open inbound connections, and that security by default is pleasing. Even though the IPv6 Egress Only gateway doesn’t allow inbound connections, you do feel more exposed, and so much of security is an annoying vibes type thing
Some steps to take
It’s sometimes easy to forget that we have run out of IPv4 addresses, and that we’re muddling through with Carrier Grade NAT and other annoying technologies that work, but make everyone’s lives just that little bit worse… But we can improve this incrementally, and anyone who’s worked with me knows I love making things better gradually.
Enable IPv6 on any managed services you use by default. If you’re running CDN hosts on CloudFront, or an API on API Gateway, those can all run IPv6, and are running it outside of your VPC. There is no security risk to you, and by creating those, you give other people benefit
Enable IPv6 in your VPC, but just on your public subnets. If adding IPv6 makes you nervous, start with ‘just’ doing it on your public subnets, and use it to given your load-balancers IPv6 addresses. This makes life better for other people, and gives learning for the next step
Enable IPv6 with Egress only gateway, for any subnets that have NAT: this is where you start saving money, as hopefully you’re NAT’ing less traffic and saving on those charges
Not every use-case will save money, I haven’t generally had the patterns of bandwidth usage that would benefit from this… but if you’re using a lot of NAT bandwidth, maybe it’s time to look at IPv6 as a potential cost-saving.
When you’re solving a problem, you look at what the managed services that you have available, considering factors like:
Your teams experience with the service
Limitations on the service, and what it was intended for, against what you’re doing
What quotas may apply that you hit
How the pricing model works
While pricing for managed-services is generally based on usage, sometimes specific costs will apply more to your workload, e.g. if you’re serving small images, you’ll be more impacted by the per-request cost than the bandwidth charges.
I would be surprised if an experienced architect hasn’t faced a situation where “Service X would be perfect for this, if only it didn’t have this restriction Y, or wasn’t priced for Z”.
We’d built out a system that was performing 3 distinct processing steps on large files.
The system had built out incrementally, and we had the 3 steps on three different auto-scale groups, fed by queues.
While some of the requests could be processed from S3 as a stream, one task required downloading the file to a filesystem, and that download took time.
The users wanted to reduce the end-to-end processing time. Some of the tasks were predicated on passing prior steps, and so we didn’t want to make the steps parallel.
Attempt 1: “EFS looks good”
We used the ‘relatively’ new Elastic File System service from AWS… The first component downloaded the file, subsequent operations used this download.
This also had the advantage that the since the ‘smallest’ service was first, you paid for that download on the cheapest instance, and the more expensive instances didn’t have to download it.
We developed, deployed, and for the first morning it was really quick… until we discovered that we were using burst quota, and spent the afternoon rolling back.
Filesystem throughput was allocated based on the amount stored on the filesystem, but as this was a transient process, we didn’t replenish it quickly enough, and didn’t like the idea of just making large random files to earn it.
Now you can just pay for provisioned throughput, perhaps in a small part because of a conversation we had with the account managers.
Attempt 2: “Sod it, just combine them”
The processes varied in complexity, there was effectively a trivial, a medium, and a high complexity task… So the second solution we approached was combining all the tasks onto a single service… the computing power for the highest task would zoom through the other two tasks, and so we combined them into what I jokingly called “the microlith”.
We didn’t touch the other parts of the pipeline, or the database, they remained in other services, but combining the 3 steps worked.
What did we gain
The system was faster, and even more usefully to operators, more predictable.
Once processing had started you could tell, based on the file size, when the items would be ready…
Much like “lower p90 but higher maximum” feels better for user experience, this consistency was great.
What did we lose
Two of the three components had external dependancies, and this did mean this component was one of the less ‘safe’ to deploy, and while tests built up to defend against that… the impact of failed deploy was larger than you’d want.
There are always times when breaking your patterns makes sense, the key is knowing what you’re both gaining and losing, and taking decisions for the right reasons at the right times.
Prime video refining an architecture to better meet scaling and cost models, making it less “Pure”, isn’t the gotcha against these services that some people would have you believe.
Here’s the one wild secret they don’t want you to know.
Please forgive me the horrible SEO title and URL…
I recently ported a number out of Skype UK to a much cheaper SIP VOIP provider. Skype served me well for a number I really just needed for compliance reasons… and it seemed obvious that I should actually transfer it to a supplier that would cost me about 12GBP per year, rather than 57 EUR as I was paying.
Number porting in the UK is a magical mystery tour, for all the times it “just works” other times you’ll feel like you’re playing the worst possible game of DND, with a horrible enemy and a disinterested Game Master.
Skype initially said “you don’t need to do anything special, just ask the new provider to port in” but turns out there was this one secret trick they didn’t think I’d want to know…
Skype expect your ‘surname’ as provided by your new supplier, to actually be your Skype username.
I discovered this after two failed porting attempts (for which my new super cheap provider charges me, which I understand, since the ongoing rental is so cheap).
I had asked Skype multiple times what to do: before the first port, after the first failure, after the second failure.
Most of these interactions were pretty infuriating “Porting is done by your new provider… it’s nothing to do with us” – despite that fact that My New Provider asks Skype to port, and Skype then says yes/no… They are pretty involved in the situation.
Skype wouldn’t tell the new provider what was wrong, beyond “the surname didn’t match” – This is entirely correct, because since Skype UK don’t seem to implement any porting code/verification scheme, for security they can’t and shouldn’t tell the port destination…
However they wouldn’t then tell me, via an authenticated channel, what they were expecting. Merely “you did it wrong try again… porting is nothing to do with us.”
It was only when I then went on another chat, talking about going to the ombudsman, and generally being a pain, did the porting team authorise the port, and revealed to the provider “yeah, the surname needs to be the username”.
This is not mentioned on their documentation. Or provided in live-chat because it’s seemingly a rare occurrence.
So if you’ve come to this post, having had problems porting out, try again telling your new provider that your ‘surname’ is in fact your username.
As someone thoroughly locked into the Apple ecosystem, until recently I’ve not been able to easily ask to BBC Radio services through my HomePods. I had to follow some Reddit posts to install shortcuts “hey Siri play Bbc 6 music” and suddenly I find myself listening to more BBC Radio as a consequence.
Previously I think you had to listen to stations ‘enough’ for Siri to recognise the activity, then it could be a suggested shortcut. It was a bit ugly, and down to how intents originally worked.
The Siri APIs have got more developed over time, and now it is possible to do this in a way that doesn’t require upfront declaration.
It’s funny then, that given the choice of “adding a play with Siri intent to BBC Sounds” or “delaying podcast release to open platforms”, that Auntie chose the latter…
This has the strange result:
The BBC’s first party actions have me listening to less BBC Podcast Audio – why listen to a topical podcast 4 weeks later, and having to remember to check BBC Sounds doesn’t match my workflow
Actions by a Third-Party, have me listening to more BBC Live audio
I know there are always backlogs, but Siri intents have been around for a while now…
Having installed some invasive ‘online proctoring’ software, I tried to ask the company how to uninstall it.
I was, finally, getting around to doing AWS certifications, and one of the way you can do that is via the Pearson OnVue proctoring system. You run software that limits what you can use your machine for, stops you using multiple windows to look up answer. It asks for a fair bit of access when you run, unsurprisingly.
I installed it, I ran the test to see if it worked, and shortly afterwards was having problems with my computer and the clipboard – and wondered if that was a ‘feature’ of this software.
I have since resolved these separately, but I wanted to uninstall OnVue, but there is NO detail of how to do this on the website.
I have asked all their contact channels, “how do I uninstall it” – I get back a variety of canned responses:
“you can’t cancel an exam via email”
“if you run a test exam you can see if the software works on your system”
Now given it doesn’t look like it was an “installed” app and just an application that ran from a zip file, uninstalling it could be as simple as not running it ever again, and deleting the download.
But I don’t know if it hasn’t installed a few system extensions or similar, I’ve had a quick look and I don’t think it has but is it too much to expect a webpage owned by Pearson that says that – right now the search results for “remove onvue macintosh” are swamped by advertising pages for Mac removal software.
A document that states what to do online, an answer in the knowledge base so agents respond – is really the bare minimum.
If you make me run/install it, give me a clear way to remove it.
(This post inspired by some posts I saw on LinkedIn, and some client experiences from many many years ago)
Resilience is a useful property.
We want it in many places: In our infrastructure from floods or traffic spikes, in our organisations from attacks by bad faith actors, and within ourselves from unexpected or exceptional events in our lives.
Now, in the last few years *waves hand* we’ve had quite a bit going on, and many of us have had to call on that resilience reserve more than usual.
Maybe as a results of that, or a general awareness of mental health in the workplace, it’s now something that’s being taught to employees.
While I think that is a good thing, I think that has the perception to be weaponised.
A resilient worker, or more likely a team, has the skills/headroom/reserve to cope with a “once in an N” event, every “N”. So a “once in a week” exception every week, a once in a month exception every month, etc.
I worry that bad managers and teams, will weaponise that resilience, and expect teams to be resilient against all events, even if they’re facing a ‘once a year’ event every month.
Exceptions aren’t avoidable. Things do change, go wrong, go better, go worse.
You can’t avoid all exceptions, and those are the ones that will draw on the “resilience reserve”, but if your team is constantly facing exceptions that are caused by poor coordination or planning – you’re wasting that valuable resource you should be keeping for elsewhere.
Giving your team the tools to be resilient is great, but you’re not giving them invincibility.
Software that isn’t “safety critical” can have real-world impacts.
If you’ve been working in IT for as long as I have been, you’ll maybe remember this wonderful example of legalese:
NOTE ON JAVA SUPPORT: THE SOFTWARE PRODUCT MAY CONTAIN SUPPORT FOR PROGRAMS WRITTEN IN JAVA. JAVA TECHNOLOGY IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED, OR INTENDED FOR USE OR RESALE AS ONLINE CONTROL EQUIPMENT IN HAZARDOUS ENVIRONMENTS REQUIRING FAIL-SAFE PERFORMANCE, SUCH AS IN THE OPERATION OF NUCLEAR FACILITIES, AIRCRAFT NAVIGATION OR COMMUNICATION SYSTEMS, AIR TRAFFIC CONTROL, DIRECT LIFE SUPPORT MACHINES, OR WEAPONS SYSTEMS, IN WHICH THE FAILURE OF JAVA TECHNOLOGY COULD LEAD DIRECTLY TO DEATH, PERSONAL INJURY, OR SEVERE PHYSICAL OR ENVIRONMENTAL DAMAGE.
Windows NT4 License agreement
It’s a pretty good example of where our minds tend to go when you think of “safety critical” systems. I tend to also think about things like complex train automation systems or the Therac-25 radiation therapy machine.
All things that are complicated, but are generally grounded in physical interactions with machinery, machinery that has high energy, or that interacts with other humans.
This came to mind because once again the Post Office Horizon Scandal, one of the biggest miscarriages ever in justice in the UK, is in the news.
If you weren’t aware, the system was buggy, could cause the branch to have massive shortfalls, giving postmasters three options.
Make up the loss up themselves, and hope the problem didn’t happen again
Report that their accounts balance, which was an act of fraud
Try to report to the Post Office which would be unhelpful at best, or began an investigation at worse
The results of this were bleak:
People were wrongly convicted of fraud/stealing from the Post Office
People were wrongly imprisoned
Some people ended their lives in the immense shame of being someone who stole from their local community
In hindsight, that looks pretty safety critical… lives were materially changed, damaged, or extinguished.
What’s worse is that people from the software vendor, and the post office claimed that the system was robust, that remote access wasn’t possible – at the same time as planning remote access to resolve issues caused by known bugs.
The latest BBC Radio 4 program on this (after an amazing series), had an instance where a Post Master lost his branch due to these bugs, a new owner bought the shop, only to then experience the same bugs. The helpline gave the same line “Nobody else is reporting these problems” which sounds highly unlikely to be true.
Sure some senior people at the time have stepped down from their non-exec directorships.
In my view this is either negligence as they should have done the due diligence to ascertain that the system was generally robust.
Software is everywhere.
Ovens have Wifi, cars have highly complex computer vision, human bodies have attachments controlling insulin flows. People had artificial eye implants to help them see, that the manufacturer no longer supports.
Whistleblowing is a painful and sacrificial act for the person who does it.But if you see people from your company, testifying in courts of law that “there are no problems with the software” (an impossible situation in all by the smallest of programs), we need to provide better ways to help this information surface.
Maybe if defence teams were better briefed, a statement like that could be countered with “No problems? Cool, we’ll verify that with an extract from your Jira instance” or “a third-party code review wouldn’t be a problem”?
I don’t know the solution.
I’m not a lawyer, I’m not an ethicist, I’m not someone who typically works in these kinds of environments – but I do know that lives were lost due to an accounting system being buggy.