Articles like this are why people think the cloud is oversold

The cloud can solve many problems, and is rightly seen as one of the easiest ways to launch web services. But it isn’t magical, and articles like this  are why people think the cloud is being oversold: The cloud is not the solution to finding missing Malaysian flight 370:

But if MH370 had been fitted with technology that made use of the cloud it may never have been lost in the first place. The cloud is a cluster of computers that provides reliable computing and storage as a service to large numbers of requests from computers with limited capabilities, such as those on board a plane or inside a mobile phone.

What the author says is really “planes should dial-back to a server with their telemetry”

This may be true, but as a comment on the article points out: that doesn’t need the cloud.

It needs a server in a data-centre. Now you may choose to deploy that server as a virtualised box in the cloud, but this is not an application where you need the main virtues of cloud type platforms.

Over and above machine virtualisation, I tend to think about ‘cloud’ meaning some combination of these  things:

  1. You scale your resources when you need them, not ahead of time. The best example of this is storage: you don’t have to pre-size your storage allocation in Amazon or Azure. 1
  2. Your application is making use of the two main scaling patterns, incoming load balancers2 and asynchronous message passing3, to dynamically change the amount of processing capacity that you have.
  3. Not a technical thing, but your costs should be scaling in line with your usage. Having the incentive to save money by doing as little as possible when you’re idle will encourage you to properly scale.
  4. You start treating your servers as livestock and not pets. If  virtualisation separates your instances from the physical hardware,  cloud deployment should separate your application from the instances.
  5. Your deployment should be cheap. It should take minutes, and be painless, and shouldn’t make your ops team bite their nails in fear. It needs to be a routine, accepted, automated process. This also requires you to have your config held in more durable places than a file on an instance, which could disappear at any moment.

The dial-back type solution that could help us find missing planes doesn’t really need many of these characteristics. The data formats would be relatively static, and the loading wouldn’t peak to such levels that you needed to place it all behind a massive loadbalancer. You’d care about reliability, but I don’t see masses of room for flexing things here.

Yes dial-back is a good opportunity to improve visibility (the ACARS data from Air France flight 440 provided a  trace of the accident), but what really could have helped us in the case of Malaysian 370, would have been that something had continued to report back position information after ACARS was disabled.

We don’t know if ACARS was disabled manually by the flight crew, or by a result of electrical systems being de-powered due to a fire. We do know the Inmarsat satellite modem was still functioning for some time, and responded to a network level ping. This only gave us a confirmation that the modem was still in range of a satellite beam, and unfortunately it was a large satellite beam which covers a wide area.

Had the plane been fitted with a newer Inmarsat system, it would have been connecting to a satellite beams with smaller footprints, which could have narrowed the search area.

What might have helped  would have been if there was another GPS receiver integrated with the satellite modem, so that even without the main ACARS system, at least position could still be reported.

That isn’t in the cloud however, that’s on the plane.

Better reporting back could have helped the investigators here, but no, that is not another solution in search of “the cloud”.

  1. Much to the cheers of capacity planners
  2. When more people hit your website, you launch more servers
  3. Instead of doing an operation when a request comes in, you put it on a queue. When the queues start growing too big, you start additional instances

Good Luck Microsoft

Microsoft have appointed Satya Nadella as their new CEO. He’s an internal hire, but from the services bit which includes Azure. Although everybody is playing catch up to Amazon Web Services, Azure has a number of features that are interesting: getting that cloud computing isn’t just about easy access to disposable servers.

Microsoft today is like the uncle who’s was great when you were a kid, got you interested in stuff, and has now fallen on hard times.

Maybe I’m just biased because I like Office (which makes me a minority I know), but I don’t want a world where there isn’t Microsoft. Google Docs is great for sharing or collaboration documents, Apples iWork is great for simple documents, and well I’m sure OpenOffice is good for something.

Microsoft Research produces so many good ideas, or clever ideas, or just the plain “hey we had a random idea” ideas. They don’t many to use that many of these, so many of them are impractical with current tech. But the ideas are there, at some level the company still tries to innovate.

That innovation doesn’t come easily however, as Windows 8 and attempts for a converged desktop/mobile/tablet interface have shown. The company doesn’t have that Apple confidence of “this is the way we scroll now”. Appeasing the fans of the legacy will not help them move on. Perhaps when the company has a better idea of what the “new” Microsoft is, selling those ideas will be a bit easier.

I may well be a Mac and iOS user now, but I think if I was going to switch phones, it would be for a Windows Phone. A bit like the Palm Pre, or Blackberry’s ultimately doomed Blackberry 10 operating system, Windows Phone didn’t feel like it started off with the requirement “be like iOS”. Android and iOS are really converging in many ways, features hopping from one to the other.

For that reason alone, I would like Microsoft to do well in the future: much like the Shuttle’s fifth computer, I think we need a strong third platform in the mobile market.

 

Management Tips from Astronauts

Being on the International Space Station (ISS) for a few months is a pretty unique experience. I’m pretty sure that nobody reading this will get to do it. Chris Hadfield, the first Canadian astronaut, did spend some time there, and I’m sure you remember his Bowie cover Space Oddity.

Anyway, he has a book out (Amusingly at Christmas I bought it for my Dad, while he bought it for me). The book manages to make space travel both more alluring, and yet in many parts tediously mundane. It’s seemingly a lot of study, luck, and waiting for your mission. Also sounds hazardous to your marriage if you’ve anything less than the most understanding of spouses.

Alongside this he highlights a few management things, from his employers, or himself, that are worth remembering here on earth.

Confessing to Near-Misses

NASA, like every safety-critical system (or at least like they should) place great emphasis on being able to speak about near-misses. About the times that something nearly went wrong, so that changes can be made before it actually happens. (I’m not aware of how much of this was in place before the Challenger Launch Decision was made.)

I don’t work in safety critical systems, I work in computers and websites. Although much less severe, we do face  similar challenges. Do you have that random configuration utility that if you feed it incomplete or invalid configuration details, will honour those and wipe out an environment?

You shouldn’t.

In an ideal culture you should be able to say “I was messing about on stage, and noticed that I could break the system with the config tool” and that the reaction should be “Oh, great, let’s figure out if we can easily fix that, and if we can pop it in a sprint” and not the sometimes standard reaction from developers, inwardly judging the operator for using the tool wrongly, while outwardly declaring “Well then you should be more careful with that tool”.

These kind of things matter: You’re not always the ‘you’ in the office.  11am at the ideal caffeination level ‘you’.  At 3am, roughly extracted from sleep by PagerDuty, you’re a lesser ‘you’.

At those points, you’re flying on instincts and adrenaline.

Systems need to be idiot proof because we can all be idiots. (And thanks to a neanderthal leftovers, I think that sometimes the smartest people can be the best idiots).

What’s the BOLDFACE for this?

Documentation and procedures are another ongoing theme. Unsurprisingly every procedure and task in space are heavily documented, because you don’t want this to go wrong when you encounter problems. To paraphrase “you should always know what the next most likely thing is that can kill you, and how to go about stopping it”

The BOLDFACE bits are the critical bits of documentation that keep you alive. Again, IT is not life or death, but your run-books and documentation should have this similar level of priority.

No operator probably needs to know everything of every system, but they should know the procedures which if done incorrectly, (or the ones that done correctly) cause data-loss or system outages.

Some years ago, I was personally stung by changes between software versions: the version before didn’t, the version after wouldn’t, but the current version had some horrible behaviour, and I managed to cause a significant outage.

So on top of your documentation, when the operations become more or less dangerous than they were, make sure that people know about the changes.

Being a Zero

This is perhaps the best way I’ve ever heard anyone talk about the problems of being the new person on an existing team. Being a zero basically means “do no harm, make nothing worse”.

Mr Hadfield correctly states that everyone wants to be a plus-one. We want to do good, think we’re doing good and be seen to be doing good. At the start you’re eager, but that comes with impulsiveness which causes problems.

He talks about some times that in that eagerness, he ended up being a minus-one, someone who made things worse. That isn’t a good first impression on earth, let at alone on the ISS where you’re about to be stuck with those people for 3 months.

His philosophy is that aiming to be a plus-one will only turn you into a minus-one, so aim to be a zero and wait until you’re more certain before you start trying to add something.

Having seen people launch themselves into teams only to fail, this is one I entirely intend to live by.

2014: Kaizen and continuous improvement

I’ve always been a bit too fond of grand new year resolutions, which basically mean I’m setting myself up with concrete targets I fail to meet exactly, and promptly abandon entirely.

This year, inspired from working around too many people doing agile, I’ve gone for a simpler take and I’m aiming for Kaizen – “continuous improvement”. Now I’ll admit that I’m bastardising the word horribly, but my resolution is just to do things that make stuff better, even if they seem small.

Right now at every gym in the country, a horde of marauding resolutioners are desperately striving to get to the gym 3 times a week. But they’re going at peak times, at the peak-time of the year, and they’ll fail because they’re going to a horrible room at its worst. It’s way less exciting to say “I’m going to go once or twice a week”, getting into the habit, and then working up; but that doesn’t give you the immediate achievement hit that misguided over-ambition does.

The hard truth is you really need to start in March when it’s a bit calmer and you’re not frustrated you can’t get any weights or machines. Come next year you’ll be so versed in knowing when and how to ask with non-verbal communication “how many sets left?” and “can we swap?” ; then you’ll be able to cope with the January hordes.

Team GB’s cycling squad were aiming for marginal gains, 1% here, 1% there – combining to something material, that material being Gold. Sure go for big wins too; but start the ball rolling with the small changes, and that 5% of improvement will put you in the place to tackle the 20% that you know will really take commitment.

The other important thing is that Kaizen addresses entire systems, not just individual items. If a car-plant repeatedly fails assembly because components are too variable – then the components are sourced with better reliability. You can’t fix it on the production line if it arrived broken.

You’re not a car plant, but you’re part of a system. Your friends that you choose to spend time with, your activities you do and where you do them. They’re all things that can be tuned a little, not just the things you directly do.

Happy 2014 everyone, and I hope it’s a one filled with many, small, incremental improvements.

Google makes VM Immortal – but how useful?

Google let you migrate machines between data-centres while they still run

While it’s a nice feature, and something that VMWare has been able to do for a while – But I can’t help feeling it’s an anti-pattern in cloud-infrastructures. Yes there are some applications that you can’t easily design as message consuming stateless data-beasts – in general to take advantage of scaling (for capacity or to money), you need to design your applications so that they can survive machine failure, be it from chaos monkey or otherwise.

Goodbye TVC

TVC

Disclosure: I’ve worked on-and-off for various bits of BBC for many years.

The closure of TVC is one of those left-brain/right-brain things: The logical “right-brain” spreadsheet lover in me looks at the old building, the amount of asbestos, the legacy cables and the fact that News, Sport & Childrens have all moved out and thinks that it’s the right thing to do.

Meanwhile the left-brain in me is screaming loudly “BUT IT’S TELEVISION CENTRE”, it’s a place of dreams, of wonder, where I was in a small room in the basement while they filmed Jools Holland above. A place I got lost deliberately at lunchtime just so that I wouldn’t get lost that one time I was running for a meeting.

The truth probably lies somewhere in-between; News being in West1, unified with world-service is a great thing for the output. BBC Worldwide and BBC Studios will be moving back in a few years. Studios 1-3 will survive.

So on this, the last day the building is still in general operation, I will think fondly of the past, and hope that post re-fit the building emerges leaner and all set for 21st century.

My friend wrote about his recreation of a famous moment: On Tap-Dancing at TVC.

A Tale of Two CEOs

When I read about the demise of HMV, there was a quote from here that rang a bell:

The relevant chart went up and I said, “The three greatest threats to HMV are, online retailers, downloadable music and supermarkets discounting loss leader product”. Suddenly I realised the MD had stopped the meeting and was visibly angry. “I have never heard such rubbish”, he said, “I accept that supermarkets are a thorn in our side but not for the serious music, games or film buyer and as for the other two, I don’t ever see them being a real threat, downloadable music is just a fad and people will always want the atmosphere and experience of a music store rather than online shopping”.

Sounded eerily familiar, and then I managed to find it:

I outlined to the Fairfax board what I described as a ‘catastrophe scenario’, which involved losing a decent chunk of their classified advertising, and they chose to totally ignore that. Roger Corbett, who was then a board member and is now the chairman of the company, he stood up at the front of the board table and he picked up a quite fat edition of the Saturday Sydney Morning Herald that was sitting there. And he held it up in front of the board members and he said to them, ‘I don’t want anyone ever coming into this boardroom again telling us that people will buy cars or houses or look for jobs without this.’ And he thumped the big fat Saturday Sydney Morning Herald on the board table.

Two companies, major problems, the same root-cause. You can’t always ignore problems in the hope that they go away or don’t materialise.

Avoiding the Barclays Cycle Hire Price Hike

Prices for annual subscription doubling in January 2012, and you can’t renew your membership to add on to the end of it… here are your two approaches:

You have a different credit/debit card and a second email address?

  1. Register a new key with an annual access period, which you don’t activate with a different email and card number.
  2. Put this away in drawer
  3. Turn off auto-renew on your existing key
  4. when access stops working use the other key

you don’t have an extra card lying around

This was given to me by the call-centre

  1. Cancel your existing access period, forfeiting what is left on it
  2. Get another annual period for 45 quid