steveheadshot.jpg

Hi.

Welcome to my blog. I talk about all things tech & leadership.

Is Software Getting Worse? (Part I)

Okay, folks. Buckle up - this one’s longer than usual, but I think it’s about something we all have feelings about.  Even stronger feelings than I have about how portrait-orientation monitors freak me out.

Okay, folks. Buckle up - this one’s longer than usual, but I think it’s about something we all have feelings about.
Even stronger feelings than I have about how portrait-orientation monitors freak me out.

One of the things that you absolutely cannot tell about me by seeing me is that I spend a little over an hour exercising every single day. I usually start my day shortly after getting up by banging out some cardio on the elliptical or exercise bike, and I’ve been doing this for years. In fact, as I write this, I am on a 725-day streak of no skipped workouts. (Don’t ask me what my goal is. I stopped having a streak goal a year ago – and then got locked in my house for a year.) Of course, if we’ve met, you know that instead of looking like Dwayne Johnson, I more closely resemble Patton Oswalt. Such is the joy of genetics.

Regardless, I think exercise is important and so I do it despite absolutely hating it. Part of why I exercise every day and push it hard is because I need the victory. I need to win the battle with myself every morning as to whether or not I’m going to get on that machine because I don’t want to. I never want to. Some days the battle is harder than others, but I do it anyways because it sets the tone for the day. If I win that battle with myself and get into gear, then I can win the next one when the urge to be lazy crops up the next time.

Anyways, while I’m working out I find I need to distract myself otherwise I just spend the hour inventing new profanity. So to distract myself, I listen to podcasts or watch documentaries or movies – usually action movies. Recently I was watching the Spielberg classic Jurassic Park. Between all the dinosaurs and running around, I was getting a kick out of the “futuristic” technology they were using to run the park, and I caught myself longing for the days of more reliable software. I mean, sure the park went offline, but that’s because Denis Nedry put some malware into the system. It didn’t crash. In fact, it did every single thing it was supposed to do without a hitch, and it was running a theme park that “spared no expense.” Of course I don’t think software quality was Spielberg’s top plot point, but it’s what I took away from the film. That, and velociraptors are jerks.

Look at this thing. Total bastard, amirite?

Look at this thing. Total bastard, amirite?

After some reflection about my fixation on the tech, I came to realize that I’m getting more pessimistic about software quality and reliability in 2021 than I was in 1995 – and then I thought about how crazy that is, and yet it’s true. In 1995 I got hold of a copy of Windows 95 (if you don’t remember that then show this blog to your parents and ask them how operating systems used to come on CDs) and I installed it without hesitation. And you know what – it worked. I didn’t have any problems. The same thing was true with Service Packs that came out; I’d just toss those things on the second I could get ahold of them – and they worked. Even when I got into IT professionally, I rarely found bugs except in really esoteric situations that I could understand why it wasn’t caught in testing.

I just don’t think that’s the case now; from my vantage point, software seems to be buggier than it used to be. I’m not exempting cloud or SaaS solutions from this, either. Far from it – I think they may actually be even less reliable from a customer perspective than IT-installed applications because at least when apps are installed by an IT team someone likely is testing out your actual use case before it’s deployed to production or pushed out to laptops across the company.

In fact, just last week Microsoft suffered a major outage to 365 that effectively shut their 365 cloud down. And then later in the week, reports starting coming up of a possibly related bug in OneDrive and Sharepoint that was just deleting files, which strikes me as a big deal given that SharePoint’s primary draw is that it you know.. holds files. These things are bad enough in a vacuum, but if you’ve talked to anyone who manages a large (over 20,000 users) M365 environment you’ll hear pretty often that “weird stuff” happens that no one can explain.

Sometimes stuff just goes wrong and I guess we’re okay with that now?

Look, I’m not throwing shade at Microsoft. As much as I am a huge proponent of VMware, they are not immune either. Very recently, their anxiously awaited update to their core virtualization platform vSphere was released. Almost as soon as ESXi 7.0 Update 2 was released, they had to pull it and recommend customers not upgrade due to a fatal error. And you can almost set your watch to complaints about a new version of iOS for a few hours after it releases. When 14.4 was released, some people just got dumped off WiFi – no big deal for a mobile device, right?

All three of these major software companies are giants in the industry, and yet fell victim to major, show stopping issues in their most important and highest profile offerings. These aren’t tiny little widgets that are being offered out of the back door of the company – they’re massive multi-billion-dollar applications and product lines. And yet, here we are with major issues immediately after release. You simply couldn’t run Jurassic Park like this; the dinosaurs would be eating the tourists even without Nedry’s malfeasance.

By the way, it turns out that I’m not alone in this observation. I launched two extraordinarily unscientific polls recently -one on Twitter and one on LinkedIn - and both came back with overwhelming sentiment that software is getting less reliable. In fact, they both came back close to 3:1 saying that people feel software is in fact less stable than it used to be. Just look at the results, here and below. I don’t think that that sentiment is just misguided frustration; I think there’s real life experience going into it, and I think we need to examine it a bit for the health of our industry and then see what we can take away for us - even if we don’t run a software company.

Let’s start with trying to figure out some of the reasons it seems like software is less reliable than it used to be. In fact, I may go so far as to drop the “seems like” and just rest on the notion that software is genuinely less reliable than it used to be. So why is it like that and how did we get here? I think there are three distinct causes of increasing displeasure with software stability: complexity, time pressure, and perspective.

I’m not alone in this sentiment. Software QA doesn’t seem to be as trustworthy as it used to be.

I’m not alone in this sentiment. Software QA doesn’t seem to be as trustworthy as it used to be.

Let’s start with the easiest to digest for most people, which is complexity. Simply put – applications are just doing a heck of a lot more than they used to. Back in the day, you’d write one application that would work on one platform (say Windows 95) and that was pretty much it. Today, more diverse platforms mean more to compensate for and code around. If you want to have feature parity between the Android and iOS versions of an application and have the website work the same way and maybe drop a fat client on Mac OS and Windows - you’re just covering more surface area. This is compounded when you talk about enterprise technology; the variety of servers and their various components, storage arrays, and network topologies means an effectively infinite array of possibilities that developers have to account for – and that’s a tall order.

Put simply, more complex applications mean a larger codebase. There’s no question that even ubiquitous applications like Outlook are just more complex than they used to be. These more complex packages require larger development teams to write them all. That of course means more collaboration is required among more people that have to stay coordinated and introduces the possibility of confusion and miscommunication.

That’s all simple to understand, but the question that just keeps nagging at me is: shouldn’t we be getting better at developing software than we used to be? I mean, it’s 2021 – shouldn’t we be masters of writing excellent code because we’ve had thirty years of practice since the early 90s? Well, apparently not. As it turns out we may have made things worse because we have decided to chase this ideal of continuous delivery of code. In order to get changes shipped out to production as quickly as possible, most software companies have invested heavily in automated testing and I don’t think it’s as deep or thorough as it needs to be to catch all of the issues with new code – even really big, really visible issues.

Ultimately, I think that software management has become a race as to who can ship code faster instead of who can ship the best code. It seems that this is especially true in the VC space, and I’m starting to wonder if there is an echo chamber happening in software management that faster is better at all costs. Maybe faster feature rollouts really should be the goal if we are talking about a web app that helps you organize your shopping list; if that breaks, your life may not be impacted but if it happens more than once - will you look for a different shopping list application? Maybe. Isn’t that ultimately bad for the developer? Come to think of it, I’ve never seen a review on the app store that says “so glad they release new versions so fast,” but I have seen a ton that say “this thing is totally unstable.” I think that the entire industry has over rotated away from stability and more towards faster and faster continuous delivery. Simply put, there isn’t enough emphasis on quality control before the code ships. That takes time to do, and time is the one thing that software companies seem to not want to spend on development as much as they used to.

How could code possibly be less reliable when it’s this easy to read and debug?

How could code possibly be less reliable when it’s this easy to read and debug?

Now, I have an idea that some of you are thinking that we don’t have a choice anymore to go slower. Today, we need to react near-instantly when a security issue is found because bad actors will exploit it on day zero so we can’t spend a week testing. Then you may want to say to me that that the patches of yesteryear came out annually or even less frequently and that would be suicide today - and you’re right. I get it, and I have been a vocal proponent of rapid response to security patches both in my day job and as a technology evangelist. It doesn’t change the fact that I get nervous when I install them these days because I know they were rushed out the door, even if out of necessity. It doesn’t mean that the feature rollouts should be held to a completely different standard. Realistically, I do not need to put emojis in my work email so urgently that I’m willing to suffer calendar instability to get it. And before you ask: yes, that was a real issue.

Finally, before I run out of space to write in (since I promised to keep each entry brief in my very first post), I think the third reason that software feels less reliable is that we all have a different perspective on software quality than we used to have. Software used to be an addition to our life, not the central component of it. If our work computer crashed, we could get other stuff done while the computer was patched or whatever. Now, there is no other work. If we don’t have access to the software that we rely on day in and day out, our whole world is shattered. Look at the pandemonium we get when Twitter or Facebook is down – and these applications are at best superfluous. In fact, I think many people have made the case that in some cases they erode the quality of your life, but we freak out all the same when they go down.

No where is this sentiment truer than for those of us who work in technology. I run the client services teams at the university I work for, and when bad software makes it out to my internal customers we hear about it – and have to work the front lines of getting people back to stable ground even when we didn’t push the update, patch, or upgrade in question. Our customers have an expectation that their systems are going to work without a hiccup and when they do hiccup, I can tell you with absolute certainty that they do not “take it in stride” because they are utterly reliant on it to get their work done.

Marc Andreessen said that “software is eating the world,” and looking back from our vantage point at the end of Q1 in 2021 he’s absolutely right. Especially during the pandemic, software and the systems it controls have been our primary connection to the world. When they become unstable, we are well and truly disconnected from the people we love. We have become dependent upon it, and yet at the time when we seem most dependent upon it being stable - it seems to be less and less able to rise to the challenge.

Stay tuned for Part II next week, where I’ll post some thoughts on what we in IT can do about this, and how we may be guilty of some of it ourselves.

 

For now, some questions for reflection:

  • Do you agree that software is generally less reliable than it used to be?

  • When is the last time a software bug ruined your day? Could you work around it?

  • Do your non-tech friends take outages and bugs in stride?

Is Software Getting Worse? (Part II)

Is Software Getting Worse? (Part II)

Don't Under Promise and Over Deliver