Is Software Getting Worse? (Part II)
Last week, I spent a lot of time talking about why I think that software quality seems to be getting worse than it used to be. If you haven’t read that post, I’d encourage you to do so because this one will presuppose that you have either read it or are just generally up on software development and challenges around the space.
To continue the conversation, does any of this matter to you if you are not employed by a software development organization? Is this a problem that you have a stake in? I think it does and is, and I think we need to be honest and vocal about it for two reasons. First, it impacts you as a user. If you’re an enterprise IT pro, this impacts you every single day. In fact, if you work for a company in any capacity that uses computers for any task (which is effectively every company), this likely impacts you pretty routinely. Second, I’ve already seen that some of the pressures we talked about last week that impact software development are having a similar impact on my infrastructure team. I want to unpack both of these things for you in the paragraphs ahead, and then I’d love to hear what you think. Leave a comment here or in the comments of the social media platform you found this post in with your perspective.
Let’s first deal with my perspective as a user and customer of a great many software products. Chances are you have a similar perspective since you’re reading this on some sort of digital platform. Where the rubber hits the road for me on this topic is the user experience. Ultimately, if we or others are rushing changes to our apps out the door before they are fully baked, users are going to get shortchanged by using buggy software that they will quickly learn they cannot depend on. Just recently, I was picking up some honey at a very small local honey business and overheard the owner complaining that the iPad-based point-of-sale system they were using was crashing again “ever since the update yesterday.” Sure, it’s not a global outage to Azure, but for that honey farmer it was a problem: he couldn’t accept any payment other than cash until it was fixed – and that cost him real revenue as I saw someone leave who didn’t have any cash.
This is true internally as well, right? Even if you aren’t selling your software on the market for real money, if you’re writing and providing something you’re doing it to help serve, support, or satisfy a need of your organization. For instance, if you work in a warehouse and someone in your company made a small, homegrown application written in some no-code platform that handles the inventory for your team, they may not be directly selling that solution but they certainly have internal customers (like you) and I hope their goal is to keep folks like you satisfied with the solution. If they pushed some sort of update or patch for it and it ended up randomly removing items from your inventory, you’d let that happen maybe once or twice before you realized you couldn’t rely on the solution anymore.
My overarching argument here is that if the creators of apps and systems care about customers’ loyalty and continued support in either cash revenue or high customer satisfaction values, they absolutely should not want us to learn that we cannot trust what they’re putting out. And yet, it doesn’t seem like many of the larger organizations care too much about reliability, because if they did it would be getting better rather than worse. I’m a big believer that if a company or leader really cares about something then the output will ultimately reflect that. If you own a restaurant and you care about cleanliness – it will be a clean restaurant. So, why don’t they care? I think it’s because we’re buying their stuff anyways. We're continuing to pay them or use their products and so there’s just no incentive to get better.
Now, let’s be real for a second – this may not impact a company with absolute dominance in their space, and substitutes are difficult to come by or undesirable. For instance, let’s say that Office 365 has enough issues to make you know that you cannot trust it. Most enterprises are still very unlikely to switch to Google Apps for productivity software, and even less likely to adopt something like OpenOffice. Look, not to get all Porter’s Five Forces on you but Microsoft, AWS, and several other companies have a massively defensible position and that’s by design. In some cases, if the software or service you’re using is terrible you still don’t have a ton of other viable options. This is true even for the home user. If you don’t believe me, go read reviews of Quicken online. People hate that app, but use it (and keep paying for it) anyways because the next best thing is a web app that does half as much as Quicken.
So, what do we do? Well for one, we need to be less reticent to switch or start driving harder deals at the original purchase. One of my favorite models for contracts is one often leveraged for public works or large capital projects like new buildings where the contractor is on the hook for financial penalties if they miss a deadline. I don’t know why we can’t start working towards models like this in software contracts. If a vendor causes an interruption to my business because of their unstable software – I think that vendor should be liable for that. Why is it always exclusively my problem when I’m the paying customer? If a plumber came in and didn’t solder my copper lines properly and it leaked right after they left, they’d be liable for that under their insurance. That’s why you should always find licensed and insured contractors. Why do we accept less from software vendors that can literally stop our companies from functioning when all we can do is call support and beg for help? Don’t get me started on the current state of tech support, by the way (that’s a whole future topic).
The second issue is directly related to the increased frequency of updates and feature releases, which many people have commented is the primary root cause of less QA going into software. As we talked about last week, there is a constant pressure to release updates and patches faster and faster to meet some unseen market force where people give up on your company forever if you don’t update your app nightly. While I don’t buy into the premise, the reality is that this force is pushing deeper and deeper into organizations – and not just software development groups.
One of the projects that I’m proudest of at my university is our rollout of a virtual workspace for our students which leverages VMware Horizon. It has been a great resource for our students to use to get their work done when they are not on campus or near the labs. While this isn’t exactly software development per se, we do have sprints that we run as we modify the virtual workspace for our students. And even in this small example we’ve seen the pace of changes increase substantially, which means we have less time to QA our workspaces or images before they get rolled out. The same is true for those of us who manage a larger infrastructure. It seems that as soon as you get one critical patch rolled out two more pop up right behind them, and we just don’t have time to test everything fully before we roll it out. So, what on earth do we do in order to maintain stability?
This is one area where a strong change management process works. I’m generally not one to advocate for additional process structure for the sake of process structure, but I think in this case it’s absolutely beneficial. We have found groups of users that are willing to be in our early adopters program. These folks are willing volunteers and know that they are going to get a rougher or “bumpier” experience than our general user population. However, we incentivize them by offering them a priority support experience, so they can effectively “cut the line” when they have issues. They appreciate the higher service level, and we appreciate that we get more timely notice of issues since they may be running something slated for a full campus rollout in the coming hours or days. What this ends up doing for us is that we have a phased rollout to the entire organization. The process is actually pretty smooth and clean; first, we roll out to our “red” users who are folks willing to live on the bleeding edge, and also usually have enough comfort with technology to not panic if something goes wrong. Then we can release to our “yellow” users who want some distance between them and the bleeding edge, and finally our “green” users who like to play it safe but still want early access. Each ring is larger than the last, so we end up ramping up instead of smacking everyone with an update at the same time. Once the green ring has spent some time with it, we then release it to the entire campus.
We’re proud of our phased rollout process. However, we also manually test larger updates before they even get to the red ring. Yes, this is less efficient than automated testing. Yes, it does induce a delay into the rollout, and yes... it is worth the delay to have more stability. We catch a lot of interoperability issues by having a human being put an image or patch through its paces before we put it in front of customers. I would strongly encourage you to adopt something similar in your organization, because it has made our overall environment more reliable.
Also, if your company, team, or organization is under considerable pressure to roll out software (or anything else) in an ever-accelerating manner with inadequate quality control, you may need to speak up. Not all problems can be solved by a technical solution; sometimes it’s about advocating for a philosophical adjustment to what is truly important. Speak up to leadership and let them know that you are seeing quality decline and that can lead to lower customer satisfaction and/or risk of lower retention and revenue.
So where does this leave us? Candidly, if we’re not fed up with the status quo – it may not mean anything in terms of action. If we’re okay with the discomfort of unreliable systems and software, why would the vendors want to do anything about it? We’re still happy customers! If, however, we have reached a turning point where we don’t want to just live with it anymore, we need to be vocal with vendors, hold them accountable for outages and/or problems, and make them feel the discomfort when they drop the ball – especially when it was not due to a security issue that required immediate remediation.
Years ago, I watched a documentary about farming where one of the farmers was being interviewed about growing using conventional methods and why he didn’t switch to organic farming. His response has stuck with me for over a decade: “The average shopper is telling us that they want cheap, plentiful food. If the market was demanding more organic food at higher prices, trust me – farmers are resourceful people – we’d make it happen.” I don’t think it’s that much different with the tech industry.
At the end of the day, we’re all in this together as an ecosystem of vendors, partners, and consumers of the solutions. I have confidence in the ability of software companies to make more reliable software – we just need to make it clear to them that we’d prefer that to a faster release of the next feature update. They’re resourceful people. They’d make it happen.
Questions for reflection:
Can you think of a software company that you may have leverage with to negotiate for terms to encourage reliability?
Have you felt pressure yourself to deliver faster results possibly at the expense of accuracy or reliability?
Are you willing to speak up to leadership that quality shouldn’t be an afterthought? How can you start that conversation?