Software Architecture Evolution: From Mainframe to SOA and Beyond

    To become an expert on modern continuous testing practices, you've got to understand a little about the systems the software is built to run on. This post looks at how we’ve moved from mainframe to SOA, and the implications for continuous testing.

    Brief History of Software and Hardware

    It’s certainly the case that we learn from everything we build—and software architecture is no exception—so there’s a good chance no engineer ever built something exactly the same way twice. However, enterprise systems, once built, are often long-lasting. And so, despite advances in hardware, and increasing business complexity, a lot of large systems run a mix of ideological approaches reflecting the best intent at the time they were built. This doesn’t make the job of testing any easier. But, with a little background, it’s easy to at least understand how things came to be the way they are.

    Monolithic architecture started with “the mainframe”—a monumental box that literally weighed a ton (or more) and lived in a special room filled with lots of whirling fans, reels, and platters. Mainframes cost a fortune, had limited inputs, and businesses didn’t have a lot of them. To start, typically only the one. And it would manage your bank’s ledger, or other mission-critical tasks that were fairly simple but infinitely complicated at the same time. It was your source of truth, and it was pretty great at that one thing it did… and naturally over time, more and more of the business wanted to make use of it. So while you may have started with only the ability to process your bank’s transactions, over time the machine turned into a tool to manage employee payroll, or serve up sales data to your marketing teams.

    Over time, companies put more and more features onto their mainframes. And the only down-side was that all of the technological eggs were in the one basket. So when you wanted to make a change to the system, you had to bring your entire business to a halt to do so. As the machines grew in complexity and importance, and as companies started building mission-critical workflows and processes around them, the servers were becoming more and more difficult to update and maintain. Changes to fix bugs or release new features meant system downtime for unrelated systems. Businesses wanted the system to do more, faster, but the systems were too large and heavy to respond quickly to their demands. Catch 22 if ever there was one.

    As technology costs fell, and storage and retrieval of data capabilities grew, the solution was to just buy more mainframes. A little smaller now, so none of them were quite as “main” as they used to be, so we just started calling them “servers.” These servers weren’t designed to replace the mainframe, often just augment it. We’d spin off the less-than-mission-critical systems into their own server. And so by continuing to throw more and more servers at the issue, businesses solved the problem once and for all! That’s how solutions work, right? Reducing cost certainly helped mask the problem, and soon every business had a handful, or a roomful, or a building full, of servers. This allowed companies to start segmenting servers by function, by division—and to have backup systems that could be rolled over in the event of a failure. It wasn’t a total cure, but it meant that tripping up on one task wouldn’t ruin all the others you had. And it carried the small disadvantage that overhead costs of operating many servers was in fact many times greater than the overhead on just one.

    A quick note on hardware. The terms for computers are largely irrelevant. Fundamentally a mainframe or server isn’t any different than a workstation, desktop, laptop, or phone. CPU, RAM, storage, all connected to a motherboard. Really the only difference between an Apple iPhone and an IBM z15 Mainframe is that one is built to fit in your pocket. (Fun to see what young technologists do with old technology!)

    The Old Way Became Too Expensive

    The upkeep on adding more and more servers skyrocketed. And these systems, all built to stand by themselves, didn’t tend to play well with others. So when you wanted to link payroll to time sheets—something you could probably do back when everything was on one big mainframe, you couldn’t because at the end of the day everything was now split out onto a bunch of ultimately highly disjointed systems. Each new server was designed in a way that reflected the best thinking at the moment it was built, but with each new advancement, you’d fix one thing, and make it harder to connect another. This only got worse when you factor in acquisitions and mergers.

    Let’s say you were a healthcare company, and you bought one of your regional competitors. Now suddenly you had two databases with two dramatically different patient tables (but we’ll keep the examples simple). One had name, city, state, zip, another had first name, last name, hometown… each of these systems had different user interfaces, separate business logic around what had to happen when someone made a record update, and integrated with countless other systems for billing, staffing, and whatever else, with staff at every level of the organization who were already trained on those tools. Simply mandating everyone switch from one to the other would be hugely expensive. What do you do?


    Well, you start off by writing some translators. You write a little piece of code that runs on yet another server, and acts like a bridge between the first two. It painstakingly hooks into both databases, and works to clean up the data, and keep it all in sync while triggering business logic validation in each system as it goes. When someone gets married and updates their name, it has a little script that runs that merges first and last into just the name field, then fires off appropriate update actions—notifies doctors, or hospitals, or billing departments with the update. And that works. Except… again you’ve just inflated your maintenance budget.

    Worse than just a rise in costs, is the rise in complexity. Instead of one source of truth, you’ve got potentially three—System A, System B, and the translator—held together by the code equivalent of duct tape. People still need to do systemic updates both applications, but now every time you make any change to either system, you also update the translator. And the translators are notoriously buggy, since they’re almost always written by third-party system integration consulting firms who love to charge by the hour. So you’ve got this huge complex web of tangles, expensive to maintain, sort of working. Until it doesn’t, and when something breaks, good luck trying to sort it all out. These tangled systems are what kept firms like Accenture, Deloitte, EY, KPMG, and PwC in business.

    And we’ve all seen the results of systems like this in action. Think of that one time someone put your name or address wrong when you started a new job or opened a bank account. The system had a script that was triggered to take your data and share it with other systems to mock “integration” so you wouldn’t have to re-enter your details, but now no matter how many times you fix the mistake there’s always one more place to fix it. Worse, every so often the sync tool goes through and thinks the mistake is the source of truth and re-breaks it all again, until it’s probably easier for everyone if you just change your name from Michelle to Mitchelle, or just get to know the people who moved in to your old apartment so you can still pick up your payroll tax forms.

    Simple is Beautiful, Simple is Easier, Simple is Cheaper Too

    Thomas Paine gave some advice that helped incite the American Revolution, and it’s still good advice for guiding business and software architecture evolution today: “[T]he more simple any thing is, the less liable it is to be disordered, and the easier repaired when disordered…” Enter service-oriented architecture (SOA) and the application programming interface (API). Instead of having one server, or a pile of servers wrapped in duct tape, now you set up multiple “microservices” that can be called to do a bunch of little things. You need to update your name, now it’s as easy as calling the API for that function from your web interface, or your mobile app, or any other interface you want to give access to. Thus, your database remains the source of truth, and you’re easily able to tie other systems to it.

    None of these services are large, or complicated, and that’s the beautiful part. Modern software architecture evolution strives to be as small and reusable as possible. And since everything is small, it’s almost inherently simpler, with less business logic trying to be represented in each chunk of code. This makes it easier to learn, maintain, repair, and extend. Software developers don’t have to memorize the workings of a huge machine with a lot of moving parts, they only have to focus on one little cog at any given point. APIs can be reused everywhere the logic is needed, making systems easier to extend, scale, and integrate with—and easier to update without system-wide down-time, thus improving dependability and release speeds.

    Another hands-on example: Let’s say you’re working with a large national telco. You offer a lot of services to your customers. Cell phone plans, internet, streaming videos… on through to network infrastructure, video conferencing, and consulting services for your business clients. You probably have a hundred different independent systems, each with 10s to 100s of events that trigger emails to your customers. A bill is ready, a bill is paid, a bill is late, an account password was changed, a new promotion is available, etc. And then someone in your marketing department gets the idea to change your logo or brand design (that’ll happen about every two to three years on average). Do you want to go through and have to manually change all of those places that generate an email, keeping in mind each system probably uses different markup languages and layout frameworks, and even identifying all the places that need to be changed is an endless testing nightmare?

    Or do you want to just have each of those places that trigger an email make a RESTful API call telling a standardized email generator service what template, addressee, subject, and body copy to use? Hint: You don’t want the old way—the old way is really painful. Using an API-based system, you can store all of your email templates in one place, and you only have to make the change once per template. Additionally, you can easily generate preview emails for each template, you don’t necessarily have to run through every possible situation that would trigger an email to see what the email would look like. And finally, maybe your marketing department also wants to change how emails are sent, but they don’t tell you until after you’ve done the template migration, and that’s fine. They want to send them through a different service provider, or ensure that no marketing email is sent over the weekend, or use a different analytics platform. Cool. If you’ve got the service in one place, it’s no big deal to make the change. Just make updates to the mailer service as needed. One spot. The effort required goes from several tedious boring months to mere minutes (well days at any rate). That’s the benefit of SOA.

    From a testing perspective, imagine the savings. Test one place, or re-test every corner of your app to generate every possible email… every time you do a deployment. This is why manual testing times are so long, and why so much software gets released without being adequately tested. The hefty cost of manual testing these old systems will weigh any team down to a crawl, it ends up being a lot easier to just try to never make any changes, and hope and pray it still works… terrifying though, since you need your code to be stable and dependable, but also adaptable to your current needs. Even with automated testing, you likely have code going out, and nobody really knows what the true impact is, nobody has had time to run all the tests needed to cover everything. Can you upgrade a dependency with a known security issue? Can you fix that bit of CSS? Or are you afraid to?

    Not hard to know why people are afraid. To put it another way, “Do you want to do 1x the effort, or 100x the effort? Do you even know how much effort it would take, or what the risks are?” Are you having to connect to your old monolithic mainframe, and are all the services or features there documented in a way you understand? What if your code touches integrated systems, will you have to let other teams or partners know about these tests? Is there a cost involved with running a test on code that utilizes a third-party service or API? Do you have to hit production to run the test—you know, the same server your real customers are using—and will your testing impact their experience? And keep in mind, these questions are for build, but also maintenance. There are ways continuous testing can help with all of these things, but prudent setup generally requires a bit of thought and planning to get right. Be wary of people trying to push in too much, too fast, it’s literally just like paying for things with a credit card. It’s OK once in a while, but every few months you have to stop eating out in order to pay down your bills.  Software development works the same way; we can rush for a time, but every fourth or fifth sprint should be 100% dedicated to paying down technical debt or the interest and penalties will start to kick in.

    Even though everyone is on-board with SOA, and knows the benefits, just about everyone still has legacy systems in play too. We don’t often get the chance to re-do everything from scratch, and even when we do, those projects take time. So, in the interim, while we’re waiting patiently until we can “go Office Space” on some dinosaur accounting tool we’re forced to integrate with, it’s important to remember that system has managed to limp along and get us to where we are today. (Those systems aren’t going anywhere, and learning COBOL is still a promising career path.) Often makes a person wonder though, did the person who architected that old system really think we’d still be using it, so many years later? Or were they just in a rush, worried about their single-quarter budget expenditures? Good lesson there, about total cost of ownership; when you architect any system, sure, remember you’re on a budget, but also try and be kind to your future selves and build it in a way that keeps technical debt down, and that you’ll want to maintain. As any engineer will say, “There’s nothing more expensive than a cheap fix.”