Wednesday, February 27, 2013

Fractured Product Syndrome

How do you sell your software product? Do you fork the source code of your previous customer, modify it a little and deploy it as its own instance with its own database? Maybe each customer even has their own server? Their own set of servers?

You are suffering from the Fractured Product Syndrome.

Let me tell you a story.

Colin and Alex have a little web development consultancy, called Coaltech (get it?) they are pretty handy with ASP.NET and SQL Server and already had a respectable list of satisfied clients when a local car hire company, Easy Wheels, asked them if they could build them a car booking system. It would keep a list of Easy Wheels’ cars and allow customers to view them and book them online.

Colin and Alex signed a contract with Easy Wheels and got busy. Their “just do it” attitude to getting features done meant that the code wasn’t particularly beautiful, but it did the job and after a couple of months work the system went live. Easy Wheels were very happy with their new system, so happy in fact that they started to tell other independent car rental companies about it. A few months later Colin and Alex got a call from Hire My Ride. They loved what Coaltech had done for Easy Wheels and wanted a similar system. Of course it would have to be redesigned to fit in with Hire My Ride’s branding, but the basic functionality was pretty much identical. Colin and Alex did the obvious thing, they took a copy of the Easy Wheels code and tweaked it for Hire My Ride. Of course they charged the same bespoke software price for Hire My Ride’s system.

Soon a third and fourth car hire company had asked Coaltech for their ‘system’. Each time Colin and Alex took a copy of the last customer’s code – it made sense to take the most recent code because they inevitably added a few refinements with each customer – and then they altered it to match the new customer’s requirements. Before long they had arranged a deal with another web agency to take over all their non-car-hire work and decided to concentrate full time on the ‘system’. It needed a name too, they couldn’t carry on calling it ‘Easy Wheels’ and they certainly couldn’t market it like that. Soon it became ‘Coaltech Cars’ with a new marketing website. They also found that they couldn’t keep up with demand with just the two of them doing customer implementations. Each new customer took around six weeks of development work with the inevitable to-and-fro of slightly different requirements and design changes. To help meet demand they started to hire developers. First Jez, then Dave, then Neville. They all modified the software in slightly different ways, but it didn’t seem to matter at the time.

Fast forward five years. Coaltech now has 50 employees and around 50 customers. It’s fair to say that they are the leading vendor of independent car rental management systems. The majority of their staff are developers, although they also have a small sales, HR and customer relationship management team. You would have thought that Colin and Alex would be happy to have turned their little web development company into such a success, but instead life just seemed to be one long headache. Although the company had grown, it always seemed hard to turn a profit. As it was customers often balked at the price of the system. The same bug would turn up again and again for different customers. There never seemed to be enough time to fix them all. Even though they might have delivered a new feature for one client, it always seemed to take just as long to implement it for another; sometimes longer depending on what code they’d been forked from. The small team that looked after the servers were in constant fire fighting mode and got very upset when anyone suggested ‘upgrading’ any of the clients -  it always meant bugs, downtime and screaming customers. And then the government changed the Value Added Tax rate. Colin had to cancel his holiday and they lost two of their best developers after several weeks of late nights and no weekends while they updated and redeployed 50 separate web applications.

The end for Coaltech cars came slowly and painfully. Alex had the first hint of it when he was made aware of a little company called RentBoy. They had a little software-as-a-service product for small car hire companies. To use it you entered your credit card number, a logo, a few other details, and you were good to go. They weren’t any immediate competition for Coaltech Cars, having only a small subset of the features, but they soon captured the low end of the market, the one or two man band car companies that had never been able to afford Coaltech’s sign up or licensing fees.

But then the features started coming thick and fast and soon Coaltech found they were loosing out to RentBoy when bidding for clients. Colin found an article on the High Scalability blog about RentBoy’s architecture. They had a single scalable application that served all their customers, one place to fix bugs, one point of scalability and maintenance. They practiced continuous deployment allowing them to quickly build and release new features for all their customers. The company had four employees and already more customers than Coaltech. They charged about a tenth of Coaltech’s fees.

Coaltech’s new client revenues soon dried up. They’d always made a certain amount of money from sign-up fees. Too late they realised that they had to start shedding staff, always a painful thing for a small closely knit company. The debts mounted until the bank would no longer support them, and before long they had to declare bankruptcy. Luckily Colin and Alex managed to get jobs as project managers in enterprise development teams.

The moral of the story? Try to avoid Fractured Product Syndrome if you possibly can. Although simply forking the source code for each new customer appears by far the easiest thing to do at the start, it simply doesn’t scale. Start thinking about how to build multi-tenanted software-as-a-service long before you get to where Coaltech got to. Learn to say ‘no’ to customers if you have to. It’s far better to have a high number of low value customers than smaller numbers of higher value ones on a differentiated platform. It’s much easier for a low-value volume software provider to move into the high-value space than for a high-value provider to move down.

Learn to recognise Fractured Product Syndrome and address it before it gets serious!

Thursday, February 21, 2013

EasyNetQ on .NET Rocks!

dotnetrocks
Last week I had the pleasure of being interviewed by Carl Franklin and Richard Campbell for a .NET Rocks episode on RabbitMQ and EasyNetQ. It was terrific fun and a real honour to be invited on the show. I’ve been listening to .NET Rocks since it started in 2002 so you can imagine how excited I was. Carl and Richard are seasoned pros when it comes to podcasting, and the awesome ninja editing skills they posses turned my rather hesitant and rambling answers into something that almost sounded coherent.
You can listen to the show on the link below:
http://www.dotnetrocks.com/default.aspx?ShowNum=848
Now Richard, about that Tesla …

Wednesday, February 06, 2013

EasyNetQ in Africa

Anthony Moloney got in touch with me recently. He’s using EasyNetQ, my simple .NET API for RabbitMQ, with his team in Kenya. Here’s an email he sent me:
Hi Mike,
Further to the brief twitter exchange today about using EasyNetQ on our Kenyan project. We started using EasyNetQ back in early November and I kept meaning to drop you a line to thank you for all your good work.
Virtual City are based in Nairobi and supply mobile solutions to the supply chain and agribusiness industry in Africa. African solutions for African problems. I got involved with them about 2 years ago to help them improve the quality of their products and I have been working on and off with them since then. Its been a bit of a journey we are getting there.
We have a number of client applications including android and wpf working in an online/offline mode over mobile networks. We need to process large amounts of incoming commands from these applications. These commands are also routed via the server to other client apps.
The application had originally used MVC and SQL server to synchronously process and store the commands but we were running into severe performance problems. We looked at various MQ solutions and decided to use RabbitMQ, WebApi & Mongo to improve processing throughput. While researching a .Net API for RabbitMQ I noticed that you had created the EasyNetQ API.
EasyNetQ greatly simplifies interacting with RabbitMQ and providing your needs are not too complicated you really don't need to know too much about the guts of RabbitMQ. We replaced the existing server setup in about a week. The use of RabbitMQ has greatly increased the scalability of the product and allows us to either scale up or scale out.
We are also using the EasyNetQ management API for monitoring queue activity on our customer services dashboard.
Kind Regards
Anthony Moloney
One of the great rewards of running an open source project is hearing about the fascinating ways that it’s used around the world. I really like that it’s an ‘African solution for African problems’ and built by a Kenyan development team. It’s also interesting that they’ve used OSS projects like RabbitMQ and Mongo alongside .NET. It reminds me of the Stack Overflow architecture, a .NET core surrounded by OSS infrastructure.

Friday, February 01, 2013

How to Write Scalable Services

SDO_arch

I’ve spent the last five years implementing and thinking about service oriented architectures. One of the core benefits of a service oriented approach is the promise of greatly enhanced scalability and redundancy. But to realise these benefits we have to write our services to be ‘scalable’. What does this mean?

There are two fundamental ways we can scale software: 'Vertically' or 'horizontally'.

  • Vertical Scaling addresses the scalability of a single instance of the service. A simple way to scale most software is simply to run it on a more powerful machine; one with a faster processor or more memory. We can also look for performance improvements in the way we write the code itself. An excellent example of company using this approach is LMAX. However, there are many drawbacks to the vertical scaling approach. Firstly the costs are rarely linear; ever more powerful hardware tends to be exponentially more expensive and the costs (and constraints) of building sophisticated performance optimised software are also considerable. Indeed premature performance optimisation often leads to overly complex software that's hard to reason about and therefore more prone to defects and high maintenance costs. Most importantly, vertical scaling does not address redundantcy; vertically scaling an application just turns a small single point of failure into a large single point of failure.

  • Horizontal Scaling. Here we run multiple instances of the application rather than focussing on the performance of a single instance. This has the advantage of being linearly scalable; rather than buying a bigger, more expensive box, we just buy more copies of the same cheap box. With the right architectural design, this approach can scale massively. Indeed it's the approach taken by almost all of largest internet scale companies: Facebook, Google, Twitter etc.. Horizontal Scaling also introduces redundancy; the loss of a single node need not impact the system as a whole. For these reasons, horizontal scaling is the preferred approach to building scalable, redundant systems.

So, the fundamental approach to building scalable systems is to compose them of horizontally scaled services. In order to do this we need to follow a few basic principles:

  • Stateless. Any services that stores state across an interaction with another service is hard to scale. For example, a web service that stores in-memory session state between requests requires a sophisticated session-aware load balancer. A stateless service, by contrast, only requires simple round-robin load balancing. For a web application (or service) you should avoid using session state or any static or application level variables.

  • Coarse Grained API. To be stateless, a service should expose an API that exposes operations as a single interaction. A chatty API, where one sets up some data, asks for some transition, and then reads off some results, implies statefulness by its design. The service would need to identify a session and then maintain information about that session between successive calls. Instead a single call, or message, to the service should encapsulate all the information that the service requires to complete the operation.

  • Idempotent. Much scalable infrastructure is a trade-off between competing constraints. Delivery guarantees are one of these. For various reasons it's is far simpler to guarantee 'at least once' delivery than 'exactly once'. If you can make your software tolerant of multiple deliveries of the same message it will be easier to scale.

  • Embrace Failure. Arrays of services are redundant if the system as a whole can survive the loss of a single node. You should design your services and infrastructure to expect and survive failure. Consider implementing a Chaos Monkey that randomly kills processes. If you start by expecting your services to fail, you'll be prepared when they inevitably do.

  • Avoid instance specific configuration. A scalable service should be designed in such a way that it doesn't need to know about other instances of itself, or have to identify itself as a specific instance. I shouldn't need to have to configure one instance any differently than another. This would include communication mechanisms that require messages to be addressed to a specific instance of the service, or some non-convention based way that the service was required to identify itself. Instead we should rely on infrastructure (load-balancers, pub-sub messaging etc.) to manage the communication between arrays of services.

  • Simple automated deployment. Have a service that can scale is no advantage if we can't deploy it when we are close to capacity. A scalable system must have automated processes to deploy new instances of services as the need arises.

  • Monitoring. We need to know when services are close to capacity so that we can add additional service instances. Monitoring is usually an infrastructure concern; we should be monitoring CPU, network, and memory usage and have alerts in place to warn us when these pass certain trigger points. Sometimes it's worth introducing application specific alerts when some internal trigger is reached, such as the number of items in an in-memory queue, for example.

  • KISS - Keep It Small and Simple. This is good advice for any software project, but is especially pertinent to building scalable resilient systems. Large monolithic codebases are hard to reason about, hard to monitor, and hard to scale. Building your system out of many small pieces makes it easy to address those pieces independently. Design your system so that each service has only one purpose and is decoupled from the operations of other services. Have your services communicate using non-proprietary open standards to avoid vendor lock-in and allow for a heterogeneous platform. JSON over HTTP, for example, is an excellent choice for intra-service communication. Every platform has HTTP and JSON libraries and there is abundant off-the-shelf infrastructure (proxies, load-balancers, caches) that can be used to help your system scale.

This post just gives a few pointers to building scalable systems, for far more detailed examples and case studies I can't recommend the High Scalability Blog enough. The Dodgy Coder blog has a very nice summary of some of the High Scalability case studies here.