AWS Cloud Computing System Design

Architecture Qualifications

I was recently asked about Software Architecture qualifications by a reader to the blog, and I’m keen to share my advice a bit more widely.

They are an experienced developer, looking to take a step away from being an individual contributor, to become a design and system implementation influencer. They wanted to know my thoughts about TOGAF, and how useful the qualification had been in my own progression.

My advice assumes that you’ve already decided that a certification or formal course of study is the right way for you to go on your next learning step. If you are a proponent of the 70/20/10 model, then this is very much covering what you should do with your 10% time. So, without further ado, let’s get into the nitty-gritty of the possible options.

TOGAF was an interesting course of study, but I really feel it’s got a very narrow range of applications. I believe it’s only relevant for a very small number of extremely large organisations. It’s main focus was around the creation and maintenance of a large architecture practice in an enterprise, which is well away from the day to day of designing and developing systems.

If you are looking at a progression path from individual contributor to technical leader, then I’d strongly advise your favourite flavour of cloud certifications. I use AWS at the moment, and there’s a Solutions Architect track that’s really good. There are similar Microsoft paths for Azure, or Google ones for their cloud.

ITIL is possibly a useful direction, but that does tend more to hardware and processes, so might be less useful if you are aiming to design and create new systems, as opposed to running existing systems stably and efficiently.

If you are thinking modern companies, strong agile approaches and staying close to the day-to-day implementation of your designs, then the Cloud route is my number 1 suggestion, and there’s a lot of great supporting courses out there to aid your studies!

System Design

The Right Way

There is no right or wrong way to develop software. This is a controversial, but key concept to grasp.

What matters is that everyone working on developing a single piece of software agrees to use the same methods. They must pull together rather than pulling apart.

No methodology or practice works the best across all products, teams or systems. You need to find what works for you and yours, and then push to maintain that successful process over time.

Many people have written and talked about what worked for them, whether it is Scrum, XP or waterfall for delivering projects, or smaller and more syntactic decisions such as OO programming, functional programming, or even a particular coding style.

Don’t fall into a trap of a one true way. What is true for one person in one situation may not hold true for your team in your situation.

Once you find a method you can make use of, then you must by all means strive to implement the best practices and recommendations of others. There is little point in choosing to do something badly when you can choose to do it well.

Find something that fits, do it as best you possibly can, and good quality software will follow without fail.

Performance Basics System Design

Connections Per Hostname

There are lots of ways to improve the performance of your website, some are easier than others to implement, and some will have a greater or lesser effect.

One potentially easy win is to limit the requests made from a single hostname. Most modern browsers will create up to six requests to a single host. If you are serving all of your resources from the same domain, e.g. then the browser will request the first six, wait for them to load, then start requesting the next six.

If you split your resources across multiple domains then this queuing will not occur. As a simple improvement, load your static content from (or similar). This means that your dynamic pages will load, and start pulling your static content very quickly, rather than waiting for all the requests to the dynamic pages to complete.

This splitting out also leads to other possible benefits. You can put your static resources onto a CDN, so the performance of your will be drastically better than if served from your own servers. It also allows for further tweaks, this domain can be configured to be cookieless (you’d use in this case). That would save sending cookies for static requests, which you should never need.

This is a quick improvement that should be simple to setup. The major effort is in configuring the domains correctly, and planning your system to allow for these split domains. Once you’ve done that, you’ll increase performance for everyone using your site, without having to improve the code that drives the system.



System Design

Breaking Changes

Making a breaking change is just about the most destructive thing you can do in a software system. Doing something that means everything that went before is outdated means you are going to severely limit your options for the future.

A breaking change is one where the system has changed in such a way to mean you cannot go back to the old version.

If you can’t go back to the old version, then you need to be certain that your new version will work. That may sound pretty simple, but it’s always harder in practice.

You should design your system to reduce the number of breaking changes, and to allow for the simplest methods of coping with a breaking change.

When you have the choice of a little extra work to make a smooth transition, give enough weight to the prospect of your change failing to estimate correctly. Don’t just assume that everything will work and you can always roll forwards.

If you can isolate your breaking change to a single layer then you can split your deployment to several pools. You need a way to send users to the new or old code in a reliable way, otherwise this will not work.

If you design ahead of time to take account for the breaking changes, then when you finally need to make one it won’t be as painful as it may have been.

System Design

Performance Tradeoffs

Performance is often a key concern in designing systems. Every time we consider performance we are making some kind of trade off in the wider system, and this needs to be understood or the system will fail.

Generally a performance requirement is phrased in a vague manner, indicating the system should be fast, responsive or otherwise quick. This is going to be very hard to design well for.

A good requirement will let us start making the tradeoffs we need. It will request that a key page is loaded in less than a second, or that calls to external services complete in less than 100 milliseconds. The requirement here is measurable, so we know if we have achieved it or not.

With the measurable requirement in hand, we can decide how to achieve the required performance. It might be easy, and just be met as the system is designed. We might need to cache data, or use more powerful hardware (trading cost for performance). We may find we need to use a lower level language or code module (trading maintainability for performance). We might have to use new or unproven technologies (trading risk for performance).

Once the requirement is understood we can look at the key performance tradeoffs, make the system design with these in mind and ensure that the stakeholders in the system are aware of the choices and options available to them. If we don’t have measurable performance requirements, we can’t make these informed decisions, and the system will suffer.

Cloud Computing System Design

Map Reduce

The idea of Map Reduce is very simple. It is a method to split up a complicated problem into smaller work packets that can be solved on a distributed system.

There are two main parts to Map Reduce, the Mapping function and the Reducing function.

The Mapping function is used to split up your dataset into manageable chunks. It is performed by the master node in the system. The Reducing function will be run on each of your worker nodes. It takes a mapped set of data, processes it, and returns the results to the master node. The master will then collate all of the reduced values and return the final results.

There are some further complexities to Map Reduce implementations, namely how the data is actually sent to a worker, how the results are returned and how the scheduling is managed. This is basically what a system like Hadoop will manage for you, so you can concentrate on the details of your Mapping and Reducing.

The canonical example is to produce a count of words in a document. The input to your mapping function is a string containing the text of the document. This will be split into words by the mapper, and a list of key value pairs identifying each word will be sent to the worker nodes. The reduce function will simply count the values provided, and return the results.

"A simple string with a repeated word"

Map function

function map(string mapInput)
  foreach(var word in input.ToLower().Split(" "))

function reduce(string[] reduceInput)

This is rough psuedo-code showing what you’d need to implement, it will not work exactly as written, you will need to customise for your actual implementation.

We’d expect from this to see the following results:

a 2
repeated 1
simple 1
string 1
with 1
word 1

From this simple example, you should be able to see how we can expand to cope with much more complicated and interesting problems.

System Design

Tiered Applications

We design systems in tiers to enable us to simplify complex problems and to give us greater understanding of the solutions we are attempting to create.

The most common separation is to split Data, Display and the methods to convert one to the other (generally known as Business Logic). This three tier structure can be seen across many different applications, and extends throughout the entire software industry. Often developers will be hired to only work on a single tier, the skills for working at each level can be very different.

If we achieve a good separation into tiers, we can replace the implementation of one tier with another. The commonest replacement is to implement new Display modes, adding a mobile display to an existing website, switching a windows display to the web.

Changing the data source is almost always more problematic, as you must move your data as well, but it is still possible.

We give up the possibility of the most efficient system when we create a tiered system, but in exchange we gain simplicity and understanding. As technology gets ever more powerful the systems we ever more afford this trade-off in the interest of creating maintainable and extendable systems.