The lesson? Simple, design for redundancy (or have contingency measures in place for a quick restore). Let's take a look at a few of the options available for AWS resilience and restore that could help you save face should Godzilla strike.
Amazon Web Services operate a global infrastructure of data centres (Regions). Inside each region are a number (3+) of 'Multi Availability Zones'. These Multi-AZs provide regionally internal redundancy and a intelligently designed platform will span at least two zones, with application data (where possible) held in AWS S3 storage (S3 is inherently stable).
Technologies used to facilitate redundancy and application load across Multi-AZs would include AWS Elastic Load Balancing (ELB) and auto-scaling, with options for database replication using either the AWS Relational Database Service (RDS) or multiple instance configuration with a tried, tested (and easy to configure) MySQL master/slave setup. Replicating Apache can be achieved at basic level through triggered cron jobs for rsync and if you're mad enough to be using an MS/Srv08/IIS setup stay clear of the MS Web Farm Framework and contact Cirronix for advice, we have a MUCH easier way, seriously.
Scaling in vanilla AWS is (at time of writing) via scripted config using their suite of CLI tools only and does require a degree of familiarity with the backend, scaling configs are also confined to set AMIs (Amazon Machine Images). Instance replacement is possible however automation for linked DNS and re-assignment of Elastic IPs (EIPs) is not catered for and requires manual application.
Multi-AZ design is a no-brainer but what happens when Godzilla storms down the East Coast in a really bad mood and takes out a full region? What you need, ideally, is some sort of cross-region platform right? Well, yes, and it can be achieved, AWS Route 53 can deliver weighted or round robin DNS for your multi-region ELBs, but to do so brings with it deepened considerations for mirroring on a global scale with all the accompanying overheads across cost and maintenance for two (or more) complete platforms. If you're prepared for that then good, just be aware that global redundancy costs.
AWS Cloudformation allows you to replicate your full stack from created JSON scripts, either in its home zone(s) or adapted for alternate regions. CF is incredibly useful for immediate re-launch (of all instances and accompanying parameters) to cover not only eventualities for disaster recovery but with useful potential for development and testing.
At the most basic end of redundancy (especially if you don't run S3 backed app data) have to be instance snapshots, in a word - take them and/or automate them (Here's how). If you have current snapshots of your EBS volume data you at least stand some chance of restoration as creating new volumes and replacing a new instance /dev/sda1 boot is relatively quick and straightforward.
3rd Party Tools
Vanilla AWS provides an amazing base level cloud framework but is admittedly lacking for easy config of automated redundancy. Thankfully, to make our lives easier, there are a number of 3rd party cloud management suites available which add substantial value to backend AWS, Scalr is one such offering and one which Cirronix prefer and highly recommend.
With plans from $99 Scalr is feature rich and includes the following great features above base AWS :
- Replication of instance changes to live (scaled) copies (includes base AMI update & live replacement).
- Automated instance replacement from failure.
- MySQL backup, rollback & instance resilience (Slave > Master promotion).
- Multi-Region farms.
- Multi-cloud deployment.
- Managed (easy) vHosting.
- Cron scripting.
You can, or course, never have enough redundancy, though by taking note of what's on offer and designing to your limits you will certainly reduce potential for downtime. And even if you're a sole operator on a limited budget running nothing else but a single micro-instance on the free tier you can still take advantage of cost effective options to implement solid disaster recovery. Scalr is the cream of the crop but Pingdom and AutoSnappy can go a long way to fighting Godzilla.