Archive for the 'scalability' Category

AWSome Meetup

Thursday, October 25th, 2007

I probably should have mentioned it on here earlier, but last night I was the guest speaker at the new, monthly AWSome Meetup. It was a fun event, and a chance to tell people about my experiences with Amazon Web Services and some related tools and products. Around 20-25 people attended I think.
My talk seemed to be well-received, and people were very interested in how to get started, some of the teething problems I’d had, and the learning curve involved in using AWS. Hopefully some other people will be writing up some of the things I talked about, so I’ll link to them when I hear about them here (please ping this post if you’re writing about it!).
Thanks to Sebastian and Donnie for setting up the meetup and for inviting me to speak. I look forward to attending future events and meeting more people interested in and using AWS.

Ongoing Amazon EC2 Observations

Tuesday, August 21st, 2007

It’s been a couple of months now that FeedBlendr has been hosted with Amazon EC2, and some people have shown interest in hearing more about that experience, so I thought I’d follow up with some observations about EC2 in general and my experiences/configuration in particular.

Here goes:

  1. Dynamic DNS is a SLOW way of faking load balancing. It’s reasonably functional from a normal management perspective, but an emergency situation would not be pretty. Setting your TTLs down to 300 should mean that changes happen pretty quickly, but 48 hours after making changes and removing an instance from my DNS records, I still see requests coming direct to that instance. Amazon, please offer us internal load balancing between instances somehow!
  2. Disposable Instances are something you need to get used to. I’ve already had one instance get itself into trouble because of a “degraded ephemeral data store” (according to an email I received from Amazon). If you’re in the habit of making instances completely disposable then you can just launch a new one and terminate the one with problems.
  3. Instance Cycling is something I’m starting to believe in – periodically just starting new, clean instances, and moving your operations over to them, then shutting down the previous ones. This is also the case with a new code release for me. Rather than upgrading the code on my instances that are already running, I slide onto new instances via DNS changes.
  4. Alternate AMIs are something I need to deal with, so that I can have a new AMI being bundled and tested, without throwing away my previous one. If there’s a problem with the new one, how do you roll back otherwise? This is something I need to figure out in my own deployment process.
  5. Inconsistent Performance seems to be quite common on instances. I wasn’t expecting this one, but if you think about it, it probably makes sense. I can launch identical instances, and load them equally in DNS, but they will perform quite differently. I believe this is probably due to resource sharing/exhaustion on specific instances due to other users’ instances on the same physical machine within EC2. This is something Amazon probably needs to look into.
  6. Slow External Connections are not something you want to be a required part of your core system. Currently my database is hosted outside the cloud (on DreamHost) and this seems to be a bit of a bottle-neck in the system. Also, I pay for the bandwidth generated by those communications which is a cost I could avoid by hosting the database within the cloud.
  7. Pricing on EC2 is something interesting to consider. Right now, I run 2 instances, 24/7, which means I’m up for $144 per month plus bandwidth, so let’s just call it $200 pm for a self-managed service. Is that actually worth it compared to getting a dedicated machine somewhere else? More thoughts on this below.

EC2 vs. “Traditional Dedicated Host”

Potentially “FOR” EC2

  • You can easily launch as many instances as you need, rather than just having one server to work with
  • Setting up unique AMIs is relatively easy, so you can scale any part of your application
  • Pay-as-you-drink means you’re not paying for infrastructure you’re not using
  • No up-front/set-up costs
  • Scalability through commodity computing
  • More choice over operating system, because it’s entirely up to you

Potentially “AGAINST” EC2

  • Because you are technically sharing hardware, you’re subject to resource-starvation/sharing with other users!
  • No control over your network configuration (e.g. no availability of hardware load balancers, choice to put instances on the same switch etc, you get what you’re given and that’s it)
  • No choices for custom hardware, so you can’t get better hard drives, more RAM etc
  • You need to learn how to build an AMI and work with the Amazon tools for bundling and launching instances etc (not that it’s hard)
  • You have to use Amazon S3 to store your AMIs in
  • There is no bundled bandwidth with the service, so you only pay for what you use, but you pay for all of it
  • No control panels, monitoring, reporting or statistics or anything like that provided (unless you install them yourself) other than very basic bandwidth numbers and “instance-hours” reporting
  • No SLA at all, so you have no guarantees on uptime

Anyone else have any other ideas/additions to those lists?