Archive for August, 2007

Ongoing Amazon EC2 Observations

Tuesday, August 21st, 2007

It’s been a couple of months now that FeedBlendr has been hosted with Amazon EC2, and some people have shown interest in hearing more about that experience, so I thought I’d follow up with some observations about EC2 in general and my experiences/configuration in particular.

Here goes:

  1. Dynamic DNS is a SLOW way of faking load balancing. It’s reasonably functional from a normal management perspective, but an emergency situation would not be pretty. Setting your TTLs down to 300 should mean that changes happen pretty quickly, but 48 hours after making changes and removing an instance from my DNS records, I still see requests coming direct to that instance. Amazon, please offer us internal load balancing between instances somehow!
  2. Disposable Instances are something you need to get used to. I’ve already had one instance get itself into trouble because of a “degraded ephemeral data store” (according to an email I received from Amazon). If you’re in the habit of making instances completely disposable then you can just launch a new one and terminate the one with problems.
  3. Instance Cycling is something I’m starting to believe in – periodically just starting new, clean instances, and moving your operations over to them, then shutting down the previous ones. This is also the case with a new code release for me. Rather than upgrading the code on my instances that are already running, I slide onto new instances via DNS changes.
  4. Alternate AMIs are something I need to deal with, so that I can have a new AMI being bundled and tested, without throwing away my previous one. If there’s a problem with the new one, how do you roll back otherwise? This is something I need to figure out in my own deployment process.
  5. Inconsistent Performance seems to be quite common on instances. I wasn’t expecting this one, but if you think about it, it probably makes sense. I can launch identical instances, and load them equally in DNS, but they will perform quite differently. I believe this is probably due to resource sharing/exhaustion on specific instances due to other users’ instances on the same physical machine within EC2. This is something Amazon probably needs to look into.
  6. Slow External Connections are not something you want to be a required part of your core system. Currently my database is hosted outside the cloud (on DreamHost) and this seems to be a bit of a bottle-neck in the system. Also, I pay for the bandwidth generated by those communications which is a cost I could avoid by hosting the database within the cloud.
  7. Pricing on EC2 is something interesting to consider. Right now, I run 2 instances, 24/7, which means I’m up for $144 per month plus bandwidth, so let’s just call it $200 pm for a self-managed service. Is that actually worth it compared to getting a dedicated machine somewhere else? More thoughts on this below.

EC2 vs. “Traditional Dedicated Host”

Potentially “FOR” EC2

  • You can easily launch as many instances as you need, rather than just having one server to work with
  • Setting up unique AMIs is relatively easy, so you can scale any part of your application
  • Pay-as-you-drink means you’re not paying for infrastructure you’re not using
  • No up-front/set-up costs
  • Scalability through commodity computing
  • More choice over operating system, because it’s entirely up to you

Potentially “AGAINST” EC2

  • Because you are technically sharing hardware, you’re subject to resource-starvation/sharing with other users!
  • No control over your network configuration (e.g. no availability of hardware load balancers, choice to put instances on the same switch etc, you get what you’re given and that’s it)
  • No choices for custom hardware, so you can’t get better hard drives, more RAM etc
  • You need to learn how to build an AMI and work with the Amazon tools for bundling and launching instances etc (not that it’s hard)
  • You have to use Amazon S3 to store your AMIs in
  • There is no bundled bandwidth with the service, so you only pay for what you use, but you pay for all of it
  • No control panels, monitoring, reporting or statistics or anything like that provided (unless you install them yourself) other than very basic bandwidth numbers and “instance-hours” reporting
  • No SLA at all, so you have no guarantees on uptime

Anyone else have any other ideas/additions to those lists?

New FeedBlendr Backend Code

Thursday, August 16th, 2007

I am in the process of moving over to a new set of internal code for FeedBlendr to help with future enhancements, so you might see some funkiness over the next few days (although hopefully not!). The new version introduces a few enhancements, including some slight changes to existing features:

  • Internal APIs use Atom-based formats where possible now (OPML stuff is still there where it makes sense tho).
  • Added a custom namespace (fv, for FeedVille) to the OPML output, and also some other outputs to provide additional information to users/developers.
  • Added some new output format options (look for them on blend information pages soon).
  • Added some customization options for output formats for restricting the number of entries displayed and limiting output to headlines only.
  • LOTS of internal changes, but those shouldn’t affect you :-)

Hopefully all servers will be switched over to the new code-base by the end of today, and some new documentation etc will be available to detail these changes.

UPDATE 2007-08-21: This code has been fully deployed on all servers as of 2007-08-17 and appears to be working successfully :-)