Archive for the 'dreamhost' Category

AWSome Meetup

Thursday, October 25th, 2007

I probably should have mentioned it on here earlier, but last night I was the guest speaker at the new, monthly AWSome Meetup. It was a fun event, and a chance to tell people about my experiences with Amazon Web Services and some related tools and products. Around 20-25 people attended I think.
My talk seemed to be well-received, and people were very interested in how to get started, some of the teething problems I’d had, and the learning curve involved in using AWS. Hopefully some other people will be writing up some of the things I talked about, so I’ll link to them when I hear about them here (please ping this post if you’re writing about it!).
Thanks to Sebastian and Donnie for setting up the meetup and for inviting me to speak. I look forward to attending future events and meeting more people interested in and using AWS.

Ongoing Amazon EC2 Observations

Tuesday, August 21st, 2007

It’s been a couple of months now that FeedBlendr has been hosted with Amazon EC2, and some people have shown interest in hearing more about that experience, so I thought I’d follow up with some observations about EC2 in general and my experiences/configuration in particular.

Here goes:

  1. Dynamic DNS is a SLOW way of faking load balancing. It’s reasonably functional from a normal management perspective, but an emergency situation would not be pretty. Setting your TTLs down to 300 should mean that changes happen pretty quickly, but 48 hours after making changes and removing an instance from my DNS records, I still see requests coming direct to that instance. Amazon, please offer us internal load balancing between instances somehow!
  2. Disposable Instances are something you need to get used to. I’ve already had one instance get itself into trouble because of a “degraded ephemeral data store” (according to an email I received from Amazon). If you’re in the habit of making instances completely disposable then you can just launch a new one and terminate the one with problems.
  3. Instance Cycling is something I’m starting to believe in – periodically just starting new, clean instances, and moving your operations over to them, then shutting down the previous ones. This is also the case with a new code release for me. Rather than upgrading the code on my instances that are already running, I slide onto new instances via DNS changes.
  4. Alternate AMIs are something I need to deal with, so that I can have a new AMI being bundled and tested, without throwing away my previous one. If there’s a problem with the new one, how do you roll back otherwise? This is something I need to figure out in my own deployment process.
  5. Inconsistent Performance seems to be quite common on instances. I wasn’t expecting this one, but if you think about it, it probably makes sense. I can launch identical instances, and load them equally in DNS, but they will perform quite differently. I believe this is probably due to resource sharing/exhaustion on specific instances due to other users’ instances on the same physical machine within EC2. This is something Amazon probably needs to look into.
  6. Slow External Connections are not something you want to be a required part of your core system. Currently my database is hosted outside the cloud (on DreamHost) and this seems to be a bit of a bottle-neck in the system. Also, I pay for the bandwidth generated by those communications which is a cost I could avoid by hosting the database within the cloud.
  7. Pricing on EC2 is something interesting to consider. Right now, I run 2 instances, 24/7, which means I’m up for $144 per month plus bandwidth, so let’s just call it $200 pm for a self-managed service. Is that actually worth it compared to getting a dedicated machine somewhere else? More thoughts on this below.

EC2 vs. “Traditional Dedicated Host”

Potentially “FOR” EC2

  • You can easily launch as many instances as you need, rather than just having one server to work with
  • Setting up unique AMIs is relatively easy, so you can scale any part of your application
  • Pay-as-you-drink means you’re not paying for infrastructure you’re not using
  • No up-front/set-up costs
  • Scalability through commodity computing
  • More choice over operating system, because it’s entirely up to you

Potentially “AGAINST” EC2

  • Because you are technically sharing hardware, you’re subject to resource-starvation/sharing with other users!
  • No control over your network configuration (e.g. no availability of hardware load balancers, choice to put instances on the same switch etc, you get what you’re given and that’s it)
  • No choices for custom hardware, so you can’t get better hard drives, more RAM etc
  • You need to learn how to build an AMI and work with the Amazon tools for bundling and launching instances etc (not that it’s hard)
  • You have to use Amazon S3 to store your AMIs in
  • There is no bundled bandwidth with the service, so you only pay for what you use, but you pay for all of it
  • No control panels, monitoring, reporting or statistics or anything like that provided (unless you install them yourself) other than very basic bandwidth numbers and “instance-hours” reporting
  • No SLA at all, so you have no guarantees on uptime

Anyone else have any other ideas/additions to those lists?

The Trials and Tribulations of Using Amazon EC2 and S3

Tuesday, May 22nd, 2007

DISCLAIMER: I’m not a “sysadmin” by any stretch of the imagination, but I know my way around the Internets and have spent my fair share of time dealing with DNS, networks, server configuration, automation, HTTP-related stuff etc etc to know my way around things like this. I’m sure some of this would have been a lot easier for someone else, but hey – it works.

ASSUMPTIONS: I’m going to assume you know about Amazon EC2 and S3 and some of the terminology involved therein, so if not, please go read up a little on that first.

So — when I was looking for a new home for FeedBlendr, I wanted something that would be extremely scalable, because I have high hopes (obviously), and it’s part of a much bigger puzzle for me, so the scalability side of things was important. In this sort of application, the biggest issue with scaling and load has been processor time and memory, since my system spends a lot of time downloading feeds from the ‘net and then holding them in memory while it’s blending them and re-ordering them and whatnot. My main issue is not database “bandwidth”, it’s “web processing power”. With that in mind, here’s what I’ve done.

  1. Right now, my database remains on DreamHost (outside of Amazon entirely)
  2. I have a relatively dynamic system configured where I can call up a new instance from EC2 based on my own customized AMI. When it loads, it will grab a copy of my latest “distribution” of my web app from S3, install it on itself, and then send me an email (and an SMS) to let me know it’s ready to roll, and to add it into the DNS if I want it to be a part of my main cluster.
  3. I have 2 instances (servers) running in Amazon, configured using round-robin DNS to handle/balance the requests involved in powering FeedBlendr.

Setting Up an AMI

My first task (once getting myself set up with the EC2 Tools) was to actually set up my own Amazon Machine Image (AMI). This is your “server” if you like – operating system and all. I worked from a Fedora Core 6 base image that someone had shared on the Amazon Developer Forums, so that was a good starting point for me. Basically, this is what I did:

  1. Got the AMI running, then logged into it (getting logged on using shared certificates etc was new for me, but I got it sorted out)
  2. Did a bunch of yum update and yum install processes to install some things I needed (Apache, PHP etc)
  3. Configured everything to work as I wanted. Remember to use name-based domain VirtualHost configuration on your images, because you don’t know what IP they’ll have when they come online (unless you wanted to factor that into your launch procedure somehow)
  4. While doing all of this, keep track of the process I needed to go through to actually install the codebase that runs FeedBlendr (and some other things) and permissions that needed to be changed etc.
  5. Built out the deployment process/scripts and did some iterative testing to make sure it worked etc.
  6. Deployed 2 instances, added their IPs to my DNS service and switched everything over to being hosted by Amazon — EASY! Right?

Custom Deployment Process

So I think what makes my process a little interesting is my deployment process. Rather than install and configure my complete application on my server, then take a snapshot of that and bundle it up as my AMI, I opted for a process where my AMI doesn’t actually contain my code at all. What happens is that the AMI is configured as a relatively barebones Apache+PHP system, capable of serving anything. When it launches, it calls a few very simple commands, which grab a package from S3, then extract it and execute a script contained within it.

That script does all the magic. It handles relocating files to where they need to be, fixing permissions, creating symbolic links, etc etc etc. It does everything it needs to do to deploy my entire system (including 2 websites, the custom feed handling core, a WordPress installation etc) in about 15 lines of bash script.

Why go to all the trouble of having this de-coupled AMI/deployment process? Simple: I work on the code for FeedBlendr a lot, and it’s undergoing pretty constant revisions. I realized very quickly that making AMIs and uploading them into S3 and registering them etc etc… sucks. It’s slow, it’s tedious, and I wanted to do it as little as possible. So doing things this way, I don’t have to make a new AMI every time I change my code, I just make a new distro package, throw it in S3, then launch a new instance and it’s got it all running. I can also “re-launch” an instance that’s already running (to save me dealing with DNS) by running a simple script which goes through the same process as when my instances first start up to get new code and overwrite everything currently running.

Dealing with DNS

DNS will come up pretty quickly as an issue if you’re working with EC2 – obviously. You launch an instance, you get a new IP. Close it down, launch a new one. New IP. Problem.

The short answer is just get yourself a custom DNS account somewhere. I’m using DynDNS, but they may not be the best. One specific problem I have with them is that there’s no programmatic way to update a hostname that’s configured with round-robin load balancing. I have 2 IPs allocated to the same domain name (feedblendr.com), so I can’t use any of their clients to add/remove/change IPs for that host. That’s something I specifically want to be able to do (have instances automatically jump into my round-robin and start balancing load – so if you know of someone, let me know!). ZoneEdit might be another option, and I know there are all sorts of other providers out there as well.

Set up your hostname in your new DNS service, and configure with a low TTL (Time To Live) (since you want to be able to change the authoritative IPs for your host quickly in case an instance goes away). I have mine set to 300 seconds, but you might even want to go shorter (if your provider will allow you to). On DynDNS, their Custom DNS service (to enable all of this) costs $25 per year, per host. Not too bad.

Now you’re in a position to add/remove IPs to that host and load balance, shift requests to a new instance etc as required. Always remember – instances in EC2 are transient! They may disappear and never come back.

Deployment Distributions

If you’re wondering how I build my packages for distribution purposes – here’s the deal:

  • I use subversion as my code repository/version control system, so everything is in there and up to date at all times (hopefully :-p)
  • I love make, it’s capable of some really cool things, so I use it here and there to automate some project management related tasks
  • I already used make to do local testing (handling exporting from SVN and then setting up permissions/links etc within the project), so it made sense to extend that process to my deployment packages.
  • I can check everything into SVN, then go to my “extras” directory and type make ec2-distro and that’s it
  • It exports all the sub-projects that make up everything that will be deployed on the server, sets up permissions within the scope of the project, creates some internal symlinks (relative file-paths of course) and then tar’s it all up. From there, it uses s3curl.pl to send a copy up into S3 in a pre-defined location, and then it’s done.
  • That package is what gets downloaded to instances and deployed when they launch.

Challenges I Faced/Face

It’s not all smooth sailing. I have had, and continue to have some things I’m not entirely happy with in this process, and in my experience with EC2. Here’s a couple specific ones in no particular order:

  • DNS: I’m not entirely happy with my DNS set up. Right now if an instance disappears, it relies on me noticing and removing it from my DNS entries, then involves some amount of time before that change is noticed. I plan on trying to improve this by figuring out some sort of heart-beat based monitoring of my instances, possibly using Nagios or something like that. I wanted to use something like WeoCEO, but I’ve not heard back from the guys there in the timeframe I was working under, so had to go it alone.
  • Shared Filesystems: I had hoped to make use of the promising S3DFS system, which promises to provide you with a fast-access (through lots of internal caching), shared filesystem, which is backed onto S3, but is accessible as a normal, local filesystem (using the FUSE system). Now here’s the kicker. It promises to enable multiple instances to access the same filesystem simultaneously. I had hoped to be able to use it to have multiple instances share a cache repository between them to improve the performance of my caching backend, and not have both instances downloading the same content right after each other because of round-robin issues. Well to make the story short – there were performance problems that meant that wasn’t an option. BUT! I’ve been in touch with the developers, and they’re working on a beta right now which should address all my problems, so I’m hoping to try it out again and use it in the future.
  • Web Stats: I used to use Analog/Webalizer-type tools to look at my server logs, but with multiple instances serving content, that starts to get difficult, unless you’re willing to log to a central server, or write something custom to deal with merging logs etc. Rather than do that, I installed Google Analytics on my site, so I now get centralized stats from that, but it doesn’t cover my non-Javascript enabled content (e.g. any feed accesses). Luckily I log those details myself, but now it’s more important that I build some good tools for peering into that data.
  • Hosting a Database: I’ve read all sorts of interesting posts about hosting databases within EC2, but something about it just makes me uneasy 🙂 Call me old-fashioned, but I’d like to know that my database was hosted on a machine that’s not going to disappear if it crashes. I suppose it’s just one more level of true redundancy to deal with right? I haven’t figured out master-master replication which seems to be an obvious requirement for that yet, so I’m not 100% happy with my database situation just yet.
  • Keeping My AMI Generic: Because I wanted to be able to modify my AMI as little as possible over time, I actually ended up moving my PHP and Apache configuration files into my distribution package as well. I have a directory called “extras” which contains things generally related to deployment, including a vhosts.conf file, and a php.ini. During deployment, these files are copied into place on the server and then Apache is automatically restarted. This allows me to customize my Apache configuration (including RewriteRules etc), without having to modify the AMI.

Handy Tools for EC2/S3

Here are a couple tools I found useful in this process, which might help you out as well:

  • s3curl.pl — a really handy little Perl script that you can use as a cURL wrapper for doing command-line requests against S3. Great because it handles the complex authentication stuff, you just give it your access keys and it takes care of things so that you can use it basically the same as you would use cURL on the command line.
  • S3 Browser — a very cool, lightweight and simple tool for checking out what you have in S3 buckets (and uploading/downloading/deleting things)

That will do for now – please ask in the comments if you have any questions and I’ll answer them here and/or revise this post to reflect new information.

Cheers — Beau

Amazon EC2+S3, Here We Come

Tuesday, May 8th, 2007

I’ll explain in detail once I’m moved, but if you’re having problems with FeedBlendr, it’s probably because I’m in the process of moving servers (for real this time).

UPDATE 2007-05-13: As promised, here’s some more information about this Amazon business.

After running FeedBlendr on DreamHost for a year and bit, and having a few problems along the way with causing too much load, they finally pulled the pin on me. They’ve been very good about things, and worked with me to help identify and solve the problems I was causing (I was on a shared server, so my usage was affecting other customers). Basically, in the end there wasn’t a “fix” per se, because I just had too many people requesting blends too many times a day (over a MILLION times a month!), and had to do something about it.

In comes Amazon. For those of you who don’t know, Amazon has started getting into the Web Services world, and 2 of their offerings are of particular interest to me (us!):

  1. Elastic Compute Cloud (EC2): A system whereby I can request a new “copy” of a complete server on-demand, and use it as part of a cluster of machines to power FeedBlendr.
  2. Simple Storage Solution (S3) Unlimited, fully-redundant (in the good way) online storage, allowing me to keep copies of things out there in “Amazon-space” where my new EC2 servers can get at it.

I won’t bore you with all the details (although feel free to get in touch if you’re interested), but FeedBlendr is now running on 2 Amazon EC2 “instances” in a balanced manner (requests go to both machines), so hopefully performance is a lot better, and things will be more reliable. You may also have noticed that I fixed some caching bugs, so blends load faster, and should be more stable. I’ve also bumped up the minimum age for blends slightly, so you may notice now that blends can get slightly older before they will get refreshed. This is mainly because people were just requesting their blends too often, causing my servers to have to rebuild a complete blend every 5 minutes, just because a single feed changed. I’ll be looking at better ways to address this, but in the mean time, please let me know if this causes any problems.

Here’s to looking forward, and seeing FeedBlendr continue to improve and serve your feed-blending needs better!

FeedBlendr… Not Relocating Anywhere

Thursday, June 8th, 2006

Change of plans!

Thanks to a new policy at DreamHost, there’s no immediate need to relocate, so FeedBlendr will be staying right where it is for now.

I’m hoping to get a bit of a chance to do some refactoring and what-not in the near future though, which will make blendr a little more flexible, and a little more standards compliant 🙂

FeedBlendr Relocating To A New Server

Tuesday, May 30th, 2006

Some time over the next week, I will be re-locating FeedBlendr.com to a new home at hosting.com. During that process, you may experience some problems accessing blends, but I hope to minimize those problems as much as possible.

I will post again when things look stable 🙂

Server Load Problems – Now Accepting Donations

Tuesday, May 9th, 2006

With increasing popularity come a number of problems that are only to be expected. Right now, FeedBlendr is experiencing one of those problems – scalability/load issues.

After a number of warnings/notifications from DreamHost, I’ve been asked to either figure out a way to lower my processor usage immediately, or risk having my account disabled until I can do so. I’m using too much of the processor on my shared server and it’s unfair to the other users – fair enough.

Now I’m faced with a decision: do I shut FeedBlendr down (I don’t want to)? is there something I can do to lower usage (not that I’ve found yet)? can I justify upgrading my hosting (without making any money from FeedBlendr under the current model)?
At the moment I’m considering upgrading to a dedicated server from Hosting.com, which should give me the power/flexibility I need, but it’s a lot of extra management/set up etc that I’d have to deal with just to get it all happening, and life is just plain busy right now. So as a stop-gap, and either way, I’ve decided to open up for donations, allowing anyone who would like to do so to show their appreciation for the service FeedBlendr offers, by dropping a few bucks in my virtual tip-jar.

This may well affect my decision on whether or not to keep FeedBlendr live – if people aren’t even willing to drop a few dollars once-off in appreciation of the service, then I don’t know if I can continue providing it, and pay extra for the privelege of keeping FeedBlendr public and free.

It’s up to you people – so please show how you feel about FeedBlendr and donate now!

Heavy Load

Sunday, April 2nd, 2006

FeedBlendr has been experiencing a bit of popularity growth, and so I’m handling more feeds/blends now. This has started raising some eyebrows over at DreamHost (where this site is hosted), and fair enough – looks like I’m using a lot more processor time than is fair on a shared server.

I’m making some modifications to ensure that caching is working locally every time, and trying to avoid remote calls when possible because they seem to be causing the problem, so please bear with me in the meantime if there are any problems. If you’d like to do your part, then please reassess how often you’re requesting your blend – you really don’t need to be hitting them every minute (which it appears some users are doing!). In the future I may have to throttle certain blends if they are requested too often from the same IP and just block some requests.

On a brighter note, if you love FeedBlendr and want to support its future development (bandwidth costs, development time etc), then please jump over here and Make A Donation!

Please let me know via comments or email if something is broken from the changes that I’ve made, or the ones I make in the near future.

UPDATE 2006-04-12: Previous changes didn’thave enough of an effect, so I’ve made some more caching changes and will wait to see what effect they have on my processor load problems.