For some strange reason I haven't dived into Amazon's set of web services until only yesterday. Previously I had dismissed them (without looking at them) as interesting web services for book stores and the like. After reading about S3 in the book RESTful Web Services by Sam Ruby, I understood that I had been wrong. And so yesterday I signed up, and was hit by the epiphany.
Amazon web services are without a doubt software as a service. You buy storage space. They have the software (and vast amounts of hardware, no doubt). You want to store some data? Fine, call this library function and it will be stored. You want it back? Here's this other function. Just like Java has "new FileOutputStream(String)", so Amazon's S3 allows programmers to store data. The programmer doesn't care about where it's being stored! That's completely irrelevant when it comes to long term (decades) of accumulated data.
Ok, first of all, you as a developer need to know that there is a service which can take your data and keep it secure and allow you to get it back, just like a file system, only you don't need any hardware. A bit like flickr can take your holiday snapshots and keep them for you. A bit like blogger keeps your blog posts around. Amazon S3 keeps stuff around for you too, except that Amazon S3 is designed for software developers, and not your aunt.
And it's cheap. Since Amazon is so big and they have all of this data already (think about how many books they have scanned, and how many mp3s can be downloaded) they have a lot of know-how in managing large (what an understatement) sets of data. So the incremental cost of a gigabyte is pretty low. They've priced it at ten cents. Oh, that's per month, so a full buck for a whole year. It's just silly.
Your data is secure and available 24/7/365. You don't need to buy a new disk, or a RAID controller, or a NAS, or a SAN, or book redundant locations in case of fire. Just send it over the wire to a bit bucket hosted by Amazon...
Oh, I forget, this story was more about discovering Amazon's web service and my epiphany. RIght.
Well you can imagine when you go to Amazon's web service portal and see their plethora of acronyms staring you in the face; they're not very intuituve. S3 is for storage. Check. The rest are as follows:
EC3, Elastic Compute Cloud: Need a Linux box to host your application? Upload it to EC3 and they will actually run it for you on some really big boxes. You pay for uptime.
FPS, Flexible Payment Service: Need to transfer money? Allow Amazon to handle the transaction.
Mechanical Turk: Do you need an army of human workers to do some work on your data? Amazon has an army of human workers, and they will work on your data.
SQS, Simple Queue Service: Do you need to be able to queue up work or data and retrieve it FIFO from another location or application? Amazon has a queue for you.
I clicked my way through the descriptions of these web services and I thought: "Wow. it really takes the pain away from scaling an application!" Most of the services are close to free: If you have an application and make use of all of the services, the costs may rise to a dollar a month if you're lucky to see that amount of traffic. And if your application really takes off and you're thinking "I hope my servers can handle the load" then your monthly bill will probably be higher, but hopefully you'll be making a few bucks off the traffic too.
You can host your application in a virtual machine running on Amazon's hardware. That's neat. You write code. You upload it. You pay for CPU cycles. Your app can store its data in Amazon's storage solution. You can queue things between different applications, and bill your customers, all using web services. Your application would be scalable and reliable should the need arise, in which case the application would also probably be profitable.
After signing up I needed to test S3 out of course. I've been looking for a place to store all my digital photos, genealogy data and other " important" stuff. And I would like to pay for such a service. So S3 seemed like the obvious choice. I googled around and found s3sync.rb to upload lots of data. Through the night it toiled away and after a day I had around 6Gb backed up, and a bill of around 72 cents. Sweet.
As I was signing up I noticed that there was another service called " SimpleDB", and although I swore I had gone through the entire list of services I thought "Was this just released, or what?" It didnt say so, just that it was in limited beta. It's service which allows you to post map structures (key value maps) and query them later. A lot like a database, except that you don't have to tell the server what keys there are. It just discovers them as you post them, and allows you to extract them using an SQL like declarative language. Like most other amazon services, it offers SOAP and plain HTTP based interaction.
I googled it and found blog postings about it from nine months back, so I deduced that it wasn't new. (Not that new anyway.) To my surprise, I was reading blog postings on SimpleDB the day after, about how cool it was that it was finally out!
Damn! The one time I know I was at the right place at the right time and I didn't recognise it! :-D Double damn!
Ach so! Today I thought I'd check out a linux box offered by the Compute Cloud...