White Paper on 'Cloud Architectures' and Best Practices of Amazon S3, EC2, SimpleDB, SQS

Filed Under (Thought Pieces) by AWS Editor on 16-07-2008

Tagged Under : , , , , , ,

I am very happy to announce my white paper on Cloud Architectures is now ready. This is one incarnation of the Emerging Cloud Service Architectures that Jeff wrote about a few weeks ago.

If you are new to the cloud, the first section of the paper will help you understand the benefits of building applications in-the-cloud. If you are using the cloud already, the second section of the paper will help you to use the cloud more effectively by utilizing some of the best practices.

In this paper, I discuss a new way to design architectures. Cloud Architectures are Services-Oriented Architectures that are designed to use On-demand infrastructure more effectively. Applications built on Cloud Architectures are such that the underlying computing infrastructure is used only when it is needed (for example to process a user request), draw the necessary resources on-demand (like compute servers or storage), perform a specific job, then relinquish the unneeded resources after the job is done. While in operation the application scales up or down elastically based on actual need for resources. Everything is automated and operates without any human intervention.

Figure2_2

As an example of a Cloud Architecture, I discuss the GrepTheWeb application. This application runs a regular expression against millions of documents from the web and returns the filtered results which match the query. The architecture is interesting because it is runs completely on-demand in automated fashion. Triggered by a regex request, hundreds of Amazon EC2 instances are launched, a Hadoop Cluster is started on them, transient messages are stored on Amazon SQS queues, statuses in Amazon SimpleDB, and all Map/Reduce jobs are run in parallel. Each Map task fetches the file from Amazon S3 and runs the regular expression - and aggregates all the results in the Reduce/Combine Phase and then disposes all the infrastructure back into the cloud (when the Hadoop job is processed)

GrepTheWeb is one of many applications built by Amazon that uses all our services (Amazon EC2, Amazon SimpleDB, Amazon SQS, Amazon S3) together.

Figure4

A wide variety of different types of applications that can be built using this design approach - from nightly batch processing systems to media processing pipelines.

An excerpt:

Cloud Architectures address key difficulties surrounding large-scale data processing. In traditional data processing it is difficult to get as many machines as an application needs. Second, it is difficult to get the machines when one needs them.  Third, it is difficult to distribute and co-ordinate a large-scale job on different machines, run processes on them, and provision another machine to recover if one machine fails. Fourth, it is difficult to auto-scale up and down based on dynamic workloads.  Fifth, it is difficult to get rid of all those machines when the job is done. Cloud Architectures solve such difficulties.

Applications built on Cloud Architectures run in-the-cloud where the physical location of the infrastructure is determined by the provider. They take advantage of simple APIs of Internet-accessible services that scale on-demand, that are industrial-strength, where the complex reliability and scalability logic of the underlying services remains implemented and hidden inside-the-cloud. The usage of resources in Cloud Architectures is as needed, sometimes ephemeral or seasonal, thereby providing the highest utilization and optimum bang for the buck.

In the first section I discuss the advantages and business benefits of Cloud Architectures and how each service was used. In the second section, I discuss best practices for the various Amazon Web Services.

You can download the PDF version or access it on AWS Resource Center

I talked about this briefly at the Hadoop Summit 2008 and QCon 2007. I got some good reviews after the talk and hence I decided to put all my thoughts in this paper along with some Best Practices for the use of Amazon Web Services (Amazon EC2, Amazon SQS, Amazon S3 and Amazon SimpleDB together). Many developers from our community have been asking for a real-world example of a complex, large-scale application. I will presenting this paper at the 2008 NSF Data-Intensive Scalable Computing Workshop at UW and 9th IEEE/NATEA Conference on Cloud Computing later this week.

I believe this new and emerging way of building applications, that run in-the-cloud, is going to change the way we do business.

– Jinesh

The Emerging Cloud Service Architecture

Filed Under (Thought Pieces) by AWS Editor on 03-06-2008

Tagged Under : , , , , , ,

I’m going to go out on a limb today and try to paint a picture of where some of this cool and crazy cloud-based infrastructure may be going. While none of what I will write about is idle speculation, it is based on just a few data points, and may be totally off base. However, I do get to talk to plenty of entrepreneurs and developers on a daily basis, and I am starting to see a very interesting pattern emerge.

Skynet_smugmug
The existing state of the art in cloud-based architectures takes the shape of an application running in the cloud, calling upon services running within and provided by the operator of the cloud. There are any number of great examples of this type of architecture. Doug Kaye at IT Conversations built and documented his implementation over a year ago. Earlier today, Don MacAskill of SmugMug send me a link to his new post, SkyNet Lives (aka EC2 @ SmugMug). In that article, Don provides a detailed review of SmugMug’s use of Amazon EC2 and S3 to implement a dynamic, highly scalable system which simultaneously minimizes response time and cost by optimizing the number of EC2 instances.

As I said, I am starting to see something which goes beyond this in a subtle yet important way. Developers are now building services in the cloud for other developers, with the understanding that important (and perhaps primary) consumers of the service will also be resident within the same cloud.

I’m going to call this the CSA, or Cloud Service Architecture.

Applications communicating with each other inside of the Amazon cloud enjoy some important benefits. They get high-bandwidth, low-latency communication, at little or no cost. They inherit all of the other attributes of cloud-based applications such as on-demand scalability, fault tolerance, cloud-wide network security, and cost efficiency. Applications running in loosely coupled fashion within the cloud can share data using SQS, S3, or other communication protocols of their choosing.

Right now, I see that forward-looking companies are starting to build components which fit into the CSA. On the database side, we have Vertica for the Cloud and MySQL Enterprise for EC2. On the media side, there’s Cruxy’s MuxCloud, IntrIdea’s MediaPlug, and Wowza Media Server Pro for Amazon EC2. I’m sure that there are others that I don’t know about.

Two_point_trend
So who’s calling these services from other EC2 instances within the cloud? Here are my first two data points (that’s enough to draw a trend line, right?):

  1. I had breakfast with the CEO of Sonian yesterday. He told me that they are now using the Vertica product to help them store, index, and retrieve massive amounts of data (more info can be found in their case study).
  2. Earlier this year I paid a visit to VisualCV in Reston, Virginia. They use MediaPlug to support uploading and processing of a variety of types of images and videos.

My sense is that this is the start of something big. Web services made it possible to cross organizational boundaries with a simple HTTP request. Now, running within the cloud makes it possible to do this with minimal network latency.

As individual developers learn more about cloud computing, they will naturally look for some very high-level components up and running within the cloud. Over time I am sure that there will be a need for more sophisticated tracking and billing mechanisms, key management, a catalog of services, and other facilities that we can’t even envision just yet. As always, we love to get this feedback from you, so let us know what you need.

I’m sure that there are some other CSA-style applications running in the Amazon cloud now. If you’ve built one, post a comment!

– Jeff;

Two Good Podcasts

Filed Under (Thought Pieces) by AWS Editor on 28-04-2008

Tagged Under :

Rightscale_mashable_podcast
I hardly ever listen to broadcast radio in my car anymore. Instead, I subscribe to a whole bunch of podcasts, some technical, some fun, and others educational. Here are two episodes which should be of interest to anyone who reads this blog:

The Mashable Podcast interviews Michael Crandell, CEO of RightScale. Michael talks about their product and how it helps organizations to use Amazon EC2 in a cost-effective fashion.

The IT Conversations Podcast captures Amazon CTO Werner Vogels as he talks about AWS at last years ETech conference.

You can listen to either or both of these on the respective sites or you can simply subscribe to their RSS feeds.

– Jeff;

PS - Congratulations are due to to RightScale for the successful completion of their fund raising endeavor.

ABOUT

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Quisque sed felis. Aliquam sit amet felis. Mauris semper, velit semper laoreet dictum, quam diam nec...

ReadMore