Cloud Computing & Google Architecture

This was one of the most interesting presentations of the google infrastructure I have ever seen.
You don't have to be an IT guy to enjoy this one.

I have always been a Google fan, and to able to understand the way their clusters work is just great.
Not everything in these videos is understandable but they do their best to explain everything from a "young" guy perspective. (at one point they put you through a TCP/IP course)

You will get acquainted with
--GFS and the way it stores it's data into 64mb chunks
--Bigtable which is the simple implementation of a non-relational database at Google
--MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware)

The one thing that really amazed me is the way they provide fault-tolerance.
All the applications that work on top of the linux driven servers at Google are "aware of the fact" that they are running on commodity hardware and anything can crash at any point so they build it so it can survive a 30 % server crash.

Some interesting facts:
-- each piece of data is kept on at least 3 different servers
-- because they don't want to be I/O limited by their HDD most of the information is kept in memory
-- the directory servers that are coordinating GFS and are aware of files place on the storage servers and metadata, are in a Linux HA scenario.

But you will have to watch so you understand my enthusiasm.
Of course most of the software (if not all) is proprietary but open-source implementations of this kind of architecture is being developed and used by Yahoo ( Hadoop for the equivalent GFS filesystem, Nutch for the crawler, Hadoop DB for Bigtable)

Enjoy it

Comments