BizJournals Portfolio

Google's Secret Formula

How Google Works How Google Works

See how the search sausage gets made at Google. See All Video & Multimedia

Google's Comic, Stripped Google's Comic, Stripped

Google first introduced the Chrome browser through a somewhat unconventional medium: a comic book. Check it out. See All Video & Multimedia
PREV 2 of 2

Cluster Control

Google’s genius lies in its net­working software, which helps thousands of cheap computers in a cluster act like one huge hard drive. Those inexpensive computers allow Google to replace parts ­without stopping the whole show: If a computer drops dead, there are at least two others ready to take its place while an engineer swaps out the busted machine.

Power Power

Just about the only thing limiting Google’s performance is how much electricity the company can buy. One of its newest data centers (code name: Project 02) is near the ­Columbia River in The Dalles, ­Oregon, which has access to 1.8 gigawatts of cheap hydroelectric power; not coincidentally, this is where ­major internet hookups from Asia connect to U.S. networks. The byte factory has two computing centers, each the size of a football field.

Petabytes

Based on the few numbers Google releases, experts guess that at least 20 petabytes of data are stored on its servers. But Googleytes are ­famous for understatement; Wired says Google may have 200 petabytes of capacity. So how much is that? If your iPod were just 1 petabyte (one million gigabytes), you’d have about 200 million songs to shuffle. And if you started downloading a ­petabyte over your high-speed internet connection, your great-great-great-great-grandchild might still be around when the last few bytes get transferred, in 2514.

Page Rank

Google decides how reliable a site is—and thus how important the site’s content will be when Google forms a list of search results—by considering more than 200 factors as it analyzes content. But the secret sauce is Google’s patented formula for following and scoring every link on a page to learn how different sites connect, which means a site is deemed reliable based largely on the quality of the sites that link to it.

Googlebots

Google deploys programs called spiders to build its copies of the internet. On popular sites, Googlebots may follow every link several times an hour. As they scour the pages, the spiders save every

bit of text or code. The raw data are pulled back into the cluster, run through the mill, and scheduled to incrementally replace the older data already on the ­index and doc servers, ensuring that results are fresh, never frozen.


Comments

If you are commenting using a Facebook account, your profile information may be displayed with your comment depending on your privacy settings. By leaving the 'Post to Facebook' box selected, your comment will be published to your Facebook profile in addition to the space below.

Connect With Portfolio.com

Come on, like us—you know you want to.

Follow us and if you're an innovative entrepreneur, we'll return the favor.

Today's top stories, conversation starters, and the back nine business bites.

spotlight on

Best of Michael Lewis

The End

The era that defined Wall Street is finally, officially over. Michael Lewis, who chronicled its excess in Liar’s Poker, returns to his old haunt to figure out what went wrong. Read More