Battle of the search engines: secret Google weapons
Very few people have raised the issue of why Google Inc., through Google Labs is involved in Google Compute. Google Compute is an organization contained within Google Inc., to promote distributed computing projects.
Very few people have considered why Google Inc., is even involved in such a project. Microdoc News explores this issue. According to the Google Compute page, Google Labs is involved in a distributed computing project because Google Inc., have high expertise in the area of distributed computing:
The Google search engine uses more than 10,000 networked computers to deliver results to millions of users worldwide, making it one of the largest distributed computing systems in existence. In addition to providing leading search technology, we are also interested in solving other important computationally intense problems. While we're not experts on protein folding, we do know quite a bit about using networked computers to solve difficult problems involving terabytes of data.
Indeed, Google Inc., have capability in this area. However, one has to recognize that universities also have many heads thinking about a problem and that by aiding a university in solving an issue of obtaining enough computing time through Google Inc., clout with its users, Google Inc., also have access to these minds who are solving distributed computing problems. Google Inc., has much to gain from such cooperation.
Google Inc., are in need of staff -- look on the Google Labs front page and they have an advertisement ofr staff. And through working with a university of people who are working on distributed computing problems, Google Inc., gets to see the minds at work, and therefore ideal people to hire in the future. The benefits are more than this however. Distributed computing models are very effective in handling large amounts of data. And Google Inc., aare in the job of handling large amounts of data, whether it is in its mainstream business of search or in its Blogger business, Froogle business or whatever it is into. The more experience Google Inc., has in distributed computing the better off Google Inc., will be.
Added to this, however, is the fact that operating a voluntary distributed computing program, Google Labs gets to do some market research. How effective is volunteer distributed computing? What are the pitfalls? How many people does one need to be effective in extending the capabilities of an existing set of computers? Distributed computing is not new and there are numbers of distributed computer projects available on the Internet. In fact, there is a distributed computing project that uses volunteer computing time to create an extensive search engine database. Google at all times needs to evaluate the effectiveness of running a huge PC farm as against co-opting volunteer time, and what better way than to give back to the community and at the same time learn a lot about volunteer distributed computing.
Would Google Inc., use volunteer distributed computing for its crawling needs? I doubt it very much especially from comments about Grub:
"I don't want more computers or bandwidth," he said. "I want more clues about which page to look at rather than another page. The problem is how to rank the right pages. I don't think whether you are a distributed architecture affects that. The problem for us is how do we direct the crawl, not do we have enough resources to get the crawl."
Google is also experimenting with distributed computing. The Google Search Bar, which adds search capabilities to a Web browser's toolbar, donates spare cycles to Stanford's Folding@Home project, which simulates the ultra-complex process of protein folding. However, the director of search, Norving, did not say Google would not use distributed computing in the future. He simply said it was not applicable to Google Search at the moment.
Google Inc., is learning much about distributed computing, and it will always need to keep its options open as to how distributed computing whether it is voluntary or not, can be used in the future. Added to this, though, is the fact that blogs, the web, and in fact the Internet is largely composed through a distributed computing model, and in many cases a voluntary distributed computing model. It would not surprise me if Google Inc., came out with a voluntary distributed computing model of information collection from blogs - a way of sidestepping the need for crawling blogs.
Source: Microdoc News