Distributed computers power new search engine
A distributed computing project called Grub, which harnesses individual users' spare computing power and internet bandwidth, began cataloguing millions of web pages this week.
The project's home page says that in the last 24 hours over 36 million web pages have been catalogued by Grub software installed by users on about 1000 personal computers around the globe. Like SETI@home and other distributed computing projects, Grub runs in the background on a computer's spare capacity. It automatically trawls the web and collects details on thousands of pages per hour and returns this information to a central database. The Grub screen saver that displays the websites the program is scouring.
LookSmart, the US company behind Grub hopes that eventually the project could provide enough raw data to keep a comprehensive search engine up-to-date. The company believes the distributed service has the potential to one day rival Google, the web's most popular search service. Google's success is built on a different approach. It relies on huge banks of servers to catalogue the internet and uses yet more computing power to return search results to millions of users almost instantaneously.
Website information collected by Grub is already being fed into one of LookSmart's search services, called WiseNut. But the collected data are also freely accessible to the public, so they can be incorporated into any web site or desktop application. Anthony Rowston, an expert in distributed computing at Microsoft's research laboratory in Cambridge, in the UK, told New Scientist: "Technically I can't see anything wrong with it." But he adds that, as there is no clear incentive to download the client, the service will depend on the good will of volunteers for its success.
Danny Sullivan, editor of Search Engine Watch, says the biggest challenge facing most search engines is choosing which pages to search rather than simply the need to search as many as possible. Google is thought to search over 150 million pages per day, but relies on other tricks to rank the significance of pages, such as the number of links between them. LookSmart has said Grub may eventually incorporate more advanced features like this.
Sullivan also raises concerns that the software could be manipulated to make searches favour particular sites over others. "I have more faith in companies that control their own crawl and index than I do in approaches that ask people to submit their own data," he told Wired News.
Source: New Scientist.com