Google to create search tool for weblogs
Google is to create a search tool specifically for weblogs, most likely giving material generated by the self-publishing tools its own tab.
CEO Eric Schmidt made the announcement on Monday, at the JP Morgan Technology and Telecom conference. 'Soon the company will also offer a service for searching Web logs, known as "blogs,"' reported Reuters.
It isn't clear if weblogs will be removed from the main search results, but precedent suggests they will be. After Google acquired Usenet groups from Deja.com, it developed a unique user interface and a refined search engine, and removed the groups from the main index. After a sticky start, Usenet veterans welcomed the new interface. Google recently acquired Blogger, and sources suggest this is the most likely option.
Bloggers too are likely to welcome their very own tab as a legitimization of the publishing format. But many others will breathe a sigh of relief as blogs disappear from the main index. "I just want a search engine that works," laments Chris Roddy, a politics and linguistics undergraduate at the University of Emory.
"I can get a Google search with porn turned off; why can't I get blogs turned off too?" he asked on Slashdot. Google has strived in vain to maintain the quality of its search results in the face of a blizzard of links generated by a small number of sources. (Google searches 3,083,324,652 pages as of 4PM PT today. Assuming there are one million bloggers, and generously assuming they have a hundred pages each, that amounts to 0.032 per cent of web content indexed by Google. Recent research by Pew put the number of blog readers as opposed to writers, as "statistically insignificant").
However, through dense and incestuous linking, results from blogs can drown out other sources. "The main problem with blogs is that, as far as Google is concerned, they masquerade as useful information when all they contain is idle chatter," wrote Roddy. "And through some fluke of their evil software, they seem to get indexed really fast, so when a major political or social event happens, Google is noised to the brim with blogs and you have to start at result number 40 or so before you get past the blogs." We'd noticed.
"Taking Usenet out of the general search was great, because it is not really interfering with general Internet searching," Roddy told us. "Usenet was a public forum in the first place." A Slashot discussion prompted a suggestion that Google add a -noblog option, which it effectively appears to be introducing by default.
Gary Stock, chief technology office for Nexcerpt, Inc. agrees. "A year or two ago you could hit 'I'm Feeling Lucky' and there was a good chance that you could find a good and authoritative page," he told us.
"It is less the case today. More and more people have more text to type, and may not have anything authoritative to say - they just throw up characters on the screen." He says that the link-based algorithm called PageRankô was designed, at Stanford University, with very different assumptions about the quality of information.
"They didn't foresee a tightly-bound body of wirers," reckons Stock. "They presumed that technicians at USC would link to the best papers from MIT, to the best local sites from a land trust or a river study - rather than a clique, a small group of people writing about each other constantly. They obviously bump the rankings system in a way for which it wasn't prepared."
For Stock and Roddy, the problem is that the resulting degradation in the quality of information makes it even harder to find primary source material. Roddy said the realization came after searching through 500 blog entries to find a primary source.
Exacerbating the problem, says Stock - who devised 'Googlewhacking', or the art of producing a search query that returns just one result - is the frequency with which the sites are indexed. "If they are really spidering all 3 billion pages, then they must have changed some law of physics," he explains.
"Someone has made a choice whether to go to a site ever hour or every three years. That begs the question - if I know something to be a high traffic site and I train my robots to visit often, do I discount it when I feed my information to PageRank?"
For example, he cites a hypothetical. "Suppose turtle-rescue.org has authoritative information about turtles. And it changes every month. Then BoingBoing puts up a page about turtles and that becomes a big deal.
"Each of us gets vote," jokes Stock. "And someone votes every day and I vote once every four years." "The blogs push up very quickly up to the top of the search results."
"To me the power of what Dave Winer and Ev Williams have done, and it's great, is that I can easily publish ResourceShelf in seconds, giving me time to do other things," says respected author and librarian Gary Price. Price doesn't regard his site as a weblog, even though he uses Blogger tools, now owned by Google. Price co-authored The Invisible Web, a guide to little-known about public resources on the Internet [Amazon - review].
"But what happens when the weblog fad dies down?" he asks. "The public think that they can put 2.1 words into Google and the best answer will appear, they don't ask how long is it taking them to get it. For the average person - its very good, but there are choices out there; and a lot of people aren't aware of them and don't know."
"You have to realize there are other information sources, and that information costs money." "This is why New York Public Libraries has a sign 'Here's where you find the stuff that isn't in Google.' and much of this is publicly accessible," Price points out.
Or as Seth Finkelstein reminds us,"Google is good, but not God." (We'll follow up what, and how to get it, soon).
Source: The Register