The Google Page Rank explained
Interview with Chris Ridings - Google PageRank expert
From Search Engine Blog.com:
Today we talk to the rather knowledgeable Chris Ridings about (surprise, surprise) PageRank , SearchKing and cowboys.
Thanks for taking the time to talk with us Chris. Can you tell us a bit about yourself? What is your background?
Well, it looks like people are stuck with me rather than the "blind lesbian circus performers with a grudge" that you practically promised everyone last week. It's a shame, I was really looking forward to reading what they had to say.
I'm sure your readers will try to hide their obvious disappointment. I come from the silicon valley of SEO, otherwise known as the UK. I could try and make myself sound interesting by starting out with all my rubbish jokes like "Well, I was born and everyone just had to make the best of it", but Mike Grehan had the cheek to be funny so it would just show me up.
I could try to dazzle you with my knowledge of busines, but Ammon Johns seems to have covered that angle too. So I guess you just get the real me - one bloke stuck in an office on the one sunny day a year we get on an island where it rains a lot and everybody apparently drinks tea and eats fish and chips. I've been using computers for a pretty long time and in fact I cut my teeth on a ZX81 when I was 7 (I think they were called Timex's in the US?).
Let's face it, about that time in history every parent plonked their child in front of a ZX81 because it promised their child a prosperous future in IT. I quickly learnt that if pilot's pressed the "j" key too much in their aeroplanes then their cockpit's filled up with grey blocks and the plane silently melted in to oblivion. Pretty soon I'd managed to crash most of Sir Clive Sinclair's Boeing fleet in to what I can only presume was Iceland, so I learnt to program by typing in listings from magazines.
That was the only way you could get decent games back then. Okay, more accurately - that was the only way you could get any games back then (decent being a relative term). When I went to university in 1993 they had this bizarre thing called the Internet, which for some reason I had never heard of until then. I learnt that it was invented by the military so that, presumably, they too could email their friends across the room to arrange which pub they were going to that night after a while of struggling with lynx (the text based browser not the cat) somebody showed me this revolutionary tool - Mosaic. That would be when I first got in to web design.
Whilst the founders of Yahoo were busy putting together a multi-million dollar list of sites I was busy putting a picture of a cathedral and a link that said "this is my friends site" on to a web page that virtually nobody ever saw.
It took me weeks, mostly trying to find somebody to host it. Nowadays, of course things have progressed and we have geocities to create those pages in minutes. Frankly, I'm disgusted - if you don't have to spend six months learning "vi" to create a web page that says "Hello World" then what's the point?
After I left University I went in to a techie job, gradually ending up doing Network support. To cut a long story short, during that time period I didn't use the internet that much. It was harder for me to get hold of and didn't seem worth the effort, there was little need for me to email people across the room - that's what pagers were for. Then I went back to college and university to do a business degree and the internet was back in force, although apparently we were now meant to use it to find information and do research.
Whilst I was doing my business degree I started getting back in to doing web sites and such, and obviously I wanted more visitors to my site. So I started reading as much as I could. Obviously I became a skilled master of meta tags, automatically submitting sites, link exchanges and FFAs. I knew to follow the advice I read to the letter in order to maximise my success. Or in other words - I was naïve. "It's right there on the web, and they rank okay, so it must be true". So I decided that anybody who gave Internet Marketing advice was not to be trusted!
If you don't trust anybody that gives Internet marketing advice, that tends to give you a minor problem in terms of getting information! So at the time I began to look at research papers and so on, taking me closer to the original sources.
There was one in particular that I read, from Stanford about Google. Google by this time was pretty much everything, so I set about reading it. Whilst I read it I made a few notes, which turned in to a list of points. The list of points turned in to a very short article about how to rank in Google.
Having used the points myself, I gave the article away freely and a new SEO commentator was born (although nobody would have ever heard of him and nobody cared :-) ). Following backlinks and the like I started to find sources of internet marketing information that were perhaps less wrong then the ones I had previously found. It's that age old thing that good information tends to cluster together. So I gradually picked up hints and tips on things that actually worked! Over time, my faith in some internet marketers grew and I realised there was good information amongst a lot of junk.
About that time PageRank was one of the most misunderstood things in the SEO world. So I started gathering together information and reading research papers to enable me to grasp an understanding of PageRank. At the time I was also programming some small scale search engines, so I had opportunity to do some tests and experiments. I wrote all this down and PageRank Explained was born. I should probably, at this point, remark that PageRank Explained was a grammatical nightmare and a lot of it's readability was down to Jill Whalen who offered to put little yellow comments on it and refused to take them off until I spellled and said things proper like. PageRank Explained turned out to be more popular than I thought.
Skipping time a bit, so this doesn't become the longest background ever, PageRank Explained afforded me the opportunity to get to know some very clever people, start Support Forums and of course rewrite it with Mike Shishigin as "PageRank Uncovered".
Whether I'm technically an SEO or not is probably a matter of opinion, perhaps we could have SEORank and judge that with a little green bar on a scale of 1 to 10? I have very few clients who generally approach me and who have specific SEO work needing doing.
It's all top secret hush hush due to client confidentiality, competitive advantage, and of course my men in black obsession. I don't work on SEO full time, mainly because nobody's offered me a fortune to work for them, and because this enables me to have more time to research. Probably one of the differences between myself and many SEOs is that my background leads me to be about having a general understanding but then focusing on a few things really well. I'm the kind of person that has to take things apart for themselves to see how they actually work, that won't believe the sun goes round the earth unless he's actually done the work to deduce it himself.
The kind of person you wouldn't want to show your new computer too because at the end of it you would have a mass of disconnected cards, a motherboard and a few sticks of memory.
You're well known for your deconstruction of PageRank. For those new to the topic, can you explain briefly what PageRank is and how it operates.
"Well known", you know how it is Peter - there's hardly a day I can walk down the street without somebody asking me about PageRank, it's difficult to go to a restaurant without getting troubled by the paparazzi and holidays are just impossible.
It has it's advantages, the groupies, the emails from adoring female fans awe-struck by my knowledge of PageRank. But the telephoto lenses and the rumours hit hard. It's all okay now, I've done rehab and the lucozade addiction is under control. If I wear dark glasses and a hat I don't get recognised by quite so many people and the press crews have moved on. Should we maybe settle on just known? :)
One of the reasons that PageRank is so hard to explain and understand is because it is analogous to something we all do all the time without thinking about it. Consider I want to buy a new DVD player, I might ask a group of friends what the best DVD player to buy is.
Now some of them are going to give me names of DVD players, but some of them are going to say "I don't know, Tim knows a lot about DVD players". When I talk to Tim, I have a greater respect for his advice because everybody said he knows this stuff. Now if Tim says "Ask Harry too, he knows a lot about DVD players", then despite the fact that nobody else has told me to ask Harry I can assume that Harry probably does know more than the rest (although probably not more than Tim).
PageRank is the same mechanism. Instead of trying to find out who knows most about DVDs, it tries to find out what pages are the most "important". It's hard to actually "ask" a page something, and talking to a computer monitor is not the best way to impress your colleagues, so they make a general assumption. That assumption is "If a page links to another page then it thinks that page is important". There are lots of things wrong with this assumption, but you said "briefly" so I'll leave people to research more if they want to.
Just like Tim's advice to "ask Harry also" is given greater weighting because most people told me to ask Tim, if a page has lots of "important" links, when it says another page is important then PageRank gives that more merit.
By using a mathematical formula to work all this out, Google ends up with a numerical value for each page. Known as it's PageRank. One of the common misconceptions is that if a page has a higher PageRank than another page, then it must rank higher in the results.
This, of course, cannot be true or the page with the highest PageRank on the web would always show first whatever you searched for. To get the actual results you see on Google's results page they will actually first calculated the relevancy of the page to the search term you typed and then apply the PageRank to sort them. When PageRank is applied to sort them it is applied in a way that gives a tendency for higher PageRanked pages to do well rather than merely sorting the results in to PageRank order.
PageRank Uncovered is certainly a comprehensive document, and obviously required a deep level of research. Can you tell us a little about how you went about your analysis of PageRank?
I have a problem defining that, if you find a topic really interesting then the line between research and obsession becomes a very blurry thing. I probably started research before I could consciously define it as research.
Am I sounding geeky enough yet? Before you start researching a topic you clearly need to start with an idea of what that topic is, which means that I had to get to a basic understanding first. When I was first researching for PageRank Explained, there wasn't that much reliable information around so I started from the original Stanford papers.
One of the things about it being a logical and mathematical topic, is that we can extrapolate a lot of information from a very small amount of base information. So for example, whilst the Stanford paper may not specifically say that internal pages can have an effect on your PageRank we can see from what they present that it must. Thus, hidden in the depths is the counter to one of the more common myths at the time "Only links coming in from other sites matter".
The problem with extrapolating information is that you need to test it, which being able to program helped no end with. Another example would be that the stanford papers formula reveals that the PageRank given is divided by the number of links, we can deduce that by putting more internal links on pages with out going links we get more total PageRank across all pages of the site.
PageRank Uncovered is really an extension to those basics I deduced in PageRank Explained. Once you have an understanding of PageRank you can then begin to watch and adjust your opinions depending on what Google do.
For example, when looking at the Google toolbar there were a couple of occasions on which a lot of page's PageRank dropped. Followed not many months later by a rise in the index size. Now until you know the basics, that's not understandable. But as soon as you know the basics then you can begin to work out why that's happening and how it relates to what the toolbar shows and how PageRank worked, or at least you can begin to say "maybe it's because…".
Essentially, because we know Google are never going to come out and tell us exactly how it works, the process is to watch for many of these little "effects" and see how they fit in to or change the overall picture. Because PageRank Uncovered was a year later, there are a lot of these little effects that can help solidify deductions or discount them. In effect we're talking about a process where you're continually theorising, modifying those theories and refining them based on the data presented.
I think that when we talk about research, in this context, we are really talking more along the lines of scientific research. It is akin to early scientists trying to work out whether the earth goes round the sun or the sun goes round the earth, we can theorise and then test those theories. We need to do that again and again and again. Whilst we may end up with an established, well accepted, answer that describes all situations that we have ever seen it still remains a theory.
One thing I've always wondered is how much PageRank may have changed since Brin & Page published their paper. After all, why would you want a key company asset on public display? How much do you think PageRank has changed since the original paper was written?
If PageRank hasn't changed since Brin & Page published their paper then I'd be shocked. This is one of the reasons I try hard to stress the "theoretical aspect". One of my pet hates, incidentally, is those who repeat the theory as fact.
However, a theory can describe every conceivable event that could occur in real life and I believe this to be the case with PageRank. When I read about PageRank, researched it, and did the testing I came to the conclusion that small and subtle changes would almost certainly have been made but that the main principles must always be the same. Anything more than the principles is of little worth to anybody but Google and perhaps their competitors. Unless you're the kind of nut that would take apart a friend's new computer to see what was in it.
It's a bit like driving a car, I need to know what the pedals are, how the steering wheel works and how the stereo works but I don't need to know the detailed scientific specifics of combustion. If I get a hire car, it may be different.
The layout may change slightly and it might be electric; but for my purposes I can still work out what the pedals do, where the steering wheel is and how the stereo works. Even those of us who are information junkies have to accept that there are limits to what we need to know, and that those limits define the tolerance to which a theory will apply or not. Many companies do have their key assets on display, it's one of the prices paid for doing business and also for a patent. I think that in Google's case, even without the original paper somebody else would have come up with a similar mechanism and realised it closely resembled the results Google were giving. There's nothing mystical or magical about it, who knows - it could well have been DaVanzoRank!
I have to ask you (and this is your chance to put your side of the story) - Ian Rogers claims you made a fundamental mistake in your earlier "PageRank Explained" analysis and said some of the recommendations in the paper are not quite accurate. Did he have a point?
Naughty Naughty, you're a lot more Jerry Springer and less Oprah when it comes to being the king of search engine chat aren't you?! :) It would appear that since then Ian has become an SEO (http://www.iprcom.com/services/search_engine_optimisation.html).
It's a little hard for me to preach ethics whilst simultaneously criticising a competitor so I'll keep this brief. Let's just say that I'd be perfectly happy for anybody who's actually reads what I write and thinks about it, or for anybody who's written anything specifically to the capabilities of a target reader group at a particular time to make their own judgement on this.
On a general view (just to expand a little), earlier on you asked me to explain PageRank briefly. I could have said it's "a recursive algorithm to calculate the statistical probability that a random surfer would come across a web page". Does this make me wrong earlier or did I tailor it and generalise to what the readership can best understand at a minimal loss of technical accuracy? In a years time would I write it the same way? If I have done a good job with my general description, have I ensured the readers knowledge of PageRank to be equal to that statement as a minimum?
And if I have does that not make it easy for them to criticise my example? And will that criticism be as a result of my success in explaining or my failure? And don't you just hate a person who answers a question with lots of their own? :)
How can webmasters benefit from knowing about PageRank? What is the best thing they can do to help boost this score, without annoying Google of course? ;)
I think that the biggest benefit to knowing about PageRank is to know when to worry about it and when not to. Seriously. When I read anything about PageRank it tends to be the "PageRank is everything" or "PageRank is worthless" type comments, occasionally misquoting me, which show little regard for the situation. E.g. The more key phrases you want a page to rank for, the more you need good PageRank on that page.
They will of course prove to be self-fulfilling prophecies, if you believe PageRank is everything then you will concentrate so heavily on PageRank that you will eventually get what you desire. If you believe PageRank is worthless then you will fail to use it at all and be blissfully unaware of what you could have achieved. It's not exactly a state of affairs that will help progress the fledgling SEO industry, but at least everybody's happy :)
Those who take the bother to truly look in to PageRank and assess it objectively are those who will use it when it benefits them. The easiest way to boost PageRank is to write something that people will want to link to. It sounds silly to say, but time spent writing information is often a lot more effectively spent than time spent asking for links. Many successful sites only ever ask for a handful of links and submit to a handful of directories. I'm not one for mantra's or catch phrases, I don't want to say "Content is King" because frankly it's another tool and there are no King's. I also don't want to use that phrase because it's become synonymous with "only ever worry about content".
But in terms of effectiveness, one well written page can draw in a lot of links. With the search engines steadfastly refusing to actually define their rules in black and white, it's also the only method that can be considered truly safe (unless of course you happen to write about ranking methods they don't approve of, review software they don't like, etc).
More important than any of these is to know what to do with your PageRank. If I made a million dollars it is essentially worthless to me if I leave it in the bank all my life until I die, there are ways I could use my millions to my benefit and ways I effectively waste it even though it is still in my possession.
The same holds true of PageRank. I've always talked a lot about internal structure, and to be honest over time my views on internal link structure have increased. It is largely irrelevant how much PageRank you have and there is little use in having more if you don't make maximum use of the PageRank you already have in your site. I think too many people are keen to have a toolbar PR7 on their home page targeting one easy to get key phrase whilst neglecting their internal pages. By arranging internal structure to focus PageRank on important internal pages they can maximise the use of PageRank, as it happens there are also good benefits in terms of navigation for the user. And how could Google possibly complain at that?!
On your site, supportforums.org, you're big on dispelling myths propogated by cowboy seos. What are some of the more pervasive myths out there? Any new tips you'd care to share?
I promised you some news, so I'm sure you won't mind me digressing from the question slightly - by the time your readers read this supportforums.org won't be my pet project anymore! Frankly supportforums.org and I have been together for a while, and we've gradually come to realize that we want different things.
She wants to grow and further her career, she wants to be more than just a mouthpiece for an idealist and I want to work on other things without stifling her ability to evolve. It's an amicable split, I get to keep the server and she gets the links and articles. She's gone to live with her new owner, and she's very happy.
I'll miss her, we just seemed to click and she used to organise everything so gracefully. But it's time to move on, and I'm sure there's another site out there for me…somewhere. As an industry commentator, I will be taking much more of a back seat. As much as I think this is a good idea for the site, and me, I will miss the ability to afford some people the opportunity to talk when they might not be able to elsewhere. I don't really want to lose that opportunity, I visit the bulletin board style forums a lot and have seen their potential to allow fully open discussion in a way that helps progress the industry.
For that reason, I have designed a bulletin board system at http://www.searchguild.com/ that has been built from the ground up to enable people to speak freely and honestly. My presence there, should people choose to come, is really only in an administrative capacity to provide the facility.
Back to the question - I spent a long time being fooled by myths when I first started to try to understand the search engines, so I guess perhaps I'm on a little bit of a moral crusade to point them out when I see them.
This isn't the "Joe Blogg's is spamming, isn't he a naughty boy" type crusade, I find those petty and annoying. I would say that rather than acting against any individual cowboy SEOs I am reacting to the propagation of bad or misleading information. I'm not sure if you could define it as a myth, but one of the biggest and most annoying things are those pages where SEOs list key phrases that nobody searches on as examples of how good they are.
If you want to detect a cowboy SEO, apart from looking for a horse and some spurs (obvious signs, they hardly need saying), then one of the best things to do is run the key phrases they advertise through wordtracker. Unfortunately this trickery is a growing trend and your average webmaster can't be expected to know this. If there was a proper industry body for SEO I'd be the one petitioning them to make their members put the number of searches per day on their pages along side those terms.
If not maybe percentage ROI, or something that means anything other than number one for "tibetan goats carrying backpacks up a mountain for a balding man with a beard". As there isn't, I'll just have to say that I personally applaud anybody who does do that. If everybody put up that further information then the trickery would eventually be obvious to all.
Beyond that there are probably more I could list here but I think I'm probably in the running for having given the longest interview answers already. Often, myths are distributed unwittingly by peope with good intentions, rather than propogated by cowboy SEOs.
They seem to make sense. I'll pick just one example, It's recently been stated that in Google the anchor text of links to a page only counts if that term appears on the page. Intuitively this has a logical basis, but we can say it's likely wrong or at best not fully considered. Partially indexed pages are pages that the search engine hasn't crawled about but it knows from link information alone.
If we accept this "theory" to be true as it stands then we must also accept that partially indexed pages can show in searches. The search engine does not know what text is on the page so it can't rank it from that, if the theory is true every anchor text must be discounted because the search engine cannot see them on the page. That would make having partially indexed pages pretty pointless and partially indexed pages do show in searches. Now if you really think about it, the probability that in general anchor text is going to appear on the actual page as well is quite high - so it would be easy to imagine that somebody testing this theory might indeed build up a body of evidence supporting it.
But that body of evidence really only demonstrates human nature and natural linking process. Ergo, there would appear to be very little evidence to support this theory in general and at least for the case of partially indexed pages the theory must be false.
The SearchKing case has highlighted the power relationship between search engines and webmasters. Do you think the search engines have wider responsibilities to the webmaster community?
Often, it is easier to explain things with analogies. Consider that there is an election of some kind, let's say for leader of a university's student union. George is elected to be president for the year, saying that he'll be the best and cut the price of beer in the student union.
George has obtained power because he was the best candidate. However, George doesn't really care for people with ginger hair "they're unimportant" and George is in a position of power so George decides that only people who don't have ginger hair get half price beer. Now George is the president, he was fairly elected, it turns out he's a good president (unless you're ginger), and there's no rule against him doing this. It's his right and there's nothing that can be done. But when George was elected he took on moral and ethical responsibilities to the people that put him there (even some misguided ginger's voted for him).
The same is true with the search engines, because they have the same degree of control. A search engine may be put in power because it is the best technologically. The process of attaining that power may be free and fair and they may have a right to do certain things, but they still have a moral and ethical responsibility to all parties concerned whether they consider them unimportant or not.
On the modern day Internet, the search engines are so important that they effectively act as a barrier between users and web sites. Regardless of how you get that big, when you are that big you get responsibilities. The most important of which being the universal responsibility of fairness.
To be fair, they must be accountable. If I say accountable, let's be clear here. Because I'm well aware that there are some who would be more than happy to blow that statement up in to "search engines should be legally regulated" for me. Accountable to me means that they should have procedures for situations that affect more parties than just themselves. Those procedures should be subject to frequent review and independent advice taken on them.
Such procedures should be published and all stakeholders should have their eyes wide open. It's not enough to ban a site, there must be reasons for banning a site and those reasons must be verifiable. That's important not only for the progression of the industry but to stop individuals within the search engines acting in a rogue capacity.
If webmasters don't have an involvement in that then a major control and checking mechanism is missed. It's a myth that webmasters cannot be told what they are doing wrong, offering them the chance to correct it, and I dare say that the first engines to awaken to that will be the most successful in the future.
It's a topic where it's easy to cloud the issue with things like "goodness" and the "correctness of the path that led to that power", some like to believe that search engines are the only businesses on earth with responsibility to only one stakeholder (the searchers).
So let me isolate it down to some very simple questions: If I have a big red button on my desk, that when I push it destroys several businesses and seriously damages many. Then regardless of how that button got there, regardless of whether I'm a good person or Dr Evil, if I choose to press that button then do I have an ethical duty to have sound reasons and to explain them when asked? I.e. Does that big red button place responsibility on me simply because it is on my desk? If I spill my coffee on it and it sets it off, is it enough of an excuse that I'm a good person and it wasn't intentional?
I note you've been following the SearchKing case closely. What are your views on the judges ruling?
I think that judges face problems with each and every case they come to, in that they must try to understand the issues involved. When we move in to the area of technology they are faced with a myriad of concepts and difficulties and the tendency is to relate that to things that they can personally understand in the real world.
This is no bad thing, and is what allows us all to understand computers but it allows us to understand only at a very superficial level. Whilst it might serve us well if we're learning to use Word, if we're making a judgement that affects the ability of each and every webmaster or of each and every search engine to do particular things in the future then that somehow seems inadequate.
Maybe in technological cases it's even more important for the parties involved to be extremely careful to explain things well, but I have to confess to being a little disappointed in the judge's understanding of PageRank. For example, take the statement "Page 3 - The PageRank is derived from a combination of factors that include text-matching and the number of links from other web pages that point to the PageRanked web site. The higher the PageRank, the more closely the web site in question supposedly matches the search query, and vice versa.
The highest possible PageRank is 10, and the lowest is 1.". I might be tempted to offer this up as an example of all of the most common misconceptions of PageRank. PageRank, by definition cannot include "text-matching", because to do so would mean calculating PageRank for every possible query that could be typed in to Google (there are an infinite number of those, so it would take till infinity). And PageRank is not measured on a scale of 1 to 10, for a start this is what Google's toolbar shows and not the actual PageRank that was reduced, secondly even that starts at 0.
I don't claim to be as sure of the outcome as many seem to be on one side or another, but from an observers viewpoint I would like to see decisions and rulings based on actual understanding of the technology and issues.
If you could build the ultimate search engine, what would it look like? How would it operate?
Google, fair, not for profit, with input from all the stakeholders, without the bits I don't like (The Tour, Google Answers, News, We're Feeling Cocky - just a plain old search engine), and ranks all my sites at number one :-)
Oh, you want me to be imaginative and future looking? Very well, but it's a lot like several before. More user based and less general. You'd turn up at the home page and it would know who you were (you're already cookied so why not?). It knows what you like and what searches you do at particular times of day, times of the year and on particular days (all the information's in their logs anyway, so why not use it).
You'd get the page design/layout you'd chosen from a selection of thousands, and down the list of searches you likely want to do would be "mother's day card". It would recognise that not everyone's from the US so that when you clicked it would tend to give you mother's day card sites for your geographical location. It would try and get more information from you to try and turn a general query in to a more specific one, like whether you want to actually buy a mother's day card or be cheap and send a cheesy email one.
Here's where I differ from most other viewpoints of the ultimate search engine. I don't want it to try and guess what I mean. If I type in the word "jaguar" it shouldn't try to guess whether I mean the car or the animal and then just return that set. That just reminds me of a little paper clip popping up in a certain application and saying "it looks like you're writing a letter" or that same application constantly changing my spelling. The ultimate search engine would get a collection of results for both the animal and the car and provide me with a mixture of both right there on the front page, it should then provide easy links to view either one or the other.
By every search result would be buttons which would let you say you really liked that page or you really hated it. The search engine would use this information to help raise or lower that site and sites most like it in the future for any searches you do.
Because this is specific to you it would begin to learn what design elements you like, what level of textual content you like and so on. This data may be cross-referenced from everybody to provide a general starting point for every user on to which the user specific preference data could be mapped. In terms of communication: It's a search engine, so I figure that a proper searchable database of frequently asked questions, how to documents, rules and instructions isn't completely out of the question. And beyond that there should be some kind of live support with a real, instant response. I'd suspect that this would have to be paid for, but that would seem fair.
Thanks Chris. And best of luck with the new forums. Over the next few weeks we'll be talking to Fast and Looksmart.
Note: You can find additional information on the Google Page Rank algorithm, by clicking here
Source: Search Engine Blog.com