Oct
2
2009
Google PageRank(PR值)讓連結來"投票"
Author: humenad&ithost一個頁面的「得票數」由所有鏈向它的頁面的重要性來決定,到一個頁面的超連結相當於對該頁投一票。一個頁面的PageRank是由所有鏈向它的頁面(「鏈入頁面」)的重要性經過遞歸演算法得到的。一個有較多鏈入的頁面會有較高的等級,相反如果一個頁面沒有任何鏈入頁面,那麼它沒有等級。
2005年初,Google為網頁連結推出一項新屬性nofollow,使得網站管理員和網誌作者可以做出一些Google不計票的連結,也就是說這些連結不算作”投票”。nofollow 的設置可以抵制評論垃圾。
Google工具條上的PageRank指標從0到10。它似乎是一個對數標度演算法,細節未知。PageRank 是 Google 的商標,其技術亦已經申請專利。
PageRank演算法中的點擊演算法是由Jon Kleinberg提出的。
PageRank演算法
假設一個由4個頁面組成的小團體:A,B, C 和 D。如果所有頁面都鏈向A,那麼A的PR(PageRank)值將是B,C 及 D的和。
PR(A) = PR(B) + PR(C) + PR(D)
繼續假設B也有連結到C,並且D也有連結到包括A的3個頁面。一個頁面不能投票2次。所以B給每個頁面半票。以同樣的邏輯,D投出的票只有三分之一算到了A的 PageRank 上。
換句話說,根據鏈處總數平分一個頁面的PR值。
最後,所有這些被換算為一個百分比再乘上一個係數q。由於下面的演算法,沒有頁面的PageRank會是0。所以,Google通過數學系統給了每個頁面一個最小值1 − q。
所以一個頁面的 PageRank 是由其他頁面的PageRank計算得到。Google 不斷的重複計算每個頁面的 PageRank。如果您給每個頁面一個隨機 PageRank 值(非0),那麼經過不斷的重複計算,這些頁面的 PR 值會趨向於正常和穩定。這就是搜尋引擎使用它的原因。
這個方程式引入了隨機瀏覽的概念,即有人上網無聊隨機打開一些頁面,點一些連結。一個頁面的PageRank值也影響了它被隨機瀏覽的機率。為了便於理解,這裏假設上網者不斷點網頁上的連結,最終到了一個沒有任何鏈出頁面的網頁,這時候上網者會隨機到另外的網頁開始瀏覽。
為了對那些有鏈出的頁面公平,q = 0.15(q的意義見上文)的演算法被用到了所有頁面上,估算頁面可能被上網者放入書籤的機率。
所以,這個等式如下:
p1,p2,…,pN是被研究的頁面,M(pi)是鏈入pi頁面的數量,L(pj)是pj鏈出頁面的數量,而N是所有頁面的數量。
PageRank值是一個特殊矩陣中的特徵向量。這個特徵向量為
R是等式的答案
如果pj不鏈向pi, 而且對每個j都成立時, 等於 0
這項技術的主要缺點是舊的頁面等級會比新頁面高。因為即使是非常好的新頁面也不會有很多上游連結,除非它是某個站點的子站點。
這就是PageRank需要多項演算法結合的原因。PageRank似乎傾向於維基百科頁面,在條目名稱的搜尋結果中總在大多數或者其他所有頁面之前。原因主要是維基百科內相互的連結很多,並且有很多站點鏈入。
Google經常處罰惡意提高PageRank的行為,至於其如何區分正常的連結交換和不正常的連結堆積仍然是商業機密。
Technology Overview
We stand alone in our focus on developing the “perfect search engine,” defined by co-founder Larry Page as something that, “understands exactly what you mean and gives you back exactly what you want.” To that end, we have persistently pursued innovation and refused to accept the limitations of existing models. As a result, we developed our serving infrastructure and breakthrough PageRank™ technology that changed the way searches are conducted.
From the beginning, our developers recognized that providing the fastest, most accurate results required a new kind of server setup. Whereas most search engines ran off a handful of large servers that often slowed under peak loads, ours employed linked PCs to quickly find each query’s answer. The innovation paid off in faster response times, greater scalability and lower costs. It’s an idea that others have since copied, while we have continued to refine our back-end technology to make it even more efficient.
The software behind our search technology conducts a series of simultaneous calculations requiring only a fraction of a second. Traditional search engines rely heavily on how often a word appears on a web page. We use more than 200 signals, including our patented PageRank™ algorithm, to examine the entire link structure of the web and determine which pages are most important. We then conduct hypertext-matching analysis to determine which pages are relevant to the specific search being conducted. By combining overall importance and query-specific relevance, we’re able to put the most relevant and reliable results first.
- PageRank Technology: PageRank reflects our view of the importance of web pages by considering more than 500 million variables and 2 billion terms. Pages that we believe are important pages receive a higher PageRank and are more likely to appear at the top of the search results.
PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. We have always taken a pragmatic approach to help improve search quality and create useful products, and our technology uses the collective intelligence of the web to determine a page’s importance.
- Hypertext-Matching Analysis: Our search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), our technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. We also analyze the content of neighboring web pages to ensure the results returned are the most relevant to a user’s query.
Our innovations don’t stop at the desktop. To give people access to the information they need, whenever and wherever they need it, we continue to develop new mobile applications and services that are more accessible and customizable. And we’re partnering with industry-leading carriers and device manufacturers to deliver these innovative services globally. We’re working with many of these industry leaders through the Open Handset Alliance to develop Android, the first complete, open, and free mobile platform, which will offer people a less expensive and better mobile experience.
Life of a Google Query
The life span of a Google query normally lasts less than half a second, yet involves a number of different steps that must be completed before results can be delivered to a person seeking information.
Tags: SEO