Google search technology, illustration

2010-11-14  来源:本站原创  分类:Tech  人气:91 

Original http://server.51cto.com/NGDC-221158.htm

Google (Google), a very successful, but very mysterious, and somewhat idealized color with Internet search giant, it is a very great advertising company, Google search button on the home page that is 20 billion U.S. dollars of annual profit the killer application, but also Internet's leading business and technology myths. Recently, a foreign Web site (PPCblog) carefully drew a flow chart of Google searches, this chart display with 3 million times every day traffic to the search engine behind Google Search button in that the response time of less than 1 second to within processing.

Before you click the Google Search button to see the results that less than 1 second of time, it do? Content on the Internet how to find Google? What kind of content will be included? Surely everyone must want to know Google The secrets behind the search button. Do not worry, we first look at the before the start of the mysterious Google data centers.

Design of Google's own servers

Google's data centers highly confidential, we can get the information is very limited. Let's look at some data: Google data centers in the United States more than 19, and another 17 outside in the United States around the world; Each data center has 50 million square feet (46,450 m2), construction of a data Centre to spend about 6 billion U.S. dollars; Google's data center is the world's one of the most efficient facilities, very green; data center 50-100 megawatts of electricity, taking into account the cooling problem, usually built in place to facilitate water; Google The servers are housed in standard shipping containers, each container can hold 1160 servers. About Google's data center, we only know so much.

Google search technology, illustration

Figure 1 Design of Google's own servers

Google search technology, illustration

Figure 2 server owned battery

Google has hundreds of thousands of servers are their own design, they think this is one of the company's core technology (51CTO recommended article: Google to the server? Intel to be careful .) Each server is equipped with a 12-volt battery, to ensure sustainable if the main power supply of power outages.

As to why for each server is equipped with batteries, Google's answer is cost. Generally more dependent on data center UPS (uninterruptible power supply system), which is basically regarded as a large batteries, power failure while in the main generator had a chance to start a temporary help supply. And Google that built into the server directly to the cheaper electricity, and costs can be directly in line with the number of servers, so they will not waste the extra capacity. Another reason is that the efficiency of large-scale UPS up to 92-95% efficiency, which means that many power is wasted, but Google's built-in battery practice efficiency by more than 99.9%.

Google search technology, illustration

Figure 3 Google's server is installed in the container, each container capacity of 1160 units

Google search technology, illustration

Figure 4, the work of Google employee

Google how to find and include your uploaded content?

Google search technology, illustration

Figure 5 before in user searches

Google uses its "crawler" tool to travel the Internet around the clock every corner of the world. 6 steps between the figure above depicts the sequence appeared on the Internet from the content to the content to be included in Google's database into the search process for users, of which there are many branches of the first step 2,3,5, all of which are intended to establish a message "sinks pool", this is the first stage, second stage is from this "pool" for users to filter the content they need. Next we look at Google is a step by step how to collect and integrate information.

1, users upload content, such as blog, microblogging, or other types of WEB content is updated on the web.

2, Google's "crawlers" that this update. In this step, Google has added a number of decision criteria include the following:

2.1, Google's "crawlers" link along the path (URL) travel around the Internet, but if the URL does not point to a site, then this site will not be indexed.

2.2, if you are not allowed in the robots.txt index set in (some or all), Google's "spiders" to crawl your site will not be the appropriate content.

2.3, if the connection point to your site has nofollow tags, Google's "crawlers" will be removed from the URL path to come to your site. Below:

Google search technology, illustration
Google search technology, illustration

Figure 6 Figure 7 pages nofollow tag in the source code

URL like the Google "spiders" signs when traveling around the Internet, Google certainly hope that your valuable web page, so a mechanism must be taken to identify which URL is spam, nofollow tag that Google one of the methods advocated. Legal update the site staff will not upload garbage almost URL, but they often appear in the comment thread and a large number of forums, like the example shown above, the URL for Google is meaningless in terms of, in order to prevent the "crawlers" reach a site through the URL, in the source code they will automatically be added nofollow tags.

2.4, Google also through the blog software or xml site map to find your site.

2.5, the higher authority of the website from your website URL to link to the more authoritative your site the higher, but the Google "spiders" will always ignore the nofollow tag is added URL.

Above these points is probably the information Google has collected on the content of "access" requirement, it appears that in some open areas (such as the Forum) released a large number of URL in order to let Google focus, this little trick is no effect . These are the information collected by Google about what happened before, once the Google collection of information is what will happen? Please read on:

Google search technology, illustration

Figure 8 information "material" and storage

3, the information collected by Google for processing after the course, we should mainly include two steps, first information "material" and storage, the second is included to optimize the information required, the figure depicts the "material" and storage mainly consists of two parts: the page title and link data is stored in an index, for the breadth-first search (see the article title is very important, so editing the title of the party must have control of consciousness); Web content stored in another an index to retrieve the frequency is not high for the long tail, personalized, depth-first search.

At this point you may already know, when you use Google search, you are not always updated in the search of the Internet, but Google's cache search, but Google updates very quickly, and as far as possible and let the cache on the Internet content synchronization.

Google search technology, illustration

Figure 9 Optimization of the information has been included

4, Google URL-based assessment of the overall authority of the domain name and web pages.

5, check the website in order to prevent cheating, including the following:

5.1, Google's search quality and anti-spam review.

5.2,1 remote testing more than the quality of the user evaluation of search results.

5.3, Google PageRank levy blackmail your users have to report suspected spam.

5.4, Google under the Digital Millennium Copyright Act (DMCA) to remove pirated content.

6, in an analysis of the pages, each page is added to aid users to search many pieces of data.

From the information appeared on the Internet to be Google included, then Google these data analysis and optimization, thus, a real-time updates of Internet information "sinks pool" to set up, can be said that Google stores the entire Internet snapshot. And that we are at the Google search button before it does something, then we look at how Google responds to user's search request, while Google's ads is to come before us, do not forget, Google, but rely on advertising to make a living of.

As long as people use Google's services, it can make money, afraid of like Andrew (Android) phone system, as some rogue manufacturers to Andrew packed in its own smart phone, but its on all of Google's various services wiping out, use their services, so quit Google, of course, so Andrew an update, these rogue mobile phone manufacturers to tension.

Google can help users search?

Google search technology, illustration

Figure 10 retrieved from the user to generate preliminary results began to

Retrieved from the user to generate preliminary results beginning (when the results are not directly presented to the user), has experienced four steps:

1, the user search request. PatrickRiley Google search quality engineer, said: In most searches, your search is in the process of multiple concurrent control or innovative Google Labs project team process, we can say that every query will be involved in some of Google's innovative experiments. We are the mice?

2, Google will provide some key words entered by the user suggestions.

3, Google will use the synonym matching your search terms with similar semantic query results.

4, generate the initial query results, while Google claims tens of thousands of relevant results can be found, but generally only shows less than 1000, while the query results will be the localization, the local site first appear in the query results.

相关文章
  • Google search technology, illustration 2010-11-14

    Original http://server.51cto.com/NGDC-221158.htm Google (Google), a very successful, but very mysterious, and somewhat idealized color with Internet search giant, it is a very great advertising company, Google search button on the home page that is 2

  • Graphic Google search technology (continued) 2010-11-14

    Original http://server.51cto.com/NGDC-221158_3.htm How will search results be optimized? 1, results are sorted by the authority and PageRank, duplicate results were removed. Results at this time is close to final form, on this basis, there are two se

  • Times after the Google search field 2010-04-13

    In addition to sworn enemy, Microsoft, Baidu, the Netease, Tencent, Sohu these second-line echelon are explicitly or implicitly to take the action, and fell in the subsequent "national team" has been gaining momentum. Google to withdraw 30% of t

  • Google Search Engine Principle (change) 2010-05-16

    This paper, we introduce the google, it is a major search engine (of a large-scale search engine) prototype, the search engine is widely used in the hypertext. Google is designed to efficiently grasp and index page, its search results than other exis

  • Google search engine optimization - website design for search engines 2010-07-31

    Abstract: Site in the search marketing major shortcomings: Knowledge of the industry: do not know search engines to attract new users of the importance of search engine ranking services in the pursuit of "fool-related", buy some in fact do not h

  • Depth analysis of Google PageRank technology (transfer) 2011-05-29

    A: What is PageRank (page rank) PageRank (page rank) is used for evaluation of a Google page "importance" of a method. Title in the blend such as logo and all other factors such as Keywords, after identification, Google PageRank to adjust result

  • Google search from entry to master v4.0 [turn] 2011-06-03

    ■ 1, Introduction I know in the first half of 2000 Google 's. Before that, I usually search for information in English AltaVista, and search information is commonly used Chinese Sina. However, since the use of Google, it will be my Favorite Search en

  • Mobile Unicom and other domestic partners will be removed Google search service 2010-03-29

    Since the Google search service will be transferred to Hong Kong from mainland China, its domestic partners have to respond. China Unicom announced from its joint development with Google mobile phone remove the Google search function, while China Mob

  • An optimization Baidu and Google search results search web site is how to write out the (original)? 2010-03-23

    After six months of time, finally my personal search sites (Wo found: www.ausou.net) written, and Zhengshishangxian operations. Here I would like to share with them ideas and experiences, are also considered a summary bar! Want to write an Baidu and

  • An optimization Baidu and Google search results of search sites to use (how to write continued, there are plans description) 2010-03-23

    Post in front of me, "an optimization Baidu and Google search results for sites how to write out?" Introduces the Wal-search (www.ausou.net) search sites background. You have made a lot of their own views, all of which I visit once. Here, do fir

  • With Baidu and Google search results contrast analysis (Wo found: www.ausou.net) 2010-03-25

    A few days ago wrote a post to introduce my search site: Wo found ---- www. Ausou.net (url); we have already mentioned some of the views, concerns were Wal-search the number and quality of search results above, . Oh, everyone's focus stray from the p

  • Improve Google search rankings of the ten methods 2010-03-02

    Improve Google search rankings are ten ways: 1, access http://www.google.com/webmasters/sitemaps and add files to add your website sitemaps. sitemaps file as a directory, all your website pages included. google site outlines services at work, a time

  • In the Google search "map" out of the "Baidu map" 2010-03-03

    Google recently was able to play ah, first to buy Microsoft's "Google China" key words, will the trouble of uproar Just Google search "map," it turns out that "Baidu map", as the leader of the times, Google really is mind Ruo

  • Fleeting save google search results 2010-03-06

    This Thursday had suddenly discovered that e text google search results collection function (requires login). One day later, this feature is inexplicable hours, restored to the original (comment + promote + remove) to adjust search results. For unkno

  • China Unicom and other mobile partners to remove the Google search service 2010-03-28

    Since the Google search service will be transferred to Hong Kong from the Mainland, its domestic partners have to respond. China Unicom officially announced it will develop its cooperation with Google search Google to remove the phone functions, whil

  • Use google search from reset method 2010-04-14

    Use google search, often the case page is reset, then re-open the google, found that no longer visit, a few minutes to recover. In such cases I used the solution is: Use other countries google, such as http://www.google.co.uk/. This approach can only

  • Google search site to search for site information 2010-05-02

    To their own website plus a search function is very simple code: Welcome to my blog: http://67566894.javaeye.com/ <form action="http://www.google.com/search" method="get"> <input name="ie" value="UTF-8" typ

  • Google search results on some small questions 2010-05-06

    Today, Google search to my blog post title [php file how to open] the following address: http://www.google.com.hk/search?hl=zh-CN&newwindow=1&safe=strict&q=php% E6% 96% 87% E4% BB% B6% E6% 80% 8E% E4% B9% 88% E6% 89% 93% E5% BC% 80 & btnG

  • Site to Google search function embedded in Google search 2010-05-14

    Transfer from: http://67566894.javaeye.com/blog/657964 To their own website plus a search function is very simple code: Welcome to my blog: http://67566894.javaeye.com/ Java code <form action="http://www.google.com/search" method="get&qu

  • Google search engine, the top ten applications 2010-06-02

    1, Pinyin input search In order to facilitate the use of Chinese users search the web, Google allows users to directly enter the Pinyin keyboard to retrieve the relevant things, for example: Enter shanghaishikebiao Search Results Tip: Did you mean: S