GrapefruitSculpin (talk | contribs) |
Jennydoscta123 (talk | contribs) m Added a missing wiki link. Tag: Visual edit |
||
Line 29: | Line 29: | ||
Currently, Googlebot follows [[HREF]] links and SRC links.<ref name="tools"/> There is increasing evidence Googlebot can execute JavaScript and parse content generated by [[Ajax (programming)|Ajax]] calls as well.<ref name="ajax">{{cite web|title=Googlebot makes POST requests via AJAX |url=http://www.thumbtack.com/engineering/googlebot-makes-post-requests-via-ajax/}}</ref><ref name="javascript">{{cite web|title=Google, the Jig is Up! Googlebot is actually a browser...|url=http://www.distilled.net/blog/seo/google-stop-playing-the-jig-is-still-up-guest-post/}}</ref> There are many theories regarding how advanced Googlebot's ability is to process JavaScript, with opinions ranging from minimal ability derived from custom interpreters.<ref>{{cite web|title=Googlebot's Javascript Interpreter: A Diagnostic|url=http://www.thegooglecache.com/white-hat-seo/googlebots-javascript-interpreter-a-diagnostic/}}</ref><ref>{{cite web|title=Googlebot is Chrome|url=http://ipullrank.com/googlebot-is-chrome/}}</ref><ref>{{cite web|title=How Googlebot crawls JavaScript|url=http://searchengineland.com/tested-googlebot-crawls-javascript-heres-learned-220157/}}</ref> Googlebot discovers pages by harvesting all the links on every page it finds. It then follows these links to other web pages. New web pages must be linked to from other known pages on the web in order to be crawled and indexed or manually submitted by the webmaster. |
Currently, Googlebot follows [[HREF]] links and SRC links.<ref name="tools"/> There is increasing evidence Googlebot can execute JavaScript and parse content generated by [[Ajax (programming)|Ajax]] calls as well.<ref name="ajax">{{cite web|title=Googlebot makes POST requests via AJAX |url=http://www.thumbtack.com/engineering/googlebot-makes-post-requests-via-ajax/}}</ref><ref name="javascript">{{cite web|title=Google, the Jig is Up! Googlebot is actually a browser...|url=http://www.distilled.net/blog/seo/google-stop-playing-the-jig-is-still-up-guest-post/}}</ref> There are many theories regarding how advanced Googlebot's ability is to process JavaScript, with opinions ranging from minimal ability derived from custom interpreters.<ref>{{cite web|title=Googlebot's Javascript Interpreter: A Diagnostic|url=http://www.thegooglecache.com/white-hat-seo/googlebots-javascript-interpreter-a-diagnostic/}}</ref><ref>{{cite web|title=Googlebot is Chrome|url=http://ipullrank.com/googlebot-is-chrome/}}</ref><ref>{{cite web|title=How Googlebot crawls JavaScript|url=http://searchengineland.com/tested-googlebot-crawls-javascript-heres-learned-220157/}}</ref> Googlebot discovers pages by harvesting all the links on every page it finds. It then follows these links to other web pages. New web pages must be linked to from other known pages on the web in order to be crawled and indexed or manually submitted by the webmaster. |
||
A problem that webmasters{{Like whom|reason=Which webmasters?|date=April 2017}} have often noted with the Googlebot is that it takes up an enormous amount of bandwidth.{{Citation needed|date=March 2011}} This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for [[Web mirror|mirror]] sites which host many gigabytes of data. Google provides "[[Google Webmaster Tools|Webmaster Tools]]" that allow website owners to throttle the crawl rate.<ref>{{cite web|url=https://www.google.com/webmasters/|title=Google - Webmasters|publisher=Google.com |date= |accessdate=2012-12-15}}</ref> |
A problem that webmasters{{Like whom|reason=Which webmasters?|date=April 2017}} have often noted with the Googlebot is that it takes up an enormous amount of bandwidth.{{Citation needed|date=March 2011}} This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for [[Web mirror|mirror]] sites which host many [[Gigabyte|gigabytes]] of data. Google provides "[[Google Webmaster Tools|Webmaster Tools]]" that allow website owners to throttle the crawl rate.<ref>{{cite web|url=https://www.google.com/webmasters/|title=Google - Webmasters|publisher=Google.com |date= |accessdate=2012-12-15}}</ref> |
||
How often Googlebot will crawl a site depends on the crawl budget. Crawl budget is an estimation of how often a website is updated. A site's crawl budget is determined by how many incoming links it has and how frequently the site is updated.<ref>{{Cite news|url=http://www.seoinshort.com/2017/01/crawl-budget-optimisation.html|title=What is Crawl Budget and How to Optimise it?}}</ref> |
How often Googlebot will crawl a site depends on the crawl budget. Crawl budget is an estimation of how often a website is updated. A site's crawl budget is determined by how many incoming links it has and how frequently the site is updated.<ref>{{Cite news|url=http://www.seoinshort.com/2017/01/crawl-budget-optimisation.html|title=What is Crawl Budget and How to Optimise it?}}</ref><ref>{{Cite web|url=http://www.nethority.com/what-is-crawl-budget-and-how-it-will-affect-your-website-ranking/|title=What Is Crawl Budget And How It Will Affect Your Website Ranking?|last=|first=|date=|website=|language=en-US|archive-url=|archive-date=|dead-url=|access-date=2018-05-29}}</ref> |
||
==References== |
==References== |
Revision as of 06:27, 29 May 2018
Original author(s) | |
---|---|
Type | Web crawler |
Website | Googlebot FAQ |
Googlebot is the search bot software used by Google, which collects documents from the web to build a searchable index for the Google Search engine.
If a webmaster wishes to restrict the information on their site available to a Googlebot, or another well-behaved spider, they can do so with the appropriate directives in a robots.txt file,[1][2] or by adding the meta tag <meta name="Googlebot" content="nofollow" />
to the web page.[1] Googlebot requests to Web servers are identifiable by a user-agent string containing "Googlebot" and a host address containing "googlebot.com".[3]
Currently, Googlebot follows HREF links and SRC links.[1] There is increasing evidence Googlebot can execute JavaScript and parse content generated by Ajax calls as well.[4][5] There are many theories regarding how advanced Googlebot's ability is to process JavaScript, with opinions ranging from minimal ability derived from custom interpreters.[6][7][8] Googlebot discovers pages by harvesting all the links on every page it finds. It then follows these links to other web pages. New web pages must be linked to from other known pages on the web in order to be crawled and indexed or manually submitted by the webmaster.
A problem that webmasters[like whom?] have often noted with the Googlebot is that it takes up an enormous amount of bandwidth.[citation needed] This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes of data. Google provides "Webmaster Tools" that allow website owners to throttle the crawl rate.[9]
How often Googlebot will crawl a site depends on the crawl budget. Crawl budget is an estimation of how often a website is updated. A site's crawl budget is determined by how many incoming links it has and how frequently the site is updated.[10][11]
References
- ^ a b c "Webmaster Tools".
- ^ "Having trouble getting your website indexed by Google? - Exsociety". 2016-09-12. Retrieved 2016-09-14.
- ^ Exact Googlebot client info can be found in Google-cached copies of pages which display such data to visitors. For example, see [1]
- ^ "Googlebot makes POST requests via AJAX".
- ^ "Google, the Jig is Up! Googlebot is actually a browser..."
- ^ "Googlebot's Javascript Interpreter: A Diagnostic".
- ^ "Googlebot is Chrome".
- ^ "How Googlebot crawls JavaScript".
- ^ "Google - Webmasters". Google.com. Retrieved 2012-12-15.
- ^ "What is Crawl Budget and How to Optimise it?".
- ^ "What Is Crawl Budget And How It Will Affect Your Website Ranking?". Retrieved 2018-05-29.
{{cite web}}
: Cite has empty unknown parameter:|dead-url=
(help)