Here are some ways to use Google([1]) and Alexa([2]) to check articles and other information.
Types of Google tests
On Wikipedia, a Google Test is any use of Google or other search engines as references. Several very distinct kinds of information can be gleaned by this method. It should be stressed that none of these applications is conclusive evidence, but simply a first-pass heuristic or rule of thumb.
- Unencyclopedic or spurious topics. Some topics introduced to Wikipedia articles don't belong here. Some of these can be detected by running a Google search on a relevant phrase and counting the number of search results. This technique works reasonably well for weeding out hoaxes, fictions, and personal theories and hypotheses. It can also be used to ascertain whether a topic is of sufficiently broad interest to merit inclusion in the wiki, though this application is highly subject to bias (see below). See Wikipedia:What Wikipedia is not for a more comprehensive list of unencyclopedic topics.
- Copyrighted material. Large piece of poorly wikified text, submitted to the wiki all at once, particularly by a new or anonymous user, are often copy-and-pasted from outside sources. Some of these are submitted in violation of copyright. (See also Wikipedia:Spotting possible copyright violations, Wikipedia:Copyrights.) A copy-and-paste operation from an online source can often be detected by running searches for excerpts.
- Idiosyncratic usage. The English language often has multiple terms for a single concept, particularly given regional dialects. A series of searches for different forms of a name reveals some approximation of their relative popularity.
- Related sites. If an article is of high quality (see Wikipedia:Featured articles), Google may be used to look for sites that might take an interest in it and be convinced to link to it.
- Research. Of course, search engines are good for finding sources of further information.
Techniques
The Google Web search is not the only Google search. In performing a Google test, consider searching groups (USENET newsgroups). This is a significantly different sample and represents, for the most part, conversations in English conducted by people who are not deliberately trying to sell products or reach a mass audience. Other things being equal, a "groups" search will typically return very roughly 1/5 as many hits as a "Web" search. Because group and Web searches have very different "systemic biases," hit numbers are not comparable. Nevertheless Group searches are particularly helpful in identifying entities whose Web presence may have been artificially inflated by promotional techniques; it is suspicious if a phrase gets, say, 100,000 Web hits but only 20 Groups hits.
USENET postings are date-stamped and have been archived for over twenty years, making them more useful than Web searches as a record of recent history. Using a Groups "advanced search", it is possible to restrict a search by date, which can help in identifying how recent the widespread use of a term is.
Google News searches can assess whether something is currently newsworthy. One characteristic of Google News is that whereas it is easy and inexpensive to create websites or post to USENET, it is harder to convince a Google news source to run a story. Thus Google News, in comparison to Web or Groups, is less susceptible to manipulation by self-promoters. Note that Google News indexes many "news" sources that reflect specific points of view, and many news sources that are only of local interest.
Depending on the subject, advanced search functions may be useful. For example, adding "site:gov" or "site:edu" will restrict your search to U.S. government sites or U.S. college and university sites.
Alexa test
Although Wikipedia is not a web directory, we can have articles about web sites if they meet the same criteria for encyclopedic interest as other articles.
If you're interested in writing a wikipedia article about a particular web site, just go to Alexa (http://www.alexa.com), and type in the URL. The traffic rank may help you decide whether a site is important enough. Most would agree that we should certainly have articles on top 100 sites, possibly have articles on top 1,000 sites. For a page not in the top 100,000, most would agree that popularity alone would not suffice to justify its inclusion in Wikipedia. The intermediate area is a grey area where opinions differ.
For some websites (e.g., microsoft.com) in the top thousand, a redirect to a broader article may be appropriate: in that case, Microsoft. (This is somewhat controversial.)
Also note that the Alexa rating includes significant bias, due to various factors. For example, the Alexa software is only available for Microsoft Windows, and requires installation.
See also Wikipedia:Web comics for some specific advice related to web comics.
Google bias
When using Google to test for importance or existence, bear in mind that this will be biased in favor of modern subjects of interest to people from developed countries with Internet access, so it should be used with some judgment. For example, a current popular-music group from the United States will probably need many thousands of Google hits before most Wikipedians consider it worthy of inclusion. A similarly important group in a country with less Internet presence will have many fewer hits, if any. An important musician of the 1300s might not show up on Google at all.
Q. What is the minimum number of matches you should see if a term is not made up? (3? 27? 81?)
A. A couple hundred perhaps! It depends on several things:
- The article's point of view: If narrow, fewer references are required. Try to categorize the point of view, ( whether it is NPOV, or other) eg: notice the difference between Ontology (philosophy) and Ontology (computer science).
- The subject: If it's about some historical person, one or two mentions in reliable texts might be enough; if it's some Internet neologism, it may be on 100 pages and might still not be considered 'existing' for Wikipedia's purposes.
- The type of sites you find: Pay attention to how open the sites are about accepting submissions. The Urban Dictionary, for example, accepts submissions freely. This is especially important if you suspect an author is self-promoting, or is promoting an idiosyncratic viewpoint. A single Internet user can submit the same ideas to message boards and open-submission sites all over the Internet.
Further judgment: the Google test checks popular usage, not correctness. For example, a search for the incorrect Charles Windsor gives 10 times more results than the correct Charles Mountbatten-Windsor.
Also, some topics may not be on the Web because of low Internet use in certain areas and cultures of the world.
Validity of the Google test
Given that the results of a google test are interpreted subjectively, its implementation is not always consistent. This reflects the nature of the test being used on a case by case basis.
In some cases, articles have been kept with Google hit counts as low as 15 and some claim that this undermines the validity of the Google test in its entirety. However, in fact, this reflects on the rather uneven and subjective nature of the Wikipedia:Votes for deletion process more than on the usefulness of the Google test. The Google test has always been and very likely always will remain an imperfect tool used to produce a general gauge of notability. It is not and should never be considered definitive.
Major factors which may affect Google hit count include foreign subjects from countries where the internet is not prevalent or topics which are of a historical nature but have not yet been well documented on the internet. In other cases, it is completely speculative as to why a subject merits inclusion with a hitcount below 100 while other such articles are frequently deleted.