List of known web archive services in-use on English Wikipedia. Sorted roughly by number of uses from most to least. Wayback Machine is about 80% of the total. Data initially compiled by User:GreenC as of March 2017, updates and corrections welcome.
Archive services
- Domain: archive.org, waybackmachine.org
- Hostname: <none>, web, wayback, liveweb, www, www.web, classic-web, web-beta, replay, replay.web, web.wayback
- Path: <none>, web
- Timestamp: Number 1 digit; 4–14 digits. Or "*". Or "?". Or combination. May also contain trailing chars like "re_" for (?), "if_" for frames and "im_" for images. If timestamp missing returns best available page.
- Examples:
- Oldest:
- Newest:
- Domain: .today, .is, .fo, .li, .vn, .md, .ph
- Hostname: <none>, www
- Path: <none>
- Timestamp: 4–14 digits; or digits + characters (see example)
- Examples:
- Domain: webcitation.org
- Deprecated - no longer accepting new archive requests; old archives no longer available
- Hostname: <none>, www
- Path: base62ID, query, cache, getfile.php, <number>
- Timestamp: None. Uses &date=2012-06-01+21:40:03 in query?url ; the short ID is base62 which converts to unix time
- Examples:
- http://www.webcitation.org/gT64fd
- http://www.webcitation.org/66lmEkpE8?url=http://www.ariacharts.com.au/pages/charts_display_album.asp?chart%3D1G50
- http://www.webcitation.org/query?id=1138911916587475
- http://www.webcitation.org/query?url=http..&date=2012-06-01+21:40:03
- http://www.webcitation.org/1138911916587475
- http://www.webcitation.org/cache/73e53dd1f16cf8c5da298418d2a6e452870cf50e
- http://www.webcitation.org/getfile.php?fileid=1c46e791d68e89e12d0c2532cc3cf629b8bc8c8e
- National Archives UK
- Domain: nationalarchives.gov.uk
- Hostname: webarchive, yourarchives
- Path: <none>
- Timestamp: 4–14 digits
- Examples:
- Domain: nla.gov.au
- Hostname: webarchive
- Path: see examples. First-level path can contain "awa" or "wayback". Second-level path might contain a pandora.nla.gov.au URL with a third-level destination URL which may or may not have a scheme (http://). Or the second-level path might be the final destination URL. A special archive index page URL is also available as "/tep/"
- Timestamp: Two types (20120727-0512, 20120326012340)
- Examples:
- https://webarchive.nla.gov.au/awa/20120726200849/http://pandora.nla.gov.au/pan/14231/20120727-0512/www.howlspace.com.au/en2/inxs/inxs.htm
- https://webarchive.nla.gov.au/awa/20110824211656/http://pandora.nla.gov.au/pan/128344/20110810-1451/www.theaureview.com/guide/festivals/bam-festival-2010-ivorys-rock-qld.html
- https://webarchive.nla.gov.au/awa/20010328130000/http://www.howlspace.com.au/en2/arenatina/arenatina.htm
- https://webarchive.nla.gov.au/wayback/20120326012340/http://news.defence.gov.au/2011/09/09/army-airborne-insertion-capability/
- https://webarchive.nla.gov.au/gov/20070831165847/http://www.defence.gov.au/opacolyte/default.cfm
- https://webarchive.nla.gov.au/tep/23790 (redirect from http://pandora.nla.gov.au/tep/23790 ) .. index page
- Note: The Australian Web Archive incorporates the Pandora archive as well as the Australian Government Web Archive and the National Library of Australia's archive of the
.au
domain. - Note: No memento access
NLA Australia- Deprecated. Integrated into Australian Web Archive (Trove) above.
- Domain: nla.gov.au
- Hostname: pandora, trove, tep, webarchive, content.webarchive
- Path: see examples. The /pan/ regex should be
/pan/[0-9]{4,7}/
- Timestamp: Three types (20120727-0512, S2000-Dec-5, 20120326012340)
- Examples:
- http://pandora.nla.gov.au/pan/14231/20120727-0512/www.howlspace.com.au/en2/inxs/inxs.htm
- http://pandora.nla.gov.au/pan/128344/20110810-1451/www.theaureview.com/guide/festivals/bam-festival-2010-ivorys-rock-qld.html
- http://pandora.nla.gov.au/nph-wb/20010328130000/http://www.howlspace.com.au/en2/arenatina/arenatina.htm
- http://pandora.nla.gov.au/nph-arch/2000/S2000-Dec-5/http://www.paralympic.org.au/athletes/athleteprofile60da.html
- http://pandora.nla.gov.au/tep/23790
- http://webarchive.nla.gov.au/gov/20120326012340/http://news.defence.gov.au/2011/09/09/army-airborne-insertion-capability/
- http://content.webarchive.nla.gov.au/gov/wayback/20120326012340/http://news.defence.gov.au/2011/09/09/army-airborne-insertion-capability
- Note: Not to be confused with non-webarchive URLs that appear similar:
- Note: No memento access
- Ghostarchive
- Domain: ghostarchive.org
- Hostname: [none]
- Path: archive, varchive/<site>, iarchive/<site>
- Timestamp: 4 to 14 digits
- Examples:
- https://ghostarchive.org/archive/fwAS7 (short-form)
- https://ghostarchive.org/archive/20210728022510/https://rms-support-letter.github.io/
- https://ghostarchive.org/varchive/UhCiGY75wVw (short form)
- https://ghostarchive.org/varchive/youtube/20100711020000/UhCiGY75wVw
- https://ghostarchive.org/iarchive/instagram/georgemofficial/1374848874216391600
- https://ghostarchive.org/iarchive/s/instagram/BMUccRPg4Ow
- https://ghostarchive.org/archive/20210728022510/https://www.instagram.com/p/BMUccRPg4Ow/
- Note: to convert short-form to long:
- For regular web pages:
- For video pages, e.g. YouTube:
- To find the earliest and latest archive available use a timestamp of "1990" or "3000" e.g.
- https://ghostarchive.org/archive/1990/https://rms-support-letter.github.io/ will find the earliest archived copy of that webpage, while
- https://ghostarchive.org/archive/3000/https://rms-support-letter.github.io/ will find the latest.
- The /archive/ path long form will work for all types of archives, for example https://ghostarchive.org/archive/20210728022510/https://youtube.com/watch?v=UhCiGY75wVw will redirect to the video, and https://ghostarchive.org/archive/20210728022510/https://www.instagram.com/p/BMUccRPg4Ow/ will redirect to the image.
- Freezepage.com
- Domain: freezepage.com
- Hostname: <none>, www
- Path: <none>
- Timestamp: <none> (only available via web scrape)
- Examples:
- Note: If the account ID which created the snapshot expires for lack of activity (no login to freezepage), the snapshot is deleted from freezepage.com
- Note: No memento access
- Library of Congress
- Domain: loc.gov
- Hostname: webarchive
- Path: all, lcwa####
- Timestamp: 4–14 digits
- Examples:
- Arquivo.pt (Portugal)
- Domain: arquivo.pt
- Hostname: <none>
- Path: wayback, wayback/wayback, noFrame/replay
- Timestamp: 4–14 digits ... might contain "mp_" see example
- Examples:
- Stanford Edu
- Domain: stanford.edu
- Hostname: swap, sul-swap-prod
- Path: <none>
- Timestamp: 4–14 digits
- Examples:
- Archive-It
- Domain: archive-it.org
- Hostname: wayback
- Path: all, a 4 digit number
- Timestamp: 4–14 digits
- Examples:
- BibAlex
- Domain: bibalex.org:80
- Hostname: web.archive, web.petabox
- Path: web
- Timestamp: 4–14 digits
- Examples:
- WikiWix
- Domain: wikiwix.com
- Hostname: archive
- Path: cache
- Timestamp: 4–14 digits
- Examples:
- Note: Does not support https. Does not support Memento
- Note: API access added in March 2018. By appending &apiresponse=1 to the end of the URL. (http://archive.wikiwix.com/cache/?url=http://www.linterweb.fr&apiresponse=1). This may require encoding of any other & in the url= section
- Note: Supports &title argument at end of URL not part of the source URL (similar to &apiresponse). Gives the name of the Wikipedia article the link is being used in (optional).
- National Archives US
- Domain: webharvest.gov
- Hostname: <none>
- Path: <variable>
- Timestamp: 4–14 digits
- Examples:
- National Archives Iceland
- Domain: vefsafn.is
- Hostname: wayback
- Path: wayback
- Timestamp: 4–14 digits
- Examples:
Europa Archives (Ireland)- Deceased. In May 2018 all archived moved to collections.internetmemory.org then as of September 2018, all archives were moved again to Archive-It [1]
- Domain: europarchive.org
- Hostname: collection
- Path: nli
- Timestamp: 4–14 digits
- Move example:
- Perma CC
- Domain: perma-archives.org, perma.cc
- Hostname: <none>
- Path: <none>, warc
- Timestamp: 4–14 digits for perma-archives.org, or snapshot ID
- Examples:
Proni Web Archives- Deceased. As of October 2018, all archives moved to Archive-It [2]
- Domain: proni.gov.uk
- Hostname: webarchive
- Path: <none>
- Timestamp: 4–14 digits
- Examples:
http://webarchive.proni.gov.uk/20111213123846/http
- Move example:
- Original:
http://webarchive.proni.gov.uk/20100218151844/http://www.berr.gov.uk/
- New: http://wayback.archive-it.org/11112/20100218151844/http://www.berr.gov.uk/
- Original:
- Parliament UK
- Domain: parliament.uk
- Hostname: webarchive
- Path: <none>
- Timestamp: 4–14 digits
- Examples:
- UK Web Archive (British Library)
- Domain: webarchive.org.uk
- Hostname: www
- Path: wayback/archive
- Timestamp: 4–14 digits with possibility of "mp_" at end
- Examples:
Canada- Deceased. As of May 2018 all archives moved to webarchive.bac-lac.gc.ca [3]
- Domain: collectionscanada.gc.ca
- Hostname: www
- Path: archivesweb, webarchives
- Timestamp: 4–14 digits
- Examples:
http://www.collectionscanada.gc.ca/webarchives/20061104084225/http://broadband.gc.ca/maps/province.html?prov=48
http://www.collectionscanada.gc.ca/archivesweb/20060209004933/http
- Note: Not to be confused with other close URL variants. Only capture "/webarchives/" or "/archivesweb/"
- Move example:
- Original:
http://www.collectionscanada.gc.ca/webarchives/20061104084225/http://broadband.gc.ca/maps/province.html?prov=48
- New: http://webarchive.bac-lac.gc.ca:8080/wayback/20061104084225/http://broadband.gc.ca/maps/province.html?prov=48
- Original:
- Canada
- Domain: bac-lac.gc.ca:8080
- Hostname: webarchive
- Path: wayback
- Timestamp: 4–14 digits
- Examples:
- Note: Formerly collectionscanada.gc.ca see above
- Catalonian Archive
- Domain: padi.cat(:8080)?
- Hostname: www, (none)
- Path: wayback
- Timestamp: 4–14 digits
- Examples:
- Web Archives Singapore
- Domain: nlb.gov.sg
- Hostname: eresources
- Path: webarchives/wayback
- Timestamp: 4–14 digits
- Examples:
- https://eresources.nlb.gov.sg/webarchives/2016-04-25%2019:07:06.000/wp/details/http://www.lta.gov.sg/apps/news/page.aspx?c=2&id=2dzk9l67sx9j40a1rhgdw3hvhrnxgq3zh34l77r37dj4w72jf1
- https://eresources.nlb.gov.sg/webarchives/wayback/20160425174854/https://www.lta.gov.sg/apps/news/page.aspx?c=2&id=2dzk9l67sx9j40a1rhgdw3hvhrnxgq3zh34l77r37dj4w72jf1
- Note: Not to be confused with other close URL variants. Only capture "/webarchives/wayback/"
- Slovenian Archives (Spletni)
- Domain: nuk.uni-lj.si:8080
- Hostname: nukrobi2 (may change)
- Path: wayback
- Timestamp: 4–14 digits
- Examples:
- Estonia Archives
- Domain: digar.ee
- Hostname: veebiarhiiv
- Path: a
- Timestamp: 4–14 digits
- Examples:
- Bavarian Archives
- Domain: bib-bvb.de
- Hostname: langzeitarchivierung
- Path: wayback
- Timestamp: 4–14 digits
- Examples:
- York University Digital Library
- Domain: yorku.ca
- Hostname: digital.library
- Path: wayback
- Timestamp: 4–14 digits
- Examples:
Other
- Memento
- Note: Redirects to an external archive service based on cached data in the Memento database which can fluctuate and/or be inaccurate due to the cache going out of sync with the client service.
- Note: Links quickly expire.
- Note: Cannot be accessed with Memento.
Undocumented
- https://cachedview.nl/ [1]
- https://cachedview.com/ [2]
- http://www.cachedpages.com/ [2]
- https://commoncrawl.org/ [3]
- https://www.bravenewtech.org/ "Video Vault" [4][5]
- https://conifer.rhizome.org/ [6]
- https://archive.st/ [7]
- https://github.com/jsvine/waybackpack [1]
- https://github.com/motherboardgithub/mass_archive [1]
See also
References
- ^ a b c "Wayback Archives". Tracetools. Retrieved 27 June 2022.
- ^ a b Bederov, Igor S. (8 May 2022). "Web Archive as an OSINT Tool". Medium. Retrieved 27 June 2022.
- ^ "Examples using Common Crawl Data". Common Crawl. Retrieved 27 June 2022.
- ^ "RightsLab". rightslab.org. Retrieved 27 June 2022.
- ^ "rightslab". Twitter. Retrieved 27 June 2022.
- ^ "Webrecorder". webrecorder.net. Retrieved 27 June 2022.
- ^ "wiki.Archiveteam". archiveteam.org. Retrieved 27 June 2022.