This is a summary of enwiki's various copyright violation detector bots and tools.
Detection via Google searches
Earwig copyvio detector
- https://copyvios.toolforge.org/
- maintainer: The Earwig, Chlod
- source code: https://github.com/earwig/copyvios
- last commit: 2 years ago
- tech: Python
- uses Google search API and the WMF eranbot Turnitin API
- WMF pays for credits
- no discount (NPerry (WMF) used to work on Wikimedia's partnership with Google, maybe this is something worth bringing up?)
- hard daily limit (maximum for any user of this API) of 10,000 queries per day
- costs US$50 per day
- makes up to 8 queries per page
- 2,000ish checks per day (not all checks use all 8 queries)
- has issues with concurrent queries
- uptime report: https://stats.uptimerobot.com/BN16RUOP5/784331770
- false positive handling via a community-maintained exclusion list at User:EarwigBot/Copyvios/Exclusions
Detection via Turnitin
CopyPatrol (original; undeployed)
Frontend (wikimedia-slimapp)
- https://copypatrol.toolforge.org/en
- maintainer: WMF Community Tech team (most active recent committer: MusikAnimal)
- source code: https://github.com/wikimedia/CopyPatrol/tree/569f76e113da307d3810e1333531fcfc8449dbcf
- last commit: 7 months ago
- tech: PHP, Twig (wikimedia-slimapp)
- is mostly a viewer for an SQL database that the copyright detection bot(s) below writes to
- users can mark pages/revisions as being fixed or requiring no action. (However, this information is not reflected on enwiki)
- there is a "compare" feature in the CopyPatrol interface. clicking on it does an API query to the Earwig tool above
Backend (EranBot)
- bot name: EranBot
- BRFA: Wikipedia:Bots/Requests for approval/EranBot 3
- maintainer: ערן
- also involved: Doc James, Ocaasi
- source code: https://github.com/valhallasw/plagiabot
- last commit: 2 years ago
- tech: Python, Pywikibot
- writes to an SQL database that CopyPatrol uses
- uses PageTriage
pagetriagetagcopyvio
API to mark pages/revisions as probable copyright violations- writes this to the
pagetriage-copyvio
log: https://en.wikipedia.org/wiki/Special:Log?type=pagetriage-copyvio - displays the tag in Special:NewPagesFeed and the Page Curation toolbar's info flyout
- need the
copyviobot
permission to use this API (not assignable by admins, probably need a bureaucrat to do it)
- writes this to the
- uses Turnitin's iThenticate API
- does WMF pay for it or if it is comped???
- what are the daily limits?
- do we hit these limits?
- false positive handling via a community-maintained exclusion list at meta:User:EranBot/Copyright/Blacklist
CopyPatrol (rewrite)
Frontend (Symfony)
- https://copypatrol.wmcloud.org/en
- maintainer: WMF Community Tech team (most active recent committer: MusikAnimal)
- source code: https://github.com/wikimedia/CopyPatrol
- last commit: 3 months ago
- tech: Symfony (PHP)
- replaced https://copypatrol.toolforge.org/en
- is mostly a viewer for an SQL database that the copyright detection bot(s) below writes to
- users can mark pages/revisions as being fixed or requiring no action. (However, this information is not reflected on enwiki)
- there is a "compare" feature in the CopyPatrol interface. clicking on it does an API query to the Earwig tool above
Backend
- bot name: CopyPatrolBot
- BRFA: Wikipedia:Bots/Requests for approval/CopyPatrolBot
- maintainer: JJMC89
- source code: https://github.com/JJMC89/copypatrol-backend
- last commit: 2 months ago
- tech: Python
- rewrite of EranBot's copyright tasks
See also
- phab:T330435 - I read this and added its contents to this essay
- Wikipedia:Turnitin
- Wikipedia:Village pump (idea lab)#Brainstorming a COPYVIO-hunter bot - I read this and added its contents to this essay
- Wikipedia:WikiProject Articles for creation/AfC Process Improvement May 2018
- Wikipedia:WikiProject Articles for creation/AfC Process Improvement May 2018/Copyvio solutions comparison report