This is my simple guide to editing at CCI - Contributor Copyright Investigations. marking stuff down, and what to do in special situations, based off of my own experience.
If you are experienced with this area on Wikipedia, feel free to add other advice. For a list I have made of CCIs, see User:Moneytrees/CCI Sort.
Basic steps of CCI
- NOTE: Earwig does not work on books or journals, and cannot translate non English sources into English, so you will have to either manually compare non-English those with the articles, or paste the content into a page like User:Moneytrees/dummy if you want it to be read by Earwig.
- 1. Before running earwig, I recommend fixing all dead links in the article. This may not be needed for edits within the last year, but is required for anything older. Make sure you check the box to link archives for all links, as the bot doesn't always detect dead links using the default option. Earwig will sometimes have trouble reading certain archives, so be patient if it doesn't work initially.
- 2. On an article that has a long history and many edits from many different users, you may not even need to run a check. Instead, take the diff link and paste it into the "URL comparison" box and run the comparison on the article on the listing. Look at the text highlighted, which will show whether or not if it is still in the article.
- 2.1 If it is still in the article, then compare the source cited in the edit to the article using the above process. If there is no source, then look through the next few edits in the page's history to determine when a ref was inserted. If the ref is dead, archive it in the current version using the fix dead links option. If it is not, manually archive it.
- 2.2 If it is no longer in the article, then mark the listing with
{{?}}Rewritten/removed since --~~~~
Make sure it wasn't moved to a different article.
- 3. Enter the article in and run the scan. Alternatively, just compare the sources cited in the edit with the edit id, as long as the source is not dead. I strongly encourage looking at the sources cited in the initial edits, as they may no longer be in the article, and earwig only does a limited web search, making it unlikely to find them.
- 4. View the results. Try and ignore the percentage, go off of highlighted text. At least check everything above ten percent.
- 5. If you find no violation, write
{{n}} Checked --~~~~
- 6. If you do find (a) violation(s), remove it. Depending on how large it is, (If you are a non-admin) mark the article for a revdel; I highly recommend you install User:Enterprisey/cv-revdel for this. Remove the diffs next to the listing, and write
{{y}} removed --~~~~
- 7. Keep an eye out for sources in the Public Domain or under free license; Some of them are attributed properly, some are not. They tend to be US government sources/very old (pre 1927) books. Keep in mind the public domain status of books in other countries is different than America's; if you are unsure of the public domain status, look at Commons:Commons:Copyright rules by territory See the bottom of this page for a chart showing the compatible licenses.
- 7.1. If it is unattributed, add the {{Source-attribution}} or {{Creative Commons text attribution notice}}, add it into the ref like I do here
- 8. For half/un attributed interwiki translations, add the article it was translated from to the talk page, like I do here.
- 9. For unattributed in wiki copying, add a note to the talk page, like I do here.
- 10. For Cut and Paste moves that don't have parallel histories (edits in between the paste on both articles, making history merging impossible), tag the article with Template:History merge (can be found in Twinkle).
- 11. For cases where you are unsure about who copied from what, the paste is very complicated, or it could be deleted but is not a straight G12, blank the article using
{{subst:copyvio|url=INSERTURL}}
and follow the instructions on the generated notice. Notifying contributors that an article was blanked is not necessary. - 12. For book violations, look up sentences used in the edit on Google Books to try and find a match, although this is not always reliable as most books have no preview and Google can be random in what it decides to show. Asking someone for it through Wikipedia:WikiProject Resource Exchange/Resource Request, looking around for a copy on archive.org or simply getting a copy through purchase or a library can also work. If none of these options are workable and the content is suspicious it is best to remove it.
- 13. For translation violations, if you don't want to manually compare the source go to a site such as copyleaks and paste in the translation and then the text of the article to compare. Alternatively, you can paste in the translation to User:Moneytrees/dummy, have earwig scan it with URL comparison mode, remove it quickly afterwards, and I'll revdel it.
- 14. For journals/paywalls, you once again have the option of asking others or getting access through an institution (The Wikipedia Library has access to a good deal). If you can't do that, feel free to use more dubious methods- sometimes you need to break a rule to enforce another.
- 15. Finally, think critically about things and use your judgement. if you have any questions, don't be afraid to ask me or any other user familiar with the area.
Detecting mirrors
Keep in mind, many sites have copied from Wikipedia over the years, and using the search engine with earwig will almost always find a handful, so be careful when removing content. If it seems like the website copied from Wikipedia, CTRL F and type "Wikipedia", which will often highlight along the lines of "Taken from wikipedia" on the scanned web page. Ignoring direct mirrors, websites that often copy from Wikipedia include:
- Every Wikipedia article has been copied by at least one BlogSpot
- Social media sites
- Travel/vacation sites
- Cheap online shopping outlets
- Forums and question answering sites
- YouTube video descriptions
- Academia.edu
- forums in general
- Slideshare.net
- Flickr descriptions
A copyvio from these websites is still possible, but tends to be more unlikely. Check to see if the page is used as a source.
95% CCI SUBJECTS COPY FROM SOURCES THEY CITE. IF A SOURCE USED IN THE ARTICLE POPS UP ON EARWIG, IT IS ALMOST CERTAINLY A VIO.
Earwig times out when loading up this one site
Certain sites don't like earwig and will time out when it tries scanning them; The Independent and some PDFs are examples. If this happens, go to a website that will find Google web caches, which are saved versions of pages that earwig should always be able to read. https://cachedpage.co/ is an example.
Presumptive removals
In some cases, sources used by the CCI subjects are inaccessible, or money would have to be spent to access them. In these cases, it is best to remove the content inserted. Note that this is a last resort option; try and find if you can access the content before doing this. Presumptive removals may also be warranted in cases where the subject copied a specific thing (e.g. plot summaries), figuring out where the subject copied from would be too difficult, or where the CCI could be wrapped up quicker by just removing everything. If the sources are inaccessible and removing everything would be too difficult, tag the article for presumptive deletion, although those are only for worse case scenarios. For presumptive removals and deletions:
Presumptive removal over copyright concerns, please see: [[Wikipedia:Contributor copyright investigations/INSERTNAME]]
{{subst:copyvio|url=Presumptive deletion over copyright concerns, please see: [[Wikipedia:Contributor copyright investigations/INSERTNAME]]}}
If the amount of text you remove is major (+500 or important text), please leave a note on the articles talk page with {{subst:CCI|INSERTNAME}}
License guide
License Compatibility with Wikipedia [note 1] | |
---|---|
Licenses compatible with Wikipedia | Licenses not compatible with Wikipedia |
Creative Commons Licenses | |
|
|
Other Licenses | |
|
|
- ^ For text only; Please see Wikipedia:File_copyright_tags for licences allowed with files
- ^ According to the WMF legal team, CC BY-SA 4.0 is not backwards compatible with CC BY-SA 3.0. Therefore, mixing text licenses under 3.0 and 4.0 would be problematic, however media files uploaded under this license are fine.