Welcome to the edit filter noticeboard |
---|
Recent filter changes (
):Filter 1014 — Pattern modified
Filter 1213 (new) — Actions: none; Flags: enabled,public; Pattern modified
Filter 1212 (new) — Actions: none; Flags: enabled,public; Pattern modified
Filter 1168 — Actions: disallow; Pattern modified
This is the edit filter noticeboard, for coordination and discussion of edit filter use and management. If you wish to request an edit filter, please post at Wikipedia:Edit filter/Requested. If you would like to report a false positive, please post at Wikipedia:Edit filter/False positives. Private filters should not be discussed in detail here; please email an edit filter manager if you have specific concerns or questions about the content of hidden filters. Click here to start a new discussion thread |
Index 1, 2, 3, 4, 5, 6, 7, 8, 9 |
Sections older than 10 days may be automatically archived by Lowercase sigmabot III. |
Request for EFH permission for PhantomTech
- The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
- PhantomTech (t · c · del · cross-wiki · SUL · edit counter · pages created (xtools • sigma) · non-automated edits · BLP edits · undos · rollbacks · logs (blocks • rights • moves) · rfar · spi)
This request is for EFH, I've previously requested EFM permission at Special:Permalink/654597783#Request_for_permission_for_PhantomTech and will notify those participants shortly after posting this request. At the time of the request there was no local user group with abusefilter-view-private and it was suggested that permission may have been more appropriate to grant at the time.
My focus on Wikipedia has been counter vandalism, including LTA. I've somewhat recently returned and have been getting back into the areas that have interested me before. I've previously watched for false positive reports that I could respond to and have started to resume doing that, along with providing information or suggestions in filter requests. My primary purpose for the permission would currently be to be able to respond to false positives on private filters. I'd also like to help with modifications to private edit filters and have done so in the past with public ones, but I'm not sure what the process for that would be for someone with EFH since they're private (-en-editfilters?). I expect to continue helping with requests but don't expect the permission to be very helpful in that area.
It's hard to self assess competency but I think I have a very good understanding of RegEx. I've used it extensively in the past, including for things related to Wikipedia. PHANTOMTECH (talk) 03:39, 13 July 2022 (UTC)
- Endorse: The concerns raised in your previous request mainly seem to revolve around editing filters — seeing as EFH only grants
abusefilter-view-private
, and given that you have been fairly active i.r.t. assisting with filters (49 edits to EF/FP, 31 edits to EF/R, with decent suggestions), I'm happy to support this request — TNT (talk • she/her) 04:10, 13 July 2022 (UTC) - Timestamp to prevent possible archiving before close.
PhantomTech[talk]
01:10, 22 July 2022 (UTC)
- An odd predicament here: low participation but no objection. Speak now (in the next 24 hours) or forever hold your peace, y'all! -- Tamzin[cetacean needed] (she|they|xe) 02:02, 29 July 2022 (UTC)
- Support seems fine, has been around long enough without causing trouble. I opposed for EFM, and still would right now, but not worried about EFH. — xaosflux Talk 13:37, 29 July 2022 (UTC)
EF AbuseFilter/614, individual matches since last year
614 (hist · log)
Hello, I have written some Rust code to identify individual matches on 614, as a way to find unused cases and also get some interesting data.
Here's a list of individual cases in that regex and how many times it was hit for the past 52 weeks:
Data dump
|
---|
(backslashes are escaped in the output) Case { src: "fetus\\s*deletus", count: 1 }
Case { src: "you'?ve\\s*been\\s*gnomed", count: 1 }
Case { src: "e+s+k+e+t+i+t", count: 2 }
Case { src: "gucci\\s*gang[\\s,]*gucci\\s*gang", count: 3 }
Case { src: "hard\\s+(?:pp|peepee)", count: 3 }
Case { src: "hit\\s*or\\s*miss[\\s,]*I\\s*guess", count: 3 }
Case { src: "#sw[4ae]g", count: 4 }
Case { src: "(?:f[u\\*][c\\*]k(?:ing?|ed|s)|sex\\s*with?)\\s*chickens?", count: 4 }
Case { src: "#redirect\\s*\\[\\[donald[\\s_]trump\\]\\]", count: 5 }
Case { src: "#yolo", count: 6 }
Case { src: "ugandan\\s*knuckles", count: 7 }
Case { src: "420\\s*b+l+a+z+e+\\s*i+t+", count: 8 }
Case { src: "tran?s?.?manian?\\b", count: 10 }
Case { src: "y\\s*o\\s*[lo\\s]+s\\s*w\\s*[4ae]+\\s*g+", count: 11 }
Case { src: "\\by+o+l+o[lo]+", count: 11 }
Case { src: "s+w+[4ae]+gg[g]+", count: 11 }
Case { src: "(?:pp|peepee)\\s+hard", count: 12 }
Case { src: "chicken\\s*f[u\\*]?[c\\*]k(?:er|s|ing)?", count: 16 }
Case { src: "bush\\s*did\\s*9.?11", count: 16 }
Case { src: "hitler\\s*did\\s*nothing?\\s*wrong", count: 20 }
Case { src: "\\bdat\\s* boi", count: 24 }
Case { src: "epst(?:ei|ie)n\\W+did\\s*n.?t\\s+kill", count: 30 }
Case { src: "rawr\\s*xd", count: 31 }
Case { src: "dank\\s*meme", count: 36 }
Case { src: "ok(?:ay)?,? boomer", count: 40 }
Case { src: "drumpf", count: 41 }
Case { src: "\\s*i\\s*n\\s*t\\s*h\\s*[ae]\\s*p\\s*(?:(?:[@uv*]\\s*)+(?:[zs$*]\\s*)+|[zs$*]{2,})\\s*a*y+", count: 43 }
Case { src: "(?:them'?s?|dems?|those\\s+are)'?\\s+(?:th[ea]|da)\\s+fa(?:cts|x)!?", count: 48 }
Case { src: "\\booo+f+\\b", count: 52 }
Case { src: "sw[4ae]g\\s*(?:yolo|daddy|money|lord|master)", count: 57 }
Case { src: "\\bt+r+o+l(?:o+l|ll)", count: 60 }
Case { src: "absolute\\s*unit", count: 75 }
Case { src: "\\bayyy", count: 89 }
Case { src: "\\bnibb+a+\\b", count: 98 }
Case { src: "\\bbruv+\\b", count: 103 }
Case { src: "sub(?:scrib(?:e|es|ed|ing))?\\s*(?:to|2)\\s*(?:p(ew|ud|ue|uw|oo)|te*.?series)", count: 115 }
Case { src: "\\b(?:ranboo|tubbo)", count: 153 }
Case { src: "\\beats?\\s*ass\\b", count: 155 }
Case { src: "b+o+iii", count: 234 }
Case { src: "\\br+eeeeee", count: 279 }
Case { src: "\\bg+a+yy(?:y|\\b)", count: 415 }
Case { src: "quandale\\s*dingle", count: 445 }
Case { src: "\\bt+\\s*h+\\s*i+\\s*c\\s*c", count: 456 }
Case { src: "aviation\\s*,[\\s\\S]*?there\\s*is\\s*no[\\s\\S]*?bee[\\s\\S]*?be\\s*able\\s*to\\s*fly", count: 478 }
Case { src: "chung[uea]s\\b", count: 564 }
Case { src: "lolo(?:lo)+", count: 807 }
Case { src: "\\by+ee+t+(?:e+(?:r+|d+))?\\b", count: 858 }
Case { src: "h+iiiii", count: 1014 }
Case { src: "dQw4w9WgXcQ", count: 1033 }
Case { src: "\\bh+iii+\\b", count: 1035 }
Case { src: "hehehe", count: 1414 }
Case { src: "\\bbruh+\\b", count: 1819 }
Case { src: "\\bl+m+a+o", count: 1868 }
Case { src: "\\buwu\\b", count: 1885 }
Case { src: "joe m[oa]m+a", count: 1966 }
Case { src: "(?:69\\D*420|420\\D*69|(?:69\\D{0,50}){3,})", count: 3164 }
Case { src: "(?:d[3e](?:[3e]+[sz]+|[sz][sz]*)e*|th[3e][zs$][3e])\\s*n+u+t+[zs$]", count: 5537 }
|
I removed negative lookarounds in some regexes because those are not needed for correctness but more so that the library I am using does not support them.
should this be called yolo swag anymore..?
inspect the sources if you would like at GitHub. 0xDeadbeef 16:51, 15 July 2022 (UTC)
- Filter 614 is called "Yolo swag and other vandalism trends" since 2014 - are you seeing it just say the former somewhere? — xaosflux Talk 12:01, 16 July 2022 (UTC)
Okay, there might have been a screw up. I didn't make it case insensitive. I will look into this again and post a more correct data dump. 0xDeadbeef 13:41, 16 July 2022 (UTC)- Updated. Looks like they all hit some stuff. It would be nice, though it would be nicer if I added a date for the latest hit. 0xDeadbeef 14:56, 16 July 2022 (UTC)
- Thanks again! I renamed it to "Memes and other vandalism trends". I might remove some of those low-hit patterns too. Suffusion of Yellow (talk) 18:57, 18 July 2022 (UTC)
EF AbuseFilter/260, individual matches since last year
No screw ups this time, I promise. I implemented ccnorm for analyzing AbuseFilter/260's matches so it should be fine:
data
|
---|
Case { src: "AUSFAILIA", count: 0 }
Case { src: "YAMAL AO2", count: 0 }
Case { src: "NJOTRANS", count: 0 }
Case { src: "GENERISK", count: 0 }
Case { src: "ION (?:DETOXIFY|CLEANSE)", count: 0 }
Case { src: "DETOX FOOT SPA", count: 0 }
Case { src: "NATION(?:AL)? DEBT RELIEF", count: 0 }
Case { src: "YOUR ANONYMOUS VPN", count: 0 }
Case { src: "LSGLASSES\\.COM", count: 0 }
Case { src: "MEEBO BAR WAS LAUNCHED", count: 0 }
Case { src: "SHAVED MY BALLS", count: 0 }
Case { src: "CORY JUDE", count: 0 }
Case { src: "CLEM WASHINGTON", count: 0 }
Case { src: "THE BROTHER ZEN", count: 0 }
Case { src: "YOLOSWAG", count: 0 }
Case { src: "CENSORS THE TRUTH", count: 0 }
Case { src: "9GSOO7EGS8", count: 0 }
Case { src: "I[- ]?8(OO[- ]?(?:98G[- ]?AS2O|87O[- ]?7AI2)|77[- ]?929[- ]?EE7E)", count: 0 }
Case { src: "DIFF=82(?:SI88EG8|GII9SGO)\\b", count: 0 }
Case { src: "OLDID=82(?:SSE78OE|GII9A82)\\b", count: 0 }
Case { src: "USER:BONADEA_AND_USER:FAVONIAN_ARE_SOCKS", count: 0 }
Case { src: "RICHARD MADENFORT", count: 0 }
Case { src: "REGGIE BRADLEY", count: 0 }
Case { src: "78ESOAIEOI", count: 0 }
Case { src: "DREU[FV]", count: 0 }
Case { src: "DISUSEKID}", count: 0 }
Case { src: "HPHELPNUMBER", count: 0 }
Case { src: "CRUI[SZ]IR", count: 0 }
Case { src: "\\bMY SEMEN", count: 1 }
Case { src: "\\bROYER\\d", count: 1 }
Case { src: "▄▄█▀▀", count: 2 }
Case { src: "BUTTPEE", count: 2 }
Case { src: "OH ?SHI-", count: 3 }
Case { src: "\\bPE[MN]JS", count: 3 }
Case { src: "CAT EYES 2OO7", count: 5 }
Case { src: "HUEHUEHUE", count: 6 }
Case { src: "RICK MARTY", count: 6 }
Case { src: "CRUI?Z[IE]?[DR]+\\b", count: 6 }
Case { src: "TASLIMSON", count: 6 }
Case { src: "HUE HUE HUE", count: 8 }
Case { src: "U MAD BRO", count: 10 }
Case { src: "[FV]UERD", count: 14 }
Case { src: "EA8O7778EAG", count: 16 }
Case { src: "NIGGAH", count: 17 }
Case { src: "\\bASS+[ \\-]HAT\\b", count: 18 }
Case { src: "GNAA\\b", count: 18 }
Case { src: "(?:DICK|COCK) IN (?:THE)? (?:ASS+|BU[TM]+|MOUTH)", count: 19 }
Case { src: "SHREK IS LOVE", count: 45 }
Case { src: "FUCKING KILL", count: 71 }
Case { src: "STUPID BITCH", count: 98 }
Case { src: "WIKIPEDIA (?:BLOW|SUCK)", count: 132 }
Case { src: "I LIKE (?:PIE|CHEESE)", count: 156 }
Case { src: "CLETUS ?BARTHOLOMEW", count: 195 }
Case { src: "\\b(?:ASS+|BU(?:TT+|M+))[- ]?(?:R(?:APE|AEP)|(?:PH|F)UCK)", count: 259 }
Case { src: "\\bLIGMA\\b", count: 1052 }
Case { src: "PORN[- ]?STAR", count: 1744 }
Case { src: "HAHAHAHA", count: 2496 }
Case { src: "\\b(?:MAS+IVE|ENORMO?US|HUMO+NGO?US|HUGE|TINY|SUCK(?:ED|ING|S)?|MY|MAH|HIS) (?:(?:KAW|[CK]O)[CK]+|DICK?|PRICK|BALL*S|A[SZ]{2,})\\b", count: 6510 }
Case { src: "NIGG(?:A(H\\W)?|ER)", count: 8377 }
|
This counts the matches for the last 52 weeks. 0xDeadbeef 14:03, 16 July 2022 (UTC)
- @0xDeadbeef: Thanks! I removed everything with 0 hits, and also the redundant "NIGGAH". Doesn't seem to cause any false negatives in the last 2000 hits (about two months). Suffusion of Yellow (talk) 17:58, 18 July 2022 (UTC)
Suggestion
Should there (or is there any?) be a filter to tag edits/page creations by new users that are made in Template, Category, Wikipedia (except AfD and SPI) namespaces? --Minorax«¦talk¦» 11:12, 25 July 2022 (UTC)
- Theoretically this should work:
!("confirmed" in user_groups) &
page_age == 0 &
contains_any(page_namespace, 10, 14, 4) &
!(page_prefixedtitle in "^Wikipedia:(Articles for deletion\/.+|Categories for discussion\/Log\/\d{4} \w{3,9} \d{1,2}|Files for discussion\/\d{4} \w{3,9} \d{1,2}|Miscellany for deletion\/.+|Redirects for discussion\/Log\/\d{4} \w{3,9} \d{1,2}|Templates for discussion\/Log\/\d{4} \w{3,9} \d{1,2})$")
- I don't have access to EFH, so I can't test it, but if an EFM could create a test filter for this, it would be much appreciated. 🐶 EpicPupper (he/him | talk) 03:29, 4 August 2022 (UTC)
- Here's a regex101 test: [1] 🐶 EpicPupper (he/him | talk) 03:36, 4 August 2022 (UTC)
- The middle 2 lines should be:
page_age == 0 & contains_any(page_namespace, 10, 14, 4) &
- I'm not sure if there's a better way to do the title check. It can be done using
substr
but that won't be better than regex for accuracy and might not make a significantly beneficial performance difference, if any at all.PhantomTech[talk]
04:36, 4 August 2022 (UTC)- I forgot to add SPI. It's in there now. 🐶 EpicPupper (he/him | talk) 04:40, 4 August 2022 (UTC)
- And thanks! 🐶 EpicPupper (he/him | talk) 04:40, 4 August 2022 (UTC)
- Err, on second thought, SPI probably shouldn't be included; I'm thinking this should be a create-type filter rather than edit. There would be too many good-faith pages where IPs/non-AC'ed users could edit (the Teahouse comes to mind, but excluding that I think there would be much more still). 🐶 EpicPupper (he/him | talk) 05:40, 4 August 2022 (UTC)
- And thanks! 🐶 EpicPupper (he/him | talk) 04:40, 4 August 2022 (UTC)
- I forgot to add SPI. It's in there now. 🐶 EpicPupper (he/him | talk) 04:40, 4 August 2022 (UTC)
- Summarizing!
!("confirmed" in user_groups) &
page_age == 0 &
contains_any(page_namespace, 4, 10, 12, 14, 710, 828) &
!(page_prefixedtitle in "^Wikipedia:(Articles for deletion\/.+|Categories for discussion\/Log\/\d{4} \w{3,9} \d{1,2}|Files for discussion\/\d{4} \w{3,9} \d{1,2}|Miscellany for deletion\/.+|Redirects for discussion\/Log\/\d{4} \w{3,9} \d{1,2}|Templates for discussion\/Log\/\d{4} \w{3,9} \d{1,2})|Template:Did you know nominations\/.+$")
- The above code checks if non-autoconfirmed users create a new page in the Wikipedia, Template, Help, Category, TimedText or Module namespaces. It excludes XfD and DYK. This filter should likely be a tag filter. TheresNoTime, can I ask you for a favor? 🐶 EpicPupper (he/him | talk) 22:25, 4 August 2022 (UTC)
- Testing at Special:AbuseFilter/1 (previous test promoted to filter) — TheresNoTime (talk • she/her) 22:31, 4 August 2022 (UTC)
- A few comments:
- contains_any(page_namespace, 10) is not the proper way to check for namespaces; for example, it tests true for namespaces 100 and 101. Use equals_to_any(page_namespace, 10)
- if you're checking for "Wikipedia:Articles for deletion/" PLUS "whatever", you might as well check for "Wikipedia:Articles for deletion/" alone; saves some processing time. Same for other strings, in most cases there's no need to be that specific unless you're gonna use the rest of the string for something.
- "in" is not a regex operator, use rlike. When using ^ and $, what's in between needs to go inside a noncapturing group, "^(?: )$"
- Documentation is here.
- Ponor (talk) 00:29, 5 August 2022 (UTC)
- This is right, here's the modified filter
!("confirmed" in user_groups) & page_age == 0 & equals_to_any(page_namespace, 4, 10, 12, 14, 710, 828) & !(page_prefixedtitle rlike "^(?:Wikipedia:(?:Articles for deletion\/|Categories for discussion\/Log\/\d{4} \w{3,9} \d{1,2}$|Files for discussion\/\d{4} \w{3,9} \d{1,2}$|Miscellany for deletion\/|Redirects for discussion\/Log\/\d{4} \w{3,9} \d{1,2}$|Templates for discussion\/Log\/\d{4} \w{3,9} \d{1,2}$)|Template:Did you know nominations\/)")
PhantomTech[talk]
04:33, 5 August 2022 (UTC)- Thanks for being speedy, a step ahead of me. @TheresNoTime could you tweak the filter? Sorry, this was my suggestion. 🐶 EpicPupper (he/him | talk) 04:38, 5 August 2022 (UTC)
- Changes made (more fool me for only clicking "check syntax" and not properly reading it, given it was log-only.. will be a bit more careful next time) — the debugging tools in AbuseFilter is really useful for testing out things like this — TheresNoTime (talk • she/her) 10:23, 5 August 2022 (UTC)
- Thanks for being speedy, a step ahead of me. @TheresNoTime could you tweak the filter? Sorry, this was my suggestion. 🐶 EpicPupper (he/him | talk) 04:38, 5 August 2022 (UTC)
- To further simplify: while it doesn't hurt, forward slashes do not need to be escaped (\/). Use "rescape" function in debug console to see that. Ponor (talk) 08:57, 5 August 2022 (UTC)
- While it isn't a special character in regex and doesn't need to be escaped for the filters here, it is the default delimiter at regex101 for PCRE so I think that's why it is escaped.
PhantomTech[talk]
09:22, 5 August 2022 (UTC)
- While it isn't a special character in regex and doesn't need to be escaped for the filters here, it is the default delimiter at regex101 for PCRE so I think that's why it is escaped.
Changes to the lead.
My thought is a tag that would be useful in detecting Wikipedia:Lead dos and don'ts violations. The difficult part is where the cut off should be so adding hatnotes or minor grammatical changes are not tagged. So basically just new prose. Richard-of-Earth (talk) 17:04, 2 August 2022 (UTC)
- Which aspects specifically would you like to look into detecting? Length? Parenthetical phrase count? Bolding? I don't know if this stuff is a good fit for an edit filter, but hammering out the details is the next step to discussing this, I think. –Novem Linguae (talk) 19:09, 2 August 2022 (UTC)
- Well my first thought was people who add new information to an article in the lead, but not in the body. So I guess we would look at the size of the addition. This is likely to include a lot of false positives, but might still be useful. Maybe a separate tag for bolding in the lead. Such tags might encourage editors to patrol the leads. This is very much a half-baked idea. Richard-of-Earth (talk) 21:49, 2 August 2022 (UTC)