Bot-related archives (v·t·) |
---|
Human editors "should not" make large-scale cosmetic edits
I have boldly changed the wording of the policy to read that human editors "should not" make large-scale cosmetic edits. Please feel free to revert. Enterprisey (talk!) 04:03, 7 January 2022 (UTC)
- I've tightened the language, bringing the should together with the bot-like manner. Open to further tweaks. Headbomb {t · c · p · b} 05:34, 7 January 2022 (UTC)
- Looks good, thanks. Enterprisey (talk!) 05:44, 7 January 2022 (UTC)
WP:MEATBOT
What's the proper forum for raising concerns about a user using semi-automated tools for an extended period of time at a high rate? ANI seems a bit harsh as they are not acting in a poor manner but multiple editors have expressed concern on their talk page to the point community consensus/guidance may be wise, even if it to approve their conduct and cut down on talk page inquiries.Slywriter (talk) 21:01, 12 January 2022 (UTC)
- Village Pump proposal or policy would get the most exposure. Make a proposal "Should a bot do this.." link to the bot and diffs and what it does. -- GreenC 21:10, 12 January 2022 (UTC)
- Depends on how disruptive it is; ANI is right, if a single user's behavior seems disruptive and direct discussion is failing, ANI is the normal next step. — xaosflux Talk 23:52, 12 January 2022 (UTC)
I have raised a question at User talk:Ser Amantio di Nicolao on this matter. The edits are not disruptive, but they should be performed on an alternate account with the bot flag, to avoid flooding watchlists and to ensure BAG approval. I will wait to see the response — Martin (MSGJ · talk) 03:57, 24 February 2022 (UTC)
Is MEATBOT not relevant any more?
I'm closing this, especially in realtionship to the forum-shopping element. To summarize, the main points of MEATBOT are relevant, including the general call out of ...merely editing quickly, particularly for a short time, is not by itself disruptive.MEATBOT is primarily about avoiding disruptive editing, and while fast human editing can be disruptive due to flooding or other reasons, a consensus that such a batch of edits are not disruptive moots the bot policy violation argument. Forums such as ANI are already equipped to make determinations if there is actionable disruption occurring - follow up may continue there. — xaosflux Talk 18:40, 21 April 2022 (UTC) |
- The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
An editor has been running a big AWB task for about a week now, with over five thousand edits each day. This has lead to some issues with people's watchlists, and even ended up at ANI, where the consensus was that they had done nothing wrong and that the watchlist flooding was a price worth paying.
But isn't that precisely what the bot flags were there for? I had earlier asked that editor to go through WP:BRFA for this task, but their response was that they weren't going to because the process was "dysfunctional".
Is this really so? Is bot approval only optional nowadays, or are there some limitations to the applicability of WP:MEATBOT in circumstances like that? – Uanfala (talk) 12:19, 21 April 2022 (UTC)
- @Uanfala: In general someone flooding recent changes and watchlists may be Wikipedia:Disruptive editing, if a compromise can't resovle via discussion - you can list them at WP:ANI for administrator review. — xaosflux Talk 13:43, 21 April 2022 (UTC)
- "consensus was that they had done nothing wrong and that the watchlist flooding was a price worth paying." This is the part that matters. If there is consensus that it's a productive use of AWB, it's all gravy. Headbomb {t · c · p · b} 14:08, 21 April 2022 (UTC)
- So if someone starts doing an AWB task affecting, say, 70,000 articles, they're not required to get the bot flag? – Uanfala (talk) 14:18, 21 April 2022 (UTC)
- Like most things, "it depends". If they are doing something like replacing all "xyz's" with "xyz'z" - it probably should be a bot task; if they are doing something like inserting article-specific variable data, it probably shouldn't be flagged bot. While "speed" is an important factor for what should be flagged as bot, if the edit should be subject to reduced editor review is more important (bot flagged edits should never be used to add controversial "facts" to articles). — xaosflux Talk 14:33, 21 April 2022 (UTC)
- So if someone starts doing an AWB task affecting, say, 70,000 articles, they're not required to get the bot flag? – Uanfala (talk) 14:18, 21 April 2022 (UTC)
- "consensus was that they had done nothing wrong and that the watchlist flooding was a price worth paying." This is the part that matters. If there is consensus that it's a productive use of AWB, it's all gravy. Headbomb {t · c · p · b} 14:08, 21 April 2022 (UTC)
- I think the only way we're going to avoid constant fighting about this is a bright line on when a BRFA is needed. Something like: (Semi-)automated tasks being run as more than a limited-scope one-off (so excluding things like massrollback) need to comply with WP:BOTPERF's limit of 1 edit per 10 seconds (allowed the same moderate amount of flexibility that bots are in that regard). Tasks anticipated to affect more than 2,500 pages per day (about a workday's worth of edits at 6 edits per minute) require BRFA. -- Tamzin[cetacean needed] (she/they) 14:33, 21 April 2022 (UTC)
- I don't see
constant fighting
. I see one user who is not dropping a stick. Levivich 17:27, 21 April 2022 (UTC)- @Levivich: ... if you look at just this one conflict rather than the many others that have popped up at ANI over the years, often pitting our most productiv editors against each other, then I suppose you're right. Given how much time you spend at ANI, I'm surprised you hadn't noticed this trend. I'm not taking Uanfala's side here in the dispute with BHG. I'm acknowledging a reality that we as a community keep fighting over this. -- Tamzin[cetacean needed] (she/they) 18:41, 21 April 2022 (UTC)
- I don't see
- Procedural comment: Please note that this discussion is part of a wider campaign against me by @Uanfala, which now extends to a total of 4 venues: my user-talk, ANI, this page, and WT:DATESPROJ#Should_all_articles_have_a_dmy/mdy_template?. I have raised the issue at WP:ANI#Sneaky_forum-shopping_by_Uanafla. --BrownHairedGirl (talk) • (contribs) 17:42, 21 April 2022 (UTC)
- User:Uanfala *really* needs to drop this. Malcolmxl5 (talk) 17:46, 21 April 2022 (UTC)
Regarding WP:BOTPERF
Few comments about WP:BOTPERF. I understand that recently Wbm has ran a script purging a couple million pages on enwiki (per phab:T157670); a couple sysadmins seem to be in the discussion and did not seem to object to this. Reviewing BOTPERF, I notice it (correctly) says performance is an issue for the sysadmins not the community, and plus some notes (like the one on editing speed) is disregarded by many large bots, as well as the guidance on times of day.
I notice there's the sysadmins' policy at wikitech:Robot policy, and I would've suggested just linking to that and not imposing anything further locally (at least not for performance reasons), but then I notice that page has also been tagged as outdated with the note that some things there may no longer apply and it was written in 2009. Tbh I'd suspect that, given the scale of Wikipedia in 2022 and the fact that it also gets a lot of random requests from unaffiliated bots, that the infrastructure can handle high requests and imposing these requirements on our project's bots isn't awfully worth it. Basically wondering if its provisions are still relevant today? ProcrastinatingReader (talk) 23:16, 10 May 2022 (UTC)
- I don't think it should be scrapped, and the first two bullet points for example are very relevant to non-system performance factors. Some of it can likely be updated. — xaosflux Talk 23:23, 10 May 2022 (UTC)
- The first two seem like community restrictions/considerations rather than performance ones. The first should probably be mentioned in the paragraph about bot trials, and the second maybe in WP:BOTFLAG. ProcrastinatingReader (talk) 23:32, 10 May 2022 (UTC)
- I don't think anyone does the whole "slowly during peak hours" thing (correct me if I'm wrong though), and don't think it's that relevant. Using maxlag is definitely the better way to do it if the bots edits really need to be slowed, so I think that part can be scrapped.
- I think edit rate limits make sense, but I guess not for performance reasons. It gives people time to spot errors, see that a bot task is happening and object if needed, and avoid spamming watch-lists. Editing every 10 seconds is 8000 edits a day, which is a pretty decent rate. Not many tasks need to be faster than that. I think the current limits are pretty decent defaults - could be increased if current practice is more than that (I think many bots do 10 edits/minute?), but I don't think they should be removed - people shouldn't be doing 30-40 edits/minute unnecessarily.
- Maybe rename the section to "Edit rate"/"Run rate", if that's the concern. Galobtter (pingó mió) 23:54, 10 May 2022 (UTC)
- I agree that the section should be updated. The only time I have seen the website getting disrupted by a bot is wikitech:Incidents/2021-07-26 ruwikinews DynamicPageList. That was because of a bot importing ~100k pages per day and even then it was resolved in 30 minutes. If it needs such a high load to cause issues, I don't see how advising to slow down during peak hours is still relevant. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 04:35, 11 May 2022 (UTC)
- Skimming that report, it seems it was primarily due to the DynamicPageList extension (since disabled on ruwikinews). Without that extension, which enwiki does not have per Special:Version, the servers could've handled even that load it seems, and sysadmins didn't seem to require the shutting down of the bot.
- To Galobtter, those two primarily, but also the restriction on fetching pages (
requesting many individual pages are not permitted
) doesn't seem too relevant today. The others also seem to exist for non-performance reasons (namely, for community harmony), so should be described as that I think. It doesn't seem like sysadmins actually require anything performance-related from bots (and otherwise I think it's just the UserAgent stuff they ask for)? Would be best to get sysadmin clarification though, as I'm just guessing. ProcrastinatingReader (talk) 11:18, 12 May 2022 (UTC)- With my sysadmin hat on, the advice about running at different speeds for peak hours vs. quiet time is super outdated and can go. In general the requirements for bots are 1) follow the user-agent policy 2) run with maxlag=5, 3) Action API requests should be made in series, not in parallel. The exception for #2 is for bots that must continue to edit as long as humans are editing, like anti-vandalism bots or my TFA Protector Bot. If you're doing something weird or unusual (like the purging mentioned earlier), it's always nice to flag it for sysadmins ahead of time in case there are concerns. The problem that happened on ruwikinews was that after an action had been identified as problematic (mass creation of articles used in DPL queries), the operator had been asked to tell sysadmins again if they do it in future - they didn't, which unsurprisingly led to the same result of a sitewide outage.
- Edit speeds are a bit harder to give advice on, my personal opinion (not backed by science or numbers) is that running at ~10 edits/minute (6 seconds between edits rather than the recommended 10) is usually good enough, and if a bulk task takes a week or a month, so be it. If some bots started editing faster like 15epm or 30epm, no one would notice. If every bot started editing faster, I don't know what would happen. Legoktm (talk) 22:12, 12 May 2022 (UTC)
- I think it would be sensible to have some way to run a bot at significantly higher speeds than currently recommended. For example when I was doing my short description additions after thousands of checked edits it would be very convenient to ramp up the speed a few times so the 150-250k edits needed didn't take weeks. There is however significant benefits with starting out slowly and I think that should continue to be the norm even if it isn't for performance reasons but rather to help with finding and resolving potential issues. It is a lot easier to deal with 5 bad edits than 500. --Trialpears (talk) 12:04, 12 May 2022 (UTC)
For some context, some past incidents I remembered off the top of my head where sysadmins intervened: the infamous status bots, a single bot that was constantly null editing that it amounted to 30% of all edits across all Wikimedia projects, a bot that was sending harmless yet bogus requests that were interfering with dashboards that tracked 4xx and 5xx errors, a bot that null edited every page on Wikipedia, exposing a MediaWiki bug!, but combined with some new Parsoid stuff being deployed caused a huge buildup in the job queue. Bots can definitely cause trouble, but it's usually the exception because bot ops are decent about self-regulating. Legoktm (talk) 01:45, 19 May 2022 (UTC)
Proposal
Based on the above discussion as well as an informal discussion with some other sysadmins, I propose replacing the entire WP:BOTPERF section with the following:
While editors generally should not worry about performance, bot operators should recognize that a bot making many requests, editing at a high speed or creating many pages can have a much greater impact and cause genuine problems. Bots are exempted from MediaWiki's normal rate limits and given higher API limits to enable more opportunities at the cost of more responsibility. System administrators expect communities to self-regulate bots but will inform users if issues do arise, and in such situations, their directives must be followed.
- Bots must follow all applicable global policies, including the User-Agent policy and API etiquette guideline.
- Bots must use
maxlag=5
(see documentation) unless they are e.g. an anti-vandalism bot that should keep editing as long as humans are. - Bots should use a speed of about 10 actions/edits per minute. There may be other non-performance related reasons to run at slower speeds, such as clogging up watchlists or potential bugs that require mass-reversion of edits.
Bots should always strive to use the most efficient method to operate. When dealing with bulk data, see m:Research:Data and wikitech:Portal:Data Services for different options.
If you are doing something unusual or different and are unsure about the impact you might have, please feel free to ask system administrators for assistance or guidance.
Let me know what you think. Legoktm (talk) 01:45, 19 May 2022 (UTC)
- I find that too big a deviation from the current guideline in subtle but significant ways (e.g. general considerations, trial bots, unflagged bots, etc...). I'd rather keep the current section and tweak the one or two bullets than need adjusting, rather than TNT the section. Headbomb {t · c · p · b} 02:40, 19 May 2022 (UTC)
- @Headbomb: It was pointed out above that the parts about trial and unflagged bots aren't relevant to performance, they're for "community harmony" (as @ProcrastinatingReader put it). Probably the trial part can go into WP:BOTAPPROVAL and I don't know where unflagged bots should be discussed.
- Which part are you referring to as "general considerations"? If you mean the intro paragraph, the substantive changes (as I intended them) are 1) calling out rapid page creation as a potentially problematic activity 2) explain what special permissions bots have that can lead to perf issues 3) state that sysadmins expect communities to self-regulate. I think all of those are justified changes. Legoktm (talk) 05:00, 19 May 2022 (UTC)
- Why it's needed (greater effect/potential to be disruptive), urgency of the task is a factor, the downloading of pages, not making unnecessary requests, making use of dumps, etc...
- I find little that needs to be tweaked outside of one or two bullets, which is basically that the guidance on when to make bot edits is superceded by the modern maxlag standards. Headbomb {t · c · p · b} 07:01, 19 May 2022 (UTC)
- Do the first two sentences that contains "greater impact" and discuss specific permissions not cover the "why it's needed?" Based on yours and Anomie's feedback, I'll add a "community performance" section as well that covers stuff like urgency of tasks, flagged vs unflagged.
- Re: "the downloading of pages, not making unnecessary requests, making use of dumps, etc.", I don't think it really makes sense to call these out anymore. It's probably more efficient to use insource/regex searches than to scan an entire dump. And we have APIs like Restbase that are intended to provide bulk content access. Are there usecases that I'm overlooking? But they're still indirectly mentioned anyways, both the Meta and Wikitech links prominently feature dumps as an option. And "use the most efficient method to operate" is the positive phrasing of not making unnecessary requests. Legoktm (talk) 05:31, 20 May 2022 (UTC)
- I personally disagree with 10epm; AWB basically caps bot edits at 20epm these days and that's what I've been using since day 1 (when the general guidance was "max 30epm"). I don't see why we're cutting that by a third. Primefac (talk) 07:20, 19 May 2022 (UTC)
- @Primefac: sorry, where does it say 30epm? For urgent stuff, it currently has "once every five seconds" aka 12epm. Legoktm (talk) 05:24, 20 May 2022 (UTC)
- PrimeBOT 7's task gave explicit approval, it was discussed briefly in this discussion (and no one contested my statement "recommended 20epm"), 15epm was approved here (though 20 was discussed), and there are a handful of others. I will admit that most of these are either for short-term edits or "we must fix this immediately" mass edits, and there are plenty of discussions where 30epm (or even 8epm) is listed as "too high", but we either need to start enforcing these speed limits (which we apparently, currently, don't) or have them match the standard usage; I can't be the only botop running in the 15-20epm range on a regular basis. Primefac (talk) 07:49, 20 May 2022 (UTC)
- @Primefac: sorry, where does it say 30epm? For urgent stuff, it currently has "once every five seconds" aka 12epm. Legoktm (talk) 05:24, 20 May 2022 (UTC)
- Seems ok to me as far as it goes, but finding a home for the community-related performance recommendations (versus systems performance, which this is now focused on) would be good. As it is that's only hinted at now. Anomie⚔ 12:23, 19 May 2022 (UTC)