dial-up? older browsers?
Please tell me you're kidding when you, in the year 2023, talk about The text on a 32 kB page takes about five seconds to load for editing on a dial-up connection
. How can it be this sentence has been overlooked for what must be decades since it was relevant?
Can we please agree that 32 kB is nothing. Maybe if you were concerned about a 32 GB page I could see the point.
TL;DR: This isn't 1991 when 9600 baud modems were the newest and shiniest. I certainly didn't do any math, but I can absolutely believe loading your example page would take five seconds on that thing. Even in the poorest and most remote areas of the world people no longer connect using 8-bit Ataris. Thank you. CapnZapp (talk) 07:33, 7 March 2023 (UTC)
- While Broadband is certainly more popular than it was in 2001, We should keep in mind that not everyone has access to the blazing fast connections of today. Hell, I'm still technically on modem because they didn't route fibre to my area. From what I gather, the point of the limit is to allow english editors outside of urban areas to edit with some degree of efficiency. No one wants to wait 2 hours to load an article. - MountainKemono (talk) 09:59, 16 May 2023 (UTC)
- A bigger consideration is computer processing speeds, especially when running multiple programs and web pages, on desktop and mobile devices. Onetwothreeip (talk) 22:41, 21 May 2023 (UTC)
Regards, CapnZapp (talk) 07:33, 7 March 2023 (UTC)
- Over a quarter of a million Americans still use dial-up internet. [2] Hawkeye7 (discuss) 03:22, 11 April 2023 (UTC)
- And 22% have no internet at all. In consideration of them, let's limit articles to zero bytes. EEng 03:53, 11 April 2023 (UTC)
- The bit about internet connection speed in relation is completely irrelevant when it comes to text. The amount data involved completely overshadowed by images and other graphical elements for example. It's simply not a relevant enough factor when discussing prose size. Peter Isotalo 09:16, 6 July 2023 (UTC)
- I agree that the technical issues section is obsolete at this point. It's also contributing to the ridiculous proliferation of articles based on every facet of a topic. I propose removing it entirely. Riposte97 (talk) 23:03, 19 October 2023 (UTC)
- Inclined to concur with wholesale removal. In particular, kb size of pages is more influenced by the number/size of images they have than by the "article size" per se. So it's not in practice within the remit of this policy page. Jo-Jo Eumerus (talk) 10:27, 20 October 2023 (UTC)
- Good point. Images are far more heavy on page size than simple text. eyeballing it alone, this thread would be close to 2K. If anything, we should have a policy more directed towards images and their raw file size. 256K is about the upper limit for what web devs use in regards to image size, whats stopping us from doing the same? - MountainKemono (talk) 10:41, 20 October 2023 (UTC)
- Inclined to concur with wholesale removal. In particular, kb size of pages is more influenced by the number/size of images they have than by the "article size" per se. So it's not in practice within the remit of this policy page. Jo-Jo Eumerus (talk) 10:27, 20 October 2023 (UTC)
- I agree that the technical issues section is obsolete at this point. It's also contributing to the ridiculous proliferation of articles based on every facet of a topic. I propose removing it entirely. Riposte97 (talk) 23:03, 19 October 2023 (UTC)
- The bit about internet connection speed in relation is completely irrelevant when it comes to text. The amount data involved completely overshadowed by images and other graphical elements for example. It's simply not a relevant enough factor when discussing prose size. Peter Isotalo 09:16, 6 July 2023 (UTC)
- And 22% have no internet at all. In consideration of them, let's limit articles to zero bytes. EEng 03:53, 11 April 2023 (UTC)
How do we measure "readability"?
This guideline has for almost two decades assumed just one type of external consumer: a reader with the goal of making it through an entire article, preferably in one sitting, regardless of size, topic and context. Any other way of consuming an article is not even hinted at. We present no evidence whatsoever if an article of, say, 15000 words is actually a bigger problem for most readers compared to an article of 6000 words.
Assuming that our highest priority is to provide the best possible content for readers, how do we know what readers really want and are best served by? Why is reading an article from start to finish in a single setting the only reader-focused ideal we strive for? Peter Isotalo 09:07, 4 July 2023 (UTC)
- I fully agree that readers typically spend less time on an article than fully reading it. I think we need to rephrase the readability section to make that clear.
- Most readers now access the site via mobile and read one or two sections. With the loss of search functionality compared to a PC reader experience, this limits the effectiveness of finding data in long articles, and a long article may be cumbersome to wade through. So the arguments have changed since this was first written.
- Unfortunately, I do not think research exists to justify a longer or shorter article length. —Femke 🐦 (talk) 16:14, 4 July 2023 (UTC)
- At least some does – see Wikipedia talk:Manual of Style/Linking#DL, sections, and_mobile readers, which changed MOS:DL in response to such data. My understanding is that it rather indicates that length is much less important than good sectionalization and clear headings, because people jump around a lot to get at the specifics they've come for. But this is not a topic I've "wallowed" in, so maybe there's more and different usability data to be had. — SMcCandlish ☏ ¢ 😼 12:18, 27 July 2023 (UTC)
Is there verifiable scientific basis for the article length guidelines?
Why isn't there any sourcing on the article size guidelines? This seems very much like a "trust me, bro" situation. And when I look at the discussions above to try to gain insight, things spin rapidly into secondary topics like "well it isn't the size per se, but about bandwidth, or editorial issues, or WP:UNDUE," or whatever. I realize that Wikipedia is built on a delicate balance of community consensus, but based on the discussions here, this guideline seems one of the most arbitrary; and therefore, one of the least respectable. Orange Suede Sofa (talk) 04:06, 6 July 2023 (UTC)
- Which part of our various manual of style pages do you feel to be the most scientifically grounded? CMD (talk) 05:08, 6 July 2023 (UTC)
- CMD, when you look at the MOS overall, it's just an arbitrary standard of how to format text that we all find consensus on. No one would demand of us that we "prove" that this or that spelling or punctuation is superior to another. It's primarily a matter of having a standard to avoid pointless disputes.
- The guidance on article size is a completely different beast. It makes claims about "readability" and that's a real-world issue that isn't just an arbitrary standard. Peter Isotalo 08:34, 6 July 2023 (UTC)
- I beg to differ with this comment as correctly formally needs to be adhered to, otherwise what is the point of an encyclopedia? Furthermore, there is only one way! I am but a mere man and stand to be corrected! Michaelcockrell7 (talk) 06:02, 12 November 2023 (UTC)
- This guideline is a standard to help avoid pointless disputes. CMD (talk) 10:05, 6 July 2023 (UTC)
- Why is the standard argued solely from the perspective of a reader who wants to read through one particular article in a single setting? Peter Isotalo 12:09, 6 July 2023 (UTC)
- What would you wish to base the standard on? CMD (talk) 13:03, 6 July 2023 (UTC)
- It isn't (that's not even true). But now that you mention it, wouldn't it be nice if someone expanded the other sections to include all of the logic discussed here over the years, like the maintenance burden, and the typically poor and redundant and off-topic prose found in many(most?) excessively long articles. SandyGeorgia (Talk) 13:54, 6 July 2023 (UTC)
- I think Peter Isotalo said it more succinctly than I did, which is that the guideline makes a claim to something that is concretely measurable, yet there is nothing here to back that claim up. I can easily see the difference between this and community consensus on things like serial commas or how we treat numerals. Right now the guideline is clearly a WP:COATRACK for many other topics, which I feel doesn't address any underlying issues. If an article has off-topic content, why isn't that specifically addressed instead of just waving an arbitrary size limit around? If I come across an article with redundant information, what happens if I cut out all the good stuff and the article goes below the size limit? Is the article now fixed, or does it have some other problem that the size guideline covered up? Orange Suede Sofa (talk) 19:03, 6 July 2023 (UTC)
- Sandy, my impression from discussions about the size of specific article is basically what Sofa pointed out: people will get hung up on counting kB or words at the expense of any other consideration. Even the 60 kB limit appears to have drifted towards being some sort of hard rule. I feel that people will often say that something's "too long to read" with this guideline as the only argument. Unless it's backed up by solid evidence, that's will not improve articles per se.
- CMD, I would like to focus the reader experience to start with, not all the other factors. What evidence can we find of any type of relevant reading behavior? Has there been discussion about this before? How can we find something more concrete than just guesswork and anecdotal evidence? Peter Isotalo 22:42, 6 July 2023 (UTC)
- If I recall correctly originally it was based on academic submission size......Oxford University Press. Moxy-
22:48, 6 July 2023 (UTC)
- If that's where the guideline comes from, then it's even more problematic than I thought. That's a guideline from a publisher of paper journals that makes no reference to readability; they're going to need a size limit for their own, and very different, practical reasons. Orange Suede Sofa (talk) 22:55, 6 July 2023 (UTC)
- When Wikipedia started the vast majority of us were academics so we simply follow academic norms. That said many many studies have been done about the 10,000 word count for readability and reader retention. I assume everyone has the capability of searching this. Moxy-
23:01, 6 July 2023 (UTC)
- I'm enough of an academic myself to know that academic norms vary; for example, Elsevier does not have strict word count limits, at least not for their engineering journals. And as for searching for the many studies, do we not observe WP:BURDEN here? Orange Suede Sofa (talk) 23:25, 6 July 2023 (UTC)
- We don't, this is a guideline. In general, we know that people don't read things past a certain length, but obviously it's not something which has a clear scientifically defined golden number of words people will read. So we have a guideline, same as any other style guideline, (and which is not strict,) which helps us as editors. CMD (talk) 00:25, 7 July 2023 (UTC)
- Since everyone is throwing around baseless claims about how they think people read articles, I'm going to join in and claim that nobody actually reads a long article from start to finish, and that all the editors defending this standard have various arguments that are neither cohesive nor based in any actual data. The even more disturbing thing is that there are participants here who I know from my fifteen years here have consistently whined about why even change anything because the WMF will just overrule it, yet here they are with equivalent ancien régime arguments that are nothing more than appeals to authority. The most disappointing thing for me, personally, is that I have respected 99% of Wikipedia's consensus policies to the point where I have taught them in public to others as respectable examples of how communities come together to arrive at a common good, but nobody here is able to agree on a common defense. The responses here are like a clown car; everyone has a different rationalization for something isn't justifiable. I won't be tendentious and argue about it any more, but now I'm going to use this as a counter-example of how even a long-standing and ultimately productive community like Wikipedia can find itself completely up in itself. Orange Suede Sofa (talk) 06:27, 7 July 2023 (UTC)
- We don't, this is a guideline. In general, we know that people don't read things past a certain length, but obviously it's not something which has a clear scientifically defined golden number of words people will read. So we have a guideline, same as any other style guideline, (and which is not strict,) which helps us as editors. CMD (talk) 00:25, 7 July 2023 (UTC)
- I'm enough of an academic myself to know that academic norms vary; for example, Elsevier does not have strict word count limits, at least not for their engineering journals. And as for searching for the many studies, do we not observe WP:BURDEN here? Orange Suede Sofa (talk) 23:25, 6 July 2023 (UTC)
- When Wikipedia started the vast majority of us were academics so we simply follow academic norms. That said many many studies have been done about the 10,000 word count for readability and reader retention. I assume everyone has the capability of searching this. Moxy-
- If that's where the guideline comes from, then it's even more problematic than I thought. That's a guideline from a publisher of paper journals that makes no reference to readability; they're going to need a size limit for their own, and very different, practical reasons. Orange Suede Sofa (talk) 22:55, 6 July 2023 (UTC)
- This page isn't (typically) used to address truly off-topic content, but rather level of detail. Keep in mind that some of the topics covered by articles here have literally millions of words written about them, and we need some way of identifying a good middle ground between that and a one-sentence stub in order to make an article that is reasonably useful. This also isn't the only page giving the message of "be concise" - cf WP:DETAIL.
- So with that in mind, what makes sense for drawing that line? WP:CANYOUREADTHIS proposes attention span times reading speed as the basis for that determination. It gives an average attention span of 30 to 40 minutes, cited to a 2005 reference - many more recent sources actually suggest smaller numbers, such as 15 or 20 minutes[3][4][5][6]. After that point, information processing is impeded[7] and information recall suffers[8]; cognitive fatigue[9] and mind wandering[10] both impact reading comprehension. Average reading speed meanwhile is roughly 238 words per minute[11], although there are some assumptions built into that estimation (education level, neurotypicality) that might support a lower number for accessibility purposes. So that calculation suggests a reasonable maximum around 9500 words of readable prose - lower than the current limits at TOOBIG.
- Other than readability, you could also consider, as Sandy mentioned, maintenance burden - but if anything that's likely to promote even shorter limits. Nikkimaria (talk) 04:55, 7 July 2023 (UTC)
- I deeply appreciate that you have been the first editor to make the effort to provide relevant data, and I have enormous amounts of respect for that. Yet, I predict that your analysis of
lower than the current limits at TOOBIG
will not result in an actual lowering of the guideline, indicating that the guideline itself is a WP:COATRACK of other issues, and shouldn't pretend to be based in data. Orange Suede Sofa (talk) 06:39, 7 July 2023 (UTC)- Nikkimaria, From what I can tell these links about attention span seem to all be about attention span while listening to a continuous spoken lecture.
- I would certainly agree that if you gathered a room full of ordinary adults and had someone at the front read a long Wikipedia article straight through in an unexciting manner, there would very likely be a significant proportion of the audience who were unable or uninterested in paying attention, pretty close to immediately, and if you kept the reading going for an hour almost none of the audience would catch every part.
- However, that doesn't really seem like the most relevant criterion for deciding what to include or not include in an encyclopedia article. –jacobolus (t) 12:39, 26 October 2023 (UTC)
- Nikkimaria, we absolutely need some form of upper limit on article size for technical or editorial reasons, but we need to argue those things separately from the reader experience. The concept of "readability" is here entirely built on the idea that unless an article is read in full, from start to finish, it does not live up to quality standards. There's no evidence that this has anything to do with what's good for readers. It's the underlying assumption on what "readability" is that needs to be addressed here.
- Regarding this issue, I emailed the WMF research list the other day and asked about research relevant to article length. I just received a very friendly reply with a "non-exhaustive" list of relevant research (everything in italics below):
- Improving Website Hyperlink Structure Using Server Logs[12]: Figure 3 shows that probability for a reader to click a link decreases substantially the later they appear in the text
- Research:Reading time[13]: characterizes how much time readers spent on a page reading an article. also interesting are the related projects under See also: i) impact of having section collapsed or not[14], as well as which parts of articles are read[15]
- A Large-Scale Characterization of How Readers Browse Wikipedia[16]: Figure 11c shows that readers have a much higher chance to stop navigating when encountering an article with low quality. this is related to the length in the sense that length is often used as one proxy to assess length.
- A large scale study of reader interactions with images on Wikipedia[17]: explores how images in articles help readers to navigate.
- Ongoing development of models to measure readability of Wikipedia articles across languages [18]
- This is the kind of research we need to be looking into. And in the mean time, I propose we remove the parts of the guideline that purport to be based on research.
- Peter Isotalo 06:55, 7 July 2023 (UTC)
- God yes. Let's hope that at long last sanity finally prevails. EEng 07:13, 7 July 2023 (UTC)
- If you agree that we need some kind of upper limit, but don't agree on the reader-experience-based metrics, what specifically would you propose as the upper limit based on technical/editorial reasons? Nikkimaria (talk) 01:00, 8 July 2023 (UTC)
- There seems to be point at which browsers struggle and lag to render the page (e.g. when going from reading to edit mode, when clicking preview, when saving the changes) making it kind of frustrating. I have a honkin'-fast machine with more RAM than god, and I've still been hit with this issue on really long articles. But I have no idea how to concretize it into a number, and I don't think there's any kind of research that could be cited. It's likely to vary widely by browser and by machine capabilities. So it seems to be a legit issue but one hard to reduce to a "this is too big" specific number. — SMcCandlish ☏ ¢ 😼 14:58, 26 October 2023 (UTC)
- I figure the WMF might have information on when bytesize or display size becomes too much. Granted, as noted repeatedly bytesize is more about the number of images in an article than about its prose contents. Jo-Jo Eumerus (talk) 17:29, 26 October 2023 (UTC)
- I have a strong suspicion that it has more to do with code complexity – how much render parsing that the browser has to do (elements and style applied to them). On Facebook, I can load screen after screen after screen of images and videos, while the article I'm thinking of where I had this page loading delay recently doesn't have any more images on it than a few screenfuls of Facebook. But it has much more text with complex markup in it. The total byte-size of all that text might be less than than that of a single image, but the browser has to do much more work to interpret and styled-display it. — SMcCandlish ☏ ¢ 😼 18:29, 26 October 2023 (UTC)
- Not at all. The speed of modern processors is very great, and the CPU time will not be noticeable. Download time is what is important. A Wikipedia article will load slowly on first download, afterwards it will be cached on the server and the network. A popular article will therefore load quickly, because it will be cached. See performance tuning for details Hawkeye7 (discuss) 19:30, 26 October 2023 (UTC)
- The quantity of text should really not any particular problem on the user browser side. The part that makes pages require more bandwidth is mostly images, and the part that is slowest to render for the server is going to be stuff like templates invoking scripts, fancy mediawiki features, footnotes, etc. In practice I've only ever had a problem with large numbers of math formulas, and only during specific times when the backend math renderer was having some kind of bug causing unexpected slowdowns, which seems to have been resolved. –jacobolus (t) 20:10, 26 October 2023 (UTC)
- Not at all. The speed of modern processors is very great, and the CPU time will not be noticeable. Download time is what is important. A Wikipedia article will load slowly on first download, afterwards it will be cached on the server and the network. A popular article will therefore load quickly, because it will be cached. See performance tuning for details Hawkeye7 (discuss) 19:30, 26 October 2023 (UTC)
- I have a strong suspicion that it has more to do with code complexity – how much render parsing that the browser has to do (elements and style applied to them). On Facebook, I can load screen after screen after screen of images and videos, while the article I'm thinking of where I had this page loading delay recently doesn't have any more images on it than a few screenfuls of Facebook. But it has much more text with complex markup in it. The total byte-size of all that text might be less than than that of a single image, but the browser has to do much more work to interpret and styled-display it. — SMcCandlish ☏ ¢ 😼 18:29, 26 October 2023 (UTC)
- I figure the WMF might have information on when bytesize or display size becomes too much. Granted, as noted repeatedly bytesize is more about the number of images in an article than about its prose contents. Jo-Jo Eumerus (talk) 17:29, 26 October 2023 (UTC)
- There seems to be point at which browsers struggle and lag to render the page (e.g. when going from reading to edit mode, when clicking preview, when saving the changes) making it kind of frustrating. I have a honkin'-fast machine with more RAM than god, and I've still been hit with this issue on really long articles. But I have no idea how to concretize it into a number, and I don't think there's any kind of research that could be cited. It's likely to vary widely by browser and by machine capabilities. So it seems to be a legit issue but one hard to reduce to a "this is too big" specific number. — SMcCandlish ☏ ¢ 😼 14:58, 26 October 2023 (UTC)
- I deeply appreciate that you have been the first editor to make the effort to provide relevant data, and I have enormous amounts of respect for that. Yet, I predict that your analysis of
- If I recall correctly originally it was based on academic submission size......Oxford University Press. Moxy-
- Why is the standard argued solely from the perspective of a reader who wants to read through one particular article in a single setting? Peter Isotalo 12:09, 6 July 2023 (UTC)
I appreciate that you ask, but I don't want to pull a figure out of thin air. I don't see that there's any more solid data than what's in the table right now. Plus, technical and editorial are two completely different things.
I'd like to focus first on getting rid of unsupported claims about what's best for readers. Can we start looking at options of how to clean up what's currently under "Readability"? I started tinkering on new wording on my own here for example. Should I make a concrete suggestion here on the talkpage?
Peter Isotalo 09:29, 8 July 2023 (UTC)
- Given that we agree there should be limits, I would not support getting rid of limits entirely unless/until we have a proposed replacement. Nikkimaria (talk) 12:59, 8 July 2023 (UTC)
- Re
Sandy, my impression from discussions about the size of specific article is basically what Sofa pointed out: people will get hung up on counting kB or words at the expense of any other consideration. Even the 60 kB limit appears to have drifted towards being some sort of hard rule. I feel that people will often say that something's "too long to read" with this guideline as the only argument. Unless it's backed up by solid evidence, that's will not improve articles per se.
Then I suggest you need to read more discussions, and that your reading has been selective. I don't support removing any text here, rather expanding it to include all the other reasons that have come up in all the other discussions over the years. First, I've never seen anyone in a real discussion refer to KB; readable prose is the relevant metric. Second, Ive never encountered an FA well over these limits that did not suffer from excess detail, verbosity, redundancy, and off-topic matter that could not be better included in a different article and summarized back to the main article, And every time I find that, I provide concrete example after example. It is not "counting words at the expense of any other consideration" (you can find same in an 8,000-word article); it is cutting words that were excess verbosity impeding readability to begin with. It's not just "too long to read"; it's more about "too boring to read as it doesn't come to the point and instead provides unencyclopedic trivia". Some editors believe that to meet comprehensive (WP:WIAFA) they must include every trival fact ever written on the topic, with no other discretion applied. We aren't writing journal articles; we're writing encyclopedic entries. If I ever see an FA that passes the recommended size considerably, and isn't overly detailed and excessively verbose, I'll support it at FAC or FAR. To date, I haven't. SIZE helps prevent poor writing, as well as helping assure that articles are maintainable and encyclopedic. SandyGeorgia (Talk) 13:23, 8 July 2023 (UTC)- PS, I believe it was Femke who tried to make the chart make more sense by removing the useless KB metric, and I think that got stalled. SandyGeorgia (Talk) 13:29, 8 July 2023 (UTC)
- I'm sorry, what are you commenting on here, Sandy? I haven't proposed removing the current size limit. Peter Isotalo 14:26, 8 July 2023 (UTC)
- I've quoted in green exactly what I'm commenting on (and agreeing with Nikkimaria that "I would not support getting rid of limits entirely unless ... "). SandyGeorgia (Talk) 18:28, 8 July 2023 (UTC)
- Okay, I don't see what you mean by that. I hadn't proposed specific size changes. I was talking about adjusting the text under "Readability" so that it doesn't include claims that it's based on actual research.
- I've quoted in green exactly what I'm commenting on (and agreeing with Nikkimaria that "I would not support getting rid of limits entirely unless ... "). SandyGeorgia (Talk) 18:28, 8 July 2023 (UTC)
- Regarding your other comments, I see that we have different perspectives on who invokes this guideline, how, where and why. I'm commenting from what I believe are my experiences. You're welcome to pick that apart if you want to, but this is my genuine impression of things. Take it or leave it.
- Now regarding the "Readability" section, below in gray is a concrete suggestion of how to reword it to get rid of the unverified claims about how people read articles. I'm excluding the shortcut note and see also for convenience. It's only for the main section, not the sub-sections.
Each Wikipedia article is in a process of evolution and is likely to continue growing. Other editors will add to articles when you are done with them. Wikipedia has practically unlimited storage space; however, long articles may be more difficult to navigate, and comprehend. An article that has grown to more than five paragraphs, or about 500 words, it should start being split up in to one or more sections. This helps organize content, especially for readers that are looking for information about a specific aspect of the article topic (see Wikipedia:Manual of Style and Wikipedia:Layout for guidance). Individual sections should not be so long that they impede the ability to find information and should be further divided into sub-sections. At around 10,000 words it may be beneficial to move some sections to other articles and replace them with summaries per Wikipedia:Summary style – see also Size guideline (rule of thumb) below. |
- I'm not putting this up for a vote, just discussion. I'd appreciate if we didn't start splitting up in oppose/support camps. Peter Isotalo 22:25, 8 July 2023 (UTC)
- Concur with (please) don't start !voting before discussing. Sorry I was rolling two answers in to one (what I quoted in green from Peter, and what Nikki said separately). Is this the page for addressing how articles are split into sections? I'm honestly confused by that suggestion, as we must have another MOS guideline somewhere about that specific subject, while this page is about the size of an article overall and when to think about splitting to a different article, as opposed to how to create sub-headings/sections. But if this is the page, I would expand the the second line from "difficult to navigate, and comprehend" to something along the lines of ... "difficult to navigate, maintain, and comprehend; and may contain extraneous, repetitive or off-topic content that would be better contained in a sub-article". I also wouldn't restrict the need to use summary style to article reaching 10,000 words, because the need to use summary style can be present well below that word count. SandyGeorgia (Talk) 23:49, 8 July 2023 (UTC)
- I'm not putting this up for a vote, just discussion. I'd appreciate if we didn't start splitting up in oppose/support camps. Peter Isotalo 22:25, 8 July 2023 (UTC)
- Suggested "difficult to navigate"-addition sounds like a good idea. I'm thinking we could also avoid getting into details of how paragraphs should look like here by just removing the first "An article that"-sentence and simply start with "Splitting up articles into sections helps..."
- As far as I know, this is the first time that someone has mentioned the size of a section. What the MOS currently says is
Overly lengthy continuous blocks of text should be avoided; sections which are so long as to impede reader understanding should be broken down into subsections. There remains some disagreement regarding the precise point at which a section becomes too long, so editors are encouraged to use their own judgment on the matter.
(WP:MILMOS#SECTLEN) I question whether this is appropriate to have here, rather than in MOS:LAYOUT or Help:Section. I think it should be in MOS:LAYOUT, where people would be most likely to find. it. Hawkeye7 (discuss) 06:52, 10 July 2023 (UTC)
- 2+2=4, yes your right! Michaelcockrell7 (talk) 06:05, 12 November 2023 (UTC)
Discussions seems to have dropped off, but I still think this issue is important. So here's an attempt at a second suggestion. My proposal is to replace what's currently under "Readability" with the text below in gray.
Each Wikipedia article is in a process of evolution and is likely to continue growing. Other editors will add to articles when you are done with them. Wikipedia has practically unlimited storage space; however, long articles may be more difficult to navigate, and comprehend. Once an article has grown large enough, it should be split into sections. This helps organize content, especially for readers that are looking for information about a specific aspect of the article topic. Individual sections should not be so long that they impede the ability to find information and should be further divided into sub-sections. For more guidance on how to organize sections, see Wikipedia:Manual of Style and Wikipedia:Layout. When an article has grown very large, it may be beneficial to move some sections to other articles and replace them with summaries per Wikipedia:Summary style – see also Size guideline (rule of thumb) below. |
Still only for discussion, but I would like to hear how close we might be to some sort of reasonable consensus. Peter Isotalo 20:10, 18 July 2023 (UTC)
- I would support adding "This helps organize content, especially for readers that are looking for information about a specific aspect of the article topic" to the existing second paragraph of this section. Nikkimaria (talk) 04:10, 19 July 2023 (UTC)
- no. You will start creating loopholes! IE. I, myself am commenting on this issue and don't even have a concept of the article! No, it is I who needs to read the article first, to understand, without going to subsections, then I will recommend audio in a scenario of size. Yours sincerely. Michaelcockrell7 (talk) 06:14, 12 November 2023 (UTC)
- On something way up near the top:
like the maintenance burden, and the typically poor and redundant and off-topic prose found in many(most?) excessively long articles.
I've been in the process of writing (not from scratch, but from a really bad allegedly C-class but in length only, more like Stub in quality) a detailed article on a semi-major topic, using virtually all available reliable source material (I'm even keeping a log on the talk page of more sources to get), and covering it all has produced a very long article. In the course of doing all this, these sources have been also very useful in other related topics, and it's become clear through direct experience that the "maintenance burden" argument is faulty. It is vastly easier to work the new sources and their facts into a single article, to keep the citations in good order, and to keep the material from becoming repetitive, self-contradictory, or otherwise problematic, than it has been to use the same sources across related articles. While I've opened a thread on the talk page about plans for splitting, I'm in no hurry to do it until I've exhausted the sources available to me, for that very reason. The difficulty of "mining" the sources for what they are worth across that topic space would become almost exponentially more troublesome, and thus discouraging and laziness-inducing (like leave this fact out of that other article even though per WP:SUMMARY it should be in there, etc.). Maybe even discouraging to continue at all. And see also Hawkeye7 way below: "Splitting an article means that some information will be duplicated, which increases our maintenance burden, as the two will have to be kept in step." While the guideline briefly touches on the idea that there is no big hurry in splitting up a long article, I think it should more clearly state the point that if someone is actively developing the material and doesn't want it split yet, that they should be listened to since they're doing (or doing a significant portion of) the work.Second, maybe there are lots of long articles that are full of redundancy and other crap prose, but – while I won't blow my own horn about the alleged quality of my writing – there is no redundancy in that long piece at all that I have not intentionally put in there in preparation for splitting into multiple articles, nor is anything in it off-topic, though some short bits have been written with an intent of merging them out to other pre-existing related articles or new ones that will result from the split-up. (They are not non sequitur or COATRACKS, but they are in a few places more detailed than necessary for that article, but are just right for a post-split side article, and will be swapped out with SUMMARY versions in the presently-long article). The point being, "long = low-quality" isn't a good assumption to make.I realize this has no real impact on the ultimate size of the more-or-less-completed article (or I wouldn't've proposed splitting it and been preparing for that). Just want to avoid people pre-emptively splitting it in the middle of my now three-month marathon of work on it. — SMcCandlish ☏ ¢ 😼 12:14, 27 July 2023 (UTC)- I think the situation you mention is a red herring in terms of the maintenance discussion. Articles built by a single user are not a big maintenance burden at any length. They are fully maintained by the individual in question, in the same way that a lengthy book written by a single author is similarly simple enough to maintain, with small updates by the author in new revisions. It's articles which sprawl over time due to various edits to particular sections and subsections by a number of different editors which are difficult. The difficulty of maintenance is also a function of broadness. The broader the concept, the more it could sprawl and the more difficult it is for any editor to be across the various topics covered in the article. (I don't know what article you are working on, but given you are close to using all available sources, I assume it must be somewhat specific.) Broadness also affects how summary style can be applied. The broader a topic, the more likely it is to cover something that could sensibly make up its own article. CMD (talk) 12:47, 27 July 2023 (UTC)
- I see where you're coming from, in the shape of the larger debate about this material, but my concern is narrower. The thing is, as soon as it got long, someone slapped a length "objection" tag on it, so it's not a red herring from my position; someone at least in theory wants to split it up before it's ready for that from the perspective of the person doing the work, but this guideline doesn't give me any solid rationale along those lines that I can cite. It's not urgent because no one is actually trying to force that split right this moment, but that could change in a day or an hour, meanwhile I have at least another 2 months probably of work to do on the piece, and a split up in the middle of it would be very disruptive to that work (or inimical if you like, since "disruptive" has a specially defined meaning in WP jargon). — SMcCandlish ☏ ¢ 😼 23:51, 27 July 2023 (UTC)
- I think the situation you mention is a red herring in terms of the maintenance discussion. Articles built by a single user are not a big maintenance burden at any length. They are fully maintained by the individual in question, in the same way that a lengthy book written by a single author is similarly simple enough to maintain, with small updates by the author in new revisions. It's articles which sprawl over time due to various edits to particular sections and subsections by a number of different editors which are difficult. The difficulty of maintenance is also a function of broadness. The broader the concept, the more it could sprawl and the more difficult it is for any editor to be across the various topics covered in the article. (I don't know what article you are working on, but given you are close to using all available sources, I assume it must be somewhat specific.) Broadness also affects how summary style can be applied. The broader a topic, the more likely it is to cover something that could sensibly make up its own article. CMD (talk) 12:47, 27 July 2023 (UTC)
- I agree that the size limit is arbitrary and should go. SMcCandlish's argument is good, and I want to extend that argument to current event articles; it's true that those article may have proseline issues, but I think the length limit prematurely kneecaps those articles, and worse, it favours early coverage (which got freely added to the article) at the expense of later coverage (whose addition is impeded by the {{too long}} warning; even if the later coverage is more relevant, there's an inertia towards keeping what we already have). After {{too long}} was added to 2023 Israel-Hamas war, prose additions slowed down significantly, and surprisingly, that activity didn't shift to the child articles. Incidentally, there are far too many child articles, which are getting BLAR'd and reinstated repeatedly, and they're disorganized to the point where there's now an "outline" article for the war. It'll take a while to fix. If we hadn't added the tag, and allowed the article to keep growing, and only later did calmly reasoned splits, I think the overall result would be superior. (FTR, I'm mostly on the "pro-NOTNEWS" side, if you've followed those recent arguments elsewhere, but I think the size limit is a lose-lose for both sides of that argument, not a win for the "NOTNEWS"/"higher-level coverage" side as is thought).
- As others said, while the data doesn't support any clear conclusion, people tend to seek out specific parts of articles, not read top-to-bottom; I doubt readers care about overall length, though they do care if our coverage is biased towards early events and against recent events and analysis, and if things are organized clearly (not prematurely split) and, ideally, easily accessible (not needing an "Outline" article to figure out where things are).
- My argument only addresses the impact on major current event articles, which is a minor point; many other good reasons have already been given here. DFlhb (talk) 08:25, 30 October 2023 (UTC)
Research-based observations
![](https://upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Wikipedia_mobile_web_main_section_usage_-_en_Barack_Obama%2C_Nov_2015.png/220px-Wikipedia_mobile_web_main_section_usage_-_en_Barack_Obama%2C_Nov_2015.png)
![](https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Wikipedia_mobile_web_main_section_usage_-_en_Word_War_II%2C_Nov-Dec_2015.png/220px-Wikipedia_mobile_web_main_section_usage_-_en_Word_War_II%2C_Nov-Dec_2015.png)
I've observations from two of the studies listed above, one on average reading time and one on the use of sections.
The measured average time that readers linger on articles that they actually want to read a bit more extensively is about 45 seconds.[19]
Applying a reading speed of 238 words/minute (as cited by Nikki above), that would mean that a reading session is about 180 words or just over 1 kB of prose. And that's assuming people people actually read the entire time rather than simply browse. So the average reading session is below a "don't bother to split"-limit that has been around for over 20 years.[20] It also has zero relevance to the current "proven" attention span argument, and is not even 10% of the limits proposed in the earliest "readers may tire"-argument from March 2004.[21]
Regarding how readers navigate articles, content further down in an article is far less likely to be read, but readers also seem perfectly capable of picking out specific sections they want to read, regardless of how far down they are.[22] If you look at the two images on the right that present the results you'll note how readers seem to favor quite specific sections, not just those that are closest to the top. To me it seems to indicate that proper organization of an article may matter far more than its total size.
Peter Isotalo 14:47, 7 July 2023 (UTC)
- That 2003 limit appears to be on total bytes, not readable prose as in the current page. Nikkimaria (talk) 01:00, 8 July 2023 (UTC)
- That's true, but the point is that it it's the origin of the "readers may tire"-argument used today. It didn't appear based on fact or research, but was simply something someone just made up at some point. Peter Isotalo 08:30, 8 July 2023 (UTC)
- It's not, though - the original limits were rather more heavily focused on technical limitations. Nikkimaria (talk) 12:59, 8 July 2023 (UTC)
- You're splitting hairs. The 2003 version clearly says that "[r]eaders may also tire". It wasn't the primary argument back then, but it's exactly this that morphed in to the "Readability" section that is currently based on pure conjecture and the notion of a fantasy reader type that doesn't exist.
- Yesterday you cited a whole bunch of research about attention span. Now the data says that the average reader reads at most 1 kB at a time. Are you going to argue that we reduce the maximum article size to 200 words? Peter Isotalo 14:35, 8 July 2023 (UTC)
- No, I'm going to argue that the data you've presented doesn't help us identify an alternate appropriate maximum article size. Nikkimaria (talk) 14:46, 8 July 2023 (UTC)
- (Disclaimer: I am the author of African humid period which is pretty lengthy) Well, I would argue that these data show that the amount of prose in the article is irrelevant to readers and thus can't be used to justify any maximum article size. That leaves editability (because VE in particular hangs up on very long articles), technical issues (the template limits) and connection issues (articles+image combinations that are overly large in terms of byte size can overtax connections) however. Jo-Jo Eumerus (talk) 05:31, 11 July 2023 (UTC)
- This is pretty much what I believe we need to be moving towards. Not saying we need to go full yolo and just abolish maximum size completely, but I think we need to focus on arguing for two separate issues:
- editorial capacity to maintain article quality
- technical limitations that actually make article loading difficult
- I think the issues need to be argued separately from each other. And I think the technical limitations absolutely need to be backed up by actual reader data. Personal experiences and anecdotal evidence makes for awfully messy discussions. Peter Isotalo 01:17, 13 July 2023 (UTC)
- I would formulate it as such:
- How long it takes in VE to edit a long article.
- For FAC/GAN/other content processes, how much work it takes to review a long article.
- How long a text+image combination can be before loading the article becomes difficult.
- Jo-Jo Eumerus (talk) 06:23, 13 July 2023 (UTC)
- I would formulate it as such:
- I agree with Jo-Jo that the amount of prose in an article is irrelevant to readers and thus can't be used to justify any maximum article size. I don't think attention span is a good argument for cutting article size because attention span varies from person to person. There's a possibility people will read an article from start to finish if they find it interesting enough but why does that matter? Everyone has their own reason for why they are reading a WP article in the first place. Some are only looking for a specific piece of information while others are just curious about the article subject. Many (most?) people don't read articles from start to finish even if they have a prose size of less than 50 kB. Does that mean articles should be split at a smaller size? It just doesn't seem logical to me. Volcanoguy 01:16, 21 July 2023 (UTC)
- This is pretty much what I believe we need to be moving towards. Not saying we need to go full yolo and just abolish maximum size completely, but I think we need to focus on arguing for two separate issues:
- (Disclaimer: I am the author of African humid period which is pretty lengthy) Well, I would argue that these data show that the amount of prose in the article is irrelevant to readers and thus can't be used to justify any maximum article size. That leaves editability (because VE in particular hangs up on very long articles), technical issues (the template limits) and connection issues (articles+image combinations that are overly large in terms of byte size can overtax connections) however. Jo-Jo Eumerus (talk) 05:31, 11 July 2023 (UTC)
- No, I'm going to argue that the data you've presented doesn't help us identify an alternate appropriate maximum article size. Nikkimaria (talk) 14:46, 8 July 2023 (UTC)
- It's not, though - the original limits were rather more heavily focused on technical limitations. Nikkimaria (talk) 12:59, 8 July 2023 (UTC)
- That's true, but the point is that it it's the origin of the "readers may tire"-argument used today. It didn't appear based on fact or research, but was simply something someone just made up at some point. Peter Isotalo 08:30, 8 July 2023 (UTC)
- My view is that we need to get rid of the readability argument at least for the time being. The reasoning we've relied on for the past 20 years simply doesn't hold up to scrutiny. See the suggestion above for a rewording of what's currently under "Readability". Peter Isotalo 20:05, 22 July 2023 (UTC)
- Or at least moderate the readility claims, and start including real evidence-based material. E.g., MOS:DL was recently overhauled on the basis of a study (WMF's I think) that showed readers, especially on mobile (now over 50% of our readership at any given moment), don't read top-to-bottom but jump around all over the place; thread is over here. I also agree with much of what Jo-Jo Eumerus wrote, other than I think that FAC/GAN are non-concerns unless/until an article is headed for one of those processes. Readers generally don't notice or care about them, only a certain camp of editors do, and many of us are not in that camp. E.g., I consider it vastly more important to improve crappy Stub and C articles into B-class than to polish the chrome on articles that are already encyclopedic enough to be useful. And I'm not alone in that. I don't think articles should be split up on a GAN/FAC review basis unless there's certainty that the review is going to happen and soon. Splitting well can take a tremendous amount of work, and when it's done poorly, the results can be very reader unhelpful. — SMcCandlish ☏ ¢ 😼 12:14, 27 July 2023 (UTC)
- My view is that we need to get rid of the readability argument at least for the time being. The reasoning we've relied on for the past 20 years simply doesn't hold up to scrutiny. See the suggestion above for a rewording of what's currently under "Readability". Peter Isotalo 20:05, 22 July 2023 (UTC)
- There is a counter-balancing technical point: splitting an article means that some information will be duplicated, which increases our maintenance burden, as the two will have to be kept in step. Hawkeye7 (discuss) 06:52, 10 July 2023 (UTC)
- Isn't that why we try to write ledes? Essentially, we begin each article with a heavily condensed version of the article that covers and summarizes the main points... this lets people decide whether they are done after the first few paragraphs or whether they want to go deeper (and either way, we have something tailored to their preference). jp×g 04:58, 9 August 2023 (UTC)
- Partly - see the explanation at WP:DETAIL. Nikkimaria (talk) 02:45, 10 August 2023 (UTC)
- "A reading session is about 180 words". That reflects my usage of Wikipedia as a reader as well. I usually consult it for concise answers of who, what, when, why, not to read in general more than a small section. That does not mean I don't support long articles though. Regards, Thinker78 (talk) 20:27, 22 October 2023 (UTC)
Research about how long readers spend on the page before bouncing, how long a typical person can read continuously before tiring, etc. are not really that relevant to this question in my opinion. Wikipedia articles should not only be designed to be read in one sitting by a single committed reader, nor should they be judged by the typical reader behavior as described by a couple pieces of timing data stripped from other context. Articles are many things to many readers: readers arrive from various sources (search engine, social media / direct mail or message from an acquaintance, link in another web document, personal bookmark, wikilink from various other articles, ...) and have widely varying levels of preparation, interests, goals, needs, and reading styles. Some would be satisfied with a half-sentence, and others could read a book and still want more.
Some want a short description of WTF the subject is / what field it's in, some are procrastinating by browsing around following wikilinks from one topic to another, some are opening up 20 wiki articles in tabs and skipping back and forth between them, some are reading a textbook and opening wiki articles about the unfamiliar terms as a supplementary resource, some want to fact check a particular detail they heard on the news, some want to learn about a specific subtopic and might skip most of an article but compare details across multiple articles, some want to learn a new challenging technical subject and plan to repeatedly revisit an article over time, some are looking for an interesting bedtime read and will keep reading as long as they find a nice narrative arc, some are students writing a paper who want to plagiarize the article look at Wikipedia's sources so they can cite them, some are researchers who will follow up the cited sources and do a deep dive, some are book authors (or blog authors) who want to find freely available images to use in their own works, some are programmers using articles as a reference for code/formulas to adapt in implementing their own code, etc. etc.
I would guess a vanishingly small proportion (less than 1/100, maybe orders of magnitude less) of total page views to articles of even moderate length ever result in someone reading the whole article straight through end to end. But that doesn't mean any particular part of the article is bad, or too long, or off topic, or should be removed. For instance, a section of niche interest to specialists, containing tabular data, or including rarely needed technical reference details might only be examined carefully by a trivial fraction of page readers, but could still be important to include somewhere (ideally nearer the bottom than the top of a page). Such sections generally do no harm to anyone who isn't looking for them, so long as they don't become magnets for collections of unrelated trivia or material that is clearly unencyclopedic; deciding this should be left up to local consensus and editors' discretion.
To concretely/specifically understand how people use articles is going to take much more detailed research than anything I've seen above or elsewhere. It will take finding (a large number of) specific readers who have some particular goal in using their computer / the internet (write a paper, learn about a topic, answer a question, ...), tracking their full browsing session (to see how/whence they arrive at articles), and then analyzing it in detail, including asking them detailed questions about which parts they read / skimmed, which parts they needed to know, which parts they found meaningful/interesting, what they plan to come back to later, etc. etc.
Trying to set a uniform standard for how long every encyclopedia article should be, irrespective of subject matter, article style, relation to other articles, importance of the topic, etc. is hopeless, and should never have been described as a "guideline", even if qualified as a "rule of thumb". The "WP:TOOBIG standard" was inevitably going to be a half-assed justification to abusively plaster {{too long}} banners all over the place and wikilawyer about one editor or another's preferred inter-article organization scheme.
The important criteria for articles is that they have clear scope, are clearly organized, stay on topic, have a moderately clear narrative flow (esp. within sections), put the most important information nearer the top, are well illustrated especially near the top, link obviously to relevant nearby/overlapping topics, etc. What byte/word count an article has is nowhere near as important, and does not in my opinion meaningfully help readers except insofar as it focuses attention on one of these more important primary goals. To the extent that bikeshedding about byte counts distracts from those criteria, it is actively harmful. –jacobolus (t) 04:41, 26 October 2023 (UTC)
- In a word: yep. — SMcCandlish ☏ ¢ 😼 05:22, 26 October 2023 (UTC)
- Agreed. XOR'easter (talk) 16:28, 27 October 2023 (UTC)
- The main purpose of WP:SIZERULE is to compliment WP:NOT and make sure that Wikipedia stays an encyclopedia, rather than becoming a dumping ground of unmanagable crap. If you think readability is the overriding motive, you've obviously never read a Wikipedia article! Nosferattus (talk) 01:46, 28 October 2023 (UTC)
- I don't understand your point. Can you explain how "don't dump crap" is co-extensive with the standard of "try hard to split articles at 10k words and never allow them to grow bigger than 15k words", as this is employed by wikilawyers in practice? Those two things seem largely unrelated. There are heaps of individually short "unmanageable crap" articles strewn across the project, as well as several excellent very long articles. If the purpose is just to tell people not to dump crap, wouldn't it be better to just say so directly? –jacobolus (t) 02:05, 28 October 2023 (UTC)
- Generally have to agree with jacobolus on this. WP's crap problem is almost entirely in two forms: junk articles, mostly in popular-culture spheres, and indiscriminate "In popular culture" sections that just add as much trivia as humanly possible. Our average long article is not full of unencyclopedic crap, but is the work of concerted editors trying to be comprehensive. If the content in them were crap, this would not at all be an issue for article size and for splitting, but for removing crap per WP:NOT's various critieria. We would have no interest in taking unencyclopedic garbage found in a long article and spinning it out into new side articles of the same unencyclopedic garbage. (Aside: The problem of junk pop-cult articles is only ultimately solvable by adjusting WP:GNG to be more stringent, probably discounting entertainment news as sources that help establish notability. The fact that some random actor has credits in 5 movies or TV shows and has been mentioned repeatedly in entertainment news – which over-dwells on name-dropping of actors and serves little purpose other than promoting actors and the works they are in, to the benefit of the media-company advertisers who keep these publications alive at all, and thus they lack independence from the subject – doesn't really make the actor encyclopedically notable, it just makes them marginally competent enough in their field to not have given up and gone back to waiting tables or driving for Uber.) — SMcCandlish ☏ ¢ 😼 04:33, 28 October 2023 (UTC)
- The connection with WP:NOT lies in the spurious belief that an encyclopaedia must consist of small articles. This was necessary in the days of print encyclopaedias because there was limited space. Wikipedia, however, is WP:NOTPAPER and has no such limits. (The Encyclopaedia Britannica also has the Macropedia with its large articles, but this was not what the authors of WP:SIZE had in mind.) The "In popular culture" sections have a similar origin: the widespread belief that the Wikipedia should be restricted to articles on popular culture. By adding trivia they are, in their mind, saving the article from being deleted under WP:NOT. Hawkeye7 (discuss) 09:12, 28 October 2023 (UTC)
- We might take for example the Dictionary of Scientific Biography, which includes a biography of Isaac Newton which runs to 61 pages. –jacobolus (t) 02:24, 30 October 2023 (UTC)
- The connection with WP:NOT lies in the spurious belief that an encyclopaedia must consist of small articles. This was necessary in the days of print encyclopaedias because there was limited space. Wikipedia, however, is WP:NOTPAPER and has no such limits. (The Encyclopaedia Britannica also has the Macropedia with its large articles, but this was not what the authors of WP:SIZE had in mind.) The "In popular culture" sections have a similar origin: the widespread belief that the Wikipedia should be restricted to articles on popular culture. By adding trivia they are, in their mind, saving the article from being deleted under WP:NOT. Hawkeye7 (discuss) 09:12, 28 October 2023 (UTC)
- Generally have to agree with jacobolus on this. WP's crap problem is almost entirely in two forms: junk articles, mostly in popular-culture spheres, and indiscriminate "In popular culture" sections that just add as much trivia as humanly possible. Our average long article is not full of unencyclopedic crap, but is the work of concerted editors trying to be comprehensive. If the content in them were crap, this would not at all be an issue for article size and for splitting, but for removing crap per WP:NOT's various critieria. We would have no interest in taking unencyclopedic garbage found in a long article and spinning it out into new side articles of the same unencyclopedic garbage. (Aside: The problem of junk pop-cult articles is only ultimately solvable by adjusting WP:GNG to be more stringent, probably discounting entertainment news as sources that help establish notability. The fact that some random actor has credits in 5 movies or TV shows and has been mentioned repeatedly in entertainment news – which over-dwells on name-dropping of actors and serves little purpose other than promoting actors and the works they are in, to the benefit of the media-company advertisers who keep these publications alive at all, and thus they lack independence from the subject – doesn't really make the actor encyclopedically notable, it just makes them marginally competent enough in their field to not have given up and gone back to waiting tables or driving for Uber.) — SMcCandlish ☏ ¢ 😼 04:33, 28 October 2023 (UTC)
- I don't understand your point. Can you explain how "don't dump crap" is co-extensive with the standard of "try hard to split articles at 10k words and never allow them to grow bigger than 15k words", as this is employed by wikilawyers in practice? Those two things seem largely unrelated. There are heaps of individually short "unmanageable crap" articles strewn across the project, as well as several excellent very long articles. If the purpose is just to tell people not to dump crap, wouldn't it be better to just say so directly? –jacobolus (t) 02:05, 28 October 2023 (UTC)
- The main purpose of WP:SIZERULE is to compliment WP:NOT and make sure that Wikipedia stays an encyclopedia, rather than becoming a dumping ground of unmanagable crap. If you think readability is the overriding motive, you've obviously never read a Wikipedia article! Nosferattus (talk) 01:46, 28 October 2023 (UTC)
- (de-indent) But... we do have a 61 page biography of Isaac Newton here? It's just split across Category:Isaac Newton for summary style reasons. Let's say that book has around 450 words per page; 450*60 = 27000 words. From the PROSESIZE tool, Isaac Newton comes back with 7000 words, Early life of Isaac Newton at 5000 words, Later life of Isaac Newton at 5000 words, Religious views of Isaac Newton at 2500 words, Isaac Newton's occult studies at 4000 words, and Isaac Newton's apple tree at 2000 words. That's 25500 words out of 27000 already; surely the DSB biography delves into some of the science he was involved in, and if we include the other Newton-adjacent things in the category like rotating spheres, Leibniz–Newton calculus controversy, and others, Wikipedia surely beats out the Dictionary of Scientific Biography in word count. I don't think anyone is demanding Wikipedia narrow its focus on topics with a lot of ground to cover like Newton, just... spread it out, so that the casuals looking for a basic overview read just the lede of the main article, mildly more diligent people read all of the main Isaac Newton article, and graduate students or Newton fans who really want to drill down into the nitty gritty read the subarticles. And to be clear, there's some pretty savage dropoff in reader views in subarticles, traditionally (see this set of well-maintained articles, where the main article has ~3,000 hits daily, and the spinoff articles generally have single-digit daily hits, with just two of the subarticles squeaking up to ~100 hits daily). That suggests that stuffing the information back in the main article will just result in it getting skipped if it's getting so few clicks relatively: most readers aren't bothering. But screw it, I've certainly worked on plenty of single digit daily hit articles for that one random reader who's interested. So in-depth coverage is welcome, but it can be done in compatibility with existing summary style, and further, it's a good idea. SnowFire (talk) 02:22, 2 November 2023 (UTC)
- Sure. I only oppose arbitrary splitting, splitting just for the sake of it, or to meet arbitrary word limits. I would have opposed, for example, splitting John McCain III's political career in two at 2000. This comes in part, from splitting articles and then having to defend the new subarticle at AfD. If you had asked me who John McCain was, I would have said he was an admiral. Hawkeye7 (discuss) 02:56, 2 November 2023 (UTC)
- I can't agree with the rather odd conclusion "stuffing the information back in the main article will just result in it getting skipped" leapt to after this: "there's some pretty savage dropoff in reader views in subarticles, traditionally (see this set of well-maintained articles, where the main article has ~3,000 hits daily, and the spinoff articles generally have single-digit daily hits, with just two of the subarticles squeaking up to ~100 hits daily)." SnowFire seems to be assuming that the facts/content itself that has been shunted into a side page is intrinsically of lower reader interest (a quality that sticks to it, no matter where we put it), when there is no evidence of this, but a lot of counter-evidence from various studies of WP and general web usage that people are simply resistant to following links to additional pages to get information. Side articles get lower views because they don't match up with simple searches, they take more work to get to, and they pertain to narrowed scopes that match fewer aggregate interests. These are properties of the "container", the side article, not of a discrete fact that someone subjectively puts into that container. What this all tells me is that moving content into side-articles has a ghettoizing effect. The presence of "fact X" in John McCain means it will necessarily get more readers seeing it than moving it into something like Early life and military career of John McCain, and someone looking to whitewash or otherwise PoV-push in our content will probably know this and use it to their ill-motivated advantage. This effect is actually doubled, because side articles almost always have far fewer watchlisters and other interested parties, so it becomes much easier to completely suppress or PoV-alter material after it has been ghettoized to a side article. This isn't to say no long articles should be split, of course, but we have to be aware of potential consequences, and also should not be jumping to unsupportable conclusions about the intrinsic interest to the reader of something that could be moved to another article. That's putting the cart before the horse. "Fact X is in side article B" doesn't magically make fact X of lower innate interest to anyone, but it certainly will translate to more difficult findability by a reader looking for it, and lower visibilty/access by readers in general. — SMcCandlish ☏ ¢ 😼 07:20, 2 November 2023 (UTC)
- From my experience, biographical subarticles in particular get ignored by search engines. You would hope that a search for
john mccain pow vietnam
would turn up Early life and military career of John McCain in its first page of results, but it does not on either Google or Bing. You just get John McCain. Similarly with a search forjohn mccain senator
not finding either of the congressional career subarticles (which, in retrospect, were a bad idea to begin with). Wasted Time R (talk) 11:22, 2 November 2023 (UTC) - Much of what SMcCandlish says above is non-controversial. Yes, spinoff articles can be used for POV-pushing and can easily fall into a trap if the few watchers they have move on and they become "dumping grounds". And yes, picking and choosing which facts make the main article is inherently a powerful editorial decision, with facts relegated to subarticles likely being deemed as less important. That is in fact precisely what I was saying. And I think people are aware of the consequences of subarticles (I certainly am, at least). And it sounds like we both agree that there are still plenty of times when this is the right trade-off. So... this isn't something that goes against what I said.
- We're going off the rails as far as the claim about lesser facts. I'm not saying that less important facts are less important because they're in a side article (which is obviously backwards), but rather that less important facts should go in side articles (well, when there's cause for a side article at all, for topics with tons of stuff written about 'em that don't fit). McCain getting 89,116 votes in his 1982 election is probably less of reader interest than him being held as a POW in Vietnam, hence why one is in the Electoral history subarticle and the other is in the lede of the main article. (And why I picked a well-maintained set of articles that can reasonably be trusted as having good judgment on what facts to stick where.)
- Anyway, as far as research goes, per the initial comments in this section, the vast majority of hits (some of which are bots, in fairness) stick around on the page for a very short period of time - enough to read the first paragraph or maybe the lede. People who read the entirety of our longer articles are rare, and would become rarer if we weakened the SIZE guidelines and just started stuffing articles with everything. Being able to get to the gist is part of what makes a good writer. All of the potential problems with split-offs SMcCandlish mentioned are absolutely true, but (and here's the value judgment part) adhering to size guidance is still more important. We should be offering a snappy, concise 15-to-20 pager as an introduction to Newton (or whoever) for readers interested in that. SnowFire (talk) 18:04, 2 November 2023 (UTC)
- Agree with SnowFire; expanding size opens up to poor additions of everything under the sun, rather than encouraging encyclopedic focus,. SandyGeorgia (Talk) 18:22, 2 November 2023 (UTC)
- Wikipedia is a compendium of knowledge, but the form it takes - what is regarded as "encyclopaedic" - is not bound by the conventions of paper encyclopaedias. I've had readers who resented having to go to a subarticle for the details they were looking for, yet could find the time to complain about it on the talk page. Our solution to keeping the articles to an arbitrary size - which increasingly lacks cogent justification - is summary style, the creation of subarticles, since unlike a paper encyclopaedia we are not limited in the number of articles we can have. But there is an inherent tension between summary style and notability. There are the POV splits, and the practice of unloading toxic waste like "in popular culture" sections into subarticles (which then get nominated for deletion). Hawkeye7 (discuss) 23:16, 2 November 2023 (UTC)
- SnowFire's response was well put, and assuages many of my concerns, but Hawkeye7's issue about an arbitrary size limit remains a live one. I'm not sure what the way around it is (or would have posted a concrete proposal for it by now!). — SMcCandlish ☏ ¢ 😼 08:55, 3 November 2023 (UTC)
- Wikipedia is a compendium of knowledge, but the form it takes - what is regarded as "encyclopaedic" - is not bound by the conventions of paper encyclopaedias. I've had readers who resented having to go to a subarticle for the details they were looking for, yet could find the time to complain about it on the talk page. Our solution to keeping the articles to an arbitrary size - which increasingly lacks cogent justification - is summary style, the creation of subarticles, since unlike a paper encyclopaedia we are not limited in the number of articles we can have. But there is an inherent tension between summary style and notability. There are the POV splits, and the practice of unloading toxic waste like "in popular culture" sections into subarticles (which then get nominated for deletion). Hawkeye7 (discuss) 23:16, 2 November 2023 (UTC)
- Agree with SnowFire; expanding size opens up to poor additions of everything under the sun, rather than encouraging encyclopedic focus,. SandyGeorgia (Talk) 18:22, 2 November 2023 (UTC)
- From my experience, biographical subarticles in particular get ignored by search engines. You would hope that a search for
- My point is that "length of an encyclopedia article" in the broader world doesn't have some kind of inherent cap, even in a paper encyclopedia where there are relatively steep trade-offs for every extra page. This is not a Newton word count contest. In my opinion any "main" article should cover all of the important aspects of the subject in a reasonably self-contained way, in sufficient detail to match the importance/extent of each subtopic. A side article with even further detail doesn't absolve the main article from its "responsibility" to cover that subtopic. Too often on articles here, the {{main}} template and "summary style" is used as an excuse to put a uselessly and often misleadingly short summary, under the theory that anyone who cares will just click the link. This article size guideline should not be used as a justification for such changes. When material is removed (whether or not it gets summarized), it should be on the basis that the removed material was veering off topic or out of scope, giving undue consideration to a particular subtopic at the expense of the main subject, interrupting the narrative flow of the article, or the like. Not just that the whole article hit some hard word count limit. Different subjects take more or less detail to adequately cover. There are some subjects about which barely anything has been written, and the most we can write is a few hundred words, mostly about the broader context. For other subjects, 15,000 words is really not enough. –jacobolus (t) 14:36, 3 November 2023 (UTC)
- I think this is bringing up one issue to argue against something else. Bad writing is bad, but I could easily flip around your example and say that spinning a long digression off into a subarticle is improving "the narrative flow of the article." And I suspect this case is far more common than the improper spin-off that interrupts the flow. Discipline about size tends to improve writing, not make it worse. Sure, there's no "inherent" limit, but there is a practical limit, and if you're going over 10,000 words you've probably hit it.
- There's no shame in summary style removals. Take Chemistry and Category:Chemistry. These other articles in the tree are not discussing matters "off-topic" or "out of scope" for the top-level Chemistry article; just the top-level Chemistry article needs to be an encyclopedia article and not a five-volume textbook.
- Undoubtedly there are topics where 15,000 words are not enough, nor even 1,500,000 words. Great, make subarticles! Or a freely licensed Wikibook, perhaps. That doesn't argue against the reasonable WP:SIZE limits of how long one single page of an encyclopedia article should be. SnowFire (talk) 19:28, 3 November 2023 (UTC)
- Category:Chemistry is not an article, and it gets 1% of the views of Chemistry so is demonstrably not something that readers look at or care about in practice.
Chemistry article needs to be an encyclopedia article and not a five-volume textbook
– nobody has ever proposed anything like this hyperbolic straw man. –jacobolus (t) 21:51, 3 November 2023 (UTC)- I trust that you will understand what it is actually meant here: not the category page itself, but rather the articles within the category (Organic chemistry, etc.), many of which are topics that are perfectly validly part of chemistry and not off-topic, etc. SnowFire (talk) 23:35, 3 November 2023 (UTC)
- An article about a very expansive topic like "Chemistry" or "History" necessarily has a huge number of topics to cover. Not only subject content about the table of elements and chemical reactions and so on, but also meta-information about the history of chemistry, the methods and tools used by chemists, the relation of chemistry to other scientific disciplines, the practical applications of chemistry, the economic impact and organization of the chemical industry and other industries with a heavy reliance on chemistry, chemistry as a career, chemistry education from secondary through postgraduate school, the organization of the chemistry research community, etc. etc.
- Our current article is quite limited and if I try to imagine an ideal Wikipedia article about the subject, it would stretch easily to 15,000 words if not beyond – that is, we could probably triple the length of current article without getting bogged down with an excessive level of detail about any particular subtopic. Most of the additions belonging in an ideal article would be topics that our current article doesn't even mention let alone cover adequately, rather than additional detail about the subjects already discussed (though I'm sure there's room for that too). Disclaimer: I don't know that much about chemistry and have very limited personal connection to the subject, beyond taking a 1 year course in high school and sometimes watching NileRed youtube videos with my 4-year-old. –jacobolus (t) 00:52, 4 November 2023 (UTC)
- The chemistry article does look rather paltry right now — and I'm saying that as a physicist, so I'm not trying to hype my own field. :-) XOR'easter (talk) 18:15, 4 November 2023 (UTC)
- I trust that you will understand what it is actually meant here: not the category page itself, but rather the articles within the category (Organic chemistry, etc.), many of which are topics that are perfectly validly part of chemistry and not off-topic, etc. SnowFire (talk) 23:35, 3 November 2023 (UTC)
- Creating subarticles is not easy. First, unless we are going relax WP:GNG for subarticles, we have to have a subtopic that is itself notable. So we cannot have "Article (part one)" and "Article (part two)" (except for list articles). Usually, we look for a section that can be split off, but not all articles have these, and creating them may involve restructuring the whole article. Since the section will be replaced with a three or four paragraph summary, it will have to be larger than that, or we won't substantially reduce the size of the parent article, which would defeat the purpose of the exercise. The subarticle has to stand on its own, so we may have to add a background summary that fits it into the subarticle. So there is considerable work involved. Hawkeye7 (discuss) 22:09, 3 November 2023 (UTC)
- I wouldn't take the alleged consensus against "inherited notability" too seriously. I think that if there's valid sources, an AFD is very unlikely to succeed on even tiny subtopics, if it can be shown that the sources are of strong quality (something like Influences on J. R. R. Tolkien, perhaps). It only gets dicey when it's, say, a fictional character spin-off and all the sources are primary sources. (And I personally would be disinclined to even think of that as a huge problem, but eh, no need to re-fight the "fancruft" wars of 2007-2011).
- If you want to avoid duplicate content, my personal suggestion is to slap a {{Main}} at the very top of the lede section of a spin-off article as a clue for "we really expect you to have read the above article as background." The nice thing about writing for readers hardcore enough to find their way to a subarticle in the first place is that you can somewhat trust them to click links if need be. It doesn't look like Battle of Gettysburg, first day spends tons of time going over the basic background to the battle itself, for example - it's understood that a reader clueless about that needs to read the top-level article first. SnowFire (talk) 23:35, 3 November 2023 (UTC)
- Regarding this:
I wouldn't take the alleged consensus against "inherited notability" too seriously.
If there's some other encyclopedia project where people don't take alleged consensuses about article-worthiness deadly seriously, maybe I should be editing over there instead. In general, deciding how to organize content across multiple articles is a hard problem, and I don't see how this guideline helps in any meaningful way to solve it. XOR'easter (talk) 18:22, 4 November 2023 (UTC) - Agreed. I split Assessment of the Battle of Long Tan (4,500 word) off from the 16,000-word main article. Then had to defend the decision at AfD - twice (Wikipedia:Articles for deletion/Assessment of the Battle of Long Tan, Wikipedia:Articles for deletion/Assessment of the Battle of Long Tan (2nd nomination)) Hawkeye7 (discuss) 20:14, 4 November 2023 (UTC)
- Regarding this:
- Category:Chemistry is not an article, and it gets 1% of the views of Chemistry so is demonstrably not something that readers look at or care about in practice.
Tables and lists
After a recent (and correct edit), a pargraph in the guideline now reads:
Readable prose is the main body of the text, excluding material such as footnotes and reference sections ("see also", "external links", bibliography, etc.), diagrams and images, tables and lists, Wikilinks and external URLs, and formatting and mark-up. The measure may substantially underestimate the amount of content in articles that summarize much of their information in tables, especially when these contain notes and explanations in text columns.
I propose that it would make more sense to remove ""tables and lists", and remove the newly added second sentence. Some articles (including some of our longest) consist almost entirely of lists (sometimes formatted as tables). — SMcCandlish ☏ ¢ 😼 16:34, 25 August 2023 (UTC)
- Your final sentence is true, but I'm having trouble seeing why the previous one follows from that. Could you explain? Nikkimaria (talk) 03:20, 26 August 2023 (UTC)
- What's not clear? The "readable" article content at a long list is the list. The current wording a) creates a loophole such that list articles are not subject to length limits at all, and b) another loophole whereby an article that consists of, say, 75% a list ignores the entire list for purposes of length calculation. I doubt anyone actually agrees that's a good idea. Hell, it could be a [bad] excuse to convert prose material into inappropriate lists/tables, just to skirt the length guidelines. — SMcCandlish ☏ ¢ 😼 05:10, 26 August 2023 (UTC)
- I am inclined to concur that tables and lists should be treated like normal wikitext. Jo-Jo Eumerus (talk) 07:40, 26 August 2023 (UTC)
- It is unclear to me why tables and lists should be treated by normal wikitext for the purpose of article size. Size limits are to to with readability, and tables are for data presentation. I am unaware of people who look up lists to read from beginning to end. Tables and lists are reference material, while articles are a presentation of information about a particular subject, which I judge to be completely different subjects. As an engineer, I have a steam table book written in the 50s (prior to computers and the internet) that is almost entirely tables about temperature and pressure for various fluids. I do not believe anyone would read the book from beginning to end as a subject matter description of steam temperature and pressure, one would just go to the table needed for the particular values. The point I am trying to make is that tables and lists should not be subject to readability limits, but certainly should be subject to technical limits, such as maximum character limit, or limits on how may citations can be included before the article breaks, or general reports on slow-down on download speed on limited access machines such as commonly used smartphones in nations with more limited data carriers. But putting a size limit on tables and lists based on the subjective readability limits would not be a good idea. It is not a loophole, it is a different perspective. Mburrell (talk) 21:58, 26 August 2023 (UTC)
- As people have noted in the paragraphs above, though, people mostly don't read articles top to bottom, either. Jo-Jo Eumerus (talk) 17:46, 27 August 2023 (UTC)
- Agree with Mburrell. SandyGeorgia (Talk) 11:40, 30 October 2023 (UTC)
- I agree too. Also, at least with tables, collapsing them can put them out of sight and out of mind. Riposte97 (talk) 22:15, 30 October 2023 (UTC)
- It is unclear to me why tables and lists should be treated by normal wikitext for the purpose of article size. Size limits are to to with readability, and tables are for data presentation. I am unaware of people who look up lists to read from beginning to end. Tables and lists are reference material, while articles are a presentation of information about a particular subject, which I judge to be completely different subjects. As an engineer, I have a steam table book written in the 50s (prior to computers and the internet) that is almost entirely tables about temperature and pressure for various fluids. I do not believe anyone would read the book from beginning to end as a subject matter description of steam temperature and pressure, one would just go to the table needed for the particular values. The point I am trying to make is that tables and lists should not be subject to readability limits, but certainly should be subject to technical limits, such as maximum character limit, or limits on how may citations can be included before the article breaks, or general reports on slow-down on download speed on limited access machines such as commonly used smartphones in nations with more limited data carriers. But putting a size limit on tables and lists based on the subjective readability limits would not be a good idea. It is not a loophole, it is a different perspective. Mburrell (talk) 21:58, 26 August 2023 (UTC)
- I am inclined to concur that tables and lists should be treated like normal wikitext. Jo-Jo Eumerus (talk) 07:40, 26 August 2023 (UTC)
- What's not clear? The "readable" article content at a long list is the list. The current wording a) creates a loophole such that list articles are not subject to length limits at all, and b) another loophole whereby an article that consists of, say, 75% a list ignores the entire list for purposes of length calculation. I doubt anyone actually agrees that's a good idea. Hell, it could be a [bad] excuse to convert prose material into inappropriate lists/tables, just to skirt the length guidelines. — SMcCandlish ☏ ¢ 😼 05:10, 26 August 2023 (UTC)
- Ah, okay, I misunderstood. I don't object to the principle but we may need to deal with the fact that the added sentence is true wrt the tools often used for assessment of this issue. Nikkimaria (talk) 13:12, 26 August 2023 (UTC)
- MOS uses the term appendix (sometimes, footers) to refer to the bottom matter (another term!) that we'd like to exclude, so maybe we could borrow that. Mathglot (talk) 09:23, 26 August 2023 (UTC)
- Sure, but it still shouldn't include "tables and lists" which are part of the main-body content of the article. — SMcCandlish ☏ ¢ 😼 05:24, 26 October 2023 (UTC)
- Disagree entirely that tables and lists should be added; the issue is readable prose, and tables are skimmed. SandyGeorgia (Talk) 11:39, 30 October 2023 (UTC)
- Tables and lists wouldn't be considered the readable prose in prose articles, but they would be considered the readable prose, or whatever is closest to that, for list articles. Onetwothreeip (talk) 09:43, 9 November 2023 (UTC)
- The problem with tables and lists is that we don't have an automated tool for counting the "readable prose" in them. The reason is that we have not determined a way of counting the text in them. We need to agree on this first. Only then can we consider size limits. Hawkeye7 (discuss) 18:22, 23 November 2023 (UTC)
- Tables and lists wouldn't be considered the readable prose in prose articles, but they would be considered the readable prose, or whatever is closest to that, for list articles. Onetwothreeip (talk) 09:43, 9 November 2023 (UTC)
- Disagree entirely that tables and lists should be added; the issue is readable prose, and tables are skimmed. SandyGeorgia (Talk) 11:39, 30 October 2023 (UTC)
- Sure, but it still shouldn't include "tables and lists" which are part of the main-body content of the article. — SMcCandlish ☏ ¢ 😼 05:24, 26 October 2023 (UTC)
Deferring to those doing the actual work instead of drive-by taggers
Kind of buried in above discussions, I proposed: While the guideline briefly touches on the idea that there is no big hurry in splitting up a long article, I think it should more clearly state the point that if someone is actively developing the material and doesn't want it split yet, that they should be listened to since they're doing (or doing a significant portion of) the work.
And some detailed rationale behind that: [23] That was in July, and nothing's happend in this regard, probably because it's mired in a broader thread. Might take a little work to craft up a guideline-worthy line item about this, but it's a pretty simple point. — SMcCandlish ☏ ¢ 😼 07:29, 2 November 2023 (UTC)
- Sounds reasonable. Hawkeye7 (discuss) 08:21, 16 November 2023 (UTC)
- The wording will need significant improvement over the green-i-fied portion above, but I would support in theory. I've been part of a situation where someone split content prematurely and without discussion, to the wrong title, without a lead, based on the usual misunderstanding of readable prose vs. KB, leaving undefined citations that couldn't be recovered by bot, not attributing with WP:CWW, creating a double maintenance load and separating primary from secondary sources, when in fact cleaning up prose redundancies and overquoting brought the main article back within size recommendations so that the split wasn't needed anyway. The wording should focus on something like taking care not to do a BOLD split, rather discussing first to see what other options there are, and outlining the steps to do it correctly, or waiting for an editor who knows how to do it correctly. The editor who did that defended it as BOLD and accused me of OWN; in fact, it was reckless and took me days of work to correct, ending up not being needed at all. I wouldn't position it as "those doing the actual work", rather not a situation where bold should be implemented without prior discussion, to a) make sure it's necessary and b) get it done right when it is. A lot of this, though, should actually be addressed at WP:PROPERSPLIT-- and people don't read either page anyway, but reinforcement of the concept at both pages might help. SandyGeorgia (Talk) 11:34, 16 November 2023 (UTC)
- WP:PROPERSPLIT should be enhanced to add the steps to keeping the referencing in order. Hawkeye7 (discuss) 23:52, 16 November 2023 (UTC)
- Isn't that going to vary a lot depending on what citation techniques are involved? — SMcCandlish ☏ ¢ 😼 23:58, 16 November 2023 (UTC)
- Yes, but in this case, the editor just split the content and never even glanced at the mess left. First, if you're using sfns, you have to bring over the matching cite template. Second, they had made so many other messes before the split that the bot couldn't figure out what named refs to use. Third, editor never bothered to check after the bot run that citations were a mess. The general message would be that the editor who does the split should be responsible to make sure all citation info is copied over. SandyGeorgia (Talk) 01:10, 17 November 2023 (UTC)
- We should be clearer in point 5 on the need to make sure references themselves are in place, not just "A References section". Point 6 should also include a note that you may need to add references, such as if the base named reference is removed (although luckily bots seem to be okay at catching this). CMD (talk) 02:31, 17 November 2023 (UTC)
- Sure, that all sounds reasonable (both of you, I mean). — SMcCandlish ☏ ¢ 😼 07:50, 17 November 2023 (UTC)
- We should be clearer in point 5 on the need to make sure references themselves are in place, not just "A References section". Point 6 should also include a note that you may need to add references, such as if the base named reference is removed (although luckily bots seem to be okay at catching this). CMD (talk) 02:31, 17 November 2023 (UTC)
- Yes, but in this case, the editor just split the content and never even glanced at the mess left. First, if you're using sfns, you have to bring over the matching cite template. Second, they had made so many other messes before the split that the bot couldn't figure out what named refs to use. Third, editor never bothered to check after the bot run that citations were a mess. The general message would be that the editor who does the split should be responsible to make sure all citation info is copied over. SandyGeorgia (Talk) 01:10, 17 November 2023 (UTC)
- Isn't that going to vary a lot depending on what citation techniques are involved? — SMcCandlish ☏ ¢ 😼 23:58, 16 November 2023 (UTC)
- WP:PROPERSPLIT should be enhanced to add the steps to keeping the referencing in order. Hawkeye7 (discuss) 23:52, 16 November 2023 (UTC)
Iconoclasm
I've noticed the {{too long}} tag on three articles recently: Winston Churchill, John F. Kennedy and Napoleon. The readership for all of these great men will always be high but it will be especially so currently as JFK is doubly-featured on the main page today as it's the anniversary of the assassination. And there's a big new biopic about Napoleon which was released today.
The typical reader will not be surprised that these pages are long as there's obviously a lot to say about these people. For example, Ridley Scott said that there are over 10,000 books about Napoleon – one for every week since he died. What may surprise the thousands of readers is that Wikipedia is complaining about this length and is expecting them to do something about it. But they probably suspect that, if they do starting taking an axe to the content, it won't end well. Other readers may suspect that the tags have been placed by iconoclastic vandals who want to diminish and disrespect these great men. But again, if you revert these tags as vandalism, that's not likely to end well either. And so, as usual, the tags linger to annoy all those readers while nothing is actually done about them. It's not a good look for our highest-profile articles.
As for the question of the best or most appropriate length, note that Scott uses the "bum ache factor" for his movies. That makes sense for a captive audience in a cinema but Wikipedia readers are not like that, are they?
Andrew🐉(talk) 22:01, 22 November 2023 (UTC)
- thank you, yes, they are! Most people scrolling through an encyclopedia article of 100,000 words, will feel brain ache won't they, hence the guidelines including 15,0000 words. The ache factor is likely normally distributed, with mean about 2-3 hours for films. People have effectively been trying to estimate the ache factor above in the "Is there verifiable scientific basis for the article length guidelines?" section, about readabilty. I'll see if I can find any evidence on readability, Tom B (talk) 17:35, 23 November 2023 (UTC)
- "Brain-ache" from scrolling through a long article? I hardly think so; maybe scroll-wheel finger-ache; but the ToC helps with that. Or did you mean, "from *reading* through" a long article? Again, no; when you're on the sofa or your commuter train, you read as little or as long as you like, then switch to the puzzle, or your song playlist, and pick up the reading later; just like with a book. By your reckoning, the printed book industry (and eBooks) might as well shut their doors because nobody could possibly read a 100-page novella, containing around 25,000 words, let alone War and Peace. And by the way, there is no Wikipedia article with 100,000 words; that's about four times bigger than the largest one. An extreme high end article like Presidency of Donald Trump is #29 in the list and weighs in at 525kb raw and 23,606 prose words (all the larger ones are list- or table-rich and prose words can't be counted accurately). And I don't even know why we bother arguing about this, as some studies (which maybe someone will link for me) have shown that most readers don't read past the lead. Mathglot (talk) 06:38, 24 November 2023 (UTC)
- Most readers use the mobile view which gives quite a different perspective. I just tried looking at Napoleon on my phone. The many large sections of prose are not a problem in this because they are condensed into single lines with fairly clear titles like Early life and Exile on Saint Helena. They can be expanded with a single click and that seems fine as a way of handling such a big topic.
- What causes brain ache is not the condensed body but the clutter at the start of the article. The first screen starts with some tiresome disambiguation and then an even more tiresome tag complaining that the article is too long. The actual article text starts over half way down the screen and its first sentence is quite complex and cluttered.
The source-code for this is even more complex and convoluted and so would immediately repel anyone who dared to start trimming it.Napoleon Bonaparte (born Napoleone Buonaparte;[1][a] 15 August 1769 – 5 May 1821), later known by his regnal name Napoleon I, was a French emperor and military commander who rose to prominence during the French Revolution and led successful campaigns during the Revolutionary Wars.
- If the mobile reader then tries to understand why the article considered too long and scrolls down through it, they may conclude that this is because it has too many categories – a huge list of over 100 which are not condensed in the same way as the prose sections and so requires a lot of scrolling to get through.
- So, this "too long" issue is very dependent on your device and preferences. The people placing these tags are not typical readers, right?
- Andrew🐉(talk) 10:48, 24 November 2023 (UTC)
- @Andrew Davidson, thank you, I agree it's about preferences. One thing we've not managed to sift are reader preferences about length of articles, except for the good point you both make that leads are key. You and @Mathglot have positively influenced my behaviour in concentrating on leads more, Tom B (talk) 15:44, 24 November 2023 (UTC)
- Andrew, you raise two issues, each of which deserves its own discussion section: mobile view (which I'll leave for later), and lead clutter, and of course there's some overlap. With respect to just one aspect of lead clutter (which applies to all devices) I've always thought that we have a WP:FIRSTSENTENCE problem in most articles, where we separate the subject of the sentence, which is typically the bolded article topic, from the verb (often is or was) by all sorts of interpolated information that hardly anybody cares about, or at least, not that soon. I have an informal measure about how egregious it is for a given article, by simply counting the number of words between the subject, and the verb; for example, the Napoleon article has a score of "16", and it's not that unusual to find articles with scores in the 30s. This really detracts a lot from readability in my view. It's not that the information isn't important to the article at all, it's just that it shouldn't be crammed into the WP:FIRSTSENTENCE between the subject and verb. Imho, that is just a ridiculous holdover from print encyclopedias which do it this way, and a failure to fully embrace the advantages of hypertext in the first sentence, perhaps in an attempt to borrow some gravitas from staid, old print encyclopedias. We don't need to do this anymore. I'd like to see the verb come right after the subject, with rare exceptions. The interpolated material could go later in the lead, later in the body, or my preference as of now, an explanatory note. That would make the lead sentence of Napoleon look like this:
- Napoleon Bonaparte[c] was a French emperor and military commander who rose to prominence during the French Revolution and led successful campaigns during the Revolutionary Wars.
- with the note showing up in the Notes section, as usual. (For the purposes of this example, I bundled everything into one note, but some of that could be dispersed in different body sections, and needn't all be covered in the note.) Mathglot (talk) 22:19, 24 November 2023 (UTC)
- @Mathglot hiya, we're editing encyclopedia articles rather than 100-page books! This is a source of the disagreement: some editors are ok with articles being book-length, others aren't. I believe you, thanks for the good point: that most don't read past the lead. Surely it's worth bothering discussing that? Why would someone only read the lead of Donald Trump, rather than scrolling through the 17,000 words below. If the guideline was reduced from 15,000 to 12,000, would that increase quality, increase readers scrolling past leads, reduce quality or have little effect? Tom B (talk) 15:40, 24 November 2023 (UTC)
- "Brain-ache" from scrolling through a long article? I hardly think so; maybe scroll-wheel finger-ache; but the ToC helps with that. Or did you mean, "from *reading* through" a long article? Again, no; when you're on the sofa or your commuter train, you read as little or as long as you like, then switch to the puzzle, or your song playlist, and pick up the reading later; just like with a book. By your reckoning, the printed book industry (and eBooks) might as well shut their doors because nobody could possibly read a 100-page novella, containing around 25,000 words, let alone War and Peace. And by the way, there is no Wikipedia article with 100,000 words; that's about four times bigger than the largest one. An extreme high end article like Presidency of Donald Trump is #29 in the list and weighs in at 525kb raw and 23,606 prose words (all the larger ones are list- or table-rich and prose words can't be counted accurately). And I don't even know why we bother arguing about this, as some studies (which maybe someone will link for me) have shown that most readers don't read past the lead. Mathglot (talk) 06:38, 24 November 2023 (UTC)
- Charles de Gaulle recently underwent huge chops, and is still undergoing them despite being below 80kB prose size. Material (usually valuable context, or relevant detail) is getting split to new articles with ~10 daily pageviews. I expressed support for these splits a year ago, but the discussion above changed my mind, they weren't an improvement; I agree with Andrew. It likely wouldn't go well, but I wish the template was deleted altogether. The first step should be to remove any word-count/prosesize-based limit from this guideline. DFlhb (talk) 10:41, 24 November 2023 (UTC)
I second Mathglot's comment at 06:38. We should think of articles as books. How do most readers treat books? Added to that is the fact "that most readers don't read past the lead", so I'd vote to delete this guideline or severely redefine and restrict its use. (That is a bit of an extreme statement which I will explain below.) -- Valjean (talk) (PING me) 18:39, 24 November 2023 (UTC)
- WP articles are not like books. WP articles are also not like conventional encyclopedia articles. They are a new thing unto themselves, hence the difficulty of knowing how large they should be able to be. Wasted Time R (talk) 19:54, 24 November 2023 (UTC)
- See Wikipedia:Five pillars and What Wikipedia is not. I have two copies of The Last Lion if anyone needs to read a book about Winston Churchill. Wikipedia is an encyclopedia; articles are not books. SandyGeorgia (Talk) 20:25, 24 November 2023 (UTC)
- This defines what Wikipedia is not, but not what it is. Nowadays, if someone wants to know what regiment of the British Army Winston Churchill served in, they don't borrow The last Lion from the library, they turn to the internet. A Google search will tell you. For further information, they can try more queries or turn to Wikipedia. This is not in the lead, but in the body they will find it. Hawkeye7 (discuss) 20:56, 24 November 2023 (UTC)
Yes, our articles are obviously not books. (My comment was too simplistic.) I meant that the reader's approach to our articles could be seen more like the way people approach books, in the sense that they rarely read a whole book in one sitting, with no pauses. (Yes, yes, and yes again, there are people who read whole books in one sitting, but they are exceptional.) Also, not just like paper books, but like digital books, which can be infinite in size.
Some of those who most strongly advocate for limiting the size of our long articles (one seems to have OCD about it) do so with the flawed approach of "paper" mindset editors, rather than "digital" mindset editors who embrace the newer possibilities of documenting the "sum of all human knowledge". (That was previously an unimaginable thought, so thanks to Jimbo for opening our minds to the new possibilities of the digital age.) Their Wikipedia would miserably fail, according to Baseball Bugs: "If I go looking for info, and Wikipedia doesn't have it, then Wikipedia has failed."
The old-fashioned editorial approach toward creating paper encyclopedias must be abandoned, which is why we have NOTPAPER. In spite of that, we have editors who still treat our articles as if they are paper, with the constraints that are involved, and they cite this page. No, the English Wikipedia, without media, can fit on an Apple Watch, and regardless of size, we can instantly find specific words, phrases, and sections of interest. Our articles can be searched for keywords by researchers and scholars who have no intention of reading the whole article. We do not create articles only for those who sit down and read a whole article. I don't give a flying eff if they never finish the article. They will usually be satisfied with the lead anyway. They can also hop to the sections they want to read.
Our primary concern/goal is documenting the sum of all human knowledge about that subject without violating any of our PAG. "Size" limits should never get in the way of that goal. PRESERVE and NOTPAPER trump "size". In fact, if there is any guideline that should be the first to fall victim to IAR, it is this one. The size in words or bytes is rather irrelevant. We need to get away from the "paper" mindset. Wasted Time R said it well: "They are a new thing unto themselves, hence the difficulty of knowing how large they should be able to be." We now know that size constraints are a thing of the past in a digital age. -- Valjean (talk) (PING me) 21:23, 24 November 2023 (UTC)
- @Andrew Davidson: It would seem that the tags on those articles are working, as the sizes of those articles have been reducing. Typically, excessive length is more a symptom of other problems, than a problem itself, and the editors of the individual articles would generally be those who can best determine whether an article is too long or not, with some exceptions. Wikipedia has developed the solution of the summary style, to ensure that as much information can be kept in Wikipedia as possible, but that individual articles remain accessible, readable and manageable. It's generally good practice to ensure that content removed from one article can be found in another article, typically a sub-article of the main subject. Onetwothreeip (talk) 21:32, 24 November 2023 (UTC)
Notes and refs
- Notes
- ^ English: /nəˈpoʊliən ˈboʊnəpɑːrt/, French: Napoléon Bonaparte [napɔleɔ̃ bɔnapaʁt]; Corsican: Napulione Buonaparte.
- ^ English: /nəˈpoʊliən ˈboʊnəpɑːrt/, French: Napoléon Bonaparte [napɔleɔ̃ bɔnapaʁt]; Corsican: Napulione Buonaparte.
- ^ Born Napoleone Buonaparte;[2][b] 15 August 1769 – 5 May 1821), later known by his regnal name Napoleon I.
- References
- ^ "CPA: corse, AJACCIO, L'ACTE DE BAPTEME DE NAPOLEON Ier". www.antiqu-arts.com. Retrieved 2023-09-24.
- ^ "CPA: corse, AJACCIO, L'ACTE DE BAPTEME DE NAPOLEON Ier". www.antiqu-arts.com. Retrieved 2023-09-24.
Quality and the 15,000 guideline
Hi everyone, guidelines say articles over 15,000 'readable' words, "Almost certainly should be divided or trimmed." 15,000 comes from a compromise relating to a 2007 change to do with 100 kilobytes. I.e. 15,000 is not based on what would lead to higher quality, more readable articles, hence the readability discussion above. I've read the links helpfully posted by @Peter Isotalo and not found anything useful on readability. But what about using verifiable evidence on quality, which is very related to readability. Quality is not mentioned once in the article size guidelines? I looked at recently promoted featured articles - October 2023 - and found the largest was about 12,000 words. We could analyse 'recently' promoted featured article maximum length to help improve the guidelines, or put them on a better footing? Grateful for evidence on quality and readability, Tom B (talk) 20:05, 23 November 2023 (UTC)
- The problem is that quality is a different thing than quantity. Badly written text is badly written, no matter whether it's 1500 or 150000 words long. Jo-Jo Eumerus (talk) 07:24, 24 November 2023 (UTC)
- There's a semi-informal 10k limit for FAs, so this is to be expected; there's no causality. DFlhb (talk) 10:35, 24 November 2023 (UTC)
- hi @DFlhb, thank you, there is a formal length requirement yes, but no exact number as you intimate. I was surprised to find a recently promoted article at 12k, that might effectively be the informal limit? For me and others there is causality, the informal limit aids quality, Tom B (talk) 16:05, 24 November 2023 (UTC)
- Quality nor readability are not sensible reasons to sub-divide articles because quality has nothing to do with size while readability has to do with the chunking and navigational structure of topics at multiple levels – sentence, paragraph, section, page, topic, category and so forth.
- The real issue is the technical size of the page and this seems to be most affected by the amount of templates rather than the amount of prose. For example, the popular page Deaths in 2023 has an edit note that "References should be in <ref>[url & title]</ref> format, as full citations make the page too slow to load, and too big to edit."
- Andrew🐉(talk) 12:37, 24 November 2023 (UTC)
- @Andrew Davidson, thank you, we have a simple disagreement: you say quality has nothing to do with size, I say it does. For me the Napoleon article increases in quality from to 1,000 words, to 8,000 and starts decreasing before about 12,000 words. I got it promoted to GA at 8,000 words and it got demoted at 18,000. Don't most think the quality decreases at some point? We just disagree when? I appreciate it will be different amounts for different articles. I don't think technical size is the big issue any more. The consensus appears to be that readability is now key? Some think we should remove the limit, some like me think we should reduce it e.g. to 12,000, but I'm open to evidence, others might think the 15,000 guideline is fine. Everyone thinks their position will improve quality or not effect it? Tom B (talk) 16:26, 24 November 2023 (UTC)
Summarising evidence, arguments on limits
Issue | Summary of arguments |
---|---|
Readability | Some believe tightening the guideline will increase readability. Attention span time. The average reading session is below the "don't bother to split"-limit.[24] It is not even 10% of the limits proposed in the earliest "readers may tire"-argument from 2004.[25] Content further down is less likely to be read, but readers can pick out sections they want to read.[26] |
Comprehensiveness | Some believe there is a trade-off between comprehensiveness and readability, others believe there is no trade-off. |
Accessibility | Concision is included in dyslexia friendly guidelines and fatiguing conditions. Accessible text should be structured well. This is more challenging with longer articles, especially on mobile, which only allows navigation on top-level headings. Search engines often direct the reader to the main article even when there is a subarticle on the exact topic. Some believe technical issues for readers with slower connections should mean limiting length. |
Quality | Some believe tightening the guideline will increase quality. |
Maintenance | Long articles have more content to maintain. On the other hand, when articles are split to resolve length issues, the maintenance load over multiple articles may become even larger. |
Explicit consensus | It is difficult to achieve explicit consensus on large bodies of text; there is a higher risk of single-authored text that may not reflect consensus. |
Guideline limits, existing and others (words) | Summary of evidence |
---|---|
8,000-10,000 | Length of journal articles.[1][2][3][4][5][6][7][8] For attention span, a 2005 study includes this session estimate: 40 minutes x 238 words ~ 10,000 words. [27]. |
12,000 | In 2023 the largest article promoted to featured might be 12,000 [28] What was the largest promoted each year 2020-23? This metric may circle back to the current 15,000 guideline. |
15,000 | Current guideline. The largest articles promoted to featured in 2010-15 were 15,000 [29] [30] [31] [32] [33]. |
No limit | Some editors believe removing the guideline would increase comprehensiveness. Removing limits reduces rules, Avoid instruction creep, WP:IAR and MOS:BLOAT. |
With thanks to @Nikkimaria and others, i've started drafting these summaries. Please add, amend if I've been unfair, particularly Nikkimaria, Tom B (talk) 18:02, 24 November 2023 (UTC)
- I have edited this with additional material and corrections. Substituted Elvis Presley as FA example and made clearer the distinction between bytes and words. Hawkeye7 (discuss) 18:30, 24 November 2023 (UTC)
- I see no mention of maintenance issues, and don't understand where the 15,000 comes from. SandyGeorgia (Talk) 18:12, 24 November 2023 (UTC)
- Probably because length cuts in two directions, maintenance-wise: A long article has more content to check, but updating multiple articles is harder than just one. Jo-Jo Eumerus (talk) 18:26, 24 November 2023 (UTC)
- Added to the table. Hawkeye7 (discuss) 18:38, 24 November 2023 (UTC)
- That's why splitting is no longer touted as the only solution in the guideline. Sometimes, it's about selection only the interesting and important, and not covering ever-changing details at all. For me maintenance, and the ability to achieve active consensus on a larger proportion of the text are the key arguments in favour of not creating too long an article. —Femke 🐦 (talk) 18:39, 24 November 2023 (UTC)
- I've discussed the three different methods of reducing page size in an essay. Hawkeye7 (discuss) 21:28, 24 November 2023 (UTC)
- In your essay, the sentence "Material must be preserved unless it is unsourced, libelous, patent nonsense, vandalism or violates copyright." is stronger than the policy being cited (which says should and caveats with ), and that policy as written contradicts other key policies and guidelines and daily practice. For instance, we delete information when it's outdated (WP:MEDDATE, but also common in other science articles), overly detailed (WP:summary style), information discouraged by Wikipedia:What Wikipedia is not, and I'm probably overlooking others. Starting from a clean slate is common when wanting to meet FA criteria with their focus on HQRS. —Femke 🐦 (talk) 12:37, 25 November 2023 (UTC)
- Your "caveats with" appears to be missing a quote. :-) — SMcCandlish ☏ ¢ 😼 13:08, 25 November 2023 (UTC)
- WP:RETAIN is a policy, so it overrides WP:MEDDATE. I tried to get WP:FALSE upgraded to a guideline without success. So unfortunately, correcting outdated information relies on WP:IAR. There is no contradiction between WP:RETAIN and WP:Summary style; the latter can never be used to justify deletion of sourced material on the basis that it is "overly detailed"; rather, it recommends the creation of or movement to a subarticle. Hawkeye7 (discuss) 19:22, 25 November 2023 (UTC)
- Although WP:Summary can justify deleting sourced material if such content is already contained in another article, usually a sub-article. Onetwothreeip (talk) 02:57, 26 November 2023 (UTC)
- In your essay, the sentence "Material must be preserved unless it is unsourced, libelous, patent nonsense, vandalism or violates copyright." is stronger than the policy being cited (which says should and caveats with ), and that policy as written contradicts other key policies and guidelines and daily practice. For instance, we delete information when it's outdated (WP:MEDDATE, but also common in other science articles), overly detailed (WP:summary style), information discouraged by Wikipedia:What Wikipedia is not, and I'm probably overlooking others. Starting from a clean slate is common when wanting to meet FA criteria with their focus on HQRS. —Femke 🐦 (talk) 12:37, 25 November 2023 (UTC)
- I've discussed the three different methods of reducing page size in an essay. Hawkeye7 (discuss) 21:28, 24 November 2023 (UTC)
- Probably because length cuts in two directions, maintenance-wise: A long article has more content to check, but updating multiple articles is harder than just one. Jo-Jo Eumerus (talk) 18:26, 24 November 2023 (UTC)
- These tables may be a good faith effort to determine the common views regarding article size, but seem quite biased. Who are these "some editors"? I haven't seen anybody ever advocate that there should be no guidelines at all regarding size, or any editors advocating for Wikipedia to be comprised of very short articles. Editors generally agree that there is compromise between articles being comprehensive and being accessible (as in readable), it's not as though some editors want articles to be readable and others want them to be comprehensive. As for the technical issues, this comprises much more than the page size limit of 2 million bytes, such as downloading and loading speeds, displaying particularly on mobile devices, and editing particularly with visual editor. Onetwothreeip (talk) 02:55, 26 November 2023 (UTC)
- There are editors advocating the abolition of the guidelines regarding size; it is not generally agreed that there is compromise between articles being comprehensive and being accessible (which is not supported by the studies); and it has been repeatedly pointed out that downloading and loading speeds have nothing to do with readable prose size. Hawkeye7 (discuss) 04:03, 26 November 2023 (UTC)
- I'm sure you would agree that all of Wikipedia should not exist in one article. While such an article would certainly be comprehensive, it would not be accessible. Likewise, each article should not be as small as one sentence, despite being very easy to read. Regarding the abolition of guidelines, I'll address that in my response to Tpbradbury. Onetwothreeip (talk) 06:30, 26 November 2023 (UTC)
- @Onetwothreeip, editors have said above: "I'd vote to delete this guideline", "if there is any guideline that should be the first to fall victim to IAR, it is this one". "The first step should be to remove any word-count/prosesize-based limit from this guideline." You're right that no one is advocating WP be comprised of very short articles, Tom B (talk) 04:05, 26 November 2023 (UTC)
- Tom B, that was me, and it was a bit of an extreme comment taken alone, so let's look at the rest of that sentence: "I'd vote to delete this guideline or severely redefine and restrict its use." I, of course, favor redefinition and restriction, rather than total deletion. I have written my thoughts in a section below about "accessibility". -- Valjean (talk) (PING me) 17:40, 26 November 2023 (UTC)
- They would still support some guideline, whether written or not, which would relate to the size of the article. It may not be explicit, and they may not like the current written guidelines, but obviously there are no editors who would realistically say that it would be fine for a single article to be millions of words long. Onetwothreeip (talk) 06:33, 26 November 2023 (UTC)
- I think one caveat is that "the article combines too many unrelated topics" is an objection that you could expect being applied to many lengthy articles, but it's not (necessarily) about the size, nor can it be defined as a size issue. The caveat with technical issues is that a lot of people confuse the size of the page (which is often dominated by images) with the size of the prose (which isn't). Jo-Jo Eumerus (talk) 08:14, 26 November 2023 (UTC)
- @Onetwothreeip, if editors say I want to delete this guideline, it is not obvious they would support even an unwritten guideline. It would make sense for guidelines to be written? We have different estimates of good comprehensiveness, it will be different for each article, reader, editor. Many readers just need the lead, where 300 words is the suggested total. For the full Churchill article, 1k words isn't comprehensive, what's too comprehensive? Many think readability or quality, starts deterioriating 10-15k given the evidence, others aren't being explicit where they think it starts deteriorating, it seems they think tens of thousands of words are ok, which is difficult to navigate. Most agree 100,000, 90,000, 80,000 is too comprehensive. Down at 20,000 some will start disagreeing and saying we need add more words, to be comprehensive. There is a trade-off between comprehensiveness and readability? Tom B (talk) 12:56, 26 November 2023 (UTC)
- There is no trade off between comprehensiveness and readability. Articles are simply as large as they need to be. Summary style comes into play when a section become undue or is a subject that the readers would search for in its own right. Hawkeye7 (discuss) 19:22, 26 November 2023 (UTC)
- Of course there is a trade-off between readability and article length. If you need to spend longer to find the main points of a text because there is more text, you're more likely to abort or drift off halfway, and have a lower understanding of the material at hand. Readability and accessibility are intricately intertwined and there is a reason why it's included in various accessibility guidelines. —Femke 🐦 (talk) 19:30, 26 November 2023 (UTC)
- Even though useability is maybe a more correct term for this [34]: the ability of a reader to locate information. Too little information, and the reader had to go to another page. Too much information, and the reader cannot find the information among intricate details. —Femke 🐦 (talk) 19:39, 26 November 2023 (UTC)
- Have added the bit I think we all agree on from your essay Hawkeye: there is a trade-off between readability and unnecessary wordiness. I think the 5% in that essay may be a testament of good quality writing from MILHIST, I think I often achieve 10% in climate change related articles. —Femke 🐦 (talk) 20:29, 26 November 2023 (UTC)
- The summary style guideline is one about comprehensiveness and readability. Onetwothreeip (talk) 20:57, 26 November 2023 (UTC)
- There is no trade off between comprehensiveness and readability. Articles are simply as large as they need to be. Summary style comes into play when a section become undue or is a subject that the readers would search for in its own right. Hawkeye7 (discuss) 19:22, 26 November 2023 (UTC)
- There are editors advocating the abolition of the guidelines regarding size; it is not generally agreed that there is compromise between articles being comprehensive and being accessible (which is not supported by the studies); and it has been repeatedly pointed out that downloading and loading speeds have nothing to do with readable prose size. Hawkeye7 (discuss) 04:03, 26 November 2023 (UTC)
Removing kb limits
My understanding is there is strong consensus to remove the kb limits thus:
Readable prose size | What to do |
---|---|
> 15,000 words | Almost certainly should be divided or trimmed. |
> 9,000 words | Probably should be divided or trimmed, although the scope of a topic can sometimes justify the added reading material. |
> 8,000 words | May need to be divided or trimmed; likelihood goes up with size. |
< 6,000 words | Length alone does not justify division or trimming. |
< 150 words | If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, the article could be expanded; see Wikipedia:Stub. |
But @Onetwothreeip reverted with "I am sympathetic to some kind of change like this, but really need a strong consensus". I thought we only needed consensus, but I'm not used to guideline changes. Is there consensus to remove the kb limits or good reasons to retain them? Tom B (talk) 15:48, 26 November 2023 (UTC)
- I'd be in favour, but I think we need to make sure that WP:SIZESPLIT has the same units, so as not to confuse people. When we added the word counts earlier, some people suggested a transition period of a few years to ease people referring to the old kb units. —Femke 🐦 (talk) 16:02, 26 November 2023 (UTC)
- When asked i have told my students for years about page length over kb (as the academic world does) . "Articles should range between 8,000 and 10,000 words, or approximately thirty-five pages in length, and would include a 150 word lead." Moxy-
16:20, 26 November 2023 (UTC)[1][2][3][4][5][6][7][8]
- I see no reason to remove the byte guideline. Each article's word count is not going to be immediately apparent. It is right that the word count is the more prominent guideline, but the byte count is simply an equivalent to that. However, I would support a change that makes it clearer that this table applies to articles that are predominantly prose content. Onetwothreeip (talk) 21:05, 26 November 2023 (UTC)
- Au contraire; few (even experienced) editors understand how to calculate readable prose, and quite often misquote the KB when referring to size. I support removal of the confusing and dated KB bits. SandyGeorgia (Talk) 21:09, 26 November 2023 (UTC)
- I would think that calculating readable prose is simply a matter of counting how many words there are in the prose of the article. I accomplish this myself by copying and pasting the text into something that counts words. Saying that a certain article kilobyte size of a prose article is generally equivalent to a certain amount of words should be complementary to the word count guideline. If it's being misquoted or confused, we should rewrite the guidelines or add clarification instead. Onetwothreeip (talk) 21:39, 26 November 2023 (UTC)
- Rewriting doesn't help considering the most frequent misapplication of this guideline is by people who don't read it. They look at the page, see the KB, and apply that without any knowledge of how readable prose is calculated. I see this routinely (and we're still seeing it, even on this page). Removing KB is the way to go here, as it is not a good approximation of readable prose. SandyGeorgia (Talk) 12:55, 27 November 2023 (UTC)
- I would think that calculating readable prose is simply a matter of counting how many words there are in the prose of the article. I accomplish this myself by copying and pasting the text into something that counts words. Saying that a certain article kilobyte size of a prose article is generally equivalent to a certain amount of words should be complementary to the word count guideline. If it's being misquoted or confused, we should rewrite the guidelines or add clarification instead. Onetwothreeip (talk) 21:39, 26 November 2023 (UTC)
- Au contraire; few (even experienced) editors understand how to calculate readable prose, and quite often misquote the KB when referring to size. I support removal of the confusing and dated KB bits. SandyGeorgia (Talk) 21:09, 26 November 2023 (UTC)
- My impression is that few or no people have defended using kb size as a metric in the discussion so far. Granted it's not a huge number of participants. And it might be worth thinking about having a guideline on kb size as well, given technical issues ... but on a different page. Jo-Jo Eumerus (talk) 10:02, 28 November 2023 (UTC)
I support Tom B's edit. We really need to accept that Wikipedia is NOTPAPER. Traditional size limits are a thing of the past, with the exception of mobile users.
In fact, I support removal of any form of maximum total word or byte length for articles and instead support "ease of access limits" on section size. It should be possible to easily hop to a section and open it using a mobile device. AFAIK, unduly large sections might be problematic. (Maybe not, so let's discuss that.) Section size is more of a concern than total article size. Finding info by searching is not a problem, including trillions of bytes. OTOH, "opening" an unduly large section might be a problem for some users. -- Valjean (talk) (PING me) 17:19, 26 November 2023 (UTC)
Remove KB, keep readable prose word limits; we've covered this at length elsewhere on this page. SandyGeorgia (Talk) 13:06, 27 November 2023 (UTC)
- Counting the number of words has become much easier than it used to be. The Wikipedia:Prosesize script is available in Special:Preferences#mw-prefsection-gadgets. It adds a "Page size" item to end of the Tools section (sidebar or dropdown menu, depending on your skin). WhatamIdoing (talk) 18:44, 27 November 2023 (UTC)
- While these tools are great, they are unfortunately not accessible to the majority of Wikipedia editors, who are not going to know how to use them. That is why guidelines like these are important. Onetwothreeip (talk) 20:17, 27 November 2023 (UTC)
- @SandyGeorgia: Do you have any examples of the guideline being misapplied in such a way? Onetwothreeip (talk) 20:15, 27 November 2023 (UTC)
- So that I don't have to troll back through my contribs for an example, look no further than a spinoff from the discussion here, which uses overall size in KB, rather than readable prose in words, for articles that don't necessarily need to be split. SandyGeorgia (Talk) 22:55, 27 November 2023 (UTC)
References
- ^ a b "European Journal of Futures Research". SpringerOpen. May 20, 2013. Retrieved November 26, 2023.
- ^ a b "instructions". academic.oup.com. Retrieved November 26, 2023.
- ^ a b "Manuscript Submission Guidelines: AERA Open: Sage Journals". Sage Journals. January 1, 2023. Retrieved November 26, 2023.
- ^ a b "Early Modern Women: An Interdisciplinary Journal: Instructions for authors". Early Modern Women: An Interdisciplinary Journal. November 17, 2019. Retrieved November 26, 2023.
- ^ a b "Development and Change". OnlineLibrary.Wiley.com. Wiley. doi:10.1111/(issn)1467-7660. ISSN 0012-155X. Cite error: The named reference "Wiley p." was defined multiple times with different content (see the help page).
- ^ a b "Submissions". Global Labour Journal. February 3, 2022. Retrieved November 26, 2023.
- ^ a b "BGSU SSCI Journal Publishing Guide" (PDF). Retrieved November 26, 2023.
- ^ a b "Guide for authors". ScienceDirect.com by Elsevier. January 6, 2016. Retrieved November 26, 2023.
"Ease of access limits" on section size
I support "ease of access limits" on section size. It should be possible to easily hop to a section and open it using a mobile device. AFAIK, unduly large sections might be problematic. (Maybe not, so let's discuss that.) Section size is more of a concern than total article size. Finding info by searching is not a problem, including trillions of bytes. OTOH, "opening" an unduly large section might be a problem for some users. -- Valjean (talk) (PING me) 17:27, 26 November 2023 (UTC)
- I agree that section lengths are perhaps more strongly related to many of the reader issues described above than overall article length. Scanning 1000 words for the pertinent information is quite difficult, and search functions on mobile phone are not super intuitive (assuming that the information has easy keywords to search for). I would be surprised if my parents could find them.
- I would support adding a (better-phrased) consideration like this:
- There are less reader issues with long articles if they are well-structured, and section are not unduly long. Visa versa, articles with long sections may benefit more from trimming or splitting if restructuring is impossible. Of course, arguments around maintainability remain.
- —Femke 🐦 (talk) 17:39, 26 November 2023 (UTC)
- 70% now are mobile users thus section size matters because most will only scroll a few times[1] This also effects article size - as in how many sections - because most will not scroll 5 times to see a huge TOC to begin with. As we know many look at the TOC for navigation (should not be colapsed by default) [1] If the article is huge full of sections it may appear overwhelming to find basic information. Moxy-
17:40, 26 November 2023 (UTC)
- I do take Moxy's point that there is a limit to how many sections we can have without impeding navigation. I think this is sort of the difference between a 6000 or 10000 max to allow for the other considerations. —Femke 🐦 (talk) 17:46, 26 November 2023 (UTC)
- Splitting does have its place, but can also create accessibility problems, and even function as improper forking. Let's look at an example. An extreme and
OCDSPA type of behavior enforcement of this guideline (by one editor) using splitting has rendered it extremely!! difficult to find and access the information linked in this article Timelines related to Donald Trump and Russian interference in United States elections. (I created that article to somewhat ameliorate the situation, but it's still a problem for readers to find information because they are forced to search many lists.) A reader has to know where to find the relevant list. Before all this splitting, a reader could find it all in one list, but that was considered (by one editor) to be too long, citing this guideline. The result is effectively a way to hide the uncomfortable information that negatively impacts Trump's administration and its proven cooperation ("conspiracy" is unproven) with Russian election interference. Splitting with this effect violates the spirit of improper forking, even if that may not have been the motive. The result is the same. -- Valjean (talk) (PING me) 17:58, 26 November 2023 (UTC)- While I've also had my issues with that main editor of the timeline article(s), and I have long found that group of articles to be a mess, it's very concerning to see an editor suggest they have a mental illness, obsessive-compulsive disorder. Can we not do that? Onetwothreeip (talk) 21:33, 26 November 2023 (UTC)
- Splitting does have its place, but can also create accessibility problems, and even function as improper forking. Let's look at an example. An extreme and
- I do take Moxy's point that there is a limit to how many sections we can have without impeding navigation. I think this is sort of the difference between a 6000 or 10000 max to allow for the other considerations. —Femke 🐦 (talk) 17:46, 26 November 2023 (UTC)
- Sounds like a reasonable argument for the creation of a Wikipedia:Section size article (currently a redirect). This page would continue to be about article size, and editors can use whichever guideline they find more useful at the time. Onetwothreeip (talk) 21:41, 26 November 2023 (UTC)
- Well, if we rgoing to start thought-policing, be aware that a lot of people with OCD are uite offended by calling their condition a "mental illness" instead of a common neurodivergence. — SMcCandlish ☏ ¢ 😼 10:37, 27 November 2023 (UTC)
- Like who? The Wikipedia article on the subject refers to it as an illness, and not as a neurodivergence. The neurodiversity article doesn't mention obsessive-compulsive disorder either. Onetwothreeip (talk) 20:12, 27 November 2023 (UTC)
- Well, if we rgoing to start thought-policing, be aware that a lot of people with OCD are uite offended by calling their condition a "mental illness" instead of a common neurodivergence. — SMcCandlish ☏ ¢ 😼 10:37, 27 November 2023 (UTC)
- When replying to editors about sections within large articles I point to WP:DETAIL saying MOS:LEADLENGTH is a good guide. ,,,BUT yes a new page or section here would be good. ...Lead info would be good here to. Moxy-
22:43, 26 November 2023 (UTC)
- WP:MILMOS#SECTLEN:
There remains some disagreement regarding the precise point at which a section becomes too long, so editors are encouraged to use their own judgment on the matter.
Hawkeye7 (discuss) 01:47, 27 November 2023 (UTC)- This is a pointless guideline. I now have more questions after reading it. Moxy-
21:44, 27 November 2023 (UTC)
- This is a pointless guideline. I now have more questions after reading it. Moxy-
- WP:MILMOS#SECTLEN:
I've found references that have 1000-word as limits for sections of journal articles.[2] Would such references be useful for applying to sub-section i.e. level ===, is this what people are talking about? There are fewer references than for total article length. It would be simpler to have only a total article length guideline, rather than getting into non-lead section sizes too?
References
- ^ a b "Research:Which parts of an article do readers read". Meta. April 22, 2015. Retrieved November 26, 2023.
- ^ [1]
MilHist oped
Wikipedia:WikiProject Military history/News/November 2023/Op-ed. SandyGeorgia (Talk) 21:11, 26 November 2023 (UTC)
WP:SPLIT
I've raised a concern at Wikipedia talk:Splitting#Numbers and changes about the guideline there having numbers badly out-of-step with actual practice (as does the present text at WP:SIZE despite all the arguing above). It will eventually need to be normalized to whatever more solidly emerges from discussion here. Just saying we need to be mindful not to create a WP:POLICYFORK.
At any rate, the idea that an article of 100K is "too long" is clearly not tenable. Most of our country and major city articles are much larger, and they are not broken (plus have almost always already been split many times, to numerous extant side articles). — SMcCandlish ☏ ¢ 😼 09:06, 27 November 2023 (UTC)
- India, Canada, Minneapolis, Cleveland -- all Featured articles, all about 11,000 words of readable prose -- all reasonably within guideline more or less, no problem, although it's likely one could find a way to split a piece from any of those if someone insisted. If "most of our country and major city articles are much larger", they're probably a mess, as "most of our country and major city articles" are. It's unfortunate to have this discussion split to the talk page of an informational essay. SandyGeorgia (Talk) 13:20, 27 November 2023 (UTC)
- It would be useful if we could inform editors on the statistics for featured articles by giving the average word count, and giving the upper and lower amounts for something like an 80% range, i.e. the word count which 10% of featured articles are under and which 10% of them are over. This should be counted at the time when the article becomes a featured article. Onetwothreeip (talk) 20:20, 27 November 2023 (UTC)
- Not really, because a huge number of FAs have grown beyond the size they were promoted at, and now need to go to WP:FAR. It will take a script writer to go back and dig up the promoted version and calculate its prose size. Dr pda used to do that work, but I'm unaware of anyone else doing it since he left. SandyGeorgia (Talk) 22:51, 27 November 2023 (UTC)
- It would be useful if we could inform editors on the statistics for featured articles by giving the average word count, and giving the upper and lower amounts for something like an 80% range, i.e. the word count which 10% of featured articles are under and which 10% of them are over. This should be counted at the time when the article becomes a featured article. Onetwothreeip (talk) 20:20, 27 November 2023 (UTC)
Lead size
Could someone remind me where the 300-word lead size on Featured articles came from? [35] I've looked at "my own" (and I've seen much longer, eg climate change), and they're all around 400 (Tourette syndrome, Dementia with Lewy bodies, J. K. Rowling, Samuel Johnson). I suspect the 300 is heavily influenced by short articles like hurricanes and pop culture, and am concerned that stating the 300 as fact without a range or more qualifiers, we'll see it misused to imply adequate leads are too long. I suspect the range on lead size is dependent on topic, more technical articles have longer leads in order to adequately summarize the content. SandyGeorgia (Talk) 12:52, 27 November 2023 (UTC)
- An an example of how the average is skewed by highly represented FAs, look at the number of Featured articles in Wikipedia:Featured articles#Meteorology and climate, and glance at their lead sizes, and then contrast that to the number of articles in Wikipedia:Featured articles#Health and medicine, and look at their leads. The size of the lead is more appropriately governed at WP:LEAD, and more usefully measured as a percentage of the article size. SandyGeorgia (Talk) 13:11, 27 November 2023 (UTC)
- @SandyGeorgia, 300 was added in Jan 2023 [36] "Calculated from last month's TFAs in [37]". 300 was actually the average, the range appears to be usually 200-400 Tom B (talk) 14:01, 27 November 2023 (UTC)
- Ah, I see ... we should be very cautious about using one-month's TFA data or an average without better qualifiers. What's left to run TFA isn't highly representative, either, since we're running out of TFA material. I'm unsure how we can better word this, but I'm also not sure why we need to get into lead size on this page, when WP:LEAD is the page governing leads. If we do mention it here, it needs much better qualifiers than just a one-month TFA average; for example, there are no medical FAs left to run TFA, but plenty of hurricanes. SandyGeorgia (Talk) 14:24, 27 November 2023 (UTC)
- WP:LEAD had the 300 on it. i've amended it there to: "Most Featured articles have a lead length of about three paragraphs, containing 12 to 15 sentences, or 200–400 words". yes that page governs, but useful to pull across all the length guidelines to here? i.e. total, lead, non-lead. on the one-month average point, i've found very limited, simple evidence can often be enough, particularly when compared with no evidence! Tom B (talk) 17:01, 27 November 2023 (UTC)
- As MOS:Lead is the main location, let's discuss further at Wikipedia_talk:Manual_of_Style/Lead_section#FA_numbers —Femke 🐦 (talk) 17:37, 27 November 2023 (UTC)
- See also User:WhatamIdoing/Sandbox#WPMED FAs. Nobody who knows my editing will be surprised to see that my numbers aren't based on pop culture articles.
;-)
I do think that a range of 250–400 would be fine. 200 is not unheard of, but it is lower than normal. WhatamIdoing (talk) 18:49, 27 November 2023 (UTC)- Is there a way to get more content editors involved here. .... as in prolific FA article writers. Do we have stats to find people? Moxy-
21:40, 27 November 2023 (UTC)
- I doubt that any good FA writer spends a lot of time counting words (or sentences). If we want these kinds of numbers, I think it would be more effective to see what the end results are. WhatamIdoing (talk) 22:30, 27 November 2023 (UTC)
- Sure don't ... this fixation on word/sentence counts in FA leads is a bit troubling, as it overlooks the overarching points of WP:LEAD, which covers the territory quite well. But the way to hear from more FA writers is to post to WT:FAC. SandyGeorgia (Talk) 22:50, 27 November 2023 (UTC)
- I have six pieces of featured content, with an average lead of 345 words. These range from Ai-Khanoum with 484 to Boukephala and Nikaia with 199. I have just finished rewriting Genghis Khan, which I hope to take to FA in the near future; this has a lead of 587 words, as you would expect for a pivotal figure in world history. As a rule, I think more about how long a lead should be to properly summarize an article, rather than word counts. ~~ AirshipJungleman29 (talk) 23:06, 27 November 2023 (UTC)
- I have successfully nominated 63 articles at FAC in the past five years. I have no idea what my shortest or longest leads are, nor their mean, mode or median. And I have no interest in finding out. I can only quote Sandy "this fixation on word/sentence counts in FA leads is a bit troubling, as it overlooks the overarching points of WP:LEAD, which covers the territory quite well." I am also an FAC coordinator, and wearing that hat I care - if it is possible - even less about word counts: if a lead fits WP:LEAD, fine; if it doesn't, not fine.
- There seems to be an element of a solution in search of a problem about this discussion. Just what is it that is considered to be "broke"? Gog the Mild (talk) 23:26, 27 November 2023 (UTC)
- Bingo! -- Valjean (talk) (PING me) 23:50, 27 November 2023 (UTC)
- Gog, we've got two problems:
- The first is that we previously recommended a paragraphs:article length ratio, which is pretty silly. You could make an article "comply" or "violate" the advice by just adding or removing a line break. That's not a really way to improve a lead.
- The second is that not everyone is a good writer, and we want to give folks a basic handle on what a typical result is. The statement is not "Your lead should be n words long"; it reports only the fact that a lot of well-written leads end up being approximately this size. The idea is that if you don't really know what you're doing, you'll be able to figure out if yours is significantly different from typical. This isn't really aimed at the FAC process, but at articles like Donald Trump on the long side, whose lead is presently 7 paragraphs, 700 words, and about twice as long as the leads for all the other modern US presidents, and at the many articles with very brief, even single-sentence, leads.
- Our best writers don't need this. It's a crutch to help those who are just learning how to write. WhatamIdoing (talk) 00:45, 28 November 2023 (UTC)
- I'm still trying to figure out what Moxy's question was about, in a section started by a person who has read several thousand FACs and promoted a thousand or so FAs :) Maybe they were looking for WP:WBFAN, so they wouldn't have to take my word for it :) A "crutch" may be a start, but for those who may or may not understand the ranges and complexity and how to interpret a guideline, when guideline pages get too WP:CREEPy, they tend to be misused by those who take them literally, and we still can't base statements about what most FA leads look like based on a one-month sample of TFA. Some types of articles have longer leads than others. SandyGeorgia (Talk) 07:59, 28 November 2023 (UTC)
- Gog, we've got two problems:
- Bingo! -- Valjean (talk) (PING me) 23:50, 27 November 2023 (UTC)
- I have six pieces of featured content, with an average lead of 345 words. These range from Ai-Khanoum with 484 to Boukephala and Nikaia with 199. I have just finished rewriting Genghis Khan, which I hope to take to FA in the near future; this has a lead of 587 words, as you would expect for a pivotal figure in world history. As a rule, I think more about how long a lead should be to properly summarize an article, rather than word counts. ~~ AirshipJungleman29 (talk) 23:06, 27 November 2023 (UTC)
- Sure don't ... this fixation on word/sentence counts in FA leads is a bit troubling, as it overlooks the overarching points of WP:LEAD, which covers the territory quite well. But the way to hear from more FA writers is to post to WT:FAC. SandyGeorgia (Talk) 22:50, 27 November 2023 (UTC)
- I doubt that any good FA writer spends a lot of time counting words (or sentences). If we want these kinds of numbers, I think it would be more effective to see what the end results are. WhatamIdoing (talk) 22:30, 27 November 2023 (UTC)
- Is there a way to get more content editors involved here. .... as in prolific FA article writers. Do we have stats to find people? Moxy-
- WP:LEAD had the 300 on it. i've amended it there to: "Most Featured articles have a lead length of about three paragraphs, containing 12 to 15 sentences, or 200–400 words". yes that page governs, but useful to pull across all the length guidelines to here? i.e. total, lead, non-lead. on the one-month average point, i've found very limited, simple evidence can often be enough, particularly when compared with no evidence! Tom B (talk) 17:01, 27 November 2023 (UTC)
- Ah, I see ... we should be very cautious about using one-month's TFA data or an average without better qualifiers. What's left to run TFA isn't highly representative, either, since we're running out of TFA material. I'm unsure how we can better word this, but I'm also not sure why we need to get into lead size on this page, when WP:LEAD is the page governing leads. If we do mention it here, it needs much better qualifiers than just a one-month TFA average; for example, there are no medical FAs left to run TFA, but plenty of hurricanes. SandyGeorgia (Talk) 14:24, 27 November 2023 (UTC)
- My intuition is that a relative word count is likely a more relevant metric than an absolute word count (see e.g. my previous comments at Talk:Édith Piaf/GA1 and Talk:Bellona's Husband: A Romance/GA2). Because of this, as well as out of curiosity, I took a look at my three WP:Featured articles (specifically, I copied the leads to https://wordcounter.net/ and got the full word counts from https://xtools.wmcloud.org/articleinfo – I'm not sure, but I think that the lead is included in the WP:XTools word count but e.g. image captions and headings are not). Mars in fiction has a 414-word lead, and the entire article is 8,336 words (5.0%). Venus in fiction has a 271-word lead, and the entire article is 4,006 words (6.8%). Sun in fiction has a 448-word lead, and the entire article is 3,304 words (13.6%). I also took a look at my current WP:Featured article candidate: George Griffith, where the lead is 528 words and the entire article is 5,704 words (9.3%). So these articles seem to have leads with roughly 5–15% of the entire article's word count, which is a fairly large span. To me, this indicates that the figures themselves aren't really all that important (seeing as I also don't think that my relatively lengthier leads need to be shorter or the relatively longer ones need to be shorter). The qualitative aspects are more important than the quantitative ones here, as noted above. Word counts are occasionally useful to illustrate that a lead is way too long or way too short, but shouldn't be viewed as targets in themselves lest we fall victim to Goodhart's law. TompaDompa (talk) 00:13, 28 November 2023 (UTC)
I agree with TompaDompa: "The qualitative aspects are more important than the quantitative ones here, as noted above. Word counts are occasionally useful to illustrate that a lead is way too long or way too short, but shouldn't be viewed as targets in themselves lest we fall victim to Goodhart's law:
Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.
Applied to the current topic, it means that even the best leads will be damaged and not serve their purpose as well when revised after focusing on word count. No, we need to think differently about this and use other metrics, and they are summarized in the nutshell at WP:LEAD: "The lead should identify the topic and summarize the body of the article with appropriate weight."
Here is a section from my essay How to create and manage a good lead section: (not for list articles)
This rule of thumb will ensure the lead covers all significant subject matter in the article:
If a subject is worth a whole section, then it deserves short mention in the lead according to its real due weight.
That due weight should also include careful consideration of the real weight of sections that summarize child articles. Those sections have much more weight than their visible size. Their weight is equal to the weight of the child article(s).
If we do not follow that equation, then POV warriors can successfully hide negative material away from many readers' notice by spinning it off and leaving a small section which is then viewed as not worthy of mention in the lead. That must not happen. It should still be mentioned in the lead according to its real due weight.
There should not be anything in the lead that does not refer to specific content in the article and is not backed up by specific references found in the article. There should not be any unnecessary elaboration or detail in the lead. Elaboration should be reserved for the body of the article. Remember to awaken the reader's interest without satisfying their hunger.
A lead written this way will stand on its own, and someone who reads it will not later be surprised by anything they find in the article or what someone else tells them about the topic. They should know because they read our article. -- Valjean (talk) (PING me) 03:36, 28 November 2023 (UTC)
- Even "See also"? "In popular culture"? "List of publications"? Hawkeye7 (discuss) 04:03, 28 November 2023 (UTC)
- No. That should be clarified better than the "(not for list articles)" above. I'm speaking of actual thematic content. Do you have a suggestion? -- Valjean (talk) (PING me) 04:20, 28 November 2023 (UTC)
- If you want to test your advice against an edge case, consider Talk:Hallowe'en Party#RfC on mention of film adaptation in the lead. WhatamIdoing (talk) 05:03, 28 November 2023 (UTC)
- No. That should be clarified better than the "(not for list articles)" above. I'm speaking of actual thematic content. Do you have a suggestion? -- Valjean (talk) (PING me) 04:20, 28 November 2023 (UTC)
- The section->lead due weight assessment is a good one. I was advised that when I started editing and it has served well since then. That also provides a rough feel between article content and lead length, which is better than a 300 word limit. CMD (talk) 04:24, 28 November 2023 (UTC)
- Yes, a content-rich and complex section might deserve 1-3 whole sentences in the lead. An insignificant section might deserve two words. Each section deserves some form of mention in the lead.
- The general guide of 3-4 paragraphs is a rough minimum for normal-length, relatively uncomplicated and uncontroversial articles. If an article is very long because it is extremely notable, widely covered, very controversial, and is very significant using many different parameters, the body will have many sections and the lead will reflect that with a length of 5-7 paragraphs, and that would be the proper functional lead length the topic deserves. A short lead would not be able to serve the function properly. -- Valjean (talk) (PING me) 04:51, 28 November 2023 (UTC)
- I don't recall ever coming across a well-written lead with seven paragraphs. SandyGeorgia (Talk) 04:57, 28 November 2023 (UTC)
- I think I encountered one once in an article that might justify a long lead, but I'm not sure it was optimally written.
-- Valjean (talk) (PING me) 06:54, 28 November 2023 (UTC)
- Four paragraphs has traditionally been considered the maximum. I don't ever remember seeing an FA or GA with more than four paragraphs. Most of them don't even have four. WhatamIdoing (talk) 06:59, 28 November 2023 (UTC)
- There are more than a few FAs with five-paragraph leads; I've already given one sample in these discussions (climate change). And Introduction to viruses and Chagas disease and Subarachnoid hemorrhage in the med dep't. As a contrast, view India's four-paragraph lead which would be better written as five, but has been artificially constrained to meet this imaginary four limit. (India was noticed as needing a Featured article review three years ago.) Are leads getting too long? I think so, but think I think articles are too long, so of course leads are growing as well. SandyGeorgia (Talk) 07:05, 28 November 2023 (UTC)
- There might be situations where I'd choose a properly written lead over FA or GA status any day. I don't get this obsession with FA and GA. That should never get in the way of other legitimate objectives. Ideally, we should be able to do both, but if achieving FA or GA means dumbing down an article or failing to document the sum of all human knowledge on the subject, then forget about FA and GA and do what we are supposed to do. FA and GA are not the ultimate purpose of Wikipedia, and they should not be stumbling block. The ones evaluating for FA or GA status should loosen up their lead length criteria and allow longer leads when justified. -- Valjean (talk) (PING me) 07:11, 28 November 2023 (UTC)
- FA and GA reinforce the wider guidelines and MOS, rather than do anything specific by themselves. I don't think I've seen dumbing down as a consistent issue there, although there are some rewrites to deal with WP:TECHNICAL. Documenting the sum of human knowledge is something FAC often assists with, finding holes in coverage for nominated articles. CMD (talk) 07:18, 28 November 2023 (UTC)
- Well, that's good. Maybe it's just the "lead" issue that needs fixing. -- Valjean (talk) (PING me) 07:22, 28 November 2023 (UTC)
- What needs fixing? Guidelines are guidelines, they are interpreted as such at FAC, and there's not necessarily anything wrong with a five-paragraph lead. India has needed a WP:FAR for three years, so it's not representative. SandyGeorgia (Talk) 07:31, 28 November 2023 (UTC)
- Exactly. Guidelines are rubbery, and that's on purpose. We should never force articles to stay at the middle of the bell curve. Some articles should be outliers without being penalized for it. -- Valjean (talk) (PING me) 07:35, 28 November 2023 (UTC)
- I had thought that generally understood and agreed. Gog the Mild (talk) 08:45, 28 November 2023 (UTC)
- I think the problem (some of us/me) are having with the text being introduced here is that, while FAC/FAR Coords are experienced writers who know how to interpret a guideline (that is, what's on this page isn't going to change anything at FAC/FAR), we know hard data in guideline pages can be misinterpreted by editors who aren't exposed to work at the FA level, and those are the very editors who frequently misinterpret pages like these ... so while we're basing numbers on FAs, we should be explaining in better detail to those not accustomed to working at that level either how to use these numbers, or we should avoid the creep entirely. We will see editors say, "lead exceeds 400 words, too long", so we should anticipate that our shorthand might not serve those we intend to reach with this page. SandyGeorgia (Talk) 16:10, 28 November 2023 (UTC)
- So "most are 250–400 words long, but some are longer or shorter"? It irritates me to be reminded that there are people who don't understand that "most" means "not all", but it's a fact that some people are completely innumerate. WhatamIdoing (talk) 17:06, 28 November 2023 (UTC)
- No ... a better qualifier ... something like "varies by content area" or some such ... anything else that can be added to remind not to apply the numbers as absolutes. SandyGeorgia (Talk) 17:08, 28 November 2023 (UTC)
- We have no actual evidence that it varies by content area, aside from the fact that clicking semi-randomly on FAs about hurricanes this morning produced a sample set of five articles, 100% of which had two paragraphs in the lead. WhatamIdoing (talk) 17:11, 28 November 2023 (UTC)
- Well, then, we have no actual evidence of anything but random samples which could be outliers, so we should be REALLY careful about adding this content at all. SandyGeorgia (Talk) 17:24, 28 November 2023 (UTC)
- Further, I'm not sure what the investment in getting someone to produce the longest article of each year is going to result in, other than a) articles which have since been defeatured, and b) outliers (all the Dynasty articles, and some others since defeatured). We're asking someone at WP:VPT to take a lot of time to produce something I can already tell you, and which will be somewhat meaningless because it by definition looks at outliers. And it appears that we now have very short articles coming through FAC, which is also perhaps an oddity, perhaps temporary. SandyGeorgia (Talk) 17:28, 28 November 2023 (UTC)
- We have no actual evidence that it varies by content area, aside from the fact that clicking semi-randomly on FAs about hurricanes this morning produced a sample set of five articles, 100% of which had two paragraphs in the lead. WhatamIdoing (talk) 17:11, 28 November 2023 (UTC)
- No ... a better qualifier ... something like "varies by content area" or some such ... anything else that can be added to remind not to apply the numbers as absolutes. SandyGeorgia (Talk) 17:08, 28 November 2023 (UTC)
- So "most are 250–400 words long, but some are longer or shorter"? It irritates me to be reminded that there are people who don't understand that "most" means "not all", but it's a fact that some people are completely innumerate. WhatamIdoing (talk) 17:06, 28 November 2023 (UTC)
- I think the problem (some of us/me) are having with the text being introduced here is that, while FAC/FAR Coords are experienced writers who know how to interpret a guideline (that is, what's on this page isn't going to change anything at FAC/FAR), we know hard data in guideline pages can be misinterpreted by editors who aren't exposed to work at the FA level, and those are the very editors who frequently misinterpret pages like these ... so while we're basing numbers on FAs, we should be explaining in better detail to those not accustomed to working at that level either how to use these numbers, or we should avoid the creep entirely. We will see editors say, "lead exceeds 400 words, too long", so we should anticipate that our shorthand might not serve those we intend to reach with this page. SandyGeorgia (Talk) 16:10, 28 November 2023 (UTC)
- I had thought that generally understood and agreed. Gog the Mild (talk) 08:45, 28 November 2023 (UTC)
- Exactly. Guidelines are rubbery, and that's on purpose. We should never force articles to stay at the middle of the bell curve. Some articles should be outliers without being penalized for it. -- Valjean (talk) (PING me) 07:35, 28 November 2023 (UTC)
- What needs fixing? Guidelines are guidelines, they are interpreted as such at FAC, and there's not necessarily anything wrong with a five-paragraph lead. India has needed a WP:FAR for three years, so it's not representative. SandyGeorgia (Talk) 07:31, 28 November 2023 (UTC)
- Well, that's good. Maybe it's just the "lead" issue that needs fixing. -- Valjean (talk) (PING me) 07:22, 28 November 2023 (UTC)
- FA and GA reinforce the wider guidelines and MOS, rather than do anything specific by themselves. I don't think I've seen dumbing down as a consistent issue there, although there are some rewrites to deal with WP:TECHNICAL. Documenting the sum of human knowledge is something FAC often assists with, finding holes in coverage for nominated articles. CMD (talk) 07:18, 28 November 2023 (UTC)
- There might be situations where I'd choose a properly written lead over FA or GA status any day. I don't get this obsession with FA and GA. That should never get in the way of other legitimate objectives. Ideally, we should be able to do both, but if achieving FA or GA means dumbing down an article or failing to document the sum of all human knowledge on the subject, then forget about FA and GA and do what we are supposed to do. FA and GA are not the ultimate purpose of Wikipedia, and they should not be stumbling block. The ones evaluating for FA or GA status should loosen up their lead length criteria and allow longer leads when justified. -- Valjean (talk) (PING me) 07:11, 28 November 2023 (UTC)
- There are more than a few FAs with five-paragraph leads; I've already given one sample in these discussions (climate change). And Introduction to viruses and Chagas disease and Subarachnoid hemorrhage in the med dep't. As a contrast, view India's four-paragraph lead which would be better written as five, but has been artificially constrained to meet this imaginary four limit. (India was noticed as needing a Featured article review three years ago.) Are leads getting too long? I think so, but think I think articles are too long, so of course leads are growing as well. SandyGeorgia (Talk) 07:05, 28 November 2023 (UTC)
- Four paragraphs has traditionally been considered the maximum. I don't ever remember seeing an FA or GA with more than four paragraphs. Most of them don't even have four. WhatamIdoing (talk) 06:59, 28 November 2023 (UTC)
- I think I encountered one once in an article that might justify a long lead, but I'm not sure it was optimally written.
- I don't recall ever coming across a well-written lead with seven paragraphs. SandyGeorgia (Talk) 04:57, 28 November 2023 (UTC)