→Very minimal demo: Nice job |
|||
Line 120: | Line 120: | ||
*http://toolserver.org/~cbm//cgi-bin/wp10.2g/alpha/cgi-bin/table.pl - ratings table for a project |
*http://toolserver.org/~cbm//cgi-bin/wp10.2g/alpha/cgi-bin/table.pl - ratings table for a project |
||
These are very ugly and incomplete, and are just intended to spark discussion. — Carl <small>([[User:CBM|CBM]] · [[User talk:CBM|talk]])</small> 02:35, 7 August 2008 (UTC) |
These are very ugly and incomplete, and are just intended to spark discussion. — Carl <small>([[User:CBM|CBM]] · [[User talk:CBM|talk]])</small> 02:35, 7 August 2008 (UTC) |
||
:Hey, these look great! I've paid money for software that didn't work as well! This is really good, thanks for putting these together. [[User:Walkerma|Walkerma]] ([[User talk:Walkerma|talk]]) 02:58, 7 August 2008 (UTC) |
Revision as of 02:58, 7 August 2008
Question
Hey CBM, where are you going to get three-four folks to maintain the new bot? :) Oleg Alexandrov (talk) 14:52, 13 July 2008 (UTC)
- I'm not sure. I don't have any candidates in mind, so at this point it may be just wishful thinking. Especially since the people with experience are usually busy enough already. Time will tell if anyone steps forward over the next couple months.
- I think breaking the code into pieces may help. A new person could just write one component instead of working with the entire codebase. I'm thinking in particular about CGI scripts to run queries against the ratings data - a lot of people have the ability to write that sort of thing. — Carl (CBM · talk) 01:14, 14 July 2008 (UTC)
- I'll be happy to help with maintenance, of course, subject to available time and as long as the logic of the code is kept reasonably separate from the database queries, which I guess will be many. I worked with PHP before a bit. Wish you luck, it will be a lot of work I think. Oleg Alexandrov (talk) 03:49, 14 July 2008 (UTC)
- I could probably do a little bit of PHP, if the code does not have the complexity of MediaWiki. My programming background is essentially QBasic (no kidding), C++ and Java, so I can probably play around with the code a bit. Titoxd(?!? - cool stuff) 04:09, 14 July 2008 (UTC)
- We will most definitely need somebody with knowledge of SQL if we want to add a database to the backend (which appears to be the case) and also a place to host the database. Would the toolserver accept it? Titoxd(?!? - cool stuff) 00:14, 16 July 2008 (UTC)
- I know enough SQL and database stuff to get by; I've already been implementing some things to see what issues come up. Also, I have a toolserver account, which comes with personal database privileges. The toolserver has a "stable server" that we could probably use as well. — Carl (CBM · talk) 01:37, 16 July 2008 (UTC)
Feature requests
Titoxd started a list of feature requests, which is a good idea. The only one I don't understand is #2 means - "Wikiproject preferences".
Request #1 will require some discussion, but it's one of the goals for the second generation code. #3, #4, and #5 are also goals to be implemented.
I think that #3 and #6 are both about the same issue, which is that the current bot code doesn't generate updated statistics until the end of the run, and that one long project can block other projects from running. My idea for the new bot is that the script to generate overall statistics would be unrelated to the script that gathers data; the statistics generated would just be the statistics present in the database at that time. That would mean that we could update the "small" projects more often than the "large" projects.
The issue of task forces and subprojects will need to be discussed mode widely. It seems like a good idea, but there are some technical issues that need to be talked through. — Carl (CBM · talk) 13:56, 14 July 2008 (UTC)
- #2 is essentially #3 in #Motivation for an update, and I'll clarify it. Also, #3 and #6 are indeed about having smaller / new projects updated more frequently, while larger projects being split off to using a separate instance of the bot, maybe using the same frequency as is being used now. Titoxd(?!? - cool stuff) 18:55, 14 July 2008 (UTC)
Category intersection
I believe that a rigorous category naming convention and a category scheme should be set up right now. We should make room for a new input (-Type), as it is possible that it would be implemented later:
I suggest a categorization scheme that follows this pattern, with this exact capitalization. For a fictitious WikiProject Whatever:
First level:
- [[Category:Whatever page of INPUT-Class]]
- [[Category:Whatever page of INPUT-Importance]]
And if -Type is accepted:
- [[Category:Whatever page of INPUT-Type]]
Second level:
- [[Category:Whatever page of INPUT-Class of INPUT-Importance]]
And if -Type is accepted:
- [[Category:Whatever page of INPUT-Type of INPUT-Class]]
- [[Category:Whatever page of INPUT-Type of INPUT-Importance]]
Third Level (if type is accepted):
- [[Category:Whatever page of INPUT-Type of INPUT-Class of INPUT-Importance]]
Inputs:
- INPUT (-Type)= Article, List, Portal, WikiProject, Disambiguation, Redirect, Image, Category, Template, Needed, NA, Unspecified
- INPUT (-Class, For now) = FA, FL, GA, A, B, C, Start, Stub, List, Template, Category, Image, Needed, Disambig, NA, Unassessed
- INPUT (-Class, Future?) = Featured, Good, A, B, C, Start, Stub, NA, Unassessed
- INPUT = Top, High, Mid, Low, NA, Unknown
Headbomb {ταλκ – WP Physics: PotW} 06:32, 15 July 2008 (UTC)
- I think that changing the categorization system is beyond the scope of this page. If the system is changed, the bot will change to work with it. But the categories should be set up by the people who contribute to the WP 1.0 group, rather than by the bot operators.
- Apart from the Type addition, the main change that I think you are proposing is to add "second-level" category/importance categories. That is one issue that definitely needs to be resolved in the next six months. — Carl (CBM · talk) 13:31, 15 July 2008 (UTC)
- Aside from that, none of those changes are actually needed to make the feature you suggest work. If the bot can pick up X-Class, Y-Importance and maybe Z-Type information independently, it can do the back-end work of putting the three together in an "Article" object. The results of the category intersection can be displayed by linking to a CGI script on the backend (toolserver?) , instead of on the wiki, as creating and updating those pages on the wiki would cause a lot of unnecessary writes to the en.wikipedia database. Titoxd(?!? - cool stuff) 19:17, 15 July 2008 (UTC)
- Actually, thinking about this, I flat-out can't support implementing this on the wiki front end. WP:1.0/I says that right now, there are 1376 participating projects in bot assessments; assuming that the projects use standard 8-point quality scales (FA, A, GA, B, C, Start, Stub, Unassessed) and 5-point importance scales (Top, High, Mid, Low, Unassessed), the bot would need to create an additional 55,040 pages. Since the bot is coded to stop for five seconds after page writes, it would mean that for a single bot run, 275,200 seconds of idle time would be added to the bot's processing time. This is 3.18 days of just waiting, without counting the processing time to generate those pages; a conservative estimate of the added processing time would be about half a day for the whole index. So, we're talking about almost doubling the time a bot run would take, making bot runs occur only once a week (or twice a month if a run fails). I'd think it would be much more efficient to just make the 1.0 bot create links to a CGI script in the toolserver that returns the processed request from the bot's assessment database. Titoxd(?!? - cool stuff) 07:15, 23 July 2008 (UTC)
Well low-use sections wouldn't need to be processed as often. Template, Disambig, Categories, Image, etc... would not have a lot of change made to them. In reality, only Article and List type would need to be considered for regular (see more than weekly). WP 1.0 really doesn't need to keep track of "A-Class" and "B-Class" disambiguation pages every day, or for "high" and "low" importance categories etc... Run the basic bot for regular updates (Full parameters for Articles, and Lists, tally up templates, categories, disambiguation, images, Projects, Portals, etc... (about ), and run the full thing (Full parameters on Articles, Lists, WikiProjects, Images, Portals, Categories, Templates... and tally up the NA, Disambig, Needs types).
Other things could be considered, such as placing individual type, class, and importance categories, then let WP 1.0 build a database and do the intersections rather than query wikipedia for each subcategory.Headbomb {ταλκ – WP Physics: PotW} 07:53, 23 July 2008 (UTC)
- Actually, I didn't consider any of those intersections in my calculations above. Adding those, even though not used on all projects, would probably add another day or two to the bot run, which is what I'm trying to avoid. And having the individual type/class/importance categories is actually what I'm suggesting, with the only thing being that instead of accessing a page on Wikipedia such as Wikipedia:Version 1.0 Editorial Team/Tropical meteorology A-Class Top-Importance articles, we link to tools:~titoxd/wp10cat.php?project=Tropical meteorology&class=A&importance=Top (dead link) or something similar. Titoxd(?!? - cool stuff) 08:36, 23 July 2008 (UTC)
- There are three types of output that the current bot generates: tables, logs, and lists. My idea for the second generation is that the tables would still be uploaded to the wiki, but the lists would be generated on demand by a web-based program. That would reduce the huge number of edits that have to be made to keep the lists up to date, which would in turn allow the bot to update the tables more often. Once the lists are dynamically generated, category intersections can be made with very little extra work. — Carl (CBM · talk) 14:19, 23 July 2008 (UTC)
- Indeed, and the logs could be generated dynamically as well, showing a page's prior history. I'd be interesting in helping code that. Titoxd(?!? - cool stuff) 21:24, 23 July 2008 (UTC)
- There are three types of output that the current bot generates: tables, logs, and lists. My idea for the second generation is that the tables would still be uploaded to the wiki, but the lists would be generated on demand by a web-based program. That would reduce the huge number of edits that have to be made to keep the lists up to date, which would in turn allow the bot to update the tables more often. Once the lists are dynamically generated, category intersections can be made with very little extra work. — Carl (CBM · talk) 14:19, 23 July 2008 (UTC)
WP News
What I mean by this is that it would be really nice for WikiProjects to have news about the status of their articles. I don't know if this is the right place to suggest this, but it's at least something to think about. What I mean by this is the ability for Wikiprojects to get bot-delivered news about which articles under their wing is up for Peer Review, is nominated for deletion, is nominated for FA, passed FA, failed FA, got demoted from FA, ... It's unrelated to WP 1.0 bots tasks per say, but perhaps WP 1.0 bot could build the database in a way to help a "WikiProject News Bot (WPNBot?)"? Or perhaps WPNBot wouldn't need this database at all, and that my request is unnecessary.Headbomb {ταλκ – WP Physics: PotW} 06:03, 15 July 2008 (UTC)
- It would certainly be possible to permit other bots to download a list of all articles from a particular project. Once that feature is available, it would be easy for someone to make a bot that watches peer review, FA, etc. and notifies wikiprojects when their pages are active. It would probably need to be an opt-out or opt-in system. — Carl (CBM · talk) 13:18, 15 July 2008 (UTC)
- Yeah it probably would. The bot might not even need this list, but it probably wouldn't hurt to structure it in a bot-accessible way. Headbomb {ταλκ – WP Physics: PotW} 13:26, 15 July 2008 (UTC)
Current tasks
A list of current tasks done by the bot would be useful. Titoxd(?!? - cool stuff) 00:13, 16 July 2008 (UTC)
Feature requests 10 & 11
I added a couple of feature requests. I suspect that neither of them will make it into the second generation bot, but I thought it was worth mentioning them. Maybe for the third generation?
- 10 may be better done by the SelectionBot, but I list it here in case. It's not critical; also, we may decide that we don't want to promote petty competition, "my article is more important than yours". I'm only proposing it be done for articles with a WP1.0 template on the talk page; currently importance is (in effect) not used in the 1.0 template. The purpose: to show why certain articles have been chosen. I thought it would be a nice idea to put on the table, at least.
- 11 would allow projects to tag specific versions of articles for the release. This could be very valuable to the 1.0 project itself, but it would require projects to be more actively involved in designing releases. This is unlikely for a general release like Version 0.7, but is VERY likely when we start having the WikiProjects designing their own mini-releases or WikiReaders, such as "Atlantic Hurricanes since 1950." I'm guessing that the code for this could be nasty, though!
Thanks, Walkerma (talk) 02:04, 7 August 2008 (UTC)
- Item #10 is very hard for selectionbot to do, because it would mean editing the individual talk pages of articles. It's also hard to program into the template code because it requires access to data like interwiki count and hitcount that isn't available to wiki code. The best solution might be to add these scores to the output lists that the bot generates. — Carl (CBM · talk) 02:47, 7 August 2008 (UTC)
Recording GA or FA
Another thing I'd like to discuss is something that regularly comes up in 1.0 discussions. Could we use the bot to record if an article is a GA or an FA, using the GA or FA talk page template (or the article milestones template)? Perhaps the bot could add it into the log and the tables? The bot already does something similar, for recording an article's inclusion in Version 0.5.
Many (including myself) would like to see GA and FA removed from the assessment scale, because they are not WikiProject-based. Why? A huge number of people misunderstand the current system - some start tagging articles as GA-Class when they aren't GAs, and there is a CONTINUAL and POINTLESS discussion about "Shouldn't A be higher than GA?" or "What's the point of A-Class?" and related things. (To me, this is like asking, shouldn't cats be higher than dogs? They are different animals, you can't reduce them to a hierarchy.) The idea would be that projects would simply assess Stub-Start-C-B-A, and let the bot tag things as GA or FA.
We don't NEED to do this right away; but if I'm to moderate a serious discussion on this topic, I need to know if it CAN be done fairly easily. Currently I believe opinion is fairly evenly split, but most of the opposition centers around the (very reasonable) idea that projects want to keep track of GAs and FAs too. If we knew unequivocally that the bot could do this, it might finally resolve this thorny issue. Only if there was a clear vote in favor would we need the code to be written.
Noting one of the second generation proposals, that WikiProjects be able to have the bot more tailored to their specific needs, it may be that this could be one such feature that could be turned on or off on a per-project basis. Walkerma (talk) 02:15, 7 August 2008 (UTC)
- This can certainly be achieved if there is desire for it. Currently, there is a consistency problem where an article can be rated GA by projects without actually being a GA, or (less likely) could be a GA but not rated GA by projects. This means that the WP 1.0 bot's count of good articles often differs from the GA project's count of good articles. Using the "real" GA/FA categories to get the lists of good/featured articles would remove that problem. — Carl (CBM · talk) 02:40, 7 August 2008 (UTC)
WIkiProject preferences aka custom ratings by project
One commonly requested feature is the ability for a project to add its own ratings to the bot's tables. After thinking about it, and trying a different approach, I think the best way to do this is to add a template to the project's category (like Category:Foo articles by quality) that has the information in it. The template would look something like this:
{{WP10params | homepage=Wikipedia:WikiProject Foo | extra1-name=Bplus | extra1-type=quality | extra1-category=Bplus mathematics articles | extra1-ranking=400 }}
That would tell the bot where the project's home page is, and also tell the bot to add the quality rating "Bplus" to the projects table. The "ranking" value is needed to sort the table correctly. — Carl (CBM · talk) 02:33, 7 August 2008 (UTC)
Very minimal demo
I have put a very minimal demo online.
- http://toolserver.org/~cbm//cgi-bin/wp10.2g/alpha/cgi-bin/list.pl - list articles from a project
- http://toolserver.org/~cbm//cgi-bin/wp10.2g/alpha/cgi-bin/list2.pl - intersection of two projects
- http://toolserver.org/~cbm//cgi-bin/wp10.2g/alpha/cgi-bin/table.pl - ratings table for a project
These are very ugly and incomplete, and are just intended to spark discussion. — Carl (CBM · talk) 02:35, 7 August 2008 (UTC)