Stack Overflow has been wildly successful. And maybe in some ways too successful.
I am concerned that Stack Overflow is being inundated by a stream of low-quality questions from users who are accidentally poisoning our well -- by turning off and turning away the core answerers who do all the real work in the system.
In theory there is "no such thing as a stupid question" but in practice, there are:
I mean a pattern of the above. Not an isolated incident, but 5-10 questions (or dozens or hundreds!) all showing the exact same negative characteristics over a period of days or weeks.
Now, a few of these questions is no problem for our community -- that's why we have voting, reputation, question closing, community moderators, flagging, etc. I am happy to intervene if there is a pattern of negligent, irresponsible, failure-to-learn-anything-at-all questions from a particular user. It's easily the #1 reason I mete out timed suspensions at this point.
All of these systems work, and have worked to date! That's the good news. That's why we have a community worth participating in, and a community worth visiting.
But.
I'm starting to see cracks in Stack Overflow as its popularity grows. At some point you have to face up to the hard reality: there are an infinite number of bad questions that can be asked in willful ignorance.
However smart our software, however smart our users -- we can't scale enough to defeat a million monkeys randomly typing. Not possible.
I worry that we're not doing enough to automatically filter out obviously bad / malicious / inept questions from the system, before the burden of having to deal with these questions lands on our talented audience of answerers.
It's an explicit goal to make moderation easy and effortless. I can't in good conscience say we're doing that, if users have to face down a neverending flood of truly horrible, careless questions.. and hope for an occasional gem to float along.
What can we do -- what do you suggest -- to detect and prevent these kinds of bottom-of-the-barrel questions from even entering our system in the first place? I am willing to sacrifice a small percentage of new questions (up to 10%) as collateral damage if necessary.
(hint as a starting point: think new user / IP address restrictions, around question asking, perhaps based on history..)
I think there are several things you can do, and they should all be taken as parts of a whole: They should be implemented together for full effect.
Extend "Vote to Close" voting limits for duplicates. In other words, don't rate limit voting to close duplicates.
Effect: This will solve a common problem that these types of questions have, They've been asked before [1]. It will also allow the question to be closed, merged, and then editors can clean up the question (and we generally do).
Create a flag reason called "Flag for editing" [2]. Add this to the existing flag reasons. I can't be everywhere; but I can guarantee if I saw posts that were flagged for editing I'd give those my attention (they should accumulate like other flags so that I can tell which ones need it most).
Editors like editing questions. I happen to love trying to find clearer ways to help the asker ask his question. It's a fun and interesting challenge for me.
Give out a badge for users that edit questions that were at negative score and then received 5 upvotes after their edit. This is only for the editor that changed the most percentage from the original post, and only the first editor to do so before it accumulated the votes (or items that make it work better).
If a user has an excessive (3?) posts flagged 6 or more times as his post needing editing, he's rate limited in the next question he can ask. That's the built in time-delay system.
My train of thought on the matter goes something like this:
Users who consistently ask bad questions are not just bad communicators. They also don't care. If they cared, they'd be in there at least adding information and thus brining it to the attention of someone who can edit it, and the quality would improve over time both through edits and their own practice of at least trying to get an answer.
Users who don't care deserve to get banned. Most won't even care (it's kinda their thing). The exception is those who'll try to make a stink about it, but we can deal with them. I think the one real requirement for participation is that you care about your content.
Users who don't care about their questions also won't care about creating new accounts. This means pretty much anything we do will require going quickly for the IP-level restriction. Otherwise, we've just made the problem harder to track and sent the users a little further underground.
Unfortunately, IP-level restrictions for this behavior seem ... dangerous in terms of friendly fire. Even if we don't catch any active users this way, we'll certainly catch some new users (which is probably worse, as an active user can complain and get re-instated, and they tend to provide more answers than questions anyway).
I think keeping the site "low friction" is an important key to it's continued success over time. Some day Jon Skeet and other high-level contributors will all decide to move on (hopefully not on the same day). We need to be careful not to turn away tomorrow's top user today. I think anything that adds friction to the asking process is dangerous. A user with 1 rep should always be able to post their first question, even if they share an IP with an imbecile.
This all means that as tempting as it is, I don't think restricting or limiting low rep accounts from asking new questions is a good idea. I come at the problem from the other direction — increase our ability to moderate these questions. Don't reduce the rep required to gain access to abilities, but do increase the power of those abilities or reduce the number of people required to participate. With that in mind, here are some suggestions:
Based on the feedback from this post, we have now implemented a form of screening during the question ask period.
Questions from IP addresses or accounts with a history of extremely poor questions will no longer be accepted. This is intended to weed out the worst quality questions.
(hint hint, question votes matter, so please continue to vote the best questions up and vote the worst questions down.)
Based on our queries and a random audit sample of affected accounts / IP addresses, it seems effective, but we'll have to see now that it's deployed and perhaps tweak further.
Note, the /ask
page error text is of the form:
Additionally, see How does Stack Overflow attempt to prevent low-quality questions and answers? [1] for several other measures we now take to assist in keeping quality high.
[1] http://meta.stackoverflow.com/questions/84668/how-does-stack-overflow-attempt-to-prevent-low-quality-questions-and-answersI've suggested this before, but I'll suggest it again - when a question is closed, even temporarily, the people who voted to close it get their close votes back. This would allow repeated bad questions from the same user, which is a common phenomenon to be closed without draining the available close-vote pool.
I'd also like to see more stress put on the notion that closing is a good thing to do - currently I think many people see it as something only done by evil bastards like me. This could be achieved by use of badges, blog/meta postings encouraging it, and more involvement in the process by those who should know better.
A question delayer.
Start by giving new users the benefit of the doubt: a first-offense bad question is permissible. Someone will fix the question and if the user just missed something he'll see the edits and hopefully understand what he did wrong. Smart users are smart, even if they look dumb when they show up.
Subsequently, the user will ask more questions. Using some metric based on what already exists (votes, accept rate, number of closes/deletes/flags, never answers/votes), determine if the user sucks and if so how much. The metric is open to elaboration by the commenters of this answer and will need to be studied by examining the data to see if this idea correlates with bad users.
Having definitive information that the user sucks, delay his question from appearing in the list of questions. Maybe just an hour, maybe a day, maybe a week depending on how bad the user is. Presumably the user is lazy and wants an answer quickly. Stack Overflow is good at getting answers quickly. Circumvent this for users taking advantage of it.
Obviously, if the user had more time he might take that time to edit his question to be better. Or he might take the time to do research on his own while he waits for the question to appear.
Look, the point here is that Stack Overflow answers questions fast and this really fosters help vampirism [1]. Adding a delay from the time the user enters the question that scales with the badness metric of that user and making it obvious to the user why this is happening will either drive the user away (no loss) or make him realize that he needs to put more effort into his questions (no loss).
Either way it's a win for the community.
So how's that sound?
[1] http://slash7.com/2006/12/22/vampires/How about a "StackOverflow induction"?
The induction would basically be a 5-minute test of the user's understanding of basic SO concepts and skills such as:
I imagine it would be interactive, and styled to feel like SO. For instance the user could be given a list of titles, a list of question texts, and a list of tags, and would be required to pick the best title, best question body, and best tags, and enter them as if they were asking a real SO question. There would be some source code in the body and the user would need to apply appropriate code formatting to it. There could also be some test of his ability to comment/vote/select an answer.
Now, users who have, let's say less than 250 rep can be "recommended for induction" by other SO users. (A nice place to put the "recommend for induction" button/checkbox would be on the Edit page. Or perhaps if it's noticed that an edit substantially changes a question - particularly its title, tags and formatting - the editor has the "recommend for induction" option.)
After getting such a recommendation, the user receives a prominent message suggesting they take the induction. If a user gets say 3 such recommendations, he will be prevented from asking any more questions until he completes the induction.
The worst offenders will probably be too lazy to do this and good riddance to them. The others will at least know that code formatting exists...
You requested stats sir, well here you go:
I think there are 2 patterns that need addressing:
Some things that we could possibly do (still needs a bit thought):
Reduce the visibility of newbie questions until they get an upvote. Perhaps add a new section for "newbie questions", once they get N votes allow them into the normal sort orders.
Tax people who ask tons of crap questions, (−N rep for a new question beyond a certain point, if and only if on average your questions are crap)
Require a rep threshold for asking questions beyond the first N questions.
Look at better ways to give high rep users better visibility of these questions as they happen. (perhaps some pages for high rep users to browse through new user questions or something along that line)
Get rid of crap questions, close and delete any questions that provide no value.
Provide an incentive to repair old poorly phrased questions and posts (either badges or rep)
Longer term
This seems to me too obvious to be plausible, but I'll post it anyhow
if rep_of_asker < 100 and rep_of_closer > 10000:
treat_closer_as_diamond_moderator() # one vote is enough, no limit
This would allow a large group of people to note and quarantine really bad questions. And also to reopen them for business if someone edits them into some useful shape.
I've noticed this a lot, and it's certainly impacting my willingness to provide answers on SO.
A lot of the bad questions aren't duplicates - they are simply bad questions: Badly written, badly formed, with insufficient information. "What wrong with my codes?" followed by 200 lines of atrocious C++ that's inevitably unrelated to the real problem...
I think questions seem to usually come from 'fresh' users (<10 rep, joined within last 48 hrs), often with a poor grasp of spelling and/or grammar, let alone programming.
I've got two suggestions:-
a: "Cooling off" period: No anonymous questions, and you can only ask a question ?48 hours AFTER registering for the site or when your rep reaches ?50. Users with 1K rep could 'recommend' 1 new user per week - with the new user automatically getting the required 50 rep.
2: "Captcha". Captcha is designed to differentiate computers from people - I'm thinking of a kind of captcha that can distinguish "professional and enthusiast programmers" from, well, everone else. You'd get the "captcha" before your question was accepted by the site if your rep was lower than say, 50. The captcha question would be a simple multiple choice selected possibly according to the tags specified on your question. Failing the captcha locks your out from asking for 24 hours. Lets say you ask a question tagged C++. You could get asked: "How many states does a bool have? 0/1/2/3/4/5"
Dumb example, but you get the idea...
I think you do need help from the Community on this. I think you need some UI to gather feedback from appropriate high-rep users about these questions. A "flag as bad question" is a minimum feature.
Similarly, I think you need a "flag as not an answer", since I'm seeing a whole lot of those. I currently flag for moderator attention and say "not an answer", but I think you might want a separate category, if only so that you can track how many of these are happening, and from which users.
And I think you may need more than this. I think you also aren't getting certain things across to some users. I'm seeing a lot of users who don't understand tags, for instance. Also, a large number who don't bother learning how to format code. I actually saw two cases of [code][/code] tags in the last week. Similarly, there are a large number who create some really crap titles. My favorite of the day is " Data not returned [1]".
I know that one of the goals of the site was to have Google as our front page. That's great for people who are searching for answers - it should be fast and easy to do so. However, I think we need to slow down some of the new users who come here and just dump their crap questions on us (pun intended).
If you had a valid email address from these users, I'd suggest you send them a one-page "read this or else" document, but without that I'm not quite sure what to do. Maybe accept their question, but before actually posting it to the site, display a page saying something like, "Are you sure this isn't crap?", and giving a little checklist:
New users are laying waste to the collective pool; strike them down at first sight with extreme prejudice.
If one of a new users' first three posts (within their first couple of days/first week) are deleted as spam/offensive, they should be automatically put on suspension at the very least.
Better still, if they're down two and out, just delete all their posts and save the moderators a good deal of time from ploughing that muck.
There's no way a new user to the system posting junk from the outset wants to learn or will bother with constructive posts.
If they racket up quick the spam/offensive deleted posts, just destroy them already. They're a lost cause.
Auto-suspension and/or auto-Windexing will save moderators time in setting the suspension period and blasting caps and allow for the true community model of keeping the place clean.
Remember, together we can all put out these fires.
Lurking and posting answers is a good way to learn the system and what's expected of you as a Stack Overflow/Super User/... participant.
So allow users to post their first question "free of charge" - after all they've come here because they've got a problem.
Then don't allow them to post again until they've done some/all of the following:
There would need to be IP logging, e-mail checking etc. to prevent users just posting using a new ID & probably some other stuff I've not thought of.
While it won't totally prevent bad questions from turning up it should slow down the rate at which they do to a manageable level.
Maybe you should add a throttle for new questions based on the average of votes of the user's past questions*.
For example, if the average scoring of questions asked in the last 30 days is -0.5 or less (and the user posted at least 2 or 3 questions), then throttle the account to a maximum of 1 question per day.
If the average is -3 or less, throttle the account to a maximum of 1 per week. If the average is -5 or less then don't let the user post questions for 30 days. The numbers obviously can be tweaked.
This should include deleted questions. Maybe in calculating the average cap the minimum score to -6
, so users won't get suspended for a very long time if they post a single exceptionally bad question.
*Everything should be both account based and IP based.
I would suggest the following combination:
Incentivizing the finding and closing of duplicates is a must. Stack Overflow's question base is already being swamped with too many dupes. This is going to become a problem.
While Stack Overflow in general has very few barriers, I feel clear feedback and timed suspensions would be the better approach for this problem. How about introducing a "bad question" flag as suggested by John Saunders. Five flags close the question. The user would get a clear message along the lines of
Substandard question
The question is entirely unintelligible, or is part of a series of questions viewed as substandard by the community. While Stack Overflow welcomes users new to programming and questions of all levels of experience, some effort when asking a question is expected.
From my own experience with users with patterns of bad questions, I'd say five to seven questions closed this way would have to result in a 24 hour suspension; twenty flags in a deletion of the account.
That is, if it is made sure that really only the crappy questions get flagged. You and I can tell what a crappy question is when we see them, but they are hard to define. A "bad" flag mustn't be misused by users who don't like a question, e.g. in open/close wars. Maybe the flags could be made contestable in that you can appeal to a moderator to have it "un-badded". I'm pretty sure most users who ask bad questions and are unwilling to improve them - except maybe for the few real trolls - would not make use of the possibility. An addition to the message above could be:
If you feel your question was flagged unjustly, you can flag the question for review by a moderator.
One thing I would like to add; 12 close votes per day is not enough; I would prefer you to increase the limits to 30.
I make a habit of only closing questions that have 4 close votes, and there was not a singlee day when 12 close votes was enough; there are just too many lousy questions, inappropriate questions, duplicated questions, belongs-on-superuser/serverfault questions that require merciless closing.
In addition, to create incentive to close questions, you should really, really award badges for users who already close a certain number of questions [1]. Better yet, this set of badges should be able to be awarded numerous times.
[1] http://meta.stackoverflow.com/questions/25053/badge-for-closing-questionI don't think it's the luser n00bs who are the problem. It's regulars who see bad behaviour and endorse it, by providing answers to duplicate questions.
make down-votes on questions free.
The first few things I can think of are possibly some sort of grammar filter, or even "community enacted restrictions"?
If we know many of the characteristics of annoying questions, possibly make a filter for them - e.g. a question with only one paragraph and/or 50 characters, a question that has only one paragraph with over 500 characters, any question that has more than five question marks in less than two paragraphs of text etc.
The next one seems a bit overkill, but if you have taken the time to make your question, I am guessing that you are annoyed at the situation and do not mind thinking slightly outside the box / making a bigger system for the greater good.
The idea is an additional button next to the link/flag/edit buttons - a "Bad Question" button - What this button can do is for starters is display all questions with over 5 markings on an additional section e.g. stackoverflow.com/bad_qs and allow people to suggest new grammar filters based on past questions. (Community Bayesian filter? :S ).
Or if you do not like the above, perhaps even as well as that, the button for "Bad Question" could be linked to the users account as well.
If the user gets 2 or more for a single question, the next time they ask a question, it can go to an FAQ about writing nicely formatted questions.
If the user gets 2 or more on a second question, it then makes a proper warning saying that restrictions may follow.
If the user gets reports on another question, they are then throttled by whatever you decide.
As most likely the users will only be posting one question every few days (unless I have underestimated the bad question problem), I am not sure what the limits can be - but I think the actual warning would be all that is needed in most cases.
I am not a fan of IP based restrictions on this sort of thing because I think there may be many companies where multiple users will use the system, although, it may work to disallow new users from an IP that has had excessive "Bad Answer" type warnings?
I am not the best at English/Grammar, so I am not really the best to come up with rules, but I hope you get the basic idea and like them!
Possible spam
to Bad question
. Spam means indiscriminately bulk advertising a commercial product. That word should not be used for bad questions that contain no advertisements. - Andreas Bonini
The How to ask [1] page desperately needs examples. For example:
If you ask a vague question, you'll get a vague answer. But if you give us details and context, we can provide a useful answer.
is vague about the definition of the word "vague"! What are details and context? These are spoken like the reader knows what we mean by them.
I'd prefer to see some good examples & bad examples, and what makes them good and bad, to give the user who really does want to ask some toehold. Maybe like:
BAD: My php page is not showing the mysql resutls when i click "OK" but i get a Internal Error WTF?
GOOD: I'm using PHP and MySQL and I submit a form, and I'm getting a 500 Internal Error. Here is the query I'm trying to run:
select * from users where....
This may well be the very first time the user has posted to a site like this, and has never considered these questions. I don't mind having an EULA, but let's make it useful for the user, too.
More guides along the right on the submission form might help. There's a link to the FAQ [2] but the FAQ doesn't cover what a user posting a question wants to know. A user asking a question doesn't need to worry about how reputation works.
SO is also treating the "how to ask a question" from the point of view of "Here's how we want you to ask the question." It feels like Question Police. The focus should instead be "Here's how to get more people to give better answers to your question" or "How to get the best answer to your question" or "How to get your question answered faster." All of those are why we guide the users to ask questions better, but we don't tell them that!
[1] http://stackoverflow.com/questions/how-to-askYou're forgetting that most of the good questions have already been asked as SO. I've seen the same thing happening at Experts Exchange, where several topics started to become satiated with the hard questions and answers, thus anyone with some common sense would just find his answer in an existing question with no need to post a new one. But there will always be people who ask low-quality questions, who are new here and who don't understand the system. But while the high-quality questions start to slow down because the system is satiated, the low-quality questions will never run out. SO is at a point where it's hard to ask a high-quality question that hasn't been asked before. But there's still plenty of room for low-grade questions. So I fear that we just have to accept that as the popularity of SO rises, the quality of the questions will just drop a bit.
Idea 1:
There are language models and classifiers. I bet that a little NLP would identify a significant fraction of the drek.
However, I do not propose to auto-reject drek. Just to force moderation. Questions that flunked the NLP 'remotely like a question we've ever liked' test would go into limbo, and emerge only of rescued by someone with some rep. I would pop up on the post button, with a message like, 'Your question does not appear to be made up of conventional English sentences. Do you want to edit further? If not, it will go into a moderation queue.'
Idea 2: Give up on 'no registration.' OK, you won't like this one.
Idea 3: Give many more people more tools to resolve questions by attaching them to extant, better, questions. One of the results of success is that many new questions, even non-drekky, are not so new. This has been thrashed in other threads, but I continue to believe that 'sit for a while accumulating answers while 5 close votes pile up, then wait for an overworked diamond to work the magic merge machinery.' e.g., allow OPs to agree with a proposed duplicate and trigger a merge, or allow non-diamonds to work the merge machine.
My concern with any system is what effect it will have on legitimate new users. Assuming anyone with under 20 rep is in some way suspicious or in need of a kiddy pool means that the initial experience of new people is one of negative trust. As it sits right now, anyone can ask any question, so the barrier to entry is pretty low. We should probably keep that rather than relegate new users to an even less trusted state. Everyone starts with neutral reputation, I think that's key.
It does mean that one-shotters, who discover the site and think
Gee, there are a lot of smart people here. I wonder if they can explain what a pointer is.
And ask it. It's like they don't trust the search engine.
Once they've been here, then we can start assigning negative reputation to them.
Language is hard, since we do have a lot of ESL users. Much as I'd like to see some kind of question-filter that pings on a post contains no capital letters, using a shift-key is not strictly required to communicate meaning. This is the kind of thing that might go into a, 'might need editing' queue of some kind rather that not get posted at all.
Part of the problem here is that we don't really know anything about these problem users. We are making assumptions about their newbie-ness, motivations, laziness, language skills etc., but we don't really know. I think this makes finding a solution more difficult.
Because if they're newbies, the solution is education. If they're lazy, the solution is making their life more difficult. If their problem is language skills, maybe we need a 'excusemyenglish' tag that they add themselves. etc.
I think flagging for editing isn't a bad idea, though I agree that in most cases it's easier just to go in and edit the question yourself (if you have the rep).
This may sound very naive (since I'm no professional programmer), but have you considered teaching a neuronal network (or other kind of AI) your decisions to assist (not replace!) moderation?
I would like to be able to single-handedly delete the really gnarly questions that come by the 10K tools. Questions that are pure unredeemable noise. E.g., http://stackoverflow.com/questions/3288720/can-no-one-answer-this-question-o-closed.
Another thing is that questions have something like a 2 day minimum before they can be killed. Not all questions deserve this honor.
One actionable suggestion I have is this:
In respect to problem users, I would suggest this actionable suggestion:
The reasoning here is that Bad Users are Bad. They aren't going to Be Good and Ask Good Questions(If they asked good questions, they wouldn't be bad).
I am a fairly prolific question-asker (in relation to my answering), and it doesn't look like I have a single negative question(at the moment... looks around for vengeful meta users).
How about allowing users to filter questions based on the asker's reputation? If "bad" questions offend you, don't look at any until the asker has amassed a few hundred reputation.
Sorry, I know I'm coming to this discussion late, but I think there's an outside-the-box type of solution: provide localized SO sites. I know I'm beating a dead horse because you and Joel talked this to death on one of the podcasts, but I really think a lot of the bad questions are people who can barely communicate in English. It's not their fault that the best site ever made for programmers requires English.
Place first posts in quarantine. The first post of new user wouldn't be visible publicly on the site until it is approved by an experienced user.
Right now, the First Posts review queue is empty, so it would only add a small delay to their publication.
Another benefit is that it might well reduce the Close vote queue.
You are already trying to estimate the quality of the style of the written text in questions and answers. However, there's something that should be easier to implement: If there is code, auto-lint it. If there is stuff that should be formatted, lint the formatting. At least for
and other things like that, show lint warnings or errors to the user. If a low-rep user chooses to ignore the lint warnings, auto-flag his post for high-rep-user attention and put the lint warnings and errors in his question text. Lint errors for stuff like
class foo {
public int A;
int b;
static void main(String[] args){
}
should not be ignorable for the user.
A user who does not care enough to at least use proper indentation probably is not a good member for the SO community.
Examples:
{}
-delimited blocks that contain code on the same indent level)}
followed by a }
that is indented more)A downside to this approach is that not all questions contain code or so.
[1] http://www.emacswiki.org/emacs/TabsSpacesBoth;
in JS, but there are some, like uppercase classnames, that you will just have to learn at some point, and learning earlier is better, I think. Also, questions and answers should be reusable as learning material, so the code style in questions has an impact on readers. - thejh
Allow to sort newest questions by score.
Then bad questions won't get too much attention. They won't attract users who upvote answers, therefore they won't attract bad answerers whoring some quick rep.
Next, allow these "bad" old questions to be closed as duplicates if a new, better worded ones, when they are asked. Currently it's impossible to close an old question as a dupe of a new one (at least I recently couldn't).
How about allowing bounties against a question? That is, I give someone a bit of reputation if they capture dead or alive close or delete duplicate or otherwise inappropriate questions.
I know there's flagging, but that hardly ever seems to work.
Proposal: Below the field for asking a question there is a field for Tags. We could add a mandatory selection box for "I tried something", for the asker to tick one of two check boxes (_ Yes _ No), and an extra field for a brief description of what was tried (an Excel formula, "The subroutine in the question", etc.; there are many options), which will be described in detail in the question itself.
It is worth noting that:
I guess whether it will be a good method depends on the balance between these two aspects (and perhaps something else I am missing). If something like this is ever implemented, its usage may even evolve in time, so it is perhaps hard to qualify it simply as "useful" or "not useful".
Take this as brainstorming.
PS1: As a possible alternative, add a field for the asker to complete: "For this question, I googled the following: _____________"
PS2: If this works, it might be expanded to other fields, e.g., "Is this a question of General Reference [1]?" (_ Yes _ No)
PS3: Some are pessimistic about this endeavor [2].
PS4: Many of the answers here actually focus on "How can we deal with low-quality questions that already entered our system?" The preventing action in these cases is by deterance for future occasions, but not for the current about-to-be-posted question. I find it comforting that the accepted answer [3], even if not the most upvoted, addresses the original question.
[1] http://meta.stackoverflow.com/a/102813/232130
-1 for not enough jQuery
is just a meme I guess. - Amarghosh