wip
This commit is contained in:
@@ -47,7 +47,7 @@ For each community on StackExchange, a \emph Meta page is offered where members
|
||||
|
||||
\section{State of the Art}
|
||||
|
||||
Since the introduction of Web 2.0 and the subsequential spawning of platforms for social interaction, researchers started investigating the emerging online communities. Research strongly focuses on the interactions of users on various platforms. Community knowledge platforms are of special interest, for instance, StackExchange/StackOverflow \cite{slag2015one, ford2018we, bazelli2013personality, movshovitz2013analysis, bosu2013building, yanovsky2019one, kusmierczyk2018causal, anderson2013steering, immorlica2015social, tausczik2011predicting}, Quora \cite{wang2013wisdom}, Reddit \cite{lin2017better, chandrasekharan2017you}, Yahoo! Answers \cite{bian2008finding}, and Wikipedia \cite{yazdanian2019eliciting}.
|
||||
Since the introduction of Web 2.0 and the subsequential spawning of platforms for social interaction, researchers started investigating the emerging online communities. Research strongly focuses on the interactions of users on various platforms. Community knowledge platforms are of special interest, for instance, StackExchange/StackOverflow \cite{slag2015one, ford2018we, bazelli2013personality, movshovitz2013analysis, bosu2013building, yanovsky2019one, kusmierczyk2018causal, anderson2013steering, immorlica2015social, tausczik2011predicting}, Quora \cite{wang2013wisdom}, Reddit \cite{lin2017better, chandrasekharan2017you}, Yahoo! Answers \cite{bian2008finding, kayes2015social}, and Wikipedia \cite{yazdanian2019eliciting}.
|
||||
These platforms allow communication over large distances and facilitate fast and easy knowledge exchange and aquisition by connecting thousands or even millions of users and create valuable repositories of knowledge in the process. Users create, edit, and consume little pieces of information and collectively build a community and knowledge repository. However, not every piece of information is factual \cite{wang2013wisdom, bian2008finding} and platforms often employ some kind of moderation to keep up the value of the platform and to ensure a certain standard within the community.
|
||||
%allow communitcation over large distances
|
||||
%fast and easy knowledge exchange
|
||||
@@ -136,10 +136,11 @@ Unwelcomeness is a large problem on StackExchange \cite{ford2016paradise}\footre
|
||||
|
||||
\subsection{Keeping users engaged, contributing and well behaved}
|
||||
While attracting and onboarding new users is an important step for growing a community, keeping them on the platform and turning them long lasting community members is equally as important for growth as well as sustainability. Users have to feel the benefits of staying with the community. Without the benefits a user has little to no motivation to interact with the community and will most likely drop out of it. Benefits are diverse, however, they can be grouped into 5 categories: information exchange, social support, social interaction, time and location flexibility, and permanency \cite{iriberri2009life}. %TODO look at refs of table 4 in \cite{iriberri2009life} and add refs if applicable
|
||||
As StackExchange is a CQA platform, the benefits from information exchange, time and location flexibility, and permanency are more prevalent, while social support, and social interaction are more in the background. Furthermore, StackExchange is driven by the community and therefore depends even more on the voluntarism of its users, making benefits even more important.
|
||||
As StackExchange is a CQA platform, the benefits from information exchange, time and location flexibility, and permanency are more prevalent, while social support, and social interaction are more in the background. Furthermore, StackExchange is driven by the community and therefore depends even more on the voluntarism of its users, making benefits even more important. %TODO somehwo a ref?
|
||||
|
||||
In a community, users can generally be split in 2 groups by motivation to voluntarily contribute: One group acts out of altruism, where users contribute with the reason to help others and do good to the community; the second group acts out of egoism and selfish reasons, for instance, getting recognition from other people \cite{ginsburg2004framework}. Users of the second group still help the community but their primary goal not neccessarily the health of commiunity but gaining reputation and making a name for themselves. Contrary, users of the first group primarly focus on helping the community and see reputation as a positive side effect which also feeds back in their ability to help others. While these groups have different objectives, both groups need recognition of their efforts \cite{iriberri2009life}. There are several methods for recognizing the value a member provides to the community: reputation, awards, trust, identity, etc. \cite{ginsburg2004framework}. %TODO maybe elaborate on reputation, awards, trust, identity, see paper ginsburg2004framework
|
||||
|
||||
%TODO first 2 sencentes buuuuu
|
||||
Volunarism is always a key part in communities. The backbone of a community is always the user base. Even if the community is lead by a commerical core team, the community is almost always several orders of magnitude greater than the number of the paid employees forming the core team. The core team often provides the infrastructur the community and does some cummunity. However, most of the community work is done by volunteers of the community \cite{butler2002community}. %TODO get number on employees and volunteers on stackexchange/overflow
|
||||
%This is also true for the StackExchange platform where the core team of paid employees is XXX and the number of voluntary community members performing community work is XXX \footnote{\url{LINK}}
|
||||
|
||||
@@ -182,6 +183,7 @@ Volunarism is always a key part in communities. The backbone of a community is a
|
||||
%badge system
|
||||
%quality
|
||||
|
||||
%TODO improve this paragraph, maybe double in length
|
||||
StackExchange employes serveral features to engage users with the platform, for instance, the reputation system and the badge system. These systems reward contributing users with achievements and encourages further contribution to the community. Both systems try to keep and increase the quality of the posts on the platform.
|
||||
|
||||
Reputation plays a important role on StackExchange and indicates the credibility of a user as well as a primary source of answers of high quality \cite{movshovitz2013analysis}. Although the largest chunk of all questions is posted by low-reputated users, high-reputated users post more questions on average. To earn a high reputation a user has to invest a lot of effort and time into the community, for instance, asking good questions or providing useful answers to questions of others. Reputation is earned when a question or answer is upvoted by other users, or if an answer is accepted as the solution to a question by the question creator. \citeauthor{mamykina2011design} found that the reputation system of StackOverflow encourages users to compete productively \cite{mamykina2011design}. But not every user participates equally, and participation depends on the personality of the user \cite{bazelli2013personality}. \citeauthor{bazelli2013personality} showed that the top-reputated users on StackOverflow are more extroverted compared to users with less reputation. \citeauthor{movshovitz2013analysis} found that by analyzing the StackOverflow community network, experts can be reliably identified by their contribution within the first few months after their registeration. Graph analysis also allowed the authors to find spamming users or users with other extreme behavior.
|
||||
@@ -214,9 +216,28 @@ Different badges also create status classes \cite{immorlica2015social}. The hard
|
||||
|
||||
|
||||
|
||||
Quality is often a concern in online communities. Platform moderators and admins want to keep a certain level of quality or even raise it. However, higher-quality posts take more time and effort than lower-quality posts. In the case of CQA platforms, this is an even bigger problem as higher quality answers fight against fast responses. Despite that, StackOverflow also has a problem with low quality and effort questions and subsequent unwelcoming answers and comments\footref{silge2019welcome}. StackOverflow has grown into a large community and larger communities are harder to control. \citeauthor{lin2017better} investigated how growth affects a community. They looked at Reddit communities that were added to the default set of subscribed communities of every new user (defaulting) which lead to a huge influx of new users to these communities as a result. The authors found that contrary to expectations, the quality stays largely the same. The vote score dips shortly after defaulting but quickly recovers or even raises to higher levels than before. The complaints of low-quality content did not increase, and the language used in the community stayed the same. However, the community clustered around fewer posts than before defaulting. \citeauthor{srba2016stack} did a similar study on the StackOverflow community \cite{srba2016stack}. They found similar pattern in the quality of posts. The quality of questions dipped momentarily due to the huge influx of new users. However, the quality did recover after 3 months. They also identified 3 types of users causing the lowering of quality: \emph{Help Vampires} (these spend litte to no effort to research their questions, which leads to many duplicates), \emph{Noobs} (they create mostly trivial questions), and \emph{Reputation Collectors} (they try to gain repuation as fast as possible by methods described by \citeauthor{bosu2013building}\cite{bosu2013building} but often with no reguard of what effects their behavior has on the community, for instance, encouraging \emph{Help Vampires} and \emph{Noobs}).
|
||||
\citeauthor{tausczik2011predicting} found reputation is linked to the perceived quality of posts in multiple ways \cite{tausczik2011predicting}. They suggest reputation could be used as an indicator of quality.
|
||||
Quality also depends on the type of platform. \citeauthor{lin2017better} showed that expert sites who charge fees, for instance, library reference services, have higher quality answers compared to free sites\cite{lin2017better}. Also, the higher the fee the higher the quality of the answers. However, free community sites outperform expert sites in terms of answer density and responsiveness.
|
||||
Quality is often a concern in online communities. Platform moderators and admins want to keep a certain level of quality or even raise it. However, higher-quality posts take more time and effort than lower-quality posts. In the case of CQA platforms, this is an even bigger problem as higher quality answers fight against fast responses. Despite that, StackOverflow also has a problem with low quality and effort questions and subsequent unwelcoming answers and comments\footref{silge2019welcome}. StackOverflow has grown into a large community and larger communities are harder to control. \citeauthor{lin2017better} investigated how growth affects a community. They looked at Reddit communities that were added to the default set of subscribed communities of every new user (defaulting) which lead to a huge influx of new users to these communities as a result. The authors found that contrary to expectations, the quality stays largely the same. The vote score dips shortly after defaulting but quickly recovers or even raises to higher levels than before. The complaints of low-quality content did not increase, and the language used in the community stayed the same. However, the community clustered around fewer posts than before defaulting. \citeauthor{srba2016stack} did a similar study on the StackOverflow community \cite{srba2016stack}. They found similar pattern in the quality of posts. The quality of questions dipped momentarily due to the huge influx of new users. However, the quality did recover after 3 months. They also identified 3 types of users causing the lowering of quality: \emph{Help Vampires} (these spend litte to no effort to research their questions, which leads to many duplicates), \emph{Noobs} (they create mostly trivial questions), and \emph{Reputation Collectors}. They try to gain repuation as fast as possible by methods described by \citeauthor{bosu2013building}\cite{bosu2013building} but often with no reguard of what effects their behavior has on the community, for instance, lowering overall content quality, turing other user away from the platform, and encouraging the behavior of \emph{Help Vampires} and \emph{Noobs} even more.
|
||||
|
||||
The authors also proposed some solutions to improve the quality problems. One suggestion is to restrict the openness of a community. This can be accomplished in different ways, for instance, introducing a posting limit for questions on a daily basis\cite{srba2016stack}. While this certainly limits the amount of low quality posts, it does not eliminate the problem. Furthermore, this limitation would also hurt engaged users which would create a large volume of higher quality content. A much more intricate solution which adapts to user behavior would be required, otherwise the limitation would hurt the community more than it improves.
|
||||
|
||||
Questions of help vampires and noobs also direct answerers away from much more demanding questions. This leads one one hand to knowledgable answerers answering questions for which they are overqualified to answer, and on the otherhand to a lack of adquate quality answers for difficult questions. \citeauthor{srba2016stack} suggest a system which tries to match questions with answerers that satify the knowledge requirement but are not grossly overqualified to answer the question. A system with this quality would prevent suggesting simple questions to overqualified answerers, and prevent an answer vacuum for questions with more advanced topics. This would ensure a more optimal utilization of the answering capability of the community.
|
||||
|
||||
Another solution is to find content abusers (noobs, help vampires, etc.) directly. One approach is to add a reporting system to the community, however, a system of this kind is also driven by user inputs and therefore can be manipulated as well. This would lead to excluding users flaged as false positives and miss a portion of content abusers completely. A better approach is to systematically find these users by their behavior. \citeauthor{kayes2015social} describe a classifier which achieves an accuracy of 83\% on the \emph{Yahoo! Answers} platform \cite{kayes2015social}. The classifier is based on empirical data where they looked at historical user activity, report data, and which users were banned from the platform. From these statistics they created the classifier which is able to distinguish between falsly and fairly banned users. \citeauthor{cheng2015antisocial} performed a similar study on antisocial behavior on various platforms. They too looked at historical data of users and their eventual bans as well as on their deleted posts rates. Their classifier achieved an accuracy of 80\%.
|
||||
|
||||
\citeauthor{ponzanelli2014improving} performed a study where they looked at post quality on StackOverflow\cite{ponzanelli2014improving}. They aim to improve the automatic low quality post detection system which is already in place and reduce the size of the review queue selected indivuals have to go through. Their classifier improves by including popularity metrics of the user posting and readability of post itself. With these additional factors they managed to reduce the amount of missclassified quality posts with only a minimal decrease in correctly classified low quality posts. Their improvement to the classifier reduced the review queue size by 9\%.
|
||||
|
||||
|
||||
% other studies which suggest changes to improve community interaction/qualtity/sustainability
|
||||
% -> help vampires, noobs, reputation collectors \cite{srba2016stack}
|
||||
% -> qualtity solution suggestions \cite{srba2016stack}
|
||||
% -> restrict openness of the community, not desirable (e.g. restrict number of questions to combat low-quality questions), will not be 100% efective\cite{srba2016stack}
|
||||
% -> ''Improving Low Quality Stack Overflow Post Detection`` \cite{ponzanelli2014improving}, reduce review queue for moderators
|
||||
% -> finding content abusers, yahoo answers \cite{kayes2015social}, other communities \cite{cheng2015antisocial}
|
||||
% -> matching questions with answerers \cite{srba2016stack} (difficult questions -> expert users, easier questions -> answerers that know it but are not experts), dont overload experts, utilize capacities of the many nonexperts
|
||||
|
||||
|
||||
\citeauthor{tausczik2011predicting} found reputation is linked to the perceived quality of posts in multiple ways \cite{tausczik2011predicting}. They suggest reputation could be used as an indicator of quality. Quality also depends on the type of platform. \citeauthor{lin2017better} showed that expert sites who charge fees, for instance, library reference services, have higher quality answers compared to free sites\cite{lin2017better}. Also, the higher the fee the higher the quality of the answers. However, free community sites outperform expert sites in terms of answer density and responsiveness.
|
||||
|
||||
|
||||
|
||||
% quality
|
||||
@@ -227,6 +248,7 @@ Quality also depends on the type of platform. \citeauthor{lin2017better} showed
|
||||
|
||||
|
||||
|
||||
|
||||
% other
|
||||
% DONE Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow \cite{anderson2012discovering} accepted answer strongly depends on when answers arrive, considered not only the question and accepted answer but the set of answers to a question
|
||||
% DONE Quizz: Targeted Crowdsourcing with a Billion (Potential) Users \cite{ipeirotis2014quizz} many online comunities based on volutarty of users not paid workers
|
||||
|
||||
Reference in New Issue
Block a user