This commit is contained in:
wea_ondara
2021-10-06 19:40:39 +02:00
parent b48c7c8845
commit b32d368695

View File

@@ -1,6 +1,6 @@
\chapter{Related Work} \chapter{Related Work}
This section is divided into two parts. The first part explains what StackExchange is, how it developed since its inception, and how it works. The second part shows previous and related work. %TODO more This section is divided into three parts. The first part explains what StackExchange is, how it developed since its inception, and how it works. The second part shows previous and related work. The third section covers approches to analyze sentiment as well as methods to analyze trends over time.
\section{Background} \section{Background}
@@ -244,15 +244,22 @@ Different badges also create status classes \cite{immorlica2015social}. The hard
\subsection{Regulation} \subsection{Regulation}
%TODO short blabla about, quality, user mod tools, trolls, ... Regulation evolves around the user actions and the content a community creates. It is required to steer the community and keep the community civil. Naturally, some users will not have the best intensions for the community in mind. These actions of such must be accounted for, and harmful actions must be delt with. Otherwise the community and its content will deteriorate.
Quality is often a concern in online communities. Platform moderators and admins want to keep a certain level of quality or even raise it. However, higher-quality posts take more time and effort than lower-quality posts. In the case of CQA platforms, this is an even bigger problem as higher quality answers fight against fast responses. Despite that, StackOverflow also has a problem with low quality and effort questions and subsequent unwelcoming answers and comments\footref{silge2019welcome}. StackOverflow has grown into a large community and larger communities are harder to control. \citeauthor{lin2017better} investigated how growth affects a community. They looked at Reddit communities that were added to the default set of subscribed communities of every new user (defaulting) which lead to a huge influx of new users to these communities as a result. The authors found that contrary to expectations, the quality stays largely the same. The vote score dips shortly after defaulting but quickly recovers or even raises to higher levels than before. The complaints of low-quality content did not increase, and the language used in the community stayed the same. However, the community clustered around fewer posts than before defaulting. \citeauthor{srba2016stack} did a similar study on the StackOverflow community \cite{srba2016stack}. They found similar pattern in the quality of posts. The quality of questions dipped momentarily due to the huge influx of new users. However, the quality did recover after 3 months. They also identified 3 types of users causing the lowering of quality: \emph{Help Vampires} (these spend litte to no effort to research their questions, which leads to many duplicates), \emph{Noobs} (they create mostly trivial questions), and \emph{Reputation Collectors}. They try to gain repuation as fast as possible by methods described by \citeauthor{bosu2013building}\cite{bosu2013building} but often with no reguard of what effects their behavior has on the community, for instance, lowering overall content quality, turing other user away from the platform, and encouraging the behavior of \emph{Help Vampires} and \emph{Noobs} even more. \textbf{Content qualtity}\\
Quality is a concern in online communities. Platform moderators and admins want to keep a certain level of quality or even raise it. However, higher-quality posts take more time and effort than lower-quality posts. In the case of CQA platforms, this is an even bigger problem as higher quality answers fight against fast responses. Despite that, StackOverflow also has a problem with low quality and effort questions and the subsequent unwelcoming answers and comments\footref{silge2019welcome}.
The authors also proposed some solutions to improve the quality problems. One suggestion is to restrict the openness of a community. This can be accomplished in different ways, for instance, introducing a posting limit for questions on a daily basis\cite{srba2016stack}. While this certainly limits the amount of low quality posts, it does not eliminate the problem. Furthermore, this limitation would also hurt engaged users which would create a large volume of higher quality content. A much more intricate solution which adapts to user behavior would be required, otherwise the limitation would hurt the community more than it improves. \citeauthor{lin2017better} investigated how growth affects a community\cite{lin2017better}. They looked at Reddit communities that were added to the default set of subscribed communities of every new user (defaulting) which lead to a huge influx of new users to these communities as a result. The authors found that contrary to expectations, the quality stays largely the same. The vote score dips shortly after defaulting but quickly recovers or even raises to higher levels than before. The complaints of low-quality content did not increase, and the language used in the community stayed the same. However, the community clustered around fewer posts than before defaulting. \citeauthor{srba2016stack} did a similar study on the StackOverflow community \cite{srba2016stack}. They found similar pattern in the quality of posts. The quality of questions dipped momentarily due to the huge influx of new users. However, the quality did recover after 3 months.
Questions of help vampires and noobs also direct answerers away from much more demanding questions. This leads one one hand to knowledgable answerers answering questions for which they are overqualified to answer, and on the otherhand to a lack of adquate quality answers for difficult questions. \citeauthor{srba2016stack} suggest a system which tries to match questions with answerers that satify the knowledge requirement but are not grossly overqualified to answer the question. A system with this quality would prevent suggesting simple questions to overqualified answerers, and prevent an answer vacuum for questions with more advanced topics. This would ensure a more optimal utilization of the answering capability of the community. \citeauthor{tausczik2011predicting} found reputation is linked to the perceived quality of posts in multiple ways \cite{tausczik2011predicting}. They suggest reputation could be used as an indicator of quality. Quality also depends on the type of platform. \citeauthor{lin2017better} showed that expert sites who charge fees, for instance, library reference services, have higher quality answers compared to free sites\cite{lin2017better}. Also, the higher the fee the higher the quality of the answers. However, free community sites outperform expert sites in terms of answer density and responsiveness.
Another solution is to find content abusers (noobs, help vampires, etc.) directly. One approach is to add a reporting system to the community, however, a system of this kind is also driven by user inputs and therefore can be manipulated as well. This would lead to excluding users flaged as false positives and miss a portion of content abusers completely. A better approach is to systematically find these users by their behavior. \citeauthor{kayes2015social} describe a classifier which achieves an accuracy of 83\% on the \emph{Yahoo! Answers} platform \cite{kayes2015social}. The classifier is based on empirical data where they looked at historical user activity, report data, and which users were banned from the platform. From these statistics they created the classifier which is able to distinguish between falsly and fairly banned users. \citeauthor{cheng2015antisocial} performed a similar study on antisocial behavior on various platforms. They too looked at historical data of users and their eventual bans as well as on their deleted posts rates. Their classifier achieved an accuracy of 80\%. \textbf{Content abuse}\\
\citeauthor{srba2016stack} identified 3 types of users causing the lowering of quality: \emph{Help Vampires} (these spend litte to no effort to research their questions, which leads to many duplicates), \emph{Noobs} (they create mostly trivial questions), and \emph{Reputation Collectors}\cite{srba2016stack}. They try to gain repuation as fast as possible by methods described by \citeauthor{bosu2013building}\cite{bosu2013building} but often with no reguard of what effects their behavior has on the community, for instance, lowering overall content quality, turning other user away from the platform, and encouraging the behavior of \emph{Help Vampires} and \emph{Noobs} even more.
Questions of \emph{Help Vampires} and \emph{Noobs} direct answerers away from much more demanding questions. On one hand this leads to knowledgable answerers answering questions for which they are overqualified to answer, and on the other hand to a lack of adequate quality answers for more difficult questions. \citeauthor{srba2016stack} suggest a system which tries to match questions with answerers that satify the knowledge requirement but are not grossly overqualified to answer the question. A system with this quality would prevent suggesting simple questions to overqualified answerers, and prevent an answer vacuum for questions with more advanced topics. This would ensure a more optimal utilization of the answering capability of the community.
\textbf{Content moderation}\\
\citeauthor{srba2016stack} proposed some solutions to improve the quality problems. One suggestion is to restrict the openness of a community. This can be accomplished in different ways, for instance, introducing a posting limit for questions on a daily basis\cite{srba2016stack}. While this certainly limits the amount of low quality posts, it does not eliminate the problem. Furthermore, this limitation would also hurt engaged users which would create a large volume of higher quality content. A much more intricate solution which adapts to user behavior would be required, otherwise the limitation would hurt the community more than it improves.
\citeauthor{ponzanelli2014improving} performed a study where they looked at post quality on StackOverflow\cite{ponzanelli2014improving}. They aim to improve the automatic low quality post detection system which is already in place and reduce the size of the review queue selected indivuals have to go through. Their classifier improves by including popularity metrics of the user posting and readability of post itself. With these additional factors they managed to reduce the amount of missclassified quality posts with only a minimal decrease in correctly classified low quality posts. Their improvement to the classifier reduced the review queue size by 9\%. \citeauthor{ponzanelli2014improving} performed a study where they looked at post quality on StackOverflow\cite{ponzanelli2014improving}. They aim to improve the automatic low quality post detection system which is already in place and reduce the size of the review queue selected indivuals have to go through. Their classifier improves by including popularity metrics of the user posting and readability of post itself. With these additional factors they managed to reduce the amount of missclassified quality posts with only a minimal decrease in correctly classified low quality posts. Their improvement to the classifier reduced the review queue size by 9\%.
@@ -266,7 +273,7 @@ Another solution is to find content abusers (noobs, help vampires, etc.) directl
% -> matching questions with answerers \cite{srba2016stack} (difficult questions -> expert users, easier questions -> answerers that know it but are not experts), dont overload experts, utilize capacities of the many nonexperts % -> matching questions with answerers \cite{srba2016stack} (difficult questions -> expert users, easier questions -> answerers that know it but are not experts), dont overload experts, utilize capacities of the many nonexperts
\citeauthor{tausczik2011predicting} found reputation is linked to the perceived quality of posts in multiple ways \cite{tausczik2011predicting}. They suggest reputation could be used as an indicator of quality. Quality also depends on the type of platform. \citeauthor{lin2017better} showed that expert sites who charge fees, for instance, library reference services, have higher quality answers compared to free sites\cite{lin2017better}. Also, the higher the fee the higher the quality of the answers. However, free community sites outperform expert sites in terms of answer density and responsiveness. Another solution is to find content abusers (noobs, help vampires, etc.) directly. One approach is to add a reporting system to the community, however, a system of this kind is also driven by user inputs and therefore can be manipulated as well. This would lead to excluding users flagged as false positives and miss a portion of content abusers completely. A better approach is to systematically find these users by their behavior. \citeauthor{kayes2015social} describe a classifier which achieves an accuracy of 83\% on the \emph{Yahoo! Answers} platform \cite{kayes2015social}. The classifier is based on empirical data where they looked at historical user activity, report data, and which users were banned from the platform. From these statistics they created the classifier which is able to distinguish between falsly and fairly banned users. \citeauthor{cheng2015antisocial} performed a similar study on antisocial behavior on various platforms. They too looked at historical data of users and their eventual bans as well as on their deleted posts rates. Their classifier achieved an accuracy of 80\%.
@@ -336,6 +343,8 @@ In general, sentiment analysis tools can be grouped into two categories: handcra
%generally fast sentiment computation %generally fast sentiment computation
%realtively easy to update (added words, ...) %realtively easy to update (added words, ...)
%nachvolliziehbare results %nachvolliziehbare results
\textbf{Handcrafted Approches}\\
Creating hand crafted tools is often a huge undertaking. They depend on a hand crafted lexicon (gold standard, human-curated lexicons), which maps features of a text to a value. In the simplest sense these just map a word to a binary value -1 (negative word) or 1 (positive word). However, most tools use a more complex lexicon to capture more features of piece of text. By design they allow a fast computation of the sentiment of a given piece of text. Also, hand crafted lexicons are easy to update and extend. Furthermore, hand crafted tools produce easily comprehensible results. The following paragraphs explain some of the analysis tools in this category. Creating hand crafted tools is often a huge undertaking. They depend on a hand crafted lexicon (gold standard, human-curated lexicons), which maps features of a text to a value. In the simplest sense these just map a word to a binary value -1 (negative word) or 1 (positive word). However, most tools use a more complex lexicon to capture more features of piece of text. By design they allow a fast computation of the sentiment of a given piece of text. Also, hand crafted lexicons are easy to update and extend. Furthermore, hand crafted tools produce easily comprehensible results. The following paragraphs explain some of the analysis tools in this category.
@@ -424,6 +433,7 @@ Word-Sense Disambiguation (WSD)\cite{akkaya2009subjectivity} is not a sentiment
%generalization problem (vader) %generalization problem (vader)
%updateing (extend/modify) hard (e.g. new domain) (vader) %updateing (extend/modify) hard (e.g. new domain) (vader)
\textbf{Machine Learning Approches}\\
Because hand crafting sentiment analysis requires a lot of effort, researches turned to approaches which offload the labor intensive part to machine learning (ML). However, this results in a new challenge, namely: gathering a \emph good data set to feed the machine learning algorithms for training. Firstly, \emph good data set needs to represent as many features as possible, or otherwise the algorithm will not recognise it. Secondly, the the data set has to be unbiased and representative for all the data of which the data set is a part of. The data set has to represent each feature in an appropiate amount, or otherwise the algorithms may discrimate a feature in favor of other more represented features. These requirements are hard to fulfill and often they are not\cite{hutto2014vader}. After a data set is aquired, a model has to be learned by the ML algorithm, which is, depending on the complexity of the alogrithm, a very computational-intensive and memory-intensive process. After training is completed, the algorithm can predict sentiment values for new pieces of text, which it has never seen before. However, due to the nature of this appraoch, the results cannot be comprehended by humans easily if at all. ML approaches also suffer from an generalization problem and therefore cannot be transfered to other domains without accepting a bad performance, or updating the training data set to fit the new domain. Updating (extending or modify) the training also require a complete training from scratch. These drawbacks make ML algorithms useful only in narrow situations where changes are not required and the training data is static and unbiased. Because hand crafting sentiment analysis requires a lot of effort, researches turned to approaches which offload the labor intensive part to machine learning (ML). However, this results in a new challenge, namely: gathering a \emph good data set to feed the machine learning algorithms for training. Firstly, \emph good data set needs to represent as many features as possible, or otherwise the algorithm will not recognise it. Secondly, the the data set has to be unbiased and representative for all the data of which the data set is a part of. The data set has to represent each feature in an appropiate amount, or otherwise the algorithms may discrimate a feature in favor of other more represented features. These requirements are hard to fulfill and often they are not\cite{hutto2014vader}. After a data set is aquired, a model has to be learned by the ML algorithm, which is, depending on the complexity of the alogrithm, a very computational-intensive and memory-intensive process. After training is completed, the algorithm can predict sentiment values for new pieces of text, which it has never seen before. However, due to the nature of this appraoch, the results cannot be comprehended by humans easily if at all. ML approaches also suffer from an generalization problem and therefore cannot be transfered to other domains without accepting a bad performance, or updating the training data set to fit the new domain. Updating (extending or modify) the training also require a complete training from scratch. These drawbacks make ML algorithms useful only in narrow situations where changes are not required and the training data is static and unbiased.
% naive bayes % naive bayes
@@ -454,6 +464,7 @@ In general, ML approaches do not provide an improvment over hand crafted lexicon
% - context awareness % - context awareness
% - disabliguation of words if they have multiple meanings (contextual meaning) % - disabliguation of words if they have multiple meanings (contextual meaning)
\textbf{VADER}\\
This shortcoming was addressed by \citeauthor{hutto2014vader} who introducted a new sentiment analysis tool: Valence Aware Dictionary for sEntiment Reasoning (VADER)\cite{hutto2014vader}. \citeauthor{hutto2014vader} acknowledged the problems that many tools have and designed VADER to leverage the shortcomings. Their aim was to introduce a tool which works well in the social media domain, provides a good coverage of features occuring in the social media domain (acronyms, initialisms, slang, etc.), and is able to work with online streams (live processing) of texts. VADER is also able to distinguish between different meanings of words (WSD) and it is able to take sentiment intensity into account. These properties make VADER an excellent choice when analysing sentiment in the social media domain. This shortcoming was addressed by \citeauthor{hutto2014vader} who introducted a new sentiment analysis tool: Valence Aware Dictionary for sEntiment Reasoning (VADER)\cite{hutto2014vader}. \citeauthor{hutto2014vader} acknowledged the problems that many tools have and designed VADER to leverage the shortcomings. Their aim was to introduce a tool which works well in the social media domain, provides a good coverage of features occuring in the social media domain (acronyms, initialisms, slang, etc.), and is able to work with online streams (live processing) of texts. VADER is also able to distinguish between different meanings of words (WSD) and it is able to take sentiment intensity into account. These properties make VADER an excellent choice when analysing sentiment in the social media domain.
%The authors used a lexicon based approach as performance was one of the most important reuqirements. %The authors used a lexicon based approach as performance was one of the most important reuqirements.