wip

2022-02-06 15:10:45 +01:00
parent e1ce47d99d
commit fc1613f375
3 changed files with 34 additions and 36 deletions
--- a/text/2_relwork.tex
+++ b/text/2_relwork.tex
@@ -13,7 +13,7 @@ With privileges, users can, for instance, create new tags if the need for a new
 StackExchange also employs a badge system to steer the community\footnote{\label{stackoverflowbadges}\url{https://stackoverflow.com/help/badges/}}. Some badges can be obtained by performing one-time actions, for instance, reading the tour page which contains necessary details for newly registered users, or by performing certain actions multiple times, for instance, editing and answering the same question within 12 hours.
 Furthermore, users can comment on every question and answer. Comments could be used for further clarifying an answer or a short discussion on a question or answer.
-For each community on StackExchange, a \emph Meta page is offered where members of the respective community can discuss the associated community \cite{mamykina2011design}\footnote{\url{https://stackoverflow.com/help/whats-meta/}}. This place is used by site admins to interact with the community. The \emph Meta pages are also used for proposing and voting on new features and reporting bugs. \emph Meta pages run the same software as the normal CQA pages so users on vote the ideas and suggestions in the same way they would do on the actual CQA sites. 
+For each community on StackExchange, a \emph Meta page is offered where members of the respective community can discuss the associated community \cite{mamykina2011design}\footnote{\url{https://stackoverflow.com/help/whats-meta/}}. This place is used by site admins to interact with the community. The \emph Meta pages are also used for proposing and voting on new features and reporting bugs. \emph Meta pages run the same software as the normal CQA pages so users vote on ideas and suggestions in the same way they would do on the actual CQA sites. 
 \begin{figure}
 \includegraphics[scale=0.47]{figures/stackoverflow_example_post}
@@ -49,7 +49,7 @@ For each community on StackExchange, a \emph Meta page is offered where members
 \section{State of the Art}
 Since the introduction of Web 2.0 and the subsequential spawning of platforms for social interaction, researchers started investigating emerging online communities. Research strongly focuses on the interactions of users on various platforms. Community knowledge platforms are of special interest, for instance, StackExchange/StackOverflow \cite{slag2015one, ford2018we, bazelli2013personality, movshovitz2013analysis, bosu2013building, yanovsky2019one, kusmierczyk2018causal, anderson2013steering, immorlica2015social, tausczik2011predicting}, Quora \cite{wang2013wisdom}, Reddit \cite{lin2017better, chandrasekharan2017you}, Yahoo! Answers \cite{bian2008finding, kayes2015social}, and Wikipedia \cite{yazdanian2019eliciting}.
-These platforms allow communication over large distances and facilitate fast and easy knowledge exchange and acquisition by connecting thousands or even millions of users and create valuable repositories of knowledge in the process. Users create, edit, and consume little pieces of information and collectively build a community and knowledge repository. However, not every piece of information is factual \cite{wang2013wisdom, bian2008finding} and platforms often employ some kind of moderation to keep up the value of the platform and to ensure a certain standard within the community.
+These platforms allow communication over large distances and facilitate fast and easy knowledge exchange and acquisition by connecting thousands or even millions of users and creating valuable repositories of knowledge in the process. Users create, edit, and consume little pieces of information and collectively build a community and knowledge repository. However, not every piece of information is factual \cite{wang2013wisdom, bian2008finding} and platforms often employ some kind of moderation to keep up the value of the platform and to ensure a certain standard within the community.
 %allow communitcation over large distances
 %fast and easy knowledge exchange
 %many answers to invaluable \cite{bian2008finding}
@@ -64,12 +64,11 @@ All these communities differ in their design. Wikipedia is a community-driven kn
 CQA sites are very effective at code review \cite{treude2011programmers}. Code may be understood in the traditional sense of source code in programming-related fields but this also translates to other fields, for instance, mathematics where formulas represent code. CQA sites are also very effective at solving conceptual questions. This is due to the fact that people have different areas of knowledge and expertise \cite{robillard1999role} and due to the large user base established CQA sites have, which again increases the variety of users with expertise in different fields.
 \subsection{Running an online community}
-Despite the differences in purpose and manifestation of these communities, they are social communities and they have to follow certain laws.
+Despite the differences in purpose and manifestation of these communities, they are social communities and they have to follow certain laws. In their book on ''Building successful online communities: Evidence-based social design`` \cite{kraut2012building} \citeauthor{kraut2012building} lie out five equally important criteria online platforms have to fulfill in order to thrive:
 In their book on ''Building successful online communities: Evidence-based social design`` \cite{kraut2012building} \citeauthor{kraut2012building} lie out five equally important criteria online platforms have to fulfill in order to thrive:
 1) When starting a community, it has to have a critical mass of users who create content. StackOverflow already had a critical mass of users from the beginning due to the StackOverflow team already being experts in the domain \cite{mamykina2011design} and the private beta\footref{atwood2008stack}. Both aspects ensured a strong community core early on.
-2) The platform must attract new users to grow as well as to replace leaving users. Depending on the type of community new users should bring certain skills, for example, programming background in open-source software development, or extended knowledge on certain domains; or qualities, for example, a certain illness in medical communities. New users also bring the challenge of onboarding with them. Most newcomers will not be familiar with all the rules and nuances of the community \cite{yazdanian2019eliciting}\footnote{\label{hanlon2018stack}\url{https://stackoverflow.blog/2018/04/26/stack-overflow-isnt-very-welcoming-its-time-for-that-to-change/}}. 
+2) The platform must attract new users to grow as well as replace leaving users. Depending on the type of community new users should bring certain skills, for example, programming background in open-source software development, or extended knowledge on certain domains; or qualities, for example, a certain illness in medical communities. New users also bring the challenge of onboarding with them. Most newcomers will not be familiar with all the rules and nuances of the community \cite{yazdanian2019eliciting}\footnote{\label{hanlon2018stack}\url{https://stackoverflow.blog/2018/04/26/stack-overflow-isnt-very-welcoming-its-time-for-that-to-change/}}. 
 3) The platform should encourage users to commit to the community. Online communities are often based on the voluntary commitment of their users \cite{ipeirotis2014quizz}, hence the platform has to ensure users are willing to stay. Most platforms do not have contracts with their users, so users should see benefits for staying with the community.
@@ -97,7 +96,7 @@ The onboarding process of new users is a permanent challenge for online communit
 \textbf{One-day-flies}\\
 \citeauthor{slag2015one} investigated why many users on StackOverflow only post once after their registration \cite{slag2015one}. They found that 47\% of all users on StackOverflow posted only once and called them one-day-flies. They suggest that code example quality is lower than that of more involved users, which often leads to answers and comments to first improve the question and code instead of answering the stated question. This likely discourages new users from using the site further. Negative feedback instead of constructive feedback is another cause for discontinuation of usage. The StackOverflow staff also conducted their own research on negative feedback of the community\footnote{\label{silge2019welcome}\url{https://stackoverflow.blog/2018/07/10/welcome-wagon-classifying-comments-on-stack-overflow/}}. They investigated the comment sections of questions by recruiting their staff members to rate a set of comments and they found more than 7\% of the reviewed comments are unwelcoming.
-One-day-flies are not unique to StackOverflow. \citeauthor{steinmacher2015social} investigated the social barriers newcomers face when they submit their first contribution to an open-source software project \cite{steinmacher2015social}. They based their work on empirical data and interviews and identified several social barriers preventing newcomers to place their first contribution to a project. Furthermore, newcomers are often on their own in open source projects. The lack of support and peers to ask for help hinders them. \citeauthor{yazdanian2019eliciting} found that new contributors on Wikipedia face challenges when editing articles. Wikipedia hosts millions of articles\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia}} and new contributors often do not know which articles they could edit and improve. Recommender systems can solve this problem by suggesting articles to edit but they suffer from the cold start problem because they rely on past user activity which is missing for new contributors. \citeauthor{yazdanian2019eliciting} proposed a solution by establishing a framework that automatically creates questionnaires to fill this gap. This also helps matching new contributors with more experienced contributors that could help newcomers when they face a problem. 
+One-day-flies are not unique to StackOverflow. \citeauthor{steinmacher2015social} investigated the social barriers newcomers face when they submit their first contribution to an open-source software project \cite{steinmacher2015social}. They based their work on empirical data and interviews and identified several social barriers preventing newcomers to place their first contribution to a project. Furthermore, newcomers are often on their own in open source projects. The lack of support and peers to ask for help hinders them. \citeauthor{yazdanian2019eliciting} found that new contributors on Wikipedia face challenges when editing articles. Wikipedia hosts millions of articles\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia}} and new contributors often do not know which articles they could edit and improve. Recommender systems can solve this problem by suggesting articles to edit but they suffer from the cold start problem because they rely on past user activity which is missing for new contributors. \citeauthor{yazdanian2019eliciting} proposed a solution by establishing a framework that automatically creates questionnaires to fill this gap. This also helps match new contributors with more experienced contributors that could help newcomers when they face a problem. 
 \citeauthor{allen2006organizational} showed that the one-time-contributors phenomenon also translates to workplaces and organizations \cite{allen2006organizational}. They found out that socialization with other members of an organization plays an important role in turnover. The better the socialization within the organization the less likely newcomers are to leave. This socialization process has to be actively pursued by the organization.
 \textbf{Lurking}\\
@@ -148,16 +147,16 @@ Unwelcomeness is a large problem on StackExchange \cite{ford2016paradise}\footre
 \subsection{Invoke commitment}
 While attracting and onboarding new users is an important step for growing a community, keeping them on the platform and turning them into long-lasting community members is equally as important for growth as well as sustainability. Users have to feel the benefits of staying with the community. Without the benefits, a user has little to no motivation to interact with the community and will most likely drop out of it. Benefits are diverse, however, they can be grouped into 5 categories: information exchange, social support, social interaction, time and location flexibility, and permanency \cite{iriberri2009life}. 
-As StackExchange is a CQA platform, the benefits from information exchange, time and location flexibility, and permanency are more prevalent, while social support and social interaction are more in the background. Social support and social interaction are more relevant in communities where individuals communicate about topics regarding themselves, for instance, communities where health aspects are the main focus \cite{maloney2005multilevel}. Time and location flexibility is important for all online communities. Information exchange and permanency are important for StackExchange as it is a large collection of knowledge that mostly does not change over time or from one individual to another. StackExchange' content is driven by the community and therefore depends on the voluntarism of its users, making benefits even more important.
+As StackExchange is a CQA platform, the benefits from information exchange, time and location flexibility, and permanency are more prevalent, while social support and social interaction are more in the background. Social support and social interaction are more relevant in communities where individuals communicate about topics regarding themselves, for instance, communities where health aspects are the main focus \cite{maloney2005multilevel}. Time and location flexibility is important for all online communities. Information exchange and permanency are important for StackExchange as it is a large collection of knowledge that mostly does not change over time or from one individual to another. StackExchange's content is driven by the community and therefore depends on the voluntarism of its users, making benefits even more important.
 %TODO abc this seem wrong here
-The backbone of a community is always the user base and its voluntarism to participate with the community. Even if the community is lead by a commercial core team, the community is almost always several orders of magnitude greater than the number of the paid employees forming the core team \cite{butler2002community}. The core team often provides the infrastructure for the community and does some community work. However, most of the community work is done by volunteers of the community.
+The backbone of a community is always the user base and its voluntarism to participate with the community. Even if the community is led by a commercial core team, the community is almost always several orders of magnitude greater than the number of the paid employees forming the core team \cite{butler2002community}. The core team often provides the infrastructure for the community and does some community work. However, most of the community work is done by volunteers of the community.
 This is also true for the StackExchange platform where the core team of paid employees is between 200 to 500\footnote{\url{https://www.linkedin.com/company/stack-overflow}} (this includes employees working on other products) and the number of voluntary community members (these users have access to moderation tools) performing community work is around 10,000 \footnote{\url{https://data.stackexchange.com/stackoverflow/revision/1412005/1735651/users-with-rep-20k}}.
 \subsection{Encourage contribution}
-In a community, users can generally be split into 2 groups by motivation to voluntarily contribute: One group acts out of altruism, where users contribute with the reason to help others and do good to the community; the second group acts out of egoism and selfish reasons, for instance, getting recognition from other people \cite{ginsburg2004framework}. Users of the second group still help the community but their primary goal is not necessarily the health of the community but gaining reputation and making a name for themselves. Contrary, users of the first group primarily focus on helping the community and see reputation as a positive side effect which also feeds back in their ability to help others. While these groups have different objectives, both groups need recognition of their efforts \cite{iriberri2009life}. There are several methods for recognizing the value a member provides to the community: reputation, awards, trust, identity, etc. \cite{ginsburg2004framework}. Reputation, trust, and identity are often reached gradually over time by continuously working on them, awards are reached at discrete points in time. Awards often take some time and effort to achieve. However, awards should not be easily achievable as their value come from the work that is required for them\cite{lawler2000rewarding}. They should also be meaningful in the community they are used in. Most importantly, awards have to be visible to the public, so other members can see them. In this way, awards become a powerful motivator to users.
+In a community, users can generally be split into 2 groups by motivation to voluntarily contribute: One group acts out of altruism, where users contribute with the reason to help others and do good to the community; the second group acts out of egoism and selfish reasons, for instance, getting recognition from other people \cite{ginsburg2004framework}. Users of the second group still help the community but their primary goal is not necessarily the health of the community but gaining reputation and making a name for themselves. Contrary, users of the first group primarily focus on helping the community and see reputation as a positive side effect which also feeds back in their ability to help others. While these groups have different objectives, both groups need recognition of their efforts \cite{iriberri2009life}. There are several methods for recognizing the value a member provides to the community: reputation, awards, trust, identity, etc. \cite{ginsburg2004framework}. Reputation, trust, and identity are often reached gradually over time by continuously working on them, awards are reached at discrete points in time. Awards often take some time and effort to achieve. However, awards should not be easily achievable as their value comes from the work that is required for them\cite{lawler2000rewarding}. They should also be meaningful in the community they are used in. Most importantly, awards have to be visible to the public, so other members can see them. In this way, awards become a powerful motivator to users.
 %TODO maybe look at finding of https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.3093&rep=rep1&type=pdf , in discussion bullet point list: subgroups, working and less feature > not working and more features, selfmoderation
@@ -200,7 +199,7 @@ In a community, users can generally be split into 2 groups by motivation to volu
 %quality
-StackExchange employs serveral features to engage users with the platform, for instance, the reputation system and the badge (award) system. These systems reward contributing users with achievements and encourages further contribution to the community. Both systems try to keep and increase the quality of the posts on the platform.
+StackExchange employs several features to engage users with the platform, for instance, the reputation system and the badge (award) system. These systems reward contributing users with achievements and encourage further contribution to the community. Both systems try to keep and increase the quality of the posts on the platform.
 \textbf{Reputation}\\
 Reputation plays an important role on StackExchange and indicates the credibility of a user, as well as a primary source of answers of high-quality \cite{movshovitz2013analysis}. Although the largest chunk of all questions is posted by low-reputation users, high-reputation users post more questions on average. To earn a high reputation a user has to invest a lot of effort and time into the community, for instance, asking good questions or providing useful answers to questions of others. Reputation is earned when a question or answer is upvoted by other users, or if an answer is accepted as the solution to a question by the question creator. \citeauthor{mamykina2011design} found that the reputation system of StackOverflow encourages users to compete productively \cite{mamykina2011design}. But not every user participates equally, and participation depends on the personality of the user \cite{bazelli2013personality}. \citeauthor{bazelli2013personality} showed that the top-reputation users on StackOverflow are more extroverted compared to users with less reputation. \citeauthor{movshovitz2013analysis} found that by analyzing the StackOverflow community network, experts can be reliably identified by their contribution within the first few months after their registration. Graph analysis also allowed the authors to find spamming users or users with other extreme behavior. 
@@ -248,7 +247,7 @@ Quality is a concern in online communities. Platform moderators and admins want
 \textbf{Content abuse}\\
 \citeauthor{srba2016stack} identified 3 types of users causing the lowering of quality: \emph{Help Vampires} (these spend little to no effort to research their questions, which leads to many duplicates), \emph{Noobs} (they create mostly trivial questions), and \emph{Reputation Collectors}\cite{srba2016stack}. They try to gain reputation as fast as possible by methods described by \citeauthor{bosu2013building}\cite{bosu2013building} but often with no regard of what effects their behavior has on the community, for instance, lowering overall content quality, turning other users away from the platform, and encouraging the behavior of \emph{Help Vampires} and \emph{Noobs} even more.
-Questions of \emph{Help Vampires} and \emph{Noobs} direct answerers away from much more demanding questions. On one hand, this leads to knowledgeable answerers answering questions for which they are overqualified to answer, and on the other hand to a lack of adequate quality answers for more difficult questions. \citeauthor{srba2016stack} suggest a system that tries to match questions with answerers that satisfy the knowledge requirement but are not grossly overqualified to answer the question. A system with this quality would prevent suggesting simple questions to overqualified answerers, and prevent an answer vacuum for questions with more advanced topics. This would ensure a more optimal utilization of the answering capability of the community.
+Questions of \emph{Help Vampires} and \emph{Noobs} direct answerers away from much more demanding questions. On one hand, this leads to knowledgeable answerers answering questions for which they are overqualified to answer, and on the other hand to a lack of adequate quality answers for more difficult questions. \citeauthor{srba2016stack} suggest a system that tries to match questions with answerers that satisfy the knowledge requirement but are not grossly overqualified to answer the question. A system with this quality would prevent suggesting simple questions to overqualified answerers, and prevent an answer vacuum for questions with more advanced topics. This would ensure more optimal utilization of the answering capability of the community.
 \textbf{Content moderation}\\
 \citeauthor{srba2016stack} proposed some solutions to improve the quality problems. One suggestion is to restrict the openness of a community. This can be accomplished in different ways, for instance, introducing a posting limit for questions on a daily basis\cite{srba2016stack}. While this certainly limits the amount of low-quality posts, it does not eliminate the problem. Furthermore, this limitation would also hurt engaged users which would create a large volume of higher quality content. A much more intricate solution that adapts to user behavior would be required, otherwise, the limitation would hurt the community more than it improves.
@@ -265,7 +264,7 @@ Questions of \emph{Help Vampires} and \emph{Noobs} direct answerers away from mu
 %   -> matching questions with answerers \cite{srba2016stack} (difficult questions -> expert users, easier questions -> answerers that know it but are not experts), dont overload experts, utilize capacities of the many nonexperts 
-Another solution is to find content abusers (noobs, help vampires, etc.) directly. One approach is to add a reporting system to the community, however, a system of this kind is also driven by user inputs and therefore can be manipulated as well. This would lead to excluding users flagged as false positives and miss a portion of content abusers completely. A better approach is to systematically find these users by their behavior. \citeauthor{kayes2015social} describe a classifier which achieves an accuracy of 83\% on the \emph{Yahoo! Answers} platform \cite{kayes2015social}. The classifier is based on empirical data where they looked at historical user activity, report data, and which users were banned from the platform. From these statistics, they created the classifier which is able to distinguish between falsely and fairly banned users. \citeauthor{cheng2015antisocial} performed a similar study on antisocial behavior on various platforms. They too looked at historical data of users and their eventual bans as well as on their deleted posts rates. Their classifier achieved an accuracy of 80\%.
+Another solution is to find content abusers (noobs, help vampires, etc.) directly. One approach is to add a reporting system to the community, however, a system of this kind is also driven by user inputs and therefore can be manipulated as well. This would lead to excluding users flagged as false positives and missing a portion of content abusers completely. A better approach is to systematically find these users by their behavior. \citeauthor{kayes2015social} describe a classifier which achieves an accuracy of 83\% on the \emph{Yahoo! Answers} platform \cite{kayes2015social}. The classifier is based on empirical data where they looked at historical user activity, report data, and which users were banned from the platform. From these statistics, they created the classifier which is able to distinguish between falsely and fairly banned users. \citeauthor{cheng2015antisocial} performed a similar study on antisocial behavior on various platforms. They too looked at historical data of users and their eventual bans as well as on their deleted posts rates. Their classifier achieved an accuracy of 80\%.
@@ -355,8 +354,7 @@ Linguistic Inquiry and Word Count (LIWC) \cite{pennebaker2001linguistic,pennebak
 % - very old (1966), continuously refined, still in use (vader)
 % - misses lexical feature detection (acronyms, ...) and sentiment intensity (vader)
-General Inquirer (GI)\cite{stone1966general} is one of the oldest sentiment tools still in use. It was originally designed in 1966 and has been continuously refined and now consists of about 11000 words where 1900 positively rated words and 2300 negatively rated words.
+General Inquirer (GI)\cite{stone1966general} is one of the oldest sentiment tools still in use. It was originally designed in 1966 and has been continuously refined and now consists of about 11000 words where 1900 positively rated words and 2300 negatively rated words. Like LIWC, GI uses a polarity-based lexicon and therefore is not able to capture sentiment intensity\cite{hutto2014vader}. Also, GI does not recognize lexical features, such as acronyms, initialisms, etc.
 Like LIWC, GI uses a polarity-based lexicon and therefore is not able to capture sentiment intensity\cite{hutto2014vader}. Also, GI does not recognize lexical features, such as acronyms, initialisms, etc.
 %Hu-Liu04 \cite{hu2004mining,liu2005opinion}, 2004
@@ -393,7 +391,7 @@ Affective Norms for English Words (ANEW) \cite{bradley1999affective} is a sentim
 % - group synonyms (synsets) together (vader)
 % - 
-WordNet analyzes text with a dictionary which contains lexical concepts \cite{miller1995wordnet,miller1998wordnet}. Each lexical concept contains multiple words which are synonyms, called synsets. These synsets are then linked by semantic relations. With this lexicon, text can be queried in multiple different ways.
+WordNet analyzes text with a dictionary that contains lexical concepts \cite{miller1995wordnet,miller1998wordnet}. Each lexical concept contains multiple words which are synonyms, called synsets. These synsets are then linked by semantic relations. With this lexicon, text can be queried in multiple different ways.
 %sentiwordnet \cite{baccianella2010sentiwordnet}
@@ -413,7 +411,7 @@ SentiWordNet \cite{baccianella2010sentiwordnet} is an extension of WordNet and a
 % - derive meaning from context -> disambiguation (vader, akkaya2009subjectivity)
 % - distinguish subjective and objective word usage, sentences can only contain negative words used in object ways -> sentence not negative, TODO example sentence (akkaya2009subjectivity)
-Word-Sense Disambiguation (WSD)\cite{akkaya2009subjectivity} is not a sentiment analysis tool per se but it can be used to enhance others. In languages certain words have different meanings depending on the context they are used in. When sentiment tools, which do not use WSD, analyze a piece of text, some words which have different meanings depending on the context may skew the resulting sentiment. Some words can even change from positive to negative or vice versa depending on the context. WSD tries to distinguish between subjective and objective word usage. For example \emph{The party was great.} and \emph{The party lost many votes}. Although \emph party is written exactly the same it has 2 completely different meanings. Depending on the context, ambiguous words can have different sentiments.
+Word-Sense Disambiguation (WSD)\cite{akkaya2009subjectivity} is not a sentiment analysis tool per se, however, it can be used to enhance others. In languages certain words have different meanings depending on the context they are used in. When sentiment tools, which do not use WSD, analyze a piece of text, some words which have different meanings depending on the context may skew the resulting sentiment. Some words can even change from positive to negative or vice versa depending on the context. WSD tries to distinguish between subjective and objective word usage. For example \emph{The party was great.} and \emph{The party lost many votes}. Although \emph party is written exactly the same it has 2 completely different meanings. Depending on the context, ambiguous words can have different sentiments.
 %%%%% automated (machine learning)
@@ -426,23 +424,23 @@ Word-Sense Disambiguation (WSD)\cite{akkaya2009subjectivity} is not a sentiment
 %updateing (extend/modify) hard (e.g. new domain) (vader)
 \textbf{Machine Learning Approches}\\
-Because handcrafting sentiment analysis requires a lot of effort, researchers turned to approaches that offload the labor-intensive part to machine learning (ML). However, this results in a new challenge, namely: gathering a \emph good data set to feed the machine learning algorithms for training. Firstly, \emph good data set needs to represent as many features as possible, otherwise, the algorithm will not recognize it. Secondly, the data set has to be unbiased and representative for all the data of which the data set is a part of. The data set has to represent each feature in an appropriate amount, otherwise, the algorithms may discriminate a feature in favor of other more represented features. These requirements are hard to fulfill and often they are not\cite{hutto2014vader}. After a data set is acquired, a model has to be learned by the ML algorithm, which is, depending on the complexity of the algorithm, a very computational-intensive and memory-intensive process. After training is completed, the algorithm can predict sentiment values for new pieces of text, which it has never seen before. However, due to the nature of this approach, the results cannot be comprehended by humans easily if at all. ML approaches also suffer from a generalization problem and therefore cannot be transferred to other domains without accepting a bad performance, or updating the training data set to fit the new domain. Updating (extending or modify) the model also requires complete retraining from scratch. These drawbacks make ML algorithms useful only in narrow situations where changes are not required and the training data is static and unbiased.
+Because handcrafting sentiment analysis requires a lot of effort, researchers turned to approaches that offload the labor-intensive part to machine learning (ML). However, this results in a new challenge, namely: gathering a \emph good data set to feed the machine learning algorithms for training. Firstly, \emph good data set needs to represent as many features as possible, otherwise, the algorithm will not recognize it. Secondly, the data set has to be unbiased and representative for all the data of which the data set is a part of. The data set has to represent each feature in an appropriate amount, otherwise, the algorithms may discriminate a feature in favor of other more represented features. These requirements are hard to fulfill and often they are not\cite{hutto2014vader}. After a data set is acquired, a model has to be learned by the ML algorithm, which is, depending on the complexity of the algorithm, a very computational-intensive and memory-intensive process. After training is completed, the algorithm can predict sentiment values for new pieces of text, which it has never seen before. However, due to the nature of this approach, the results cannot be comprehended by humans easily if at all. ML approaches also suffer from a generalization problem and therefore cannot be transferred to other domains without accepting a bad performance, or updating the training data set to fit the new domain. Updating (extending or modifing) the model also requires complete retraining from scratch. These drawbacks make ML algorithms useful only in narrow situations where changes are not required and the training data is static and unbiased.
 % naive bayes
 % - simple (vader)
 % - assumption: feature probabilties are indepenend of each other (vader)
-The Naive Bayes (NB) classifier is one of the simplest ML algorithms. It uses Bayesian probability to classify samples. This requires the assumption that the probabilities of the features are independent of one another. %which often they are not because languages have certain structures of features.
+The Naive Bayes (NB) classifier is one of the simplest ML algorithms. It uses Bayesian probability to classify samples. This requires the assumption that the probabilities of the features are independent of one another, which often they are not because languages have certain structures of features.
 % Maximum Entropy
 % - exponential model + logistic regression (vader)
 % - feature weighting through not assuming indepenence as in naive bayes (vader)
-Maximum Entropy (ME) is a more sophisticated algorithm. It uses an exponential model and logistic regression. It distinguishes itself from NB by not assuming conditional independence of features. It also supported weighting of features by using the entropy of features.
+Maximum Entropy (ME) is a more sophisticated algorithm. It uses an exponential model and logistic regression. It distinguishes itself from NB by not assuming conditional independence of features. It also supports weighting of features by using the entropy of features.
 %svm
 %- mathemtical anspruchsvoll (vader)
 %- seperate datapoints using hyper planes (vader)
 %- long training period (other methods do not need training at all because lexica) (vader)
-Support Vector Machines (SVM) uses a different approach. SVMs put data points in an $n$-dimentional space and differentiates them with hyperplanes ($n-1$ dimensional planes), so data points fall in 1 of the 2 halves of the space divided by the hyperplane. This approach is usually very memory and computation-intensive as each data point is represented by an $n$-dimentional vector where $n$ denotes the number of trained features.
+Support Vector Machines (SVM) uses a different approach. SVMs put data points in an $n$-dimentional space and differentiate them with hyperplanes ($n-1$ dimensional planes), so data points fall in 1 of the 2 halves of the space divided by the hyperplane. This approach is usually very memory and computation-intensive as each data point is represented by an $n$-dimentional vector where $n$ denotes the number of trained features.
 %generall blyabla, transition to vader
@@ -474,7 +472,7 @@ This shortcoming was addressed by \citeauthor{hutto2014vader} who introduced a n
 When introducing a change to a system (experiment), one often wants to know whether the intervention achieves its intended purpose. This leads to 3 possible outcomes: a) the intervention shows an effect and the system changes in the desired way, b) the intervention shows an effect and the system changes in an undesired way, or c) the system did not react at all to the change. There are multiple ways to determine which of these outcomes occur. To analyze the behavior of the system, data from before and after the intervention as well as the nature of the intervention has to be acquired. The are multiple ways to run such an experiment and one has to choose which type of experiment fits best. There are 2 categories of approaches: actively creating an experiment where one design the experiment before it is executed (for example randomized control trials in medical fields), or using existing data of an experiment that was not designed beforehand, or where setting up a designed experiment is not possible (quasi-experiment).
-As this thesis investigates a change that has already been implemented by another party, this thesis covers quasi-experiments. A tool that is often used for this purpose is an \emph{Interrupted Time Series} (ITS) analysis. The ITS analysis is a form of segmented regression analysis, where data from before, after and during the intervention is regressed with separate line segements\cite{mcdowall2019interrupted}. ITS requires data at (regular) intervals from before and after the intervention (time series). The interrupt signifies the intervention and the time of when it occurred must be known. The intervention can be at a single point in time or it can be stretched out over a certain time span. This property must also be known to take it into account when designing the regression. Also, as the data is acquired from a quasi-experiment, it may be baised\cite{bernal2017interrupted}, for example, seasonality, time-varying confounders (for example, a change in measuring data), variance in the number of single observations grouped together in an interval measurement, etc. These biases need to be addressed if present. Seasonality can be accounted for by subtracting the average value of each of the months in successive years (i.e. subtract the average value of all Januaries in the data set from the values in Januaries).
+As this thesis investigates a change that has already been implemented by another party, this thesis covers quasi-experiments. A tool that is often used for this purpose is an \emph{Interrupted Time Series} (ITS) analysis. The ITS analysis is a form of segmented regression analysis, where data from before, after, and during the intervention is regressed with separate line segements\cite{mcdowall2019interrupted}. ITS requires data at (regular) intervals from before and after the intervention (time series). The interrupt signifies the intervention and the time of when it occurred must be known. The intervention can be at a single point in time or it can be stretched out over a certain time span. This property must also be known to take it into account when designing the regression. Also, as the data is acquired from a quasi-experiment, it may be baised\cite{bernal2017interrupted}, for example, seasonality, time-varying confounders (for example, a change in measuring data), variance in the number of single observations grouped together in an interval measurement, etc. These biases need to be addressed if present. Seasonality can be accounted for by subtracting the average value of each of the months in successive years (i.e. subtract the average value of all Januaries in the data set from the values in Januaries).
 %\begin{lstlisting}
 % deseasonalized = datasample - average(dataSamplesInMonth(month(datasample)))
 %\end{lstlisting}
--- a/text/3_method.tex
+++ b/text/3_method.tex
@@ -1,6 +1,6 @@
 \chapter{Method}
-StackExchange introduced a \emph{new contributor} indicator to all communities on $21^{st}$ of August in 2018 at 9 pm UTC\footnote{\label{post2018come}\url{https://meta.stackexchange.com/questions/314287/come-take-a-look-at-our-new-contributor-indicator}}. This step is one of many StackExchange took to make the platform and its members more welcoming towards new users. This indicator is shown to potential answerers in the answer text box of a question  from a new contributor as shown in figure \ref{newcontributor}. The indicator is added to a question if the question is the first contribution of the user or if the first contribution (question or answer) of the user was less than 7 days ago\footnote{\label{sonic2018what}\url{https://meta.stackexchange.com/questions/314472/what-are-the-exact-criteria-for-the-new-contributor-indicator-to-be-shown}}. The indicator is then shown for 7 days from the creation date of the question. Note that the user can be registered for a long time and then post their first question and it is counted as a question from a new contributor. Also, if a user decides to delete all their existing contributions from the site and then creates a new question this question will have the \emph{new contributor} indicator attached. The sole deciding factor for the indicator is the date and time of the first non-deleted contribution and the 7-day window afterward. 
+StackExchange introduced a \emph{new contributor} indicator to all communities on $21^{st}$ of August in 2018 at 9 pm UTC\footnote{\label{post2018come}\url{https://meta.stackexchange.com/questions/314287/come-take-a-look-at-our-new-contributor-indicator}}. This step is one of many StackExchange took to make the platform and its members more welcoming towards new users. This indicator is shown to potential answerers in the answer text box of a question from a new contributor, as shown in figure \ref{newcontributor}. The indicator is added to a question if the question is the first contribution of the user or if the first contribution (question or answer) of the user was less than 7 days ago\footnote{\label{sonic2018what}\url{https://meta.stackexchange.com/questions/314472/what-are-the-exact-criteria-for-the-new-contributor-indicator-to-be-shown}}. The indicator is then shown for 7 days from the creation date of the question. Note that the user can be registered for a long time and then post their first question and it is counted as a question from a new contributor. Also, if a user decides to delete all their existing contributions from the site and then creates a new question this question will have the \emph{new contributor} indicator attached. The sole deciding factor for the indicator is the date and time of the first non-deleted contribution and the 7-day window afterward. 
 \begin{figure}
 \centering\includegraphics[scale=0.47]{figures/new_contributor}
@@ -15,18 +15,18 @@ StackExchange introduced a \emph{new contributor} indicator to all communities o
 %TODO state plots of sec 5 here and why these were chosen
 % -> also limitierungen, andere faktoren
-This thesis investigates the following criteria to determine whether the change affected a community positively or negatively, or whether the community is largly unaffected:
+This thesis investigates the following criteria to determine whether the change affected a community positively or negatively, or whether the community is largely unaffected:
 \begin{itemize}
 \item \textbf{Sentiment of answers to a question}. This symbolizes the quality of communication between different individuals. Better values indicate better communication. Through the display of the \emph{new contributor} indicator, the answerer should react less negatively towards the new user when they behave outside the community standards.
- \item \textbf{Vote score of questions}. This symbolizes the feedback the community give to a question. Voters will likely vote more postively (not voting instead of down voting, or upvoting instead of of not voting) due to the \emph{new contributor} indicator. Thereby the vote score should increase after the change.
+ \item \textbf{Vote score of questions}. This symbolizes the feedback the community gives to a question. Voters will likely vote more positively (not voting instead of down-voting, or upvoting instead of not voting) due to the \emph{new contributor} indicator. Thereby the vote score should increase after the change.
- \item \textbf{The amount of first and follow-up question}. This symbolizes the willingness of users to participate in the community. Higher amounts of first questions indicate higher number of new participating users. Higher follow-up questions indicate that users are more willing to stay within the community and continue their active participation.
+ \item \textbf{Amount of first and follow-up question}. This symbolizes the willingness of users to participate in the community. Higher amounts of first questions indicate a higher number of new participating users. Higher follow-up questions indicate that users are more willing to stay within the community and continue their active participation.
 \end{itemize}
-If these criteria improve after the change is introducted, the community is affected positively. If they worsen, the community is affected negatively. If the criteria stay largely the same, then the community is unaffected. Here it is important to note that a question may receive answers and votes after the \emph{new contributor} indicator is no longer shown and therefore these are not considered as part of the data set to analyze.
+If these criteria improve after the change is introduced, the community is affected positively. If they worsen, the community is affected negatively. If the criteria stay largely the same, then the community is unaffected. Here it is important to note that a question may receive answers and votes after the \emph{new contributor} indicator is no longer shown and therefore these are not considered as part of the data set to analyze.
 %only when new contributor insicator is shown
-To measure the effect on sentiment of the change this thesis utilizes the Vader\cite{hutto2014vader} sentiment analysis tool. This decision is based on the performance in analyzing and categorizing microblog-like texts, the speed of processing, and on the simplicity of use. Vader uses a lexicon of words, and rules related to grammar and syntax. This lexicon was manually created by \citeauthor{hutto2014vader} and is therefore considered a \emph{gold standard lexicon}. Each word has a sentiment value attached to it. Negative words, for instance \emph evil, have negative values; good words, for instance \emph brave, have a positive values. The range of these values is continuous, so words can have different intensities, for instance, \emph bad has a higher value than \emph evil. This feature of instensity distinction makes Vader a valance-based approach. 
+To measure the effect on the sentiment of the change this thesis utilizes the Vader\cite{hutto2014vader} sentiment analysis tool. This decision is based on the performance in analyzing and categorizing microblog-like texts, the speed of processing, and the simplicity of use. Vader uses a lexicon of words, and rules related to grammar and syntax. This lexicon was manually created by \citeauthor{hutto2014vader} and is therefore considered a \emph{gold standard lexicon}. Each word has a sentiment value attached to it. Negative words, for instance, \emph evil, have negative values; good words, for instance, \emph brave, have positive values. The range of these values is continuous, so words can have different intensities, for instance, \emph bad has a higher value than \emph evil. This feature of intensity distinction makes Vader a valance-based approach. 
 However, just simply looking at the words in a text is not enough and therefore Vader also uses rules to determine how words are used in conjunction with other words. Some words can boost other words. For example, ``They did well.'' is less intense than ``They did extremely well.''. This works for both positive and negative sentences. Moreover, words can have different meanings depending on the context, for instance, ``Fire provides warmth.'' and ``Boss is about to fire an employee.'' This feature is called \emph{Word Sense Disambiguation}.
@@ -34,9 +34,9 @@ Furthermore, Vader also detects language features commonly found in social media
 After all these features are considered, Vader assigns a sentiment value between -1 and 1 on a continuous range. The sentiment range is divided into 3 classes: negative (-1 to -0.05), neutral (-0.05 to 0.05), and positive (0.05 to 1). The outer edges of this range are rarely reached as the text would have to be extremely negative or positive which is very unlikely. 
 %speed
-Due to this mathematical simplicy, Vader is really fast when computing a sentiment value for a given text. This feature is one of the requirements   \citeauthor{hutto2014vader} originally posed. They proposed that Vader shall be fast enough to do online (real time) analysis of social media text.
+Due to this mathematical simplicity, Vader is really fast when computing a sentiment value for a given text. This feature is one of the requirements   \citeauthor{hutto2014vader} originally posed. They proposed that Vader shall be fast enough to do online (real-time) analysis of social media text.
 %simplicy
-Vader is also easy to use. It does not require any pre-training on a dataset as it already has a human curated lexicon and rules related to grammar and syntax. Therefore the sentiment analysis only requires an input to evaluate. This thesis uses a publicly available implementation of Vader.\footnote{\url{https://github.com/cjhutto/vaderSentiment}}
+Vader is also easy to use. It does not require any pre-training on a dataset as it already has a human-curated lexicon and rules related to grammar and syntax. Therefore the sentiment analysis only requires an input to evaluate. This thesis uses a publicly available implementation of Vader.\footnote{\url{https://github.com/cjhutto/vaderSentiment}}
 The design of Vader allows fast and verifiable analysis.
 % lexicon approach
 %valence based (sentiment intensity, (-1,1) continous)
@@ -56,7 +56,7 @@ StackExchange provides anonymized data dumps of all their communities for resear
 % broken entries, missing user id
 % answers in html -> strip html and remove code sections, no contribution to sentiment
-After preprocessing the raw data, relevant data is filtered and computed. Questions and answers in the data are mixed together and have to be separated and answers have to be linked to their questions. Also, questions in these datasets do not have the \emph{new contributor} indicator attached to them and neither do users. So, the first contribution date and time of users have to be calculated via the creation dates of the questions and answers the user has posted. Then, questions are filtered per user and by whether they are created within the 7-day window after the first contribution of the user. These questions were created during the period where the \emph{new contributor} indicator would have been displayed, in case the questions had been posted before the change, or has been displayed after the change. From these questions, all answers which arrived within the 7-day window are considered for the analysis. Answers which arrived at a later point are excluded as the answerer most likely has not seen the disclaimer shown in figure \ref{newcontributor}. Included answers are then analyzed with Vader and the resulting sentiments are stored. Furthermore, votes to questions of new contributors are counted if they arrived within the 7-day window and count 1 if it is an upvote and -1 if it is a downvote. Moreover, number of questions new contributors ask are counted and divided into two classes: 1st-question of a user and follow-up questions of a new contributor.
+After preprocessing the raw data, relevant data is filtered and computed. Questions and answers in the data are mixed together and have to be separated and answers have to be linked to their questions. Also, questions in these datasets do not have the \emph{new contributor} indicator attached to them and neither do users. So, the first contribution date and time of users have to be calculated via the creation dates of the questions and answers the user has posted. Then, questions are filtered per user and by whether they are created within the 7-day window after the first contribution of the user. These questions were created during the period where the \emph{new contributor} indicator would have been displayed, in case the questions had been posted before the change, or had been displayed after the change. From these questions, all answers which arrived within the 7-day window are considered for the analysis. Answers which arrived at a later point are excluded as the answerer most likely has not seen the disclaimer shown in figure \ref{newcontributor}. Included answers are then analyzed with Vader and the resulting sentiments are stored. Furthermore, votes to questions of new contributors are counted if they arrived within the 7-day window and count 1 if it is an upvote and -1 if it is a downvote. Moreover, the number of questions new contributors ask, are counted and divided into two classes: 1st-question of a user and follow-up questions of a new contributor.
 % calc sentiment for answers
 % questions do not have a tag if from a new contribtor -> calc first contributor
@@ -72,10 +72,10 @@ After preprocessing the raw data, relevant data is filtered and computed. Questi
 \section{Analysis}
-An interrupted time series (ITS) analysis captures trends before and after a change in a system and fits very well with the question this thesis investigates. ITS can be applied to a large variety of data if the data contains the same kind of data points before and after the change and when the change date and time are known. \citeauthor{bernal2017interrupted} published a paper on how ITS works \cite{bernal2017interrupted}. ITS performes well on medical data, for instance, when a new treatment is introduced ITS can visualize if the treatment improves a condition. For ITS no control group is required and often control groups are not feasible. ITS only works with the before and after data and a point in time where a change was introduced. 
+An interrupted time series (ITS) analysis captures trends before and after a change in a system and fits very well with the question this thesis investigates. ITS can be applied to a large variety of data if the data contains the same kind of data points before and after the change and when the change date and time are known. \citeauthor{bernal2017interrupted} published a paper on how ITS works \cite{bernal2017interrupted}. ITS performs well on medical data, for instance, when a new treatment is introduced ITS can visualize if the treatment improves a condition. For ITS no control group is required and often control groups are not feasible. ITS only works with the before and after data and a point in time where a change was introduced. 
-ITS relies on linear regression and tries to fit a three-segment linear function to the data. The authors also described cases where more than three segments are used but these models quickly raise the complexity of the analysis and for this thesis a three-segment linear regression is sufficient. The three segments are lines to fit the data before and after the change as well as one line to connect the other two lines at the change date. Figure \ref{itsexample} shows an example of an ITS. Each segment is captured by a tensor of the following formula $Y_t = \beta_0 + \beta_1T + \beta_2X_t + \beta_3TX_t$, where $T$ represents time as a number, for instance, number of months since the start of data recording, $X_t$ represents 0 or 1 depending on whether the change is in effect, $\beta_0$ represents the value at $T = 0$, $\beta_1$ represents the slope before the change, $\beta_2$ represents the value when the change is introduced, and $\beta_3$ represents the slope after the change. Contrary to the basic method explained in \cite{bernal2017interrupted} where the ITS is performed on aggregated values per month, this thesis performs the ITS on single data points, as the premise that the aggregated values all have the same weight within a certain margin is not fulfilled for sentiment and vote score values. Performing the ITS with aggregated values would skew the linear regression more towards data points with less weight. Single data point fitting prevents this, as weight is taken into account with more data points. To filter out seasonal effects, the average value of all data points with the same month of all years is subtracted from the data points (i.e. subtract the average value of all Januaries from each data point in a January). This thesis uses the least squares method for regression.
+ITS relies on linear regression and tries to fit a three-segment linear function to the data. The authors also described cases where more than three segments are used but these models quickly raise the complexity of the analysis and for this thesis a three-segment linear regression is sufficient. The three segments are lines to fit the data before and after the change as well as one line to connect the other two lines at the change date. Figure \ref{itsexample} shows an example of an ITS. Each segment is captured by a tensor of the following formula $Y_t = \beta_0 + \beta_1T + \beta_2X_t + \beta_3TX_t$, where $T$ represents time as a number, for instance, number of months since the start of data recording, $X_t$ represents 0 or 1 depending on whether the change is in effect, $\beta_0$ represents the value at $T = 0$, $\beta_1$ represents the slope before the change, $\beta_2$ represents the value when the change is introduced, and $\beta_3$ represents the slope after the change. Contrary to the basic method explained in \cite{bernal2017interrupted} where the ITS is performed on aggregated values per month, this thesis performs the ITS on single data points, as the premise that the aggregated values all have the same weight within a certain margin is not fulfilled for sentiment and vote score values. Performing the ITS with aggregated values would skew the linear regression more towards data points with less weight. Single data point fitting prevents this, as weight is taken into account with more data points. To filter out seasonal effects, the average value of all data points with the same month of all years is subtracted from the data points (i.e. subtract the average value of all Januaries from each data point in a January). This thesis uses the least-squares method for regression.
-Although, the ITS analysis takes data density variability and seasonality into account, there is always a possibility that an unknown factor or event is contained in the data. It is always recommended to do a visual inspection of the data. This thesis contains one example where the data density increases so drastically in a particular time span that this form of analysis looses accuracy. 
+Although the ITS analysis takes data density variability and seasonality into account, there is always a possibility that an unknown factor or event is contained in the data. It is always recommended to do a visual inspection of the data. This thesis contains one example where the data density increases so drastically in a particular time span that this form of analysis loses accuracy. 
 %limitations
 % large sudden changes (maybe include example from analysis)
 % autocorrelation?
@@ -84,7 +84,7 @@ Although, the ITS analysis takes data density variability and seasonality into a
 \begin{figure}
 \centering\includegraphics[scale=0.7]{figures/itsexample}
- \caption{An example that visualizes how ITS works. The change of the system occurs at month 0. The blue line shows the average sentiment of fictional answers grouped by month. The numbers attached to the blue line show the number of sentiment values for a given month. The yellow line represents the ITS analysis as a three-segment line. This exmaple shows the expected behavior of the data sets in the following sections.}
+ \caption{An example that visualizes how ITS works. The change of the system occurs at month 0. The blue line shows the average sentiment of fictional answers grouped by month. The numbers attached to the blue line show the number of sentiment values for a given month. The yellow line represents the ITS analysis as a three-segment line. This example shows the expected behavior of the data sets in the following sections.}
 \label{itsexample}
 \end{figure}
--- a/2
+++ b/2
@@ -21,7 +21,7 @@
 allg:
 - DONE 50+ refs 
 - DONE: links -> foot notes
- mehr structur
+- DONE R mehr structur
 extra