diff --git a/text/0_abstract.tex b/text/0_abstract.tex index cdfa97d..b73d15c 100644 --- a/text/0_abstract.tex +++ b/text/0_abstract.tex @@ -1,7 +1,7 @@ \chapter*{Abstract} \label{cha:abstract} -StackExchange is a question and answer platform and like other social platforms, StackExchange is eager to provide a good first impression to users. StackExchange made many decisions to attract new users. One of these decisions was to introduce the \emph{new contributor} indicator which is shown to users that may answer a question from a new user. This thesis investigates whether this change improved the impression, new users experience. To measure whether the change achieved its intended target, this thesis uses VADER to quantify the sentiment of the answers to questions of new contributors which are then used in an interrupted time series. The results indicate that in some of the communities the change did indeed achieve its intended purpose. +StackExchange is a question and answer platform and as many other social platforms, StackExchange is eager to provide a good first impression to users. StackExchange made many decisions to attract new users. One of these decisions was to introduce the \emph{new contributor} indicator which is shown to users that may answer a question from a new user. This thesis investigates whether this change improved the impression, new users experience. To measure whether the change achieved its intended target, this thesis uses VADER to quantify the sentiment of the answers to questions of new contributors which are then used in an interrupted time series analysis. The results indicate that in some of the communities the change did indeed achieve its intended purpose. %This is a place-holder for the abstract. diff --git a/text/1_intro.tex b/text/1_intro.tex index 4c008bf..38ad76f 100644 --- a/text/1_intro.tex +++ b/text/1_intro.tex @@ -10,7 +10,7 @@ %DONE different types of communities: social exchange(facebook, twitter, div messaging apps), social support platforms, information exchange (community knownledge platforms (CQA, forums, wikis, ...), ...) -With the introduction of the Web 2.0 and its core feature of user interaction, users interact with each other in online communities. These communities come in various shapes and forms. There are communities for social interaction, for instance, Facebook\footnote{\url{https://facebook.com}}, Twitter\footnote{\url{https://twitter.com}}, and instant messaging apps. There are communities for social support, i.e. communities where users have certain common qualities, for instance, illnesses. There are also communities with the purpose of information exchange. Information exchange platforms can be grouped into expert and community knowledge platforms. While expert platforms are rarely known and often only used by niece groups, community knowledge platforms are widely known and used by the general public. Community knowledge platforms can be divided into 1) wikis, for instance, Wikipedia\footnote{\url{https://wikipedia.org}}, 2) forums, and 3) Q\&A platforms, for instance, \emph{Yahoo! Answers}\footnote{\url{https://answers.yahoo.com}}, \emph Quora\footnote{\url{https://quora.com}}, and StackExchange\footnote{\url{https://stackexchange.com}}. +With the introduction of the Web 2.0 and its core feature of user interaction, users interact with each other in online communities. These communities come in various shapes and forms. There are communities for social interaction, for instance, Facebook\footnote{\url{https://facebook.com}}, Twitter\footnote{\url{https://twitter.com}}, and instant messaging apps. There are communities for social support, i.e. communities where users have certain common qualities, for instance, illnesses. There are also communities with the purpose of information exchange. Information exchange platforms can be grouped into expert and community knowledge platforms. While expert platforms are rarely known and often only used by niche groups, community knowledge platforms are widely known and used by the general public. Community knowledge platforms can be divided into 1) wikis, for instance, Wikipedia\footnote{\url{https://wikipedia.org}}, 2) forums, and 3) Q\&A platforms, for instance, \emph{Yahoo! Answers}\footnote{\url{https://answers.yahoo.com}}, \emph Quora\footnote{\url{https://quora.com}}, and StackExchange\footnote{\url{https://stackexchange.com}}. StackExchange is a Q\&A platform and consists of 174 communities\footnote{\url{https://stackexchange.com/tour}}. Each community evolves around a specific topic, for instance, StackOverflow focuses on software engineering, and AskUbuntu focuses on the Ubuntu operating system. This distinguishes StackExchange from other Q\&A sites such as \emph{Yahoo! Answers} where no such differentiation into topics exists. %TODO ref diff --git a/text/2_relwork.tex b/text/2_relwork.tex index a41c4af..5c0ae15 100644 --- a/text/2_relwork.tex +++ b/text/2_relwork.tex @@ -6,7 +6,7 @@ This section is divided into three parts. The first part explains what StackExch StackExchange\footnote{\url{https://stackexchange.com}} is a community question and answering (CQA) platform where users can ask and answer questions, accept answers as an appropriate solution to the question, and up-/downvote questions and answers. StackExchange uses a community-driven knowledge creation process by allowing everyone who registers to participate in the community. Invested users also get access to moderation tools to help maintain the vast community. All posts on the StackExchange platform are publicly visible, allowing non-users to benefit from the community as well. Posts are also accessible for web search engines so users can find questions and answers easily with a simple web search. StackExchange keeps an archive of all questions and answers posted, creating a knowledge archive for future visitors to look into. -Originally, StackExchange started with StackOverflow\footnote{\url{https://stackoverflow.com}} in 2008\footnote{\label{atwood2008stack}\url{https://stackoverflow.blog/2008/08/01/stack-overflow-private-beta-begins/}}. Since then StackExchange grew into a platform hosting sites for 174 different topics\footnote{\label{stackexchangetour}\url{https://stackexchange.com/tour}}, for instance, programming (StackOverflow), maths (MathOverflow\footnote{\url{https://mathoverflow.net}} and Math StackExchange\footnote{\url{https://math.stackexchange.com}}), and typesetting (TeX/LaTeX\footnote{\url{https://tex.stackexchange.com}}). Questions on StackExchange are stated in the natural English language and consist of a title, a body containing a detailed description of the problem or information needed, and tags to categorize the question. After a question is posted the community can submit answers to the question. The author of the question can then accept an appropriate answer which satisfies their question. The accepted answer is then marked as such with a green checkmark and shown on top of all the other answers. Figure \ref{soexamplepost} shows an example of a StackOverflow question. Questions and answers can be up-/downvoted by every user registered on the site. Votes typically reflect the quality and importance of the respective question or answers. Answers with a high voting score raise to the top of the answer list as answers are sorted by the vote score in descending order by default. Voting also influences a user's reputation \cite{movshovitz2013analysis}\footref{stackexchangetour}. When a post (question or answer) is voted upon the reputation of the poster changes accordingly. Furthermore, downvoting of answers also decreases the reputation of the user who voted\footnote{\url{https://stackoverflow.com/help/privileges/vote-down}}. +Originally, StackExchange started with StackOverflow\footnote{\url{https://stackoverflow.com}} in 2008\footnote{\label{atwood2008stack}\url{https://stackoverflow.blog/2008/08/01/stack-overflow-private-beta-begins/}}. Since then StackExchange grew into a platform hosting sites for 174 different topics\footnote{\label{stackexchangetour}\url{https://stackexchange.com/tour}}, for instance, programming (StackOverflow), maths (MathOverflow\footnote{\url{https://mathoverflow.net}} and Math StackExchange\footnote{\url{https://math.stackexchange.com}}), and typesetting (TeX/LaTeX\footnote{\url{https://tex.stackexchange.com}}). Questions on StackExchange are stated in the natural English language and consist of a title, a body containing a detailed description of the problem or information needed, and tags to categorize the question. After a question is posted the community can submit answers to the question. The author of the question can then accept an appropriate answer which satisfies their question. The accepted answer is then marked as such with a green checkmark and shown on top of all the other answers. Figure \ref{soexamplepost} shows an example of a StackOverflow question. Questions and answers can be up-/downvoted by every user registered on the site. Votes typically reflect the quality and importance of the respective question or answers. Answers with a high voting score raise to the top of the answer list as answers are sorted by the vote score in descending order by default. Voting also influences a user's reputation \cite{movshovitz2013analysis}\footref{stackexchangetour}. When a post (question or answer) is voted upon, the reputation of the poster changes accordingly. Furthermore, downvoting of answers also decreases the reputation of the user who voted\footnote{\url{https://stackoverflow.com/help/privileges/vote-down}}. Reputation on StackExchange indicates how trustworthy a user is. To gain a high reputation value a user has to invest a lot of time and effort to reach a high reputation value by asking good questions and posting good answers to questions. Reputation also unlocks privileges which may differ slightly from one community to another\footnote{\url{https://mathoverflow.com/help/privileges/}}\mfs\footnote{\url{https://stackoverflow.com/help/privileges/}}. With privileges, users can, for instance, create new tags if the need for a new tag arises, cast votes on closing or reopening questions if the question is off-topic or a duplicate of another question, or when a question had been closed for no or a wrong reason, or even get access to moderation tools. @@ -64,7 +64,7 @@ All these communities differ in their design. Wikipedia is a community-driven kn CQA sites are very effective at code review \cite{treude2011programmers}. Code may be understood in the traditional sense of source code in programming-related fields but this also translates to other fields, for instance, mathematics where formulas represent code. CQA sites are also very effective at solving conceptual questions. This is due to the fact that people have different areas of knowledge and expertise \cite{robillard1999role} and due to the large user base established CQA sites have, which again increases the variety of users with expertise in different fields. \subsection{Running an online community} -Despite the differences in purpose and manifestation of these communities, they are social communities and they have to follow certain laws. In their book on ''Building successful online communities: Evidence-based social design`` \cite{kraut2012building} \citeauthor{kraut2012building} lie out five equally important criteria online platforms have to fulfill in order to thrive: +Despite the differences in purpose and manifestation of these communities, they are social communities and they have to follow certain laws. In their book on ''Building successful online communities: Evidence-based social design`` \cite{kraut2012building} \citeauthor{kraut2012building} lay out five equally important criteria online platforms have to fulfill in order to thrive: 1) When starting a community, it has to have a critical mass of users who create content. StackOverflow already had a critical mass of users from the beginning due to the StackOverflow team already being experts in the domain \cite{mamykina2011design} and the private beta\footref{atwood2008stack}. Both aspects ensured a strong community core early on. @@ -96,7 +96,7 @@ The onboarding process of new users is a permanent challenge for online communit \textbf{One-day-flies}\\ \citeauthor{slag2015one} investigated why many users on StackOverflow only post once after their registration \cite{slag2015one}. They found that 47\% of all users on StackOverflow posted only once and called them one-day-flies. They suggest that code example quality is lower than that of more involved users, which often leads to answers and comments to first improve the question and code instead of answering the stated question. This likely discourages new users from using the site further. Negative feedback instead of constructive feedback is another cause for discontinuation of usage. The StackOverflow staff also conducted their own research on negative feedback of the community\footnote{\label{silge2019welcome}\url{https://stackoverflow.blog/2018/07/10/welcome-wagon-classifying-comments-on-stack-overflow/}}. They investigated the comment sections of questions by recruiting their staff members to rate a set of comments and they found more than 7\% of the reviewed comments are unwelcoming. -One-day-flies are not unique to StackOverflow. \citeauthor{steinmacher2015social} investigated the social barriers newcomers face when they submit their first contribution to an open-source software project \cite{steinmacher2015social}. They based their work on empirical data and interviews and identified several social barriers preventing newcomers from placing their first contribution to a project. Furthermore, newcomers are often on their own in open-source projects. The lack of support and peers to ask for help hinders them. \citeauthor{yazdanian2019eliciting} found that new contributors on Wikipedia face challenges when editing articles. Wikipedia hosts millions of articles\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia}} and new contributors often do not know which articles they could edit and improve. Recommender systems can solve this problem by suggesting articles to edit but they suffer from the cold start problem because they rely on past user activity which is missing for new contributors. \citeauthor{yazdanian2019eliciting} proposed a solution by establishing a framework that automatically creates questionnaires to fill this gap. This also helps match new contributors with more experienced contributors that could help newcomers when they face a problem. +One-day-flies are not unique to StackOverflow. \citeauthor{steinmacher2015social} investigated the social barriers newcomers face when they submit their first contribution to an open-source software project \cite{steinmacher2015social}. They based their work on empirical data and interviews and identified several social barriers preventing newcomers from placing their first contribution to a project. Furthermore, newcomers are often on their own in open-source projects. The lack of support and peers to ask for help hinders them. \citeauthor{yazdanian2019eliciting} found that new contributors on Wikipedia face challenges when editing articles \cite{yazdanian2019eliciting}. Wikipedia hosts millions of articles\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia}} and new contributors often do not know which articles they could edit and improve. Recommender systems can solve this problem by suggesting articles to edit but they suffer from the cold start problem because they rely on past user activity which is missing for new contributors. \citeauthor{yazdanian2019eliciting} proposed a solution by establishing a framework that automatically creates questionnaires to fill this gap. This also helps match new contributors with more experienced contributors that could help newcomers when they face a problem. \citeauthor{allen2006organizational} showed that the one-time-contributors phenomenon also translates to workplaces and organizations \cite{allen2006organizational}. They found out that socialization with other members of an organization plays an important role in turnover. The better the socialization within the organization the less likely newcomers are to leave. This socialization process has to be actively pursued by the organization. \textbf{Lurking}\\ @@ -105,7 +105,8 @@ One-day-flies may partially be a result of lurking. Lurking is consuming content % DONE Non-public and public online community participation: Needs, attitudes and behavior \cite{nonnecke2006non} about lurking, many programmers do that probably, not even registering, lurking not a bad behavior but observing, lurkers are more introverted, passive behavior, less optimistic and positive than posters, prviously lurking was thought of free riding, not contributing, taking not giving to comunity, important for getting to know a community, better integration when joining \textbf{Reflection}\\ -The StackOverflow team acknowledged the one-time-contributors trend\footref{hanlon2018stack}\footref{silge2019welcome} and took efforts to make the site more welcoming to new users\footnote{\label{friend2018rolling}\url{https://stackoverflow.blog/2018/06/21/rolling-out-the-welcome-wagon-june-update/}}. They lied out various reasons: Firstly, they have sent mixed messages whether the site is an expert site or for everyone. Secondly, they gave too little guidance to new users which resulted in poor questions from new users and in the unwelcoming behavior of more integrated users towards the new users. New users do not know all the rules and nuances of communication in the communities. An example is that ''Please`` and ''Thank you`` are not well received on the site as they are deemed unnecessary. Also the quality, clearness, and language quality of the questions of new users is lower than more experienced users which leads to unwelcoming or even toxic answers and comments. Moreover, users who gained moderation tool access could close questions with predefined reasons which often are not meaningful enough for the poster of the question\footnote{\label{hanlon2013war}\url{https://stackoverflow.blog/2013/06/25/the-war-of-the-closes/}}. Thirdly, marginalized groups, for instance, women and people of color \cite{ford2016paradise}\footref{hanlon2018stack}\mfs\footnote{\label{stackoversurvey2019}\url{https://insights.stackoverflow.com/survey/2019}}, are more likely to drop out of the community due to unwelcoming behavior from other users\footref{hanlon2018stack}. They feel the site is an elitist and hostile place. +The StackOverflow team acknowledged the one-time-contributors trend\footref{hanlon2018stack}\footref{silge2019welcome} and took efforts to make the site more welcoming to new users\footnote{\label{friend2018rolling}\url{https://stackoverflow.blog/2018/06/21/rolling-out-the-welcome-wagon-june-update/}}. They layed out various reasons: Firstly, they have sent mixed messages whether the site is an expert site or for everyone. Secondly, they gave too little guidance to new users which resulted in poor questions from new users and in the unwelcoming behavior of more integrated users towards the new users. New users do not know all the rules and nuances of communication in the communities. An example is that ''Please`` and ''Thank you`` are not well received on the site as they are deemed unnecessary. Also the quality, clearness, and language quality of the questions of new users is lower than more experienced users which leads to unwelcoming or even toxic answers and comments. Moreover, users who gained moderation tool access could close questions with predefined reasons which often are not meaningful enough for the poster of the question\footnote{\label{hanlon2013war}\url{https://stackoverflow.blog/2013/06/25/the-war-of-the-closes/}}. Thirdly, marginalized groups, for instance, women and people of color \cite{ford2016paradise}\footref{hanlon2018stack}\mfs\footnote{\label{stackoversurvey2019}\url{https://insights.stackoverflow.com/survey/2019}}, are more likely to drop out of the community due to unwelcoming behavior from other users\footref{hanlon2018stack}. They feel the site is an elitist and hostile place. + The team suggested several steps to mitigate these problems. Some of these steps include appealing to the users to be more welcoming and forgiving towards new users\footref{hanlon2018stack}\footref{silge2019welcome}\mfs\footnote{\url{https://stackoverflow.blog/2012/07/20/kicking-off-the-summer-of-love/}}, other steps are geared towards changes to the platform itself: The \emph{Be nice policy} (code of conduct) was updated with feedback from the community\footnote{\url{https://meta.stackexchange.com/questions/240839/the-new-new-be-nice-policy-code-of-conduct-updated-with-your-feedback}}. This includes: new users should not be judged for not knowing all things. Furthermore, the closing reasons were updated to be more meaningful to the poster, and questions that are closed are shown as ''on hold`` instead of ''closed`` for the first 5 days\footref{hanlon2013war}. Moreover, the team investigates how the comment sections can be improved to lessen the unwelcomeness and hostility and keep civility up. \textbf{Mentorship Research Project}\\ @@ -128,9 +129,9 @@ For this project, four mentors were hand-selected and therefore the project woul % Rolling out the Welcome Wagon: June Update \cite{friend2018rolling} “Ask a Question Wizard” prototype, reduce exclusion (negative feelings, expectations and experiences), improve inclusion (learn from other communities facing similar problems), classification of abusive and unwelcoming comments -%Unwelcomeness is a large problem on StackExchange; not so strong; maybe other sentence +%Unwelcomeness is a large problem on StackExchange; not so strong; maybe xother sentence \textbf{Unwelcomeness}\\ -Unwelcomeness is a large problem on StackExchange \cite{ford2016paradise}\footref{friend2018rolling}\footref{hanlon2018stack}. Although unwelcomeness affects all new users, users from marginalized groups suffer significantly more \cite{vasilescu2014gender}\footref{hanlon2018stack}. \citeauthor{ford2016paradise} investigated barriers users face when contributing to StackOverflow. The authors identified 14 barriers in total hindering newcomers to contribute and five barriers were rated significantly more problematic for women than men. On StackOverflow only 5.8\% (2015\footnote{\url{https://insights.stackoverflow.com/survey/2015}}, 7.9\% 2019\footref{stackoversurvey2019}) of active users identify as women. \citeauthor{david2008community} found similar results of 5\% women in their work on \emph{Community-based production of open-source software} \cite{david2008community}. These numbers are comparatively small to the number of degrees in Science, Technology, Engineering, and Mathematics (STEM) \cite{clark2005women} where 20\% are achieved by women \cite{hill2010so}. Despite the difference, the percentage of women on StackOverflow has increased in recent years. +Unwelcomeness is a large problem on StackExchange \cite{ford2016paradise}\footref{hanlon2018stack}\footref{friend2018rolling}. Although unwelcomeness affects all new users, users from marginalized groups suffer significantly more \cite{vasilescu2014gender}\footref{hanlon2018stack}. \citeauthor{ford2016paradise} investigated barriers users face when contributing to StackOverflow. The authors identified 14 barriers in total hindering newcomers to contribute and five barriers were rated significantly more problematic for women than men. On StackOverflow only 5.8\% (2015\footnote{\url{https://insights.stackoverflow.com/survey/2015}}, 7.9\% 2019\footref{stackoversurvey2019}) of active users identify as women. \citeauthor{david2008community} found similar results of 5\% women in their work on \emph{Community-based production of open-source software} \cite{david2008community}. These numbers are comparatively small to the number of degrees in Science, Technology, Engineering, and Mathematics (STEM) \cite{clark2005women} where 20\% are achieved by women \cite{hill2010so}. Despite the difference, the percentage of women on StackOverflow has increased in recent years. %discrimitation % DONE Paradise Unplugged: Identifying Barriers for Female Participation on Stack Overflow \cite{ford2016paradise} gender gap, females only 5\%, contribution barriers, found 5 gender specific (women) barriers among 14 barrier in total, barriers also affect groups like industry programmers @@ -470,7 +471,7 @@ This shortcoming was addressed by \citeauthor{hutto2014vader} who introduced a n % ursprüngliches paper ITS, wie hat man das früher (davor) gemacht \subsection{Trend analysis} -When introducing a change to a system (experiment), one often wants to know whether the intervention achieves its intended purpose. This leads to 3 possible outcomes: a) the intervention shows an effect and the system changes in the desired way, b) the intervention shows an effect and the system changes in an undesired way, or c) the system did not react at all to the change. There are multiple ways to determine which of these outcomes occur. To analyze the behavior of the system, data from before and after the intervention as well as the nature of the intervention has to be acquired. The are multiple ways to run such an experiment and one has to choose which type of experiment fits best. There are 2 categories of approaches: actively creating an experiment where one designs the experiment before it is executed (for example randomized control trials in medical fields), or using existing data of an experiment that was not designed beforehand, or when setting up a designed experiment is not possible (quasi-experiment). +When introducing a change to a system (experiment), one often wants to know whether the intervention achieves its intended purpose. This leads to 3 possible outcomes: a) the intervention shows an effect and the system changes in the desired way, b) the intervention shows an effect and the system changes in an undesired way, or c) the system did not react at all to the change. There are multiple ways to determine which of these outcomes occur. To analyze the behavior of the system, data from before and after the intervention as well as the nature of the intervention has to be acquired. There are multiple ways to run such an experiment and one has to choose which type of experiment fits best. There are 2 categories of approaches: actively creating an experiment where one designs the experiment before it is executed (for example randomized control trials in medical fields), or using existing data of an experiment that was not designed beforehand, or when setting up a designed experiment is not possible (quasi-experiment). As this thesis investigates a change that has already been implemented by another party, this thesis covers quasi-experiments. A tool that is often used for this purpose is an \emph{Interrupted Time Series} (ITS) analysis. The ITS analysis is a form of segmented regression analysis, where data from before, after, and during the intervention is regressed with separate line segements\cite{mcdowall2019interrupted}. ITS requires data at (regular) intervals from before and after the intervention (time series). The interrupt signifies the intervention and the time when it occurred must be known. The intervention can be at a single point in time or it can be stretched out over a certain time span. This property must also be known to take into account when designing the regression. Also, as the data is acquired from a quasi-experiment, it may be baised\cite{bernal2017interrupted}, for example, seasonality, time-varying confounders (for example, a change in measuring data), variance in the number of single observations grouped together in an interval measurement, etc. These biases need to be addressed if present. Seasonality can be accounted for by subtracting the average value of each of the months in successive years (i.e. subtract the average value of all Januaries in the data set from the values in Januaries). %\begin{lstlisting} diff --git a/text/3_method.tex b/text/3_method.tex index 4adf4e2..0310d87 100644 --- a/text/3_method.tex +++ b/text/3_method.tex @@ -19,14 +19,14 @@ This thesis investigates the following criteria to determine whether the change \begin{itemize} \item \textbf{Sentiment of answers to a question}. This symbolizes the quality of communication between different individuals. Better values indicate better communication. Through the display of the \emph{new contributor} indicator, the answerer should react less negatively towards the new user when they behave outside the community standards. \item \textbf{Vote score of questions}. This symbolizes the feedback the community gives to a question. Voters will likely vote more positively (not voting instead of down-voting, or upvoting instead of not voting) due to the \emph{new contributor} indicator. Thereby the vote score should increase after the change. - \item \textbf{Amount of first and follow-up question}. This symbolizes the willingness of users to participate in the community. Higher amounts of first questions indicate a higher number of new participating users. Higher follow-up questions indicate that users are more willing to stay within the community and continue their active participation. + \item \textbf{Amount of first and follow-up questions}. This symbolizes the willingness of users to participate in the community. Higher amounts of first questions indicate a higher number of new participating users. Higher follow-up questions indicate that users are more willing to stay within the community and continue their active participation. \end{itemize} If these criteria improve after the change is introduced, the community is affected positively. If they worsen, the community is affected negatively. If the criteria stay largely the same, then the community is unaffected. Here it is important to note that a question may receive answers and votes after the \emph{new contributor} indicator is no longer shown and therefore these are not considered part of the data set to analyze. %only when new contributor insicator is shown \section{Vader} -To measure the effect on the sentiment of the change this thesis utilizes the Vader\cite{hutto2014vader} sentiment analysis tool. This decision is based on the performance in analyzing and categorizing microblog-like texts, the speed of processing, and the simplicity of use. Vader uses a lexicon of words, and rules related to grammar and syntax. This lexicon was manually created by \citeauthor{hutto2014vader} and is therefore considered a \emph{gold standard lexicon}. Each word has a sentiment value attached to it. Negative words, for instance, \emph evil, have negative values; good words, for instance, \emph brave, have positive values. The range of these values is continuous, so words can have different intensities, for instance, \emph bad has a higher value than \emph evil. This feature of intensity distinction makes Vader a valance-based approach. +To measure the effect on the sentiment of the change this thesis utilizes the Vader\cite{hutto2014vader} sentiment analysis tool. This decision is based on the performance in analyzing and categorizing microblog-like texts, the speed of processing, and the simplicity of use. Vader uses a lexicon of words, and rules related to grammar and syntax. This lexicon was manually created by \citeauthor{hutto2014vader} and is therefore considered a \emph{gold standard lexicon}. Each word has a sentiment value attached to it. Negative words, for instance, \emph{evil}, have negative values; good words, for instance, \emph{brave}, have positive values. The range of these values is continuous, so words can have different intensities, for instance, \emph{bad} has a higher value than \emph{evil}. This feature of intensity distinction makes Vader a valance-based approach. However, just simply looking at the words in a text is not enough and therefore Vader also uses rules to determine how words are used in conjunction with other words. Some words can boost other words. For example, ``They did well.'' is less intense than ``They did extremely well.''. This works for both positive and negative sentences. Moreover, words can have different meanings depending on the context, for instance, ``Fire provides warmth.'' and ``Boss is about to fire an employee.'' This feature is called \emph{Word Sense Disambiguation}. @@ -74,7 +74,7 @@ After preprocessing the raw data, relevant data is filtered and computed. Questi \section{Analysis} An interrupted time series (ITS) analysis captures trends before and after a change in a system and fits very well with the question this thesis investigates. ITS can be applied to a large variety of data if the data contains the same kind of data points before and after the change and when the change date and time are known. \citeauthor{bernal2017interrupted} published a paper on how ITS works \cite{bernal2017interrupted}. ITS performs well on medical data, for instance, when a new treatment is introduced ITS can visualize if the treatment improves a condition. For ITS no control group is required and often control groups are not feasible. ITS only works with the before and after data and a point in time when a change is introduced. -ITS relies on linear regression and tries to fit a three-segment linear function to the data. The authors also described cases where more than three segments are used but these models quickly raise the complexity of the analysis and for this thesis a three-segment linear regression is sufficient. The three segments are lines to fit the data before and after the change as well as one line to connect the other two lines at the change date. Figure \ref{itsexample} shows an example of an ITS. Each segment is captured by a tensor of the following formula $Y_t = \beta_0 + \beta_1T + \beta_2X_t + \beta_3TX_t$, where $T$ represents time as a number, for instance, the number of months since the start of data recording, $X_t$ represents 0 or 1 depending on whether the change is in effect, $\beta_0$ represents the value at $T = 0$, $\beta_1$ represents the slope before the change, $\beta_2$ represents the value when the change is introduced, and $\beta_3$ represents the slope after the change. +An ITS relies on linear regression and tries to fit a three-segment linear function to the data. The authors also described cases where more than three segments are used but these models quickly raise the complexity of the analysis and for this thesis a three-segment linear regression is sufficient. The three segments are lines to fit the data before and after the change as well as one line to connect the other two lines at the change date. Figure \ref{itsexample} shows an example of an ITS. Each segment is captured by a tensor of the following formula $Y_t = \beta_0 + \beta_1T + \beta_2X_t + \beta_3TX_t$, where $T$ represents time as a number, for instance, the number of months since the start of data recording, $X_t$ represents 0 or 1 depending on whether the change is in effect, $\beta_0$ represents the value at $T = 0$, $\beta_1$ represents the slope before the change, $\beta_2$ represents the value when the change is introduced, and $\beta_3$ represents the slope after the change. Contrary to the basic method explained in \cite{bernal2017interrupted} where the ITS is performed on aggregated values per month, this thesis performs the ITS on single data points, as the premise that the aggregated values all have the same weight within a certain margin is not fulfilled for sentiment and vote score values. Performing the ITS with aggregated values would skew the linear regression more towards data points with less weight. Single data point fitting prevents this, as weight is taken into account with more data points. To filter out seasonal effects, the average value of all data points with the same month of all years is subtracted from the data points (i.e. subtract the average value of all Januaries from each data point in a January). This thesis uses the least-squares method for regression. diff --git a/text/5_results.tex b/text/5_results.tex index c4dc3fb..96357dc 100644 --- a/text/5_results.tex +++ b/text/5_results.tex @@ -8,8 +8,7 @@ In diagrams (a), the blue line states the average sentiment (\emph{average senti Similarly, in diagrams (b), the blue line represents the average vote score of the questions of new users. The number attached to the blue line indicates the number of questions that formed the average vote score. The ITS (orange, green, red, purple, and brown lines) are computed the same way as in diagrams (a). -In diagrams (c), the blue line represents the number of 1st questions from new users, whereas the orange line denotes the follow-up questions from new users. The green and red lines -represent the ITS of the blue and orange lines respectively. In these diagrams, no weighting is performed as each data point has equivalent weight. +In diagrams (c), the blue line represents the number of 1st questions from new users, whereas the orange line denotes the follow-up questions from new users. The green and red lines represent the ITS of the blue and orange lines respectively. In these diagrams, no weighting is performed as each data point has equivalent weight. \pagebreak @@ -87,7 +86,7 @@ ServerFault shows gradually rising average sentiments prior to the change. At th The vote score falls prior to the change, made a huge jump upward, and quickly returns to the levels just prior to the change. Even though the leap at the change date is big and the ITS fits the data very well, the vote score does not improve in the long term after the change. Despite, sentiment and vote score not being affected in the long run, the number of 1st questions sees a drastic change and improves dramatically. Prior to the change, the number of 1st questions decreases steadily, while after the change the numbers increase at the same pace as they fall prior to the change. -The number of follow-up questions also sees the same course direction, falling prior to and raising after the change, albeit not the change is not as drastic. +The number of follow-up questions also sees the same course direction, falling prior to and raising after the change, albeit the change is not as drastic. In summarizing, even though the sentiment and vote score are not really affected, the turn in the number of first question and follow-up questions indicates that the change positively affected the community. \begin{figure}[H] @@ -222,7 +221,7 @@ In summary, the sentiment improves, the vote score is unaffected, and the number % sentiment rose in most of the communities % the vote score is mostly uncorrelated with the change \section*{Benefitters} -More than half of the communities show benefits from the change. The number of first questions increases in all of the 6 previously shown communities. Also, for most of these communities, the number of follow-up questions increased too. Furthermore, the sentiment ITS shows an improvement in all except 1 community. The vote score analysis yielded no meaningful results for these communities. The vote score does not change with the introduction of Stackexchange' policy, with the exception of ServerFault, however, the increase in the vote score did not last for long. +More than half of the communities show benefits from the change. The number of first questions increases in all of the 6 previously shown communities. Also, for most of these communities, the number of follow-up questions increased too. Furthermore, the sentiment ITS shows an improvement in all except 1 community. The vote score analysis yielded no meaningful results for these communities. The vote score does not change with the introduction of Stackexchange' policy, with the exception of ServerFault. However, the increase in the vote score did not last for long. @@ -261,11 +260,11 @@ In summary, the sentiment and vote score does not seem to be affected, however, % sentiments falling faster than before the change \section{MathOverflow.net} -On MathOverflow the sentiment shows a constant regression before the change, however, average sentiments are low at about 10 months before the change and spike high directly before the change. When the change is introduced the regression makes a small jump up and decreases thereafter. The sentiment falls sharply at the time the change is introduced, indicating that the change negatively affected the sentiment. +On MathOverflow the sentiment shows a constant regression before the change, however, average sentiments are low at about 10 months before the change and spike high directly before the change. When the change is introduced the regression makes a small jump up and decreases thereafter. The sentiment falls sharply at the time the change is introduced, indicating that the change negatively affects the sentiment. The votes score steadily increases prior to the change and then quickly returns to the level from 3 years before the change. However, the vote score does not change in course at the change date but several months after the change is introduced, leading to an inconclusive result. -Contrary, the number of questions asked by new contributors does improve. The number of 1st questions falls prior to the change and stabilizes to a constant trend thereafter. However, the number of follow-up questions that is constant before the change starts decreasing after the change. The number of the 1st questions, the months of -41, -29, -17, -5, and 7 are local maxima, indicating seasonality in the data \cite{bernal2017interrupted}. These months are all in March. Also while the number of 1st questions stabilized to a constant trend, the number of follow-up questions descreases, indicating that the new users tend more to become one-day-flies as time passed on \cite{slag2015one}. +Contrary, the number of questions asked by new contributors does improve. The number of 1st questions falls prior to the change and stabilizes to a constant trend thereafter. However, the number of follow-up questions that is constant before the change starts decreasing after the change. The number of the 1st questions, the months of -41, -29, -17, -5, and 7 are local maxima, indicating seasonality in the data \cite{bernal2017interrupted}. These months are all in March. Also while the number of 1st questions stabilized to a constant trend, the number of follow-up questions decreases, indicating that the new users tend more to become one-day-flies as time passed on \cite{slag2015one}. In summary, the sentiment, vote score, and the number of follow-up questions are affected negatively. Only the number of 1st questions from new contributors trend stabilizes. The change does not indicate a clear improvement according to its goal. This data set is sparse compared to the other datasets. Also, the vote scores are high compared to other datasets. \begin{figure}[H] @@ -372,4 +371,4 @@ When looking at the results of SuperUser, the community stands out and shows int \section*{Summary} -In summary, the change introduced by StackExchange clearly improved the engagement in 6 of the 10 investigated communities. Sentiment, vote score, and number (1st and follow-up) questions rose as a result. The other 4 communities do not profit from the change. Although, many statistics jump up to a higher level the downward trends are not stopped. The statistics of SuperUser show a large influx of new users about 6 months after the change sending the sentiment and vote score on a deep dive and with the decrease in new users they raise again. However, this event is not related to the change but the magnitude of the huge change in new user numbers renders the analysis incomparable. +In summary, the change introduced by StackExchange clearly improved the engagement in 6 of the 10 investigated communities. Sentiment, vote score, and number (1st and follow-up) questions rose as a result. The other 4 communities do not profit from the change. Although, many statistics jump up to a higher level the downward trends are not stopped. The statistics of SuperUser show a large influx of new users about 6 months after the change sending the sentiment and vote score on a deep dive and with the decrease in new users they raise again. However, this event is not related to the change and the magnitude of the change in new user numbers renders the analysis incomparable. diff --git a/text/6_discussion.tex b/text/6_discussion.tex index a87aede..14c7733 100644 --- a/text/6_discussion.tex +++ b/text/6_discussion.tex @@ -10,7 +10,7 @@ The StackOverflow community has a fairly stable average sentiment before the cha AskUbuntu shows an interesting zig-zag pattern in the average sentiment graph every 20 months. However, this is not a seasonal effect, as seasonal effects are based on a 12-month cycle \cite{bernal2017interrupted}. Also, the average sentiment falls before the change and raises thereafter, indicating that the change works for this community. However, further data is needed to see if the zig-zag pattern repeats itself. The number of 1st questions starts increasing again after the change stopping the downward trend before that. -On stats.stackexchange.com the average sentiment falls before the change but since the change, the downward trend stops and the sentiment starts to rise slowly, suggesting the change has a positive effect on the community. This is supported by the increase in the number of 1st and follow-up questions by new contributors. The vote score takes a dip after the change but starts to recover after 12 months which could be the result of another factor. In the same time frame, the number of 1st questions increases a lot which means more new contributors contribute to the community. Due to this influx of new users, the community metrics suffer for a period of time but recover afterward. This effect is also described in \cite{lin2017better} however the cause and effect, in this case, are not as pronounced. +On stats.stackexchange.com the average sentiment falls before the change but since the change, the downward trend stops and the sentiment starts to rise slowly, suggesting the change has a positive effect on the community. This is supported by the increase in the number of 1st and follow-up questions by new contributors. The vote score takes a dip after the change but starts to recover after 12 months which could be the result of another factor. In the same time frame, the number of 1st questions increases a lot which means more new contributors contribute to the community. Due to this influx of new users, the community metrics suffer for a period of time but recover afterward. This effect is also described in \cite{lin2017better}. However, in this case the effect are not as pronounced. In the tex.stackexchange.com community sentiments are stable before the change and show a stark rising pattern after the change. The change seems to work for this community but future data will be necessary to see if the rising pattern continues in the shown manner. The votes score ITS does not fit the model and values before and after the change indicate a linear downward trend. However, the number of 1st questions increases slightly after the change while the prior trend shows a decreasing development. The number of follow-up questions still continues a downward trend, indicating that the new contributors tend to become one-day-flies \cite{slag2015one}. By looking at the graph of the 1st questions, the months of -44, -32, -20, -8, 4, and 16 are local minima, indicating seasonality in the data \cite{bernal2017interrupted}. These months are all in December when the people of large parts of the world are on holiday.