This commit is contained in:
wea_ondara
2022-11-20 10:49:28 +01:00
parent f7e803dc45
commit a8c99b45f9
7 changed files with 92 additions and 93 deletions

View File

@@ -10,10 +10,9 @@
%DONE different types of communities: social exchange(facebook, twitter, div messaging apps), social support platforms, information exchange (community knownledge platforms (CQA, forums, wikis, ...), ...)
The introduction of the Web 2.0 and its core feature of user interaction, users interact with each other in online communities. These communities come in various shapes and forms. There are communties for social interaction, for instance, Facebook\footnote{\url{https://facebook.com}}, Twitter\footnote{\url{https://twitter.com}}, and instant messaging apps. There are communties for social support, i.e. communities where users have certain common qualities, for instance, illnesses. There are also communities with the purpose of information exchange. Information exchange platforms can be grouped into expert and community knownledge platforms. While expert platforms are rarely known and often only used by niece groups, community knowledge platforms are widely known and used by the general public. Community knowledge platforms can be divided into 1) wikis, for instance, Wikipedia\footnote{\url{https://wikipedia.org}}, 2) forums, and 3) Q\&A platforms, for instance \emph{Yahoo! Answers}\footnote{\url{https://answers.yahoo.com}}, \emph Quora\footnote{\url{https://quora.com}}, and StackExchange\footnote{\url{https://stackexchange.com}}.
With the introduction of the Web 2.0 and its core feature of user interaction, users interact with each other in online communities. These communities come in various shapes and forms. There are communities for social interaction, for instance, Facebook\footnote{\url{https://facebook.com}}, Twitter\footnote{\url{https://twitter.com}}, and instant messaging apps. There are communities for social support, i.e. communities where users have certain common qualities, for instance, illnesses. There are also communities with the purpose of information exchange. Information exchange platforms can be grouped into expert and community knowledge platforms. While expert platforms are rarely known and often only used by niece groups, community knowledge platforms are widely known and used by the general public. Community knowledge platforms can be divided into 1) wikis, for instance, Wikipedia\footnote{\url{https://wikipedia.org}}, 2) forums, and 3) Q\&A platforms, for instance, \emph{Yahoo! Answers}\footnote{\url{https://answers.yahoo.com}}, \emph Quora\footnote{\url{https://quora.com}}, and StackExchange\footnote{\url{https://stackexchange.com}}.
StackExchange is a Q\&A platform and consists of 174 communities\footnote{\url{https://stackexchange.com/tour}}. Each community evolves around a specific topic, for instance, StackOverflow focusing on software engeneering,
or AskUbuntu focusing on the Ubuntu operating system. This distincts StackExchange from other Q\&A sites such as \emph{Yahoo! Answers} where no such differentiation into topics exists. %TODO ref
StackExchange is a Q\&A platform and consists of 174 communities\footnote{\url{https://stackexchange.com/tour}}. Each community evolves around a specific topic, for instance, StackOverflow focuses on software engineering, and AskUbuntu focuses on the Ubuntu operating system. This distinguishes StackExchange from other Q\&A sites such as \emph{Yahoo! Answers} where no such differentiation into topics exists. %TODO ref
% stackexchange and how it developed via stackoverflow \cite{mamykina2011design} good description of SO
% Design Lessons from the Fastest Q&A Site in the West \cite{mamykina2011design} early investigation of so
@@ -31,13 +30,13 @@ or AskUbuntu focusing on the Ubuntu operating system. This distincts StackExchan
%DONE -> add mentor ship program+desc rough description: exampl for onboarding
%DONE stackexhange tries many things over the years: mentorship program+ref, list more examples here, see section 2
Communities face different challenges during their lifetime\cite{kraut2012building}. In the beginning, bootstrapping the community and gaining a critical mass of users is the main challenge. In the following phase, community growth is the main challenge. In the third phase, the goal is to keep the community in a lively and ordered state. The main challenges in this phase are onboarding of new users, ensure steady user engagement and contributions, and regulation. Running a community also includes challenges in other areas, for instance, technical, financial, and personel challanges. A community has to solve all these challenges to a certain degree to exist and continue existing. Many communities have been created over the years and a lot of these communities also went extinct. A recent example of a community shutdown of a larger-scale community is \emph{Yahoo! Answers} which shut down in May 2021\footnote{\url{https://help.yahoo.com/kb/SLN35642.html}}.
Communities face different challenges during their lifetime\cite{kraut2012building}. In the beginning, bootstrapping the community and gaining a critical mass of users is the main challenge. In the following phase, community growth is the main challenge. In the third phase, the goal is to keep the community in a lively and ordered state. The main challenges in this phase are onboarding new users, ensuring steady user engagement and contributions, and regulation. Running a community also includes challenges in other areas, for instance, technical, financial, and personnel challenges. A community has to solve all these challenges to a certain degree to exist and continue existing. Many communities have been created over the years and a lot of these communities also went extinct. A recent example of a community shutdown of a larger-scale community is \emph{Yahoo! Answers} which shut down in May 2021\footnote{\url{https://help.yahoo.com/kb/SLN35642.html}}.
StackExchange is continually working on improving their platform. Their team implemented several changes to the platform to tackle different challenges that arose over time, for instance, updating the \emph{code of conduct} to ensure a more friendly tone in user interactions\footnote{\url{https://meta.stackexchange.com/questions/240839/the-new-new-be-nice-policy-code-of-conduct-updated-with-your-feedback}}, improving the review queue for reported content for moderators\cite{ponzanelli2014improving}, or the \emph{Mentorship Research Project}\cite{ford2018we}\footnote{\url{https://meta.stackoverflow.com/questions/357198/mentorship-research-project-results-wrap-up}}.
The \emph{Mentorship Research Project} was a research project to improve the onboarding process of new users. In the study new users who created their first question had the option the let their question be reviewed by a mentor (a user familiar with the community). The mentor world reviw the question and suggest changes to the question, for instance, more context to the question. The user would then adjust their question and post it in the community. The result of the suty was that mentored questions were received segnificantly better than non-mentored questions. Although, this is just a research project, StackExchange could create some automated system to help new users during their onboarding phase.
The \emph{Mentorship Research Project} was a research project to improve the onboarding process of new users. In the study new users who created their first question had the option the let their question be reviewed by a mentor (a user familiar with the community). The mentor would review the question and suggest changes to the question, for instance, more context to the question. The user would then adjust their question and post it in the community. The result of the study was that mentored questions were received significantly better than non-mentored questions. Although this is just a research project, StackExchange could create some automated systems to help new users during their onboarding phase.
In August of 2018, the StackExchange team introduced a small change that may have had a huge impact on the platform. They added a new feature to visibly highlight questions from new contributors, as part of their effort to make the site more welcoming for new users\footnote{\url{https://meta.stackexchange.com/questions/314287/come-take-a-look-at-our-new-contributor-indicator}}. Specifically, members who want to answer a question created by a new contributor are shown a notification in the answer box that this question is from a new contributor. The StackExchange team hopes that this little change encourages members to be more friendly and forgiving toward new users.
In August 2018, the StackExchange team introduced a small change that may have had a huge impact on the platform. They added a new feature to visibly highlight questions from new contributors, as part of their effort to make the site more welcoming for new users\footnote{\url{https://meta.stackexchange.com/questions/314287/come-take-a-look-at-our-new-contributor-indicator}}. Specifically, members who want to answer a question created by a new contributor are shown a notification in the answer box that this question is from a new contributor. The StackExchange team hopes that this little change encourages members to be more friendly and forgiving toward new users.
% write about the change investigated
% stackexchange new contriutor post: https://meta.stackexchange.com/questions/314287/come-take-a-look-at-our-new-contributor-indicator?cb=1
% what did change intend?
@@ -53,7 +52,7 @@ This thesis evaluates whether this change has a real impact on the community and
\item Perform an \emph{Interrupted Time Series} (ITS) analysis on the gathered data
\end{enumerate}
This thesis utilizes Vader \cite{hutto2014vader}, a sentiment analysis tool, to measure the sentiments of the answers submitted to questions of new contributors. The ITS analysis evaluates whether the change achieved its purpose of making the platform more welcoming.
Higher sentiment values, higher vote scores, and higher question count indicate more friendly community interactions and welcomingness towards new contributors. Also, when new contributors have a good experience with their first question, they are more likely to post further questions.
Higher sentiment values, higher vote scores, and higher question counts indicate more friendly community interactions and welcomingness towards new contributors. Also, when new contributors have a good experience with their first question, they are more likely to post further questions.
% how is change investigated by this thesis
% vader library

View File

@@ -6,7 +6,7 @@ This section is divided into three parts. The first part explains what StackExch
StackExchange\footnote{\url{https://stackexchange.com}} is a community question and answering (CQA) platform where users can ask and answer questions, accept answers as an appropriate solution to the question, and up-/downvote questions and answers. StackExchange uses a community-driven knowledge creation process by allowing everyone who registers to participate in the community. Invested users also get access to moderation tools to help maintain the vast community. All posts on the StackExchange platform are publicly visible, allowing non-users to benefit from the community as well. Posts are also accessible for web search engines so users can find questions and answers easily with a simple web search. StackExchange keeps an archive of all questions and answers posted, creating a knowledge archive for future visitors to look into.
Originally, StackExchange started with StackOverflow\footnote{\url{https://stackoverflow.com}} in 2008\footnote{\label{atwood2008stack}\url{https://stackoverflow.blog/2008/08/01/stack-overflow-private-beta-begins/}}. Since then StackExchange grew into a platform hosting sites for 174 different topics\footnote{\label{stackexchangetour}\url{https://stackexchange.com/tour}}, for instance, programming (StackOverflow), maths (MathOverflow\footnote{\url{https://mathoverflow.net}} and Math StackExchange\footnote{\url{https://math.stackexchange.com}}), and typesetting (TeX/LaTeX\footnote{\url{https://tex.stackexchange.com}}). Questions on StackExchange are stated in natural English language and consist of a title, a body containing a detailed description of the problem or information need, and tags to categorize the question. After a question is posted the community can submit answers to the question. The author of the question can then accept an appropriate answer which satisfies their question. The accepted answer is then marked as such with a green checkmark and shown on top of all the other answers. Figure \ref{soexamplepost} shows an example of a StackOverflow question. Questions and answers can be up-/downvoted by every user registered on the site. Votes typically reflect the quality and importance of the respective question or answers. Answers with a high voting score raise to the top of the answer list as answers are sorted by the vote score in descending order by default. Voting also influences a user's reputation \cite{movshovitz2013analysis}\footref{stackexchangetour}. When a post (question or answers) is voted upon the reputation of the poster changes accordingly. Furthermore, downvoting of answers also decreases the reputation of the user who voted\footnote{\url{https://stackoverflow.com/help/privileges/vote-down}}.
Originally, StackExchange started with StackOverflow\footnote{\url{https://stackoverflow.com}} in 2008\footnote{\label{atwood2008stack}\url{https://stackoverflow.blog/2008/08/01/stack-overflow-private-beta-begins/}}. Since then StackExchange grew into a platform hosting sites for 174 different topics\footnote{\label{stackexchangetour}\url{https://stackexchange.com/tour}}, for instance, programming (StackOverflow), maths (MathOverflow\footnote{\url{https://mathoverflow.net}} and Math StackExchange\footnote{\url{https://math.stackexchange.com}}), and typesetting (TeX/LaTeX\footnote{\url{https://tex.stackexchange.com}}). Questions on StackExchange are stated in the natural English language and consist of a title, a body containing a detailed description of the problem or information needed, and tags to categorize the question. After a question is posted the community can submit answers to the question. The author of the question can then accept an appropriate answer which satisfies their question. The accepted answer is then marked as such with a green checkmark and shown on top of all the other answers. Figure \ref{soexamplepost} shows an example of a StackOverflow question. Questions and answers can be up-/downvoted by every user registered on the site. Votes typically reflect the quality and importance of the respective question or answers. Answers with a high voting score raise to the top of the answer list as answers are sorted by the vote score in descending order by default. Voting also influences a user's reputation \cite{movshovitz2013analysis}\footref{stackexchangetour}. When a post (question or answer) is voted upon the reputation of the poster changes accordingly. Furthermore, downvoting of answers also decreases the reputation of the user who voted\footnote{\url{https://stackoverflow.com/help/privileges/vote-down}}.
Reputation on StackExchange indicates how trustworthy a user is. To gain a high reputation value a user has to invest a lot of time and effort to reach a high reputation value by asking good questions and posting good answers to questions. Reputation also unlocks privileges which may differ slightly from one community to another\footnote{\url{https://mathoverflow.com/help/privileges/}}\mfs\footnote{\url{https://stackoverflow.com/help/privileges/}}.
With privileges, users can, for instance, create new tags if the need for a new tag arises, cast votes on closing or reopening questions if the question is off-topic or a duplicate of another question, or when a question had been closed for no or a wrong reason, or even get access to moderation tools.
@@ -59,7 +59,7 @@ These platforms allow communication over large distances and facilitate fast and
% DONE How Do Programmers Ask and Answer Questions on the Web? \cite{treude2011programmers} qa sites very effective at code review and conceptual questions
% DONE The role of knowledge in software development \cite{robillard1999role} people have different areas of knowledge and expertise
All these communities differ in their design. Wikipedia is a community-driven knowledge repository and consists of a collection of articles. Every user can create an article. Articles are edited collaboratively and continually improved and expanded. Reddit is a platform for social interaction where users create posts and comment on other posts or comments. Quora, StackExchange, and Yahoo! Answers are community question and answer (CQA) platforms. On Quora and Yahoo! Answers users can ask any question regarding any topics whereas on StackExchange users have to post their questions in the appropriate subcommunity, for instance, StackOverflow for programming-related questions or MathOverflow for math-related questions.
All these communities differ in their design. Wikipedia is a community-driven knowledge repository and consists of a collection of articles. Every user can create an article. Articles are edited collaboratively and continually improved and expanded. Reddit is a platform for social interaction where users create posts and comment on other posts or comments. Quora, StackExchange, and Yahoo! Answers are community question-and-answer (CQA) platforms. On Quora and Yahoo! Answers users can ask any question regarding any topic whereas on StackExchange users have to post their questions in the appropriate subcommunity, for instance, StackOverflow for programming-related questions or MathOverflow for math-related questions.
CQA sites are very effective at code review \cite{treude2011programmers}. Code may be understood in the traditional sense of source code in programming-related fields but this also translates to other fields, for instance, mathematics where formulas represent code. CQA sites are also very effective at solving conceptual questions. This is due to the fact that people have different areas of knowledge and expertise \cite{robillard1999role} and due to the large user base established CQA sites have, which again increases the variety of users with expertise in different fields.
@@ -68,7 +68,7 @@ Despite the differences in purpose and manifestation of these communities, they
1) When starting a community, it has to have a critical mass of users who create content. StackOverflow already had a critical mass of users from the beginning due to the StackOverflow team already being experts in the domain \cite{mamykina2011design} and the private beta\footref{atwood2008stack}. Both aspects ensured a strong community core early on.
2) The platform must attract new users to grow as well as replace leaving users. Depending on the type of community new users should bring certain skills, for example, programming background in open-source software development, or extended knowledge on certain domains; or qualities, for example, a certain illness in medical communities. New users also bring the challenge of onboarding with them. Most newcomers will not be familiar with all the rules and nuances of the community \cite{yazdanian2019eliciting}\footnote{\label{hanlon2018stack}\url{https://stackoverflow.blog/2018/04/26/stack-overflow-isnt-very-welcoming-its-time-for-that-to-change/}}.
2) The platform must attract new users to grow as well as replace leaving users. Depending on the type of community new users should bring certain skills, for example, a programming background in open-source software development, or extended knowledge on certain domains; or qualities, for example, a certain illness in medical communities. New users also bring the challenge of onboarding with them. Most newcomers will not be familiar with all the rules and nuances of the community \cite{yazdanian2019eliciting}\footnote{\label{hanlon2018stack}\url{https://stackoverflow.blog/2018/04/26/stack-overflow-isnt-very-welcoming-its-time-for-that-to-change/}}.
3) The platform should encourage users to commit to the community. Online communities are often based on the voluntary commitment of their users \cite{ipeirotis2014quizz}, hence the platform has to ensure users are willing to stay. Most platforms do not have contracts with their users, so users should see benefits for staying with the community.
@@ -96,7 +96,7 @@ The onboarding process of new users is a permanent challenge for online communit
\textbf{One-day-flies}\\
\citeauthor{slag2015one} investigated why many users on StackOverflow only post once after their registration \cite{slag2015one}. They found that 47\% of all users on StackOverflow posted only once and called them one-day-flies. They suggest that code example quality is lower than that of more involved users, which often leads to answers and comments to first improve the question and code instead of answering the stated question. This likely discourages new users from using the site further. Negative feedback instead of constructive feedback is another cause for discontinuation of usage. The StackOverflow staff also conducted their own research on negative feedback of the community\footnote{\label{silge2019welcome}\url{https://stackoverflow.blog/2018/07/10/welcome-wagon-classifying-comments-on-stack-overflow/}}. They investigated the comment sections of questions by recruiting their staff members to rate a set of comments and they found more than 7\% of the reviewed comments are unwelcoming.
One-day-flies are not unique to StackOverflow. \citeauthor{steinmacher2015social} investigated the social barriers newcomers face when they submit their first contribution to an open-source software project \cite{steinmacher2015social}. They based their work on empirical data and interviews and identified several social barriers preventing newcomers to place their first contribution to a project. Furthermore, newcomers are often on their own in open source projects. The lack of support and peers to ask for help hinders them. \citeauthor{yazdanian2019eliciting} found that new contributors on Wikipedia face challenges when editing articles. Wikipedia hosts millions of articles\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia}} and new contributors often do not know which articles they could edit and improve. Recommender systems can solve this problem by suggesting articles to edit but they suffer from the cold start problem because they rely on past user activity which is missing for new contributors. \citeauthor{yazdanian2019eliciting} proposed a solution by establishing a framework that automatically creates questionnaires to fill this gap. This also helps match new contributors with more experienced contributors that could help newcomers when they face a problem.
One-day-flies are not unique to StackOverflow. \citeauthor{steinmacher2015social} investigated the social barriers newcomers face when they submit their first contribution to an open-source software project \cite{steinmacher2015social}. They based their work on empirical data and interviews and identified several social barriers preventing newcomers from placing their first contribution to a project. Furthermore, newcomers are often on their own in open-source projects. The lack of support and peers to ask for help hinders them. \citeauthor{yazdanian2019eliciting} found that new contributors on Wikipedia face challenges when editing articles. Wikipedia hosts millions of articles\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia}} and new contributors often do not know which articles they could edit and improve. Recommender systems can solve this problem by suggesting articles to edit but they suffer from the cold start problem because they rely on past user activity which is missing for new contributors. \citeauthor{yazdanian2019eliciting} proposed a solution by establishing a framework that automatically creates questionnaires to fill this gap. This also helps match new contributors with more experienced contributors that could help newcomers when they face a problem.
\citeauthor{allen2006organizational} showed that the one-time-contributors phenomenon also translates to workplaces and organizations \cite{allen2006organizational}. They found out that socialization with other members of an organization plays an important role in turnover. The better the socialization within the organization the less likely newcomers are to leave. This socialization process has to be actively pursued by the organization.
\textbf{Lurking}\\
@@ -105,12 +105,12 @@ One-day-flies may partially be a result of lurking. Lurking is consuming content
% DONE Non-public and public online community participation: Needs, attitudes and behavior \cite{nonnecke2006non} about lurking, many programmers do that probably, not even registering, lurking not a bad behavior but observing, lurkers are more introverted, passive behavior, less optimistic and positive than posters, prviously lurking was thought of free riding, not contributing, taking not giving to comunity, important for getting to know a community, better integration when joining
\textbf{Reflection}\\
The StackOverflow team acknowledged the one-time-contributors trend\footref{hanlon2018stack}\footref{silge2019welcome} and took efforts to make the site more welcoming to new users\footnote{\label{friend2018rolling}\url{https://stackoverflow.blog/2018/06/21/rolling-out-the-welcome-wagon-june-update/}}. They lied out various reasons: Firstly, they have sent mixed messages whether the site is an expert site or for everyone. Secondly, they gave too little guidance to new users which resulted in poor questions from new users and in the unwelcoming behavior of more integrated users towards the new users. New users do not know all the rules and nuances of communication of the communities. An example is that ''Please`` and ''Thank you`` are not well received on the site as they are deemed unnecessary. Also the quality, clearness, and language quality of the questions of new users is lower than more experienced users which leads to unwelcoming or even toxic answers and comments. Moreover, users who gained moderation tool access could close questions with predefined reasons which often are not meaningful enough for the poster of the question\footnote{\label{hanlon2013war}\url{https://stackoverflow.blog/2013/06/25/the-war-of-the-closes/}}. Thirdly, marginalized groups, for instance, women and people of color \cite{ford2016paradise}\footref{hanlon2018stack}\mfs\footnote{\label{stackoversurvey2019}\url{https://insights.stackoverflow.com/survey/2019}}, are more likely to drop out of the community due to unwelcoming behavior from other users\footref{hanlon2018stack}. They feel the site is an elitist and hostile place.
The team suggested several steps to mitigate these problems. Some of these steps include appealing to the users to be more welcoming and forgiving towards new users\footref{hanlon2018stack}\footref{silge2019welcome}\mfs\footnote{\url{https://stackoverflow.blog/2012/07/20/kicking-off-the-summer-of-love/}}, other steps are geared towards changes to the platform itself: The \emph{Be nice policy} (code of conduct) was updated with feedback from the community\footnote{\url{https://meta.stackexchange.com/questions/240839/the-new-new-be-nice-policy-code-of-conduct-updated-with-your-feedback}}. This includes: new users should not be judged for not knowing all things. Furthermore, the closing reasons were updated to be more meaningful to the poster, and questions that are closed are shown as ''on hold`` instead of ''closed`` for the first 5 days\footref{hanlon2013war}. Moreover, the team investigates how the comment sections can be improved to lessen the unwelcomeness and hostility and keep the civility up.
The StackOverflow team acknowledged the one-time-contributors trend\footref{hanlon2018stack}\footref{silge2019welcome} and took efforts to make the site more welcoming to new users\footnote{\label{friend2018rolling}\url{https://stackoverflow.blog/2018/06/21/rolling-out-the-welcome-wagon-june-update/}}. They lied out various reasons: Firstly, they have sent mixed messages whether the site is an expert site or for everyone. Secondly, they gave too little guidance to new users which resulted in poor questions from new users and in the unwelcoming behavior of more integrated users towards the new users. New users do not know all the rules and nuances of communication in the communities. An example is that ''Please`` and ''Thank you`` are not well received on the site as they are deemed unnecessary. Also the quality, clearness, and language quality of the questions of new users is lower than more experienced users which leads to unwelcoming or even toxic answers and comments. Moreover, users who gained moderation tool access could close questions with predefined reasons which often are not meaningful enough for the poster of the question\footnote{\label{hanlon2013war}\url{https://stackoverflow.blog/2013/06/25/the-war-of-the-closes/}}. Thirdly, marginalized groups, for instance, women and people of color \cite{ford2016paradise}\footref{hanlon2018stack}\mfs\footnote{\label{stackoversurvey2019}\url{https://insights.stackoverflow.com/survey/2019}}, are more likely to drop out of the community due to unwelcoming behavior from other users\footref{hanlon2018stack}. They feel the site is an elitist and hostile place.
The team suggested several steps to mitigate these problems. Some of these steps include appealing to the users to be more welcoming and forgiving towards new users\footref{hanlon2018stack}\footref{silge2019welcome}\mfs\footnote{\url{https://stackoverflow.blog/2012/07/20/kicking-off-the-summer-of-love/}}, other steps are geared towards changes to the platform itself: The \emph{Be nice policy} (code of conduct) was updated with feedback from the community\footnote{\url{https://meta.stackexchange.com/questions/240839/the-new-new-be-nice-policy-code-of-conduct-updated-with-your-feedback}}. This includes: new users should not be judged for not knowing all things. Furthermore, the closing reasons were updated to be more meaningful to the poster, and questions that are closed are shown as ''on hold`` instead of ''closed`` for the first 5 days\footref{hanlon2013war}. Moreover, the team investigates how the comment sections can be improved to lessen the unwelcomeness and hostility and keep civility up.
\textbf{Mentorship Research Project}\\
The StackOverflow team partnered with \citeauthor{ford2018we} and implemented the Mentorship Research Project \cite{ford2018we}\footnote{\url{https://meta.stackoverflow.com/questions/357198/mentorship-research-project-results-wrap-up}}. The project lasted one month and aimed to help newcomers improve their first questions before they are posted publicly. The program went as follows: When a user is about to post a question the user is asked whether they want their question to be reviewed by a mentor. If they confirmed they are forward to a help room with a mentor who is an experienced user. The question is then reviewed and the mentor suggests some changes if applicable. These changes may include narrowing the question for more precise answers, adding a code example or adjusting code, or removing of \emph{Please} and \emph{Thank you} from the question. After the review and editing, the question is posted publicly by the user. The authors found that mentored questions are received significantly better by the community than non-mentored questions. The questions also received higher scores and were less likely to be off-topic and poor in quality. Furthermore, newcomers are more comfortable when their question is reviewed by a mentor.
For this project, four mentors were hand-selected and therefore the project would not scale very well as the number of mentors is very limited but it gave the authors an idea on how to pursue their goal of increasing the welcomingness on StackExchange. The project is followed up by a \emph{Ask a question wizard} to help new users, as well as more experienced users, improve the structure, quality, and clearness of their questions\footref{friend2018rolling}.
The StackOverflow team partnered with \citeauthor{ford2018we} and implemented the Mentorship Research Project \cite{ford2018we}\footnote{\url{https://meta.stackoverflow.com/questions/357198/mentorship-research-project-results-wrap-up}}. The project lasted one month and aimed to help newcomers improve their first questions before they are posted publicly. The program went as follows: When a user is about to post a question the user is asked whether they want their question to be reviewed by a mentor. If they confirmed they are forwarded to a help room with a mentor who is an experienced user. The question is then reviewed and the mentor suggests some changes if applicable. These changes may include narrowing the question for more precise answers, adding a code example or adjusting code, or removing \emph{Please} and \emph{Thank you} from the question. After the review and editing, the question is posted publicly by the user. The authors found that mentored questions are received significantly better by the community than non-mentored questions. The questions also received higher scores and were less likely to be off-topic and poor in quality. Furthermore, newcomers are more comfortable when their question is reviewed by a mentor.
For this project, four mentors were hand-selected and therefore the project would not scale very well as the number of mentors is very limited but it gave the authors an idea of how to pursue their goal of increasing the welcomingness on StackExchange. The project is followed up by a \emph{Ask a question wizard} to help new users, as well as more experienced users, improve the structure, quality, and clearness of their questions\footref{friend2018rolling}.
% DONE One-day flies on StackOverflow \cite{slag2015one}, 1 contribution during whole registration, only user with 6 month of registration
@@ -150,13 +150,13 @@ While attracting and onboarding new users is an important step for growing a com
As StackExchange is a CQA platform, the benefits from information exchange, time and location flexibility, and permanency are more prevalent, while social support and social interaction are more in the background. Social support and social interaction are more relevant in communities where individuals communicate about topics regarding themselves, for instance, communities where health aspects are the main focus \cite{maloney2005multilevel}. Time and location flexibility is important for all online communities. Information exchange and permanency are important for StackExchange as it is a large collection of knowledge that mostly does not change over time or from one individual to another. StackExchange's content is driven by the community and therefore depends on the voluntarism of its users, making benefits even more important.
%TODO abc this seem wrong here
The backbone of a community is always the user base and its voluntarism to participate with the community. Even if the community is led by a commercial core team, the community is almost always several orders of magnitude greater than the number of the paid employees forming the core team \cite{butler2002community}. The core team often provides the infrastructure for the community and does some community work. However, most of the community work is done by volunteers of the community.
The backbone of a community is always the user base and its voluntarism to participate in the community. Even if the community is led by a commercial core team, the community is almost always several orders of magnitude greater than the number of the paid employees forming the core team \cite{butler2002community}. The core team often provides the infrastructure for the community and does some community work. However, most of the community work is done by volunteers of the community.
This is also true for the StackExchange platform where the core team of paid employees is between 200 to 500\footnote{\url{https://www.linkedin.com/company/stack-overflow}} (this includes employees working on other products) and the number of voluntary community members (these users have access to moderation tools) performing community work is around 10,000 \footnote{\url{https://data.stackexchange.com/stackoverflow/revision/1412005/1735651/users-with-rep-20k}}.
\subsection{Encourage contribution}
In a community, users can generally be split into 2 groups by motivation to voluntarily contribute: One group acts out of altruism, where users contribute with the reason to help others and do good to the community; the second group acts out of egoism and selfish reasons, for instance, getting recognition from other people \cite{ginsburg2004framework}. Users of the second group still help the community but their primary goal is not necessarily the health of the community but gaining reputation and making a name for themselves. Contrary, users of the first group primarily focus on helping the community and see reputation as a positive side effect which also feeds back in their ability to help others. While these groups have different objectives, both groups need recognition of their efforts \cite{iriberri2009life}. There are several methods for recognizing the value a member provides to the community: reputation, awards, trust, identity, etc. \cite{ginsburg2004framework}. Reputation, trust, and identity are often reached gradually over time by continuously working on them, awards are reached at discrete points in time. Awards often take some time and effort to achieve. However, awards should not be easily achievable as their value comes from the work that is required for them\cite{lawler2000rewarding}. They should also be meaningful in the community they are used in. Most importantly, awards have to be visible to the public, so other members can see them. In this way, awards become a powerful motivator to users.
In a community, users can generally be split into 2 groups by motivation to voluntarily contribute: One group acts out of altruism, where users contribute with the reason to help others and do good to the community; the second group acts out of egoism and selfish reasons, for instance, getting recognition from other people \cite{ginsburg2004framework}. Users of the second group still help the community but their primary goal is not necessarily the health of the community but gaining reputation and making a name for themselves. Contrary, users of the first group primarily focus on helping the community and see reputation as a positive side effect which also feeds back into their ability to help others. While these groups have different objectives, both groups need recognition of their efforts \cite{iriberri2009life}. There are several methods for recognizing the value a member provides to the community: reputation, awards, trust, identity, etc. \cite{ginsburg2004framework}. Reputation, trust, and identity are often reached gradually over time by continuously working on them, awards are reached at discrete points in time. Awards often take some time and effort to achieve. However, awards should not be easily achievable as their value comes from the work that is required for them\cite{lawler2000rewarding}. They should also be meaningful in the community they are used in. Most importantly, awards have to be visible to the public, so other members can see them. In this way, awards become a powerful motivator for users.
%TODO maybe look at finding of https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.3093&rep=rep1&type=pdf , in discussion bullet point list: subgroups, working and less feature > not working and more features, selfmoderation
@@ -238,21 +238,21 @@ Different badges also create status classes \cite{immorlica2015social}. The hard
Regulation evolves around the user actions and the content a community creates. It is required to steer the community and keep the community civil. Naturally, some users will not have the best intentions for the community in mind. These actions of such must be accounted for, and harmful actions must be dealt with. Otherwise, the community and its content will deteriorate.
\textbf{Content quality}\\
Quality is a concern in online communities. Platform moderators and admins want to keep a certain level of quality or even raise it. However, higher-quality posts take more time and effort than lower-quality posts. In the case of CQA platforms, this is an even bigger problem as higher-quality answers fight against fast responses. Despite that, StackOverflow also has a problem with low quality and effort questions and the subsequent unwelcoming answers and comments\footref{silge2019welcome}.
Quality is a concern in online communities. Platform moderators and admins want to keep a certain level of quality or even raise it. However, higher-quality posts take more time and effort than lower-quality posts. In the case of CQA platforms, this is an even bigger problem as higher-quality answers fight against fast responses. Despite that, StackOverflow also has a problem with low-quality and low-effort questions and the subsequent unwelcoming answers and comments\footref{silge2019welcome}.
\citeauthor{lin2017better} investigated how growth affects a community\cite{lin2017better}. They looked at Reddit communities that were added to the default set of subscribed communities of every new user (defaulting) which lead to a huge influx of new users to these communities as a result. The authors found that contrary to expectations, the quality stays largely the same. The vote score dips shortly after defaulting but quickly recovers or even raises to higher levels than before. The complaints of low-quality content did not increase, and the language used in the community stayed the same. However, the community clustered around fewer posts than before defaulting. \citeauthor{srba2016stack} did a similar study on the StackOverflow community \cite{srba2016stack}. They found a similar pattern in the quality of posts. The quality of questions dipped momentarily due to the huge influx of new users. However, the quality did recover after 3 months.
\citeauthor{tausczik2011predicting} found reputation is linked to the perceived quality of posts in multiple ways \cite{tausczik2011predicting}. They suggest reputation could be used as an indicator of quality. Quality also depends on the type of platform. \citeauthor{lin2017better} showed that expert sites who charge fees, for instance, library reference services, have higher quality answers compared to free sites\cite{lin2017better}. Also, the higher the fee the higher the quality of the answers. However, free community sites outperform expert sites in terms of answer density and responsiveness.
\textbf{Content abuse}\\
\citeauthor{srba2016stack} identified 3 types of users causing the lowering of quality: \emph{Help Vampires} (these spend little to no effort to research their questions, which leads to many duplicates), \emph{Noobs} (they create mostly trivial questions), and \emph{Reputation Collectors}\cite{srba2016stack}. They try to gain reputation as fast as possible by methods described by \citeauthor{bosu2013building}\cite{bosu2013building} but often with no regard of what effects their behavior has on the community, for instance, lowering overall content quality, turning other users away from the platform, and encouraging the behavior of \emph{Help Vampires} and \emph{Noobs} even more.
\citeauthor{srba2016stack} identified 3 types of users causing the lowering of quality: \emph{Help Vampires} (these spend little to no effort to research their questions, which leads to many duplicates), \emph{Noobs} (they create mostly trivial questions), and \emph{Reputation Collectors}\cite{srba2016stack}. They try to gain reputation as fast as possible by methods described by \citeauthor{bosu2013building}\cite{bosu2013building} but often with no regard for what effects their behavior has on the community, for instance, lowering overall content quality, turning other users away from the platform, and encouraging the behavior of \emph{Help Vampires} and \emph{Noobs} even more.
Questions of \emph{Help Vampires} and \emph{Noobs} direct answerers away from much more demanding questions. On one hand, this leads to knowledgeable answerers answering questions for which they are overqualified to answer, and on the other hand to a lack of adequate quality answers for more difficult questions. \citeauthor{srba2016stack} suggest a system that tries to match questions with answerers that satisfy the knowledge requirement but are not grossly overqualified to answer the question. A system with this quality would prevent suggesting simple questions to overqualified answerers, and prevent an answer vacuum for questions with more advanced topics. This would ensure more optimal utilization of the answering capability of the community.
\textbf{Content moderation}\\
\citeauthor{srba2016stack} proposed some solutions to improve the quality problems. One suggestion is to restrict the openness of a community. This can be accomplished in different ways, for instance, introducing a posting limit for questions on a daily basis\cite{srba2016stack}. While this certainly limits the amount of low-quality posts, it does not eliminate the problem. Furthermore, this limitation would also hurt engaged users which would create a large volume of higher quality content. A much more intricate solution that adapts to user behavior would be required, otherwise, the limitation would hurt the community more than it improves.
\citeauthor{srba2016stack} proposed some solutions to improve the quality problems. One suggestion is to restrict the openness of a community. This can be accomplished in different ways, for instance, by introducing a posting limit for questions on a daily basis\cite{srba2016stack}. While this certainly limits the amount of low-quality posts, it does not eliminate the problem. Furthermore, this limitation would also hurt engaged users which would create a large volume of higher-quality content. A much more intricate solution that adapts to user behavior would be required, otherwise, the limitation would hurt the community more than it improves.
\citeauthor{ponzanelli2014improving} performed a study where they looked at post quality on StackOverflow\cite{ponzanelli2014improving}. They aim to improve the automatic low-quality post detection system which is already in place and reduce the size of the review queue selected individuals have to go through. Their classifier improves by including popularity metrics of the user posting and the readability of the post itself. With these additional factors, they managed to reduce the amount of misclassified quality posts with only a minimal decrease of correctly classified low-quality posts. Their improvement to the classifier reduced the review queue size by 9\%.
\citeauthor{ponzanelli2014improving} performed a study where they looked at post quality on StackOverflow\cite{ponzanelli2014improving}. They aim to improve the automatic low-quality post detection system which is already in place and reduce the size of the review queue selected individuals have to go through. Their classifier improves by including popularity metrics of the user posting and the readability of the post itself. With these additional factors, they managed to reduce the amount of misclassified quality posts with only a minimal decrease in correctly classified low-quality posts. Their improvement to the classifier reduced the review queue size by 9\%.
% other studies which suggest changes to improve community interaction/qualtity/sustainability
@@ -264,7 +264,7 @@ Questions of \emph{Help Vampires} and \emph{Noobs} direct answerers away from mu
% -> matching questions with answerers \cite{srba2016stack} (difficult questions -> expert users, easier questions -> answerers that know it but are not experts), dont overload experts, utilize capacities of the many nonexperts
Another solution is to find content abusers (noobs, help vampires, etc.) directly. One approach is to add a reporting system to the community, however, a system of this kind is also driven by user inputs and therefore can be manipulated as well. This would lead to excluding users flagged as false positives and missing a portion of content abusers completely. A better approach is to systematically find these users by their behavior. \citeauthor{kayes2015social} describe a classifier which achieves an accuracy of 83\% on the \emph{Yahoo! Answers} platform \cite{kayes2015social}. The classifier is based on empirical data where they looked at historical user activity, report data, and which users were banned from the platform. From these statistics, they created the classifier which is able to distinguish between falsely and fairly banned users. \citeauthor{cheng2015antisocial} performed a similar study on antisocial behavior on various platforms. They too looked at historical data of users and their eventual bans as well as on their deleted posts rates. Their classifier achieved an accuracy of 80\%.
Another solution is to find content abusers (noobs, help vampires, etc.) directly. One approach is to add a reporting system to the community, however, a system of this kind is also driven by user inputs and therefore can be manipulated as well. This would lead to excluding users flagged as false positives and missing a portion of content abusers completely. A better approach is to systematically find these users by their behavior. \citeauthor{kayes2015social} describe a classifier which achieves an accuracy of 83\% on the \emph{Yahoo! Answers} platform \cite{kayes2015social}. The classifier is based on empirical data where they looked at historical user activity, report data, and which users were banned from the platform. From these statistics, they created the classifier which is able to distinguish between falsely and fairly banned users. \citeauthor{cheng2015antisocial} performed a similar study on antisocial behavior on various platforms. They too looked at historical data of users and their eventual bans as well as their deleted posts rates. Their classifier achieved an accuracy of 80\%.
@@ -304,7 +304,7 @@ When analyzing a community, one typically finds 2 types of data: text, and metad
% alle sentiment methoden + vader
\subsection{Sentiment analysis}
Researchers put forth many tools for sentiment analysis over the years. Each tool has its advantages and drawbacks and there is not a silver bullet solution that fits all research questions. Researchers have to choose a tool that best fits their needs and they need to be aware of the drawbacks of their choice. Sentiment analysis poses three important challenges:
Researchers put forth many tools for sentiment analysis over the years. Each tool has its advantages and drawbacks and there is no silver bullet solution that fits all research questions. Researchers have to choose a tool that best fits their needs and they need to be aware of the drawbacks of their choice. Sentiment analysis poses three important challenges:
\begin{itemize}
\item Coverage: detecting as many features as possible from a given piece of text
\item Weighting: assigning one or multiple values (value range and granularity) to detected features
@@ -347,7 +347,7 @@ Creating hand-crafted tools is often a huge undertaking. They depend on a hand-c
% - TODO list some application examples
% ...
Linguistic Inquiry and Word Count (LIWC) \cite{pennebaker2001linguistic,pennebakerdevelopment} is one of the more popular tools. Due to its widespread usage, LIWC is well verified, both internally and externally. Its lexicon consists of about 6,400 words where words are categorized into one or more of the 76 defined categories \cite{pennebaker2015development}. 620 words have a positive and 744 words have a negative emotion. Examples for positive words are: love, nice, sweet; examples for negative words are: hurt, ugly, nasty. LIWC also has some drawbacks, for instance, it does not capture acronyms, emoticons, or slang words. Furthermore, LIWC's lexicon uses a polarity-based approach, meaning that it cannot distinguish between the sentences ''This pizza is good`` and ''This pizza is excellent``\cite{hutto2014vader}. \emph Good and \emph excellent are both in the category of positive emotion but LIWC does not distinguish between single words in the same category.
Linguistic Inquiry and Word Count (LIWC) \cite{pennebaker2001linguistic,pennebakerdevelopment} is one of the more popular tools. Due to its widespread usage, LIWC is well-verified, both internally and externally. Its lexicon consists of about 6,400 words where words are categorized into one or more of the 76 defined categories \cite{pennebaker2015development}. 620 words have a positive and 744 words have a negative emotion. Examples of positive words are: love, nice, and sweet; examples of negative words are: hurt, ugly, and nasty. LIWC also has some drawbacks, for instance, it does not capture acronyms, emoticons, or slang words. Furthermore, LIWC's lexicon uses a polarity-based approach, meaning that it cannot distinguish between the sentences ''This pizza is good`` and ''This pizza is excellent``\cite{hutto2014vader}. \emph Good and \emph excellent are both in the category of positive emotion but LIWC does not distinguish between single words in the same category.
%General Inquirer (GI) \cite{stone1966general} 1966 TODO ref wrong?
% - 11k words, 1900 pos, 2300 neg, all approx (vader)
@@ -365,7 +365,7 @@ General Inquirer (GI)\cite{stone1966general} is one of the oldest sentiment tool
% - bootstrapped from wordnet (wellknown english lexical database) (vader, hu2004mining)
%TODO refs
Hu-Liu04 \cite{hu2004mining,liu2005opinion} is a opinion mining tool. It searches for features in multiple pieces of text, for instance, product reviews, and rates the opinion of the feature by using a binary classification\cite{hu2004mining}. Crucially Hu-Liu04 does not summarize the texts but summarizes ratings of the opinions about features mentioned in the texts. Hu-Liu04 was bootstrapped from WordNet\cite{hu2004mining} and then extended further. It now uses a lexicon consisting of about 6800 words where 2000 words have a positive sentiment and 4800 words have a negative sentiment attached\cite{hutto2014vader}. This tool is, by design, better suited for social media texts, although it also misses emoticons, acronyms, and initialisms.
Hu-Liu04 \cite{hu2004mining,liu2005opinion} is an opinion mining tool. It searches for features in multiple pieces of text, for instance, product reviews, and rates the opinion of the feature by using a binary classification\cite{hu2004mining}. Crucially Hu-Liu04 does not summarize the texts but summarizes ratings of the opinions about features mentioned in the texts. Hu-Liu04 was bootstrapped from WordNet\cite{hu2004mining} and then extended further. It now uses a lexicon consisting of about 6800 words where 2000 words have a positive sentiment and 4800 words have a negative sentiment attached\cite{hutto2014vader}. This tool is, by design, better suited for social media texts, although it also misses emoticons, acronyms, and initialisms.
%SenticNet \cite{cambria2010senticnet} 2010
% - concept-level opinion and sentiment analysis tool (vader)
@@ -424,7 +424,7 @@ Word-Sense Disambiguation (WSD)\cite{akkaya2009subjectivity} is not a sentiment
%updateing (extend/modify) hard (e.g. new domain) (vader)
\textbf{Machine Learning Approches}\\
Because handcrafting sentiment analysis requires a lot of effort, researchers turned to approaches that offload the labor-intensive part to machine learning (ML). However, this results in a new challenge, namely: gathering a \emph good data set to feed the machine learning algorithms for training. Firstly, \emph good data set needs to represent as many features as possible, otherwise, the algorithm will not recognize it. Secondly, the data set has to be unbiased and representative for all the data of which the data set is a part of. The data set has to represent each feature in an appropriate amount, otherwise, the algorithms may discriminate a feature in favor of other more represented features. These requirements are hard to fulfill and often they are not\cite{hutto2014vader}. After a data set is acquired, a model has to be learned by the ML algorithm, which is, depending on the complexity of the algorithm, a very computational-intensive and memory-intensive process. After training is completed, the algorithm can predict sentiment values for new pieces of text, which it has never seen before. However, due to the nature of this approach, the results cannot be comprehended by humans easily if at all. ML approaches also suffer from a generalization problem and therefore cannot be transferred to other domains without accepting a bad performance, or updating the training data set to fit the new domain. Updating (extending or modifing) the model also requires complete retraining from scratch. These drawbacks make ML algorithms useful only in narrow situations where changes are not required and the training data is static and unbiased.
Because handcrafting sentiment analysis requires a lot of effort, researchers turned to approaches that offload the labor-intensive part to machine learning (ML). However, this results in a new challenge, namely: gathering a \emph good data set to feed the machine learning algorithms for training. Firstly, \emph good data set needs to represent as many features as possible, otherwise, the algorithm will not recognize it. Secondly, the data set has to be unbiased and representative of all the data of which the data set is a part of. The data set has to represent each feature in an appropriate amount, otherwise, the algorithms may discriminate a feature in favor of other more represented features. These requirements are hard to fulfill and often they are not\cite{hutto2014vader}. After a data set is acquired, a model has to be learned by the ML algorithm, which is, depending on the complexity of the algorithm, a very computationally-intensive and memory-intensive process. After training is completed, the algorithm can predict sentiment values for new pieces of text, that it has never seen before. However, due to the nature of this approach, the results cannot be comprehended by humans easily if at all. ML approaches also suffer from a generalization problem and therefore cannot be transferred to other domains without accepting a bad performance, or updating the training data set to fit the new domain. Updating (extending or modifying) the model also requires complete retraining from scratch. These drawbacks make ML algorithms useful only in narrow situations where changes are not required and the training data is static and unbiased.
% naive bayes
% - simple (vader)
@@ -440,7 +440,7 @@ Maximum Entropy (ME) is a more sophisticated algorithm. It uses an exponential m
%- mathemtical anspruchsvoll (vader)
%- seperate datapoints using hyper planes (vader)
%- long training period (other methods do not need training at all because lexica) (vader)
Support Vector Machines (SVM) uses a different approach. SVMs put data points in an $n$-dimentional space and differentiate them with hyperplanes ($n-1$ dimensional planes), so data points fall in 1 of the 2 halves of the space divided by the hyperplane. This approach is usually very memory and computation-intensive as each data point is represented by an $n$-dimentional vector where $n$ denotes the number of trained features.
Support Vector Machines (SVM) use a different approach. SVMs put data points in an $n$-dimentional space and differentiate them with hyperplanes ($n-1$ dimensional planes), so data points fall in 1 of the 2 halves of the space divided by the hyperplane. This approach is usually very memory and computation-intensive as each data point is represented by an $n$-dimentional vector where $n$ denotes the number of trained features.
%generall blyabla, transition to vader
@@ -470,9 +470,9 @@ This shortcoming was addressed by \citeauthor{hutto2014vader} who introduced a n
% ursprüngliches paper ITS, wie hat man das früher (davor) gemacht
\subsection{Trend analysis}
When introducing a change to a system (experiment), one often wants to know whether the intervention achieves its intended purpose. This leads to 3 possible outcomes: a) the intervention shows an effect and the system changes in the desired way, b) the intervention shows an effect and the system changes in an undesired way, or c) the system did not react at all to the change. There are multiple ways to determine which of these outcomes occur. To analyze the behavior of the system, data from before and after the intervention as well as the nature of the intervention has to be acquired. The are multiple ways to run such an experiment and one has to choose which type of experiment fits best. There are 2 categories of approaches: actively creating an experiment where one design the experiment before it is executed (for example randomized control trials in medical fields), or using existing data of an experiment that was not designed beforehand, or where setting up a designed experiment is not possible (quasi-experiment).
When introducing a change to a system (experiment), one often wants to know whether the intervention achieves its intended purpose. This leads to 3 possible outcomes: a) the intervention shows an effect and the system changes in the desired way, b) the intervention shows an effect and the system changes in an undesired way, or c) the system did not react at all to the change. There are multiple ways to determine which of these outcomes occur. To analyze the behavior of the system, data from before and after the intervention as well as the nature of the intervention has to be acquired. The are multiple ways to run such an experiment and one has to choose which type of experiment fits best. There are 2 categories of approaches: actively creating an experiment where one designs the experiment before it is executed (for example randomized control trials in medical fields), or using existing data of an experiment that was not designed beforehand, or when setting up a designed experiment is not possible (quasi-experiment).
As this thesis investigates a change that has already been implemented by another party, this thesis covers quasi-experiments. A tool that is often used for this purpose is an \emph{Interrupted Time Series} (ITS) analysis. The ITS analysis is a form of segmented regression analysis, where data from before, after, and during the intervention is regressed with separate line segements\cite{mcdowall2019interrupted}. ITS requires data at (regular) intervals from before and after the intervention (time series). The interrupt signifies the intervention and the time of when it occurred must be known. The intervention can be at a single point in time or it can be stretched out over a certain time span. This property must also be known to take it into account when designing the regression. Also, as the data is acquired from a quasi-experiment, it may be baised\cite{bernal2017interrupted}, for example, seasonality, time-varying confounders (for example, a change in measuring data), variance in the number of single observations grouped together in an interval measurement, etc. These biases need to be addressed if present. Seasonality can be accounted for by subtracting the average value of each of the months in successive years (i.e. subtract the average value of all Januaries in the data set from the values in Januaries).
As this thesis investigates a change that has already been implemented by another party, this thesis covers quasi-experiments. A tool that is often used for this purpose is an \emph{Interrupted Time Series} (ITS) analysis. The ITS analysis is a form of segmented regression analysis, where data from before, after, and during the intervention is regressed with separate line segements\cite{mcdowall2019interrupted}. ITS requires data at (regular) intervals from before and after the intervention (time series). The interrupt signifies the intervention and the time when it occurred must be known. The intervention can be at a single point in time or it can be stretched out over a certain time span. This property must also be known to take into account when designing the regression. Also, as the data is acquired from a quasi-experiment, it may be baised\cite{bernal2017interrupted}, for example, seasonality, time-varying confounders (for example, a change in measuring data), variance in the number of single observations grouped together in an interval measurement, etc. These biases need to be addressed if present. Seasonality can be accounted for by subtracting the average value of each of the months in successive years (i.e. subtract the average value of all Januaries in the data set from the values in Januaries).
%\begin{lstlisting}
% deseasonalized = datasample - average(dataSamplesInMonth(month(datasample)))
%\end{lstlisting}

View File

@@ -4,7 +4,7 @@ StackExchange introduced a \emph{new contributor} indicator to all communities o
\begin{figure}
\centering\includegraphics[scale=0.47]{figures/new_contributor}
\caption{The answer box a potential answerers sees when viewing a question from a new contributor. \copyright{Tim Post, 2018, \url{https://meta.stackexchange.com/users/50049/tim-post}}\footref{post2018come}}
\caption{The answer box potential answerers sees when viewing a question from a new contributor. \copyright{Tim Post, 2018, \url{https://meta.stackexchange.com/users/50049/tim-post}}\footref{post2018come}}
\label{newcontributor}
\end{figure}
@@ -21,7 +21,7 @@ This thesis investigates the following criteria to determine whether the change
\item \textbf{Vote score of questions}. This symbolizes the feedback the community gives to a question. Voters will likely vote more positively (not voting instead of down-voting, or upvoting instead of not voting) due to the \emph{new contributor} indicator. Thereby the vote score should increase after the change.
\item \textbf{Amount of first and follow-up question}. This symbolizes the willingness of users to participate in the community. Higher amounts of first questions indicate a higher number of new participating users. Higher follow-up questions indicate that users are more willing to stay within the community and continue their active participation.
\end{itemize}
If these criteria improve after the change is introduced, the community is affected positively. If they worsen, the community is affected negatively. If the criteria stay largely the same, then the community is unaffected. Here it is important to note that a question may receive answers and votes after the \emph{new contributor} indicator is no longer shown and therefore these are not considered as part of the data set to analyze.
If these criteria improve after the change is introduced, the community is affected positively. If they worsen, the community is affected negatively. If the criteria stay largely the same, then the community is unaffected. Here it is important to note that a question may receive answers and votes after the \emph{new contributor} indicator is no longer shown and therefore these are not considered part of the data set to analyze.
%only when new contributor insicator is shown
@@ -30,11 +30,11 @@ To measure the effect on the sentiment of the change this thesis utilizes the Va
However, just simply looking at the words in a text is not enough and therefore Vader also uses rules to determine how words are used in conjunction with other words. Some words can boost other words. For example, ``They did well.'' is less intense than ``They did extremely well.''. This works for both positive and negative sentences. Moreover, words can have different meanings depending on the context, for instance, ``Fire provides warmth.'' and ``Boss is about to fire an employee.'' This feature is called \emph{Word Sense Disambiguation}.
Furthermore, Vader also detects language features commonly found in social media text which may not be present in other forms of text, for instance, books, or newspapers. Social media texts may contain acronyms, initialisms (for instance \emph{afaik} (as far as I know)), slang words, emojis, caps words (often used to emphasize meaning), punctuation (for instance, \emph{!!!}, and \emph{?!?!}), etc.. These features can convey a lot of meaning and drastically change the sentiment of a text.
Furthermore, Vader also detects language features commonly found in social media text which may not be present in other forms of text, for instance, books, or newspapers. Social media texts may contain acronyms, initialisms (for instance \emph{afaik} (as far as I know)), slang words, emojis, caps words (often used to emphasize meaning), punctuation (for instance, \emph{!!!}, and \emph{?!?!}), etc. These features can convey a lot of meaning and drastically change the sentiment of a text.
After all these features are considered, Vader assigns a sentiment value between -1 and 1 on a continuous range. The sentiment range is divided into 3 classes: negative (-1 to -0.05), neutral (-0.05 to 0.05), and positive (0.05 to 1). The outer edges of this range are rarely reached as the text would have to be extremely negative or positive which is very unlikely.
%speed
Due to this mathematical simplicity, Vader is really fast when computing a sentiment value for a given text. This feature is one of the requirements \citeauthor{hutto2014vader} originally posed. They proposed that Vader shall be fast enough to do online (real-time) analysis of social media text.
Due to this mathematical simplicity, Vader is really fast when computing a sentiment value for a given text. This feature is one of the requirements \citeauthor{hutto2014vader} originally posed. They proposed that Vader shall be fast enough to do an online (real-time) analysis of social media text.
%simplicy
Vader is also easy to use. It does not require any pre-training on a dataset as it already has a human-curated lexicon and rules related to grammar and syntax. Therefore the sentiment analysis only requires an input to evaluate. This thesis uses a publicly available implementation of Vader.\footnote{\url{https://github.com/cjhutto/vaderSentiment}}
The design of Vader allows fast and verifiable analysis.
@@ -49,14 +49,14 @@ The design of Vader allows fast and verifiable analysis.
% sentiment calculation via vaderlib, write whole paragraph and explain, also add ref to paper \cite{hutto2014vader}
\section{Data gathering and preprocessing}
StackExchange provides anonymized data dumps of all their communities for researchers to investigate at no cost on archive.org\footnote{\label{archivestackexchange}\url{https://archive.org/download/stackexchange}}. These data dumps contain users, posts (questions and answers), badges, comments, tags, votes, and a post history containing all versions of posts. Each entry contains the necessary information, for instance, id, creation date, title, body, and how the data is linked together (which user posted a question/answer/comment). However, not all data entries are valid and therefore cannot be used in the analysis, for instance, questions or answers of which the user is unknown, but this only affects a very small amount of entries. So before the actual analysis, the data has to be cleaned. Moreover, the answer texts are in HTML format, containing tags that could skew the sentiment values, and they need to be stripped away beforehand. Additionally, answers may contain code sections which also would skew the results and are therefore omitted.
StackExchange provides anonymized data dumps of all their communities for researchers to investigate at no cost on archive.org\footnote{\label{archivestackexchange}\url{https://archive.org/download/stackexchange}}. These data dumps contain users, posts (questions and answers), badges, comments, tags, votes, and a post history containing all versions of posts. Each entry contains the necessary information, for instance, id, creation date, title, body, and how the data is linked together (which user posted a question/answer/comment). However, not all data entries are valid and therefore cannot be used in the analysis, for instance, questions or answers of which the user is unknown, but this only affects a very small amount of entries. So before the actual analysis, the data has to be cleaned. Moreover, the answer texts are in HTML format, containing tags that could skew the sentiment values, and they need to be stripped away beforehand. Additionally, answers may contain code sections that also would skew the results and are therefore omitted.
% data sets as xml files from archive.org \cite{archivestackexchange}
%cleaning data
% broken entries, missing user id
% answers in html -> strip html and remove code sections, no contribution to sentiment
After preprocessing the raw data, relevant data is filtered and computed. Questions and answers in the data are mixed together and have to be separated and answers have to be linked to their questions. Also, questions in these datasets do not have the \emph{new contributor} indicator attached to them and neither do users. So, the first contribution date and time of users have to be calculated via the creation dates of the questions and answers the user has posted. Then, questions are filtered per user and by whether they are created within the 7-day window after the first contribution of the user. These questions were created during the period where the \emph{new contributor} indicator would have been displayed, in case the questions had been posted before the change, or had been displayed after the change. From these questions, all answers which arrived within the 7-day window are considered for the analysis. Answers which arrived at a later point are excluded as the answerer most likely has not seen the disclaimer shown in figure \ref{newcontributor}. Included answers are then analyzed with Vader and the resulting sentiments are stored. Furthermore, votes to questions of new contributors are counted if they arrived within the 7-day window and count 1 if it is an upvote and -1 if it is a downvote. Moreover, the number of questions new contributors ask, are counted and divided into two classes: 1st-question of a user and follow-up questions of a new contributor.
After preprocessing the raw data, relevant data is filtered and computed. Questions and answers in the data are mixed together and have to be separated and answers have to be linked to their questions. Also, questions in these datasets do not have the \emph{new contributor} indicator attached to them and neither do users. So, the first contribution date and time of users have to be calculated via the creation dates of the questions and answers the user has posted. Then, questions are filtered per user and by whether they are created within the 7-day window after the first contribution of the user. These questions were created during the period where the \emph{new contributor} indicator would have been displayed, in case the questions had been posted before the change, or had been displayed after the change. From these questions, all answers which arrived within the 7-day window are considered for the analysis. Answers which arrived at a later point are excluded as the answerer most likely has not seen the disclaimer shown in figure \ref{newcontributor}. Included answers are then analyzed with Vader and the resulting sentiments are stored. Furthermore, votes to questions of new contributors are counted if they arrived within the 7-day window and count 1 if it is an upvote and -1 if it is a downvote. Moreover, the number of questions new contributors ask is counted and divided into two classes: 1st-question of a new contributor and the follow-up questions of a new contributor.
% calc sentiment for answers
% questions do not have a tag if from a new contribtor -> calc first contributor
@@ -72,9 +72,9 @@ After preprocessing the raw data, relevant data is filtered and computed. Questi
\section{Analysis}
An interrupted time series (ITS) analysis captures trends before and after a change in a system and fits very well with the question this thesis investigates. ITS can be applied to a large variety of data if the data contains the same kind of data points before and after the change and when the change date and time are known. \citeauthor{bernal2017interrupted} published a paper on how ITS works \cite{bernal2017interrupted}. ITS performs well on medical data, for instance, when a new treatment is introduced ITS can visualize if the treatment improves a condition. For ITS no control group is required and often control groups are not feasible. ITS only works with the before and after data and a point in time where a change was introduced.
An interrupted time series (ITS) analysis captures trends before and after a change in a system and fits very well with the question this thesis investigates. ITS can be applied to a large variety of data if the data contains the same kind of data points before and after the change and when the change date and time are known. \citeauthor{bernal2017interrupted} published a paper on how ITS works \cite{bernal2017interrupted}. ITS performs well on medical data, for instance, when a new treatment is introduced ITS can visualize if the treatment improves a condition. For ITS no control group is required and often control groups are not feasible. ITS only works with the before and after data and a point in time when a change is introduced.
ITS relies on linear regression and tries to fit a three-segment linear function to the data. The authors also described cases where more than three segments are used but these models quickly raise the complexity of the analysis and for this thesis a three-segment linear regression is sufficient. The three segments are lines to fit the data before and after the change as well as one line to connect the other two lines at the change date. Figure \ref{itsexample} shows an example of an ITS. Each segment is captured by a tensor of the following formula $Y_t = \beta_0 + \beta_1T + \beta_2X_t + \beta_3TX_t$, where $T$ represents time as a number, for instance, number of months since the start of data recording, $X_t$ represents 0 or 1 depending on whether the change is in effect, $\beta_0$ represents the value at $T = 0$, $\beta_1$ represents the slope before the change, $\beta_2$ represents the value when the change is introduced, and $\beta_3$ represents the slope after the change.
ITS relies on linear regression and tries to fit a three-segment linear function to the data. The authors also described cases where more than three segments are used but these models quickly raise the complexity of the analysis and for this thesis a three-segment linear regression is sufficient. The three segments are lines to fit the data before and after the change as well as one line to connect the other two lines at the change date. Figure \ref{itsexample} shows an example of an ITS. Each segment is captured by a tensor of the following formula $Y_t = \beta_0 + \beta_1T + \beta_2X_t + \beta_3TX_t$, where $T$ represents time as a number, for instance, the number of months since the start of data recording, $X_t$ represents 0 or 1 depending on whether the change is in effect, $\beta_0$ represents the value at $T = 0$, $\beta_1$ represents the slope before the change, $\beta_2$ represents the value when the change is introduced, and $\beta_3$ represents the slope after the change.
Contrary to the basic method explained in \cite{bernal2017interrupted} where the ITS is performed on aggregated values per month, this thesis performs the ITS on single data points, as the premise that the aggregated values all have the same weight within a certain margin is not fulfilled for sentiment and vote score values. Performing the ITS with aggregated values would skew the linear regression more towards data points with less weight. Single data point fitting prevents this, as weight is taken into account with more data points. To filter out seasonal effects, the average value of all data points with the same month of all years is subtracted from the data points (i.e. subtract the average value of all Januaries from each data point in a January). This thesis uses the least-squares method for regression.
@@ -88,7 +88,7 @@ Although the ITS analysis takes data density variability and seasonality into ac
%3 segment example like it will be used later
% with lower sentiment first and higher sentiment after the change
%
For demonstration purposes, this section shows how to create a synthetic example for an ITS analysis. The example has 3 segments, equal to the number of segments that will be used in the analysis in the next sections. In this example, the sentiment is lower before the change occurs and high after the change has occurred. This example also includes data density variablily, i.e., there are a different amount of data points for each month. The example is shown visually in figure \ref{itsexample} is generated by the following algorithm:
For demonstration purposes, this section shows how to create a synthetic example for an ITS analysis. The example has 3 segments, equal to the number of segments that will be used in the analysis in the next sections. In this example, the sentiment is lower before the change occurs and high after the change has occurred. This example also includes data density variability, i.e., there is a different amount of data points for each month. The example shown visually in figure \ref{itsexample} is generated by the following algorithm:
\begin{itemize}
\item Select time frame: for instance, 15 months before and after the change
\item Select base values: before the change choose a base value of $0.10$ and after the change choose a base value of $0.15$
@@ -96,7 +96,7 @@ For demonstration purposes, this section shows how to create a synthetic example
\item Choose sample size (data density): choose a random sample size in $[200, 400)$ for each month and duplicate the value from the previous step by the sample size in each month respectively
\item Compute the ITS: while taking data density variability into account
\end{itemize}
This algorihm generates an ITS where the line before the change is on a lower level than the line after the change. However, this algorithm does not control the slopes of the segments before and after the change. The slopes of the lines in \ref{itsexample} are random. The algorithm could be extended to also control the slopes of the lines, however, for demonstration purposes in this thesis this is enough.
This algorithm generates an ITS where the line before the change is on a lower level than the line after the change. However, this algorithm does not control the slopes of the segments before and after the change. The slopes of the lines in \ref{itsexample} are random. The algorithm could be extended to also control the slopes of the lines, however, for demonstration purposes in this thesis, this is enough.
\begin{figure}

View File

@@ -16,7 +16,7 @@ This thesis investigates the largest datasets available and includes the dataset
\item tex.stackexchange.com
\item unix.stackexchange.com
\end{itemize}
These datasets are selected due to their size as larger datasets yield more consistent results. Smaller datasets may be too sparse to take any meaningful conclusions. Also, outliers would influence the results more when compared to outliner in bigger datasets. The dataset contain all the necessary data since the creation of the respective community and until the last day of February 2020.
These datasets are selected due to their size as larger datasets yield more consistent results. Smaller datasets may be too sparse to take any meaningful conclusions. Also, outliers would influence the results more when compared to outliers in bigger datasets. The datasets contain all the necessary data since the creation of the respective community and until the last day of February 2020.
% from archive.org \cite{archivestackexchange}
% list of datasets
@@ -49,7 +49,7 @@ New users asked 2880039 questions with an average of 1.240 questions per new use
``Mathematics Stack Exchange is a question and answer site for people studying math at any level and professionals in related fields.''\footnote{\url{https://math.stackexchange.com/}}
The community has 624671 registered users of which 17074 were active between December 2019 and February 2020.
Members asked 1170938 questions in total and gave 1565188 answers with an average answer density of 1.336 answers per question.
New users asked 265704 questions with an average of 1.336 questions per new user during their first week after first contribution.
New users asked 265704 questions with an average of 1.336 questions per new user during their first week after their first contribution.
\begin{figure}[H]
\begin{subfigure}[c]{0.5\textwidth}
@@ -68,7 +68,7 @@ New users asked 265704 questions with an average of 1.336 questions per new user
MathOverflow.net is a rather small community for professional mathematicians.
The community has 105471 registered users of which 1501 were active between December 2019 and February 2020.
Members asked 108083 questions in total and gave 144918 answers with an average answer density of 1.34 answers per question.
New users asked 23746 questions with an average of 1.131 questions per new user during their first week after first contribution.
New users asked 23746 questions with an average of 1.131 questions per new user during their first week after their first contribution.
\begin{figure}[H]
\begin{subfigure}[c]{0.5\textwidth}
@@ -87,7 +87,7 @@ New users asked 23746 questions with an average of 1.131 questions per new user
AskUbuntu.com is a rather small community for Ubuntu users and developers.
The community has 783614 registered users of which 7033 were active between December 2019 and February 2020.
Members asked 334194 questions in total and gave 418051 answers with an average answer density of 1.25 answers per question.
New users asked 157018 questions with an average of 1.101 questions per new user during their first week after first contribution.
New users asked 157018 questions with an average of 1.101 questions per new user during their first week after their first contribution.
\begin{figure}[H]
\begin{subfigure}[c]{0.5\textwidth}
@@ -106,7 +106,7 @@ New users asked 157018 questions with an average of 1.101 questions per new user
ServerFault.com is a rather small community for system and network administrators.
The community has 451180 registered users of which 3947 were active between December 2019 and February 2020.
Members asked 274564 questions in total and gave 432334 answers with an average answer density of 1.574 answers per question.
New users asked 88547 questions with an average of 1.106 questions per new user during their first week after first contribution.
New users asked 88547 questions with an average of 1.106 questions per new user during their first week after their first contribution.
\begin{figure}[H]
\begin{subfigure}[c]{0.5\textwidth}
@@ -125,7 +125,7 @@ New users asked 88547 questions with an average of 1.106 questions per new user
SuperUser.com is a rather small community for computer enthusiasts and power users.
The community has 861533 registered users of which 7392 were active between December 2019 and February 2020.
Members asked 424718 questions in total and gave 587559 answers with an average answer density of 1.383 answers per question.
New users asked 161397 questions with an average of 1.085 questions per new user during their first week after first contribution.
New users asked 161397 questions with an average of 1.085 questions per new user during their first week after their first contribution.
\begin{figure}[H]
\begin{subfigure}[c]{0.5\textwidth}
@@ -144,7 +144,7 @@ New users asked 161397 questions with an average of 1.085 questions per new user
electronics.stackexchange.com is a rather small community for electrical engineering.
The community has 184795 registered users of which 3172 were active between December 2019 and February 2020.
Members asked 130025 questions in total and gave 221811 answers with an average answer density of 1.705 answers per question.
New users asked 47035 questions with an average of 1.126 questions per new user during their first week after first contribution.
New users asked 47035 questions with an average of 1.126 questions per new user during their first week after their first contribution.
\begin{figure}[H]
\begin{subfigure}[c]{0.5\textwidth}
@@ -163,7 +163,7 @@ New users asked 47035 questions with an average of 1.126 questions per new user
``Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.''\footnote{\url{https://stats.stackexchange.com/}}
The community has 227032 registered users of which 4485 were active between December 2019 and February 2020.
Members asked 151777 questions in total and gave 148046 answers with an average answer density of 0.975 answers per question.
New users asked 57636 questions with an average of 1.112 questions per new user during their first week after first contribution.
New users asked 57636 questions with an average of 1.112 questions per new user during their first week after their first contribution.
\begin{figure}[H]
\begin{subfigure}[c]{0.5\textwidth}
@@ -182,7 +182,7 @@ New users asked 57636 questions with an average of 1.112 questions per new user
tex.stackexchange.com is a rather small community for TEX and related typesetting systems.
The community has 171867 registered users of which 3280 were active between December 2019 and February 2020.
Members asked 188860 questions in total and gave 227875 answers with an average answer density of 1.206 answers per question.
New users asked 59692 questions with an average of 1.191 questions per new user during their first week after first contribution.
New users asked 59692 questions with an average of 1.191 questions per new user during their first week after their first contribution.
\begin{figure}[H]
\begin{subfigure}[c]{0.5\textwidth}
@@ -201,7 +201,7 @@ New users asked 59692 questions with an average of 1.191 questions per new user
unix.stackexchange.com is a rather small community for Linux and Unix-like operating systems.
The community has 356498 registered users of which 4565 were active between December 2019 and February 2020.
Members asked 174625 questions in total and gave 256007 answers with an average answer density of 1.466 answers per question.
New users asked 62437 questions with an average of 1.124 questions per new user during their first week after first contribution.
New users asked 62437 questions with an average of 1.124 questions per new user during their first week after their first contribution.
\begin{figure}[H]
\begin{subfigure}[c]{0.5\textwidth}

View File

@@ -6,7 +6,7 @@ This section shows the results of the experiments described in section 3 on the
In diagrams (a), the blue line states the average sentiment (\emph{average sentiment} in diagram legend) of the answers to questions from new contributors. Also, the numbers attached to the blue line indicate the number of answers to questions from new users that formed the average sentiment. The orange line (\emph{sm single ITS} in the diagram legend) represents the ITS over the whole period of the available data. As stated in section 3.2, data density variability is a factor to take into account, therefore, the orange line represents the weighted ITS. The green, red, purple, and brown lines also represent weighted ITS, however, the time periods considered for ITS before and after the change are limited to 6, 9, 12, and 15 months respectively.
Similarly, in diagram (b), the blue line represents the average vote score of the questions of new users. The number attached to the blue line indicates the number of questions that formed the average vote score. The ITS (orange, green, red, purple, and brown lines) are computed the same way as in diagrams (a).
Similarly, in diagrams (b), the blue line represents the average vote score of the questions of new users. The number attached to the blue line indicates the number of questions that formed the average vote score. The ITS (orange, green, red, purple, and brown lines) are computed the same way as in diagrams (a).
In diagrams (c), the blue line represents the number of 1st questions from new users, whereas the orange line denotes the follow-up questions from new users. The green and red lines
represent the ITS of the blue and orange lines respectively. In these diagrams, no weighting is performed as each data point has equivalent weight.
@@ -18,14 +18,14 @@ StackOverflow shows a very slight decrease in the average sentiment of time befo
The average vote score rises right before and stays fairly constant after the change. The rise of the vote score before the change indicates an outside factor other than the inspected change that improved the vote score. By looking at the ITS itself, the change heightens the base level of the vote score, but the trend is the same after the change, indicating the change did not bring a long-term effect. Either way, the vote score is not affected by the change.
The number of questions from new contributors increases after the change while before the change is fairly constant. The number of follow-up questions from new contributors declines before the change and rises after the change. Both ITS show that new contributors ask more questions than before. The graph shows a good example of seasonality in data \cite{bernal2017interrupted}. The month 0 indicates August. For the 1st questions, the months -44, -32, -20, -8, 4, and 16 are all local minima. These months are all in December. The months -31 to -27, -19 to -15, -7 to -3, and 5 to 9 show a pattern with 3 upward spikes. During December, the people of large portions of the world are going through a holiday season which may likely explain these regular dips in contribution. The number of follow up questions also shows dips in months of December.
The number of questions from new contributors increases after the change while before the change is fairly constant. The number of follow-up questions from new contributors declines before the change and rises after the change. Both ITS show that new contributors ask more questions than before. The graph shows a good example of seasonality in data \cite{bernal2017interrupted}. The month 0 indicates August. For the 1st questions, the months -44, -32, -20, -8, 4, and 16 are all local minima. These months are all in December. The months -31 to -27, -19 to -15, -7 to -3, and 5 to 9 show a pattern with 3 upward spikes. During December, the people of large portions of the world are going through a holiday season which may likely explain these regular dips in contribution. The number of follow-up questions also shows dips in the months of December.
In summary, the sentiment improves, the vote score is unaffected, and the number of questions asked by new contributors improves, suggesting that the community benefits from the change.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../stackoverflow.com/output/its/average_sentiments-i1.png}
\caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on StackOverflow.com}
\caption{An interrupted time series analysis of the sentiments of answers to questions created by new contributors on StackOverflow.com}
\label{stackoverflow_its}
\end{subfigure}
\begin{subfigure}[t]{0.5\textwidth}
@@ -50,7 +50,7 @@ In summary, the sentiment improves, the vote score is unaffected, and the number
\section{AskUbuntu.com}
AskUbuntu sees a decrease in average sentiments prior to the change. After the introduction of the change, the ITS dips but sentiments keep rising drastically since then, indicating the change has a positive effect.
The vote score has a huge range of values prior to and after the change. Prior to the change, the vote score averages to a nearly constant trend. However, after the change, the trend takes a turn downwards. The graph indicates the vote score is lower after the change, however, due to the huge value fluctuation, a clear conclusion, whether the change does improve the vote score or not, cannot be reached.
The vote score has a huge range of values prior to and after the change. Prior to the change, the vote score averages a nearly constant trend. However, after the change, the trend takes a turn downwards. The graph indicates the vote score is lower after the change, however, due to the huge value fluctuation, a clear conclusion, whether the change does improve the vote score or not, cannot be reached.
Contrary, the number of questions asked by new users improve after the change.
The number of 1st questions slightly decreases prior to the change and starts rising after the change. The number of follow-up questions stabilizes from a slightly decreasing trend. This indicates more new users ask their first question.
@@ -59,7 +59,7 @@ In summary, the sentiment does improve after the change, as well as the number o
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../askubuntu.com/output/its/average_sentiments-i1.png}
\caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on AskUbuntu.com}
\caption{An interrupted time series analysis of the sentiments of answers to questions created by new contributors on AskUbuntu.com}
\label{ubuntu_its}
\end{subfigure}
\begin{subfigure}[t]{0.5\textwidth}
@@ -82,18 +82,18 @@ In summary, the sentiment does improve after the change, as well as the number o
% maybe: sentiments did not change drastically as seen in maths communities
\section{ServerFault.com}
ServerFault shows gradually rising average sentiments prior to the change. At the time of the change, the regression makes a jump upward and the average sentiment decreases slowly afterward. The sentiment stays largly the same before and after the change. Even though it is slowly rising at first and falling after the change, due to the small jump in sentiment at the change date, overall the sentiment value is pretty stable.
ServerFault shows gradually rising average sentiments prior to the change. At the time of the change, the regression makes a jump upward and the average sentiment decreases slowly afterward. The sentiment stays largely the same before and after the change. Even though it is slowly rising at first and is falling after the change, due to the small jump in sentiment at the change date, overall the sentiment value is pretty stable.
The vote score falls prior to the change, made a huge jump upward, and quickly returns to the levels just prior to the change. Even though, the leap at the change date is big and the ITS fits the data very well, the vote score does not improve in the long-term after the change.
The vote score falls prior to the change, made a huge jump upward, and quickly returns to the levels just prior to the change. Even though the leap at the change date is big and the ITS fits the data very well, the vote score does not improve in the long term after the change.
Despite, sentiment and vote score not being affected in the long run, the number of 1st questions sees a drastic change and improve dramatically. Prior to the change, the number of 1st questions decreases steadily, while after the change the numbers increase at the same pace as they fall prior to the change.
Despite, sentiment and vote score not being affected in the long run, the number of 1st questions sees a drastic change and improves dramatically. Prior to the change, the number of 1st questions decreases steadily, while after the change the numbers increase at the same pace as they fall prior to the change.
The number of follow-up questions also sees the same course direction, falling prior to and raising after the change, albeit not the change is not as drastic.
In summarizing, even though the sentiment and vote score are not really affected, the turn in the number of first question and follow-up questions indicates that the change positively affected the community.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../serverfault.com/output/its/average_sentiments-i1.png}
\caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on ServerFault.com}
\caption{An interrupted time series analysis of the sentiments of answers to questions created by new contributors on ServerFault.com}
\label{fault_its}
\end{subfigure}
\begin{subfigure}[t]{0.5\textwidth}
@@ -125,7 +125,7 @@ In summary, the sentiment improves after the change, the vote score is inconclus
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../stats.stackexchange.com/output/its/average_sentiments-i1.png}
\caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on stats.stackexchange.com}
\caption{An interrupted time series analysis of the sentiments of answers to questions created by new contributors on stats.stackexchange.com}
\label{stats_its}
\end{subfigure}
\begin{subfigure}[t]{0.5\textwidth}
@@ -152,13 +152,13 @@ On tex.stackexchange.com the average sentiment is low compared to the other inve
In stark contrast, the vote score shows a downward trend. The vote score is on a continuous downward trend with a peek around the change date but the vote score does not improve in the long term. Although there is a short window around the change date where vote scores are higher compared to before and after the change, this is not a result of the change but a coincidence. The vote score increases several months before the change actually occurs. The continuous downward trend with a peek around the change date does not indicate that the vote score improves in the long term. Either way, this indicates the change did not affect the vote score.
The amount of 1st questions improved after the change and turned the downward trend into an upward trend with the same grade. The number of follow-up questions does not see an improvement and continues the downward trend like before the change. This shows that more new contributors ask their 1st question than before, however, they still tend to become one-day-flies \cite{slag2015one}. Alos, the number of the 1st questions, the months of -44, -32, -20, -8, 4, and 16 are local minima, indicating seasonality in the data \cite{bernal2017interrupted}. Theses months are all in December where the people of large parts of the world are on holiday.
The amount of 1st questions improved after the change and turned the downward trend into an upward trend with the same grade. The number of follow-up questions does not see an improvement and continues the downward trend like before the change. This shows that more new contributors ask their 1st question than before, however, they still tend to become one-day-flies \cite{slag2015one}. Also, the number of the 1st questions, the months of -44, -32, -20, -8, 4, and 16 are local minima, indicating seasonality in the data \cite{bernal2017interrupted}. These months are all in December when the people of large parts of the world are on holiday.
In summary, the sentiment improves, the vote score is unaffected, and the number of 1st questions does improve, suggesting that the community benefits from the change.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../tex.stackexchange.com/output/its/average_sentiments-i1.png}
\caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on tex.stackexchange.com}
\caption{An interrupted time series analysis of the sentiments of answers to questions created by new contributors on tex.stackexchange.com}
\label{tex_its}
\end{subfigure}
\begin{subfigure}[t]{0.5\textwidth}
@@ -192,7 +192,7 @@ In summary, the sentiment improves, the vote score is unaffected, and the number
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../unix.stackexchange.com/output/its/average_sentiments-i1.png}
\caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on unix.stackexchange.com}
\caption{An interrupted time series analysis of the sentiments of answers to questions created by new contributors on unix.stackexchange.com}
\label{unix_its}
\end{subfigure}
\begin{subfigure}[t]{0.5\textwidth}
@@ -233,13 +233,13 @@ On math.stackexchange.com the sentiment decreased before and after the change. E
Similarly, the vote score does not improve either and keeps decreasing after the change. The decrease slows down a little after the change. Also, the vote score rises several months before the change, indicating an effect of an unrelated cause. The vote score analysis is inconclusive.
Contrary, the number of questions asked by new contributors does improve. The number of 1st questions seems to stabilize a bit and is only decreasing slowly. The number of follow-up question even reverses the trend and start increasing after the change. This shows a good example of seasonality in data \cite{bernal2017interrupted}. The month 0 indicates August. For the 1st questions, the months -44, -32, -20, -8, 4, and 16 are all a local minimum. These months are Decembers. Similarly, the months -38 and -37, -26 and -25, -14 and -13, -2 and -1, and 10 and 11 are all in June and July. During both these times the people large portions of the world are going through a holiday season which may likely explain these regular dips in contribution. The graph for the follow up questions also shows dips at the same times, although the dips in December are not always as decernible.
Contrary, the number of questions asked by new contributors does improve. The number of 1st questions seems to stabilize a bit and is only decreasing slowly. The number of follow-up question even reverses the trend and start increasing after the change. This shows a good example of seasonality in data \cite{bernal2017interrupted}. The month 0 indicates August. For the 1st questions, the months -44, -32, -20, -8, 4, and 16 are all a local minimum. These months are all in December. Similarly, the months -38 and -37, -26 and -25, -14 and -13, -2 and -1, and 10 and 11 are all in June and July. During both these times, the people large portions of the world are going through a holiday season which may likely explain these regular dips in contribution. The graph for the follow-up questions also shows dips at the same times, although the dips in December are not always as discernible.
In summary, the sentiment and vote score does not seem to be affected, however, the number of question from new contributors tends to improve. This shows users seem to be more willing to interact with the community, even though the sentiment of the interactions still decreases. The change does not indicate a clear improvement according to its goal.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../math.stackexchange.com/output/its/average_sentiments-i1.png}
\caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on math.stackexchange.com}
\caption{An interrupted time series analysis of the sentiments of answers to questions created by new contributors on math.stackexchange.com}
\label{math_its}
\end{subfigure}
\begin{subfigure}[t]{0.5\textwidth}
@@ -265,13 +265,13 @@ On MathOverflow the sentiment shows a constant regression before the change, how
The votes score steadily increases prior to the change and then quickly returns to the level from 3 years before the change. However, the vote score does not change in course at the change date but several months after the change is introduced, leading to an inconclusive result.
Contrary, the number of questions asked by new contributors does improve. The number of 1st questions falls prior to the change and stabilizes to a constant trend thereafter. However, the number of follow-up questions that is constant before the change starts decreasing after the change. The number of the 1st questions, the months of -41, -29, -17, -5, and 7 are local maxima, indicating seasonality in the data \cite{bernal2017interrupted}. Theses months are all in March. Also while the number of 1st questions stablized to a constant trend, the number of followup questions descreases, indicating that the new users tend more to become one-day-flies as time passed on \cite{slag2015one}.
Contrary, the number of questions asked by new contributors does improve. The number of 1st questions falls prior to the change and stabilizes to a constant trend thereafter. However, the number of follow-up questions that is constant before the change starts decreasing after the change. The number of the 1st questions, the months of -41, -29, -17, -5, and 7 are local maxima, indicating seasonality in the data \cite{bernal2017interrupted}. These months are all in March. Also while the number of 1st questions stabilized to a constant trend, the number of follow-up questions descreases, indicating that the new users tend more to become one-day-flies as time passed on \cite{slag2015one}.
In summary, the sentiment, vote score, and the number of follow-up questions are affected negatively. Only the number of 1st questions from new contributors trend stabilizes. The change does not indicate a clear improvement according to its goal. This data set is sparse compared to the other datasets. Also, the vote scores are high compared to other datasets.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../mathoverflow.net/output/its/average_sentiments-i1.png}
\caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on MathOverflow.net}
\caption{An interrupted time series analysis of the sentiments of answers to questions created by new contributors on MathOverflow.net}
\label{matho_its}
\end{subfigure}
\begin{subfigure}[t]{0.5\textwidth}
@@ -297,13 +297,13 @@ On electronics.stackexchange.com the average sentiment decreases continuously pr
Similarly, the vote score trend does not improve either and keeps decreasing after the change, however, the vote score does make a big leap upwards at the change. The vote score is trend is not affected by the change.
The number of 1st questions rises continuously prior to the change but decreases thereafter. The number of follow-up questions falls slightly prior to the change and stabilizes afterward. This indicates fewer new users, and that the change negatively impacted the number of new users. However, the number of followup questions increases slighly after the change. Eventhough the number of new user decreases after the change the amount of followup questions incrases, indicating the number of one-day-flies decreases \cite{slag2015one}.
The number of 1st questions rises continuously prior to the change but decreases thereafter. The number of follow-up questions falls slightly prior to the change and stabilizes afterward. This indicates fewer new users, and that the change negatively impacted the number of new users. However, the number of follow-up questions increases slightly after the change. Even though the number of new users decreases after the change the amount of follow-up questions increases, indicating the number of one-day-flies decreases \cite{slag2015one}.
In summary, the sentiment does not seem to be affected. The vote score continues its downward trend although on a higher level than before. The number of questions from new contributors trend does not show real improvements. This indicates that the change does not clearly improve interactions within the community.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../electronics.stackexchange.com/output/its/average_sentiments-i1.png}
\caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on electronics.stackexchange.com}
\caption{An interrupted time series analysis of the sentiments of answers to questions created by new contributors on electronics.stackexchange.com}
\label{ele_its}
\end{subfigure}
\begin{subfigure}[t]{0.5\textwidth}
@@ -328,7 +328,7 @@ In summary, the sentiment does not seem to be affected. The vote score continues
\section{SuperUser.com}
SuperUser shows only sightly decreasing average sentiment and vote score up to the change. At the change time the regressions take a dip down and the regression shows a downward trend after the change. However, the huge drop in sentiment and vote score does not align with the change date but happens 4 months after the change.
In the same time frame the number of 1st questions skyrockets to more than triple the previous levels. This is similar to the feature found in the results from stats.stackexchange.com, although this example is much more pronounced. This feature also seems to be produced by the huge influx of new users to the community. As described in \cite{lin2017better}, the quality of interactions in the community dip for a while but recovers over time. The sentiment recovers after about 13 months. The vote score also starts to recover at the same time, however not as quickly as the sentiment value. Due to this spite in the number of new users, the analysis does not yield any meaningful results.
In the same time frame the number of 1st questions skyrockets to more than triple the previous levels. This is similar to the feature found in the results from stats.stackexchange.com, although this example is much more pronounced. This feature also seems to be produced by the huge influx of new users to the community. As described in \cite{lin2017better}, the quality of interactions in the community dip for a while but recovers over time. The sentiment recovers after about 13 months. The vote score also starts to recover at the same time, but not as quickly as the sentiment value. Due to this spike in the number of new users, the analysis does not yield any meaningful results.
The number of 1st questions decreases prior to the change and then goes through the roof indicating a huge wave of new users indicating a drastic influx of new users. Data available in the future will show if the recovery at the end of the timeframe is persistent. Even though a lot of new users joined the community, the number of follow-up questions stayed largely the same.
@@ -336,7 +336,7 @@ In summary, the sentiment and vote score analysis does not yield a meaningful re
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../superuser.com/output/its/average_sentiments-i1.png}
\caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on SuperUser.com}
\caption{An interrupted time series analysis of the sentiments of answers to questions created by new contributors on SuperUser.com}
\label{super_its}
\end{subfigure}
\begin{subfigure}[t]{0.5\textwidth}

View File

@@ -1,34 +1,34 @@
\chapter{Discussion}
%TODO ~1 ref/paragraph
The ITS analysis of the investigated communities shows mixed results. Some communities show an improvment in the measured qualities while others are not affected at all or show a decrease in these qualitities. By and large, the majority of the investigated communities benefit from the change while the minority sees either no change or a change for the worse. Some communities show interesting features unrelated to the analysis which will also be mentioned in this chapter.
The ITS analysis of the investigated communities shows mixed results. Some communities show an improvement in the measured qualities while others are not affected at all or show a decrease in these qualities. By and large, the majority of the investigated communities benefit from the change while the minority sees either no change or a change for the worse. Some communities show interesting features unrelated to the analysis which will also be mentioned in this chapter.
\section{Benefitters}
There are 6 communities that profit of the change in some form: StackOverflow, AskUbuntu, ServerFault, stats.stackexchange.com, tex.stackexchange.com, and unix.stackexchange.com.
There are 6 communities that profit from the change in some form: StackOverflow, AskUbuntu, ServerFault, stats.stackexchange.com, tex.stackexchange.com, and unix.stackexchange.com.
The StackOverflow community has a fairly stable average sentiment before the change. The average sentiment jumps to a higher level and keeps rising after the change is introduced. Furthermore, the number of 1st questions from new contributors starts rising drastically after the change while prior levels stagnate. Also, the follow-up questions start increasing slightly. The votes score trend takes a new direction 9 months before the change and is unrelated to it. The change has a positive effect on the StackOverflow community. The graph with number of questions from new contributors shows a good example of seasonality in data \cite{bernal2017interrupted}. The month 0 indicates August. For the 1st questions, the months -44, -32, -20, -8, 4, and 16 are all a local minimum. These months are Decembers. The months -31 to -27, -19 to -15, -7 to -3, and 5 to 9 show a pattern with 3 upward spikes. During December, the people of large portions of the world are going through a holiday season which may likely explain these regular dips in contribution. The graph for the follow up questions also shows dips in months of December.
The StackOverflow community has a fairly stable average sentiment before the change. The average sentiment jumps to a higher level and keeps rising after the change is introduced. Furthermore, the number of 1st questions from new contributors starts rising drastically after the change while prior levels stagnate. Also, the follow-up questions start increasing slightly. The votes score trend takes a new direction 9 months before the change and is unrelated to it. The change has a positive effect on the StackOverflow community. The graph with the number of questions from new contributors shows a good example of seasonality in data \cite{bernal2017interrupted}. The month 0 indicates August. For the 1st questions, the months -44, -32, -20, -8, 4, and 16 are all a local minimum. These months are all in December. The months -31 to -27, -19 to -15, -7 to -3, and 5 to 9 show a pattern with 3 upward spikes. During December, the people of large portions of the world are going through a holiday season which may likely explain these regular dips in contribution. The graph for the follow-up questions also shows dips in the months of December.
AskUbuntu shows an interesting zig-zag pattern in the average sentiment graph every 20 months. However, this is not a seasonal effect, as seasonal effects are base on a 12 month cycle \cite{bernal2017interrupted}. Also, the average sentiment falls before the change and raises thereafter, indicating that the change works for this community. However, further data is needed to see if the zig-zag pattern repeats itself. The number of 1st questions starts increasing again after the change stopping the downward trend before that.
AskUbuntu shows an interesting zig-zag pattern in the average sentiment graph every 20 months. However, this is not a seasonal effect, as seasonal effects are based on a 12-month cycle \cite{bernal2017interrupted}. Also, the average sentiment falls before the change and raises thereafter, indicating that the change works for this community. However, further data is needed to see if the zig-zag pattern repeats itself. The number of 1st questions starts increasing again after the change stopping the downward trend before that.
On stats.stackexchange.com the average sentiment falls before the change but since the change, the downward trend stops and the sentiment starts to rise slowly, suggesting the change has a positive effect on the community. This is supported by the increase in the number of 1st and followup questions by new contributors. The vote score takes a dip after the change but starts to recover after 12 months which could be the result of another factor. The same time frame the number of 1st questions increases a lot which means a more new contributors contribute to the community. Due to this influx of new users the community metrics suffer of a period of time but recover afterward. This effect is also described in \cite{lin2017better} however the cause and effect in this case are not as pronounced.
On stats.stackexchange.com the average sentiment falls before the change but since the change, the downward trend stops and the sentiment starts to rise slowly, suggesting the change has a positive effect on the community. This is supported by the increase in the number of 1st and follow-up questions by new contributors. The vote score takes a dip after the change but starts to recover after 12 months which could be the result of another factor. In the same time frame, the number of 1st questions increases a lot which means more new contributors contribute to the community. Due to this influx of new users, the community metrics suffer for a period of time but recover afterward. This effect is also described in \cite{lin2017better} however the cause and effect, in this case, are not as pronounced.
In the tex.stackexchange.com community sentiments are stable before the change and show a stark rising pattern after the change. The change seems to work for this community but future data will be necessary to see if the rising pattern continues in the shown manner. The votes score ITS does not fit the model and values before and after the change indicate a linear downward trend. However, the number of 1st questions increases slightly after the change while the prior trend shows a decreasing development. The number of followup questions still continues a downward trend, indicating that the new contributors tend to become one-day-flies \cite{slag2015one}. By looking at the the graph of the 1st questions, the months of -44, -32, -20, -8, 4, and 16 are local minima, indicating seasonality in the data \cite{bernal2017interrupted}. Theses months are all in December where the people of large parts of the world are on holiday.
In the tex.stackexchange.com community sentiments are stable before the change and show a stark rising pattern after the change. The change seems to work for this community but future data will be necessary to see if the rising pattern continues in the shown manner. The votes score ITS does not fit the model and values before and after the change indicate a linear downward trend. However, the number of 1st questions increases slightly after the change while the prior trend shows a decreasing development. The number of follow-up questions still continues a downward trend, indicating that the new contributors tend to become one-day-flies \cite{slag2015one}. By looking at the graph of the 1st questions, the months of -44, -32, -20, -8, 4, and 16 are local minima, indicating seasonality in the data \cite{bernal2017interrupted}. These months are all in December when the people of large parts of the world are on holiday.
unix.stackexchange.com also shows a decreasing pattern prior and a rising pattern in sentiment after the change. The vote score analysis shows a fairly linear downward trend before and after the change and is not affected by it. However, the number of 1st questions by new contributors starts to drastically increase while before the change the levels are constant, indicating this community also profits from the change. %TODO 1 ref, nothing found
unix.stackexchange.com also shows a decreasing pattern prior to and a rising pattern in sentiment after the change. The vote score analysis shows a fairly linear downward trend before and after the change and is not affected by it. However, the number of 1st questions by new contributors starts to drastically increase while before the change the levels are constant, indicating this community also profits from the change. %TODO 1 ref, nothing found
On ServerFault the sentiment rises gradually before the change, jumps upward by a small value when the change is introduced and the sentiment falls slowly thereafter but the levels are pretty stable over the analyzed period. The vote scores show the change has a huge impact on the community. The previously decreasing trend jumps up by a large amount. However, the vote score rapidly returns to levels right before the change. Contrary, the number of first questions turns direction and starts increasing at the same rate it is falling previously. %TODO 1 ref, nothing found
On ServerFault the sentiment rises gradually before the change, jumps upward by a small value when the change is introduced and the sentiment falls slowly thereafter but the levels are pretty stable over the analyzed period. The vote scores show the change has a huge impact on the community. The previously decreasing trend jumps up by a large amount. However, the vote score rapidly returns to levels right before the change. Contrary, the number of first questions turns in direction and starts increasing at the same rate it is falling previously. %TODO 1 ref, nothing found
%~ - -
\section{No benefits/no evidence}
The other 4 communities do not seem to profit from the change so clearly: Mathoverflow, math.stackexchange.com, electronics.stackexchange.com, and SuperUser. Some of these communities still improve in certain aspects but the overall picture of the analysis does not allow an improving conclusion.
The average sentiment stays constant on MathOverflow before the change and decreases afterward. The sentiment levels start increasing six months before the change and are unrelated. However, the sentiment falls sharply at the change date, indicating the sentiment values are affected negatively by the change. The vote score is steadily increasing before the change and the crashes down shortly after the change. However, the vote score is very high compared to other communities. The number of 1st questions stabilizes after the change compared to the slight downward previously. By looking at the the graph of the 1st questions, the months of -41, -29, -17, -5, and 7 are local maxima, indicating seasonality in the data \cite{bernal2017interrupted}. Theses months are all in March. Also while the number of 1st questions stablized to a constant trend, the number of followup questions descreases, indicating that the new users tend more to become one-day-flies as time passed on \cite{slag2015one}.
The average sentiment stays constant on MathOverflow before the change and decreases afterward. The sentiment levels start increasing six months before the change and are unrelated. However, the sentiment falls sharply at the change date, indicating the sentiment values are affected negatively by the change. The vote score is steadily increasing before the change and crashes shortly after the change. However, the vote score is very high compared to other communities. The number of 1st questions stabilizes after the change compared to the slight downward previously. By looking at the graph of the 1st questions, the months of -41, -29, -17, -5, and 7 are local maxima, indicating seasonality in the data \cite{bernal2017interrupted}. These months are all in March. Also while the number of 1st questions stabilizes to a constant trend, the number of follow-up questions decreases, indicating that the new users tend more to become one-day-flies as time passed on \cite{slag2015one}.
math.stackexchange.com shows a downward trend before and after the change for sentiment and vote score. The sentiment ITS is particularly affected by the low sentiment values at the end and future data is required to determine if this trend continues. However, the number of 1st questions stabilizes a bit after changes and follow up questions even see and a slight increase after the change. The graph with number of questions from new contributors shows a good example of seasonality in data \cite{bernal2017interrupted}. The month 0 indicates August. For the 1st questions, the months -44, -32, -20, -8, 4, and 16 are all a local minimum. These months are Decembers. Similarly, the months -38 and -37, -26 and -25, -14 and -13, -2 and -1, and 10 and 11 are all in June and July. During both these times the people large portions of the world are going through a holiday season which may likely explain these regular dips in contribution. The graph for the follow up questions also shows dips at the same times, although the dips in December are not always as decernible.
math.stackexchange.com shows a downward trend before and after the change in sentiment and vote score. The sentiment ITS is particularly affected by the low sentiment values at the end and future data is required to determine if this trend continues. However, the number of 1st questions stabilizes a bit after changes, and follow-up questions even see a slight increase after the change. The graph with the number of questions from new contributors shows a good example of seasonality in data \cite{bernal2017interrupted}. The month 0 indicates August. For the 1st questions, the months -44, -32, -20, -8, 4, and 16 are all a local minimum. These months are all in December. Similarly, the months -38 and -37, -26 and -25, -14 and -13, -2 and -1, and 10 and 11 are all in June and July. During both these times, the people large portions of the world are going through a holiday season which may likely explain these regular dips in contribution. The graph for the follow-up questions also shows dips at the same times, although the dips in December are not always as discernible.
The electronics.stackexchange.com community has a similar pattern for the sentiment value and vote scores compared to math.stackexchange.com. However, the sentiment values seem to recover after about 12 months and future data is required to see if the rise at the end of the period is a long term trend. The rising number of first questions of new contributors stops at the change date and transition into a decreasing pattern. However, the number of followup questions increases slighly after the change. Eventhough the number of new user decreases after the change the amount of followup questions incrases, indicating the number of one-day-flies decreases \cite{slag2015one}.
The electronics.stackexchange.com community has a similar pattern for the sentiment value and vote scores compared to math.stackexchange.com. However, the sentiment values seem to recover after about 12 months and future data is required to see if the rise at the end of the period is a long-term trend. The rising number of first questions of new contributors stops at the change date and transitions into a decreasing pattern. However, the number of follow-up questions increases slightly after the change. Even though the number of new users decreases after the change the amount of follow-up questions increases, indicating the number of one-day-flies decreases \cite{slag2015one}.
SuperUser shows an odd pattern. The average sentiment values and votes scores are stable before the change and decrease dramatically shortly afterward. However, the sentiment recovers after 12 months. The ITS model chosen in this thesis is not able to capture the apparent pattern. However, the number of 1st question skyrockets indicating a huge influx of new users. The time frames of the falling sentiment values and vote scores, and the rising number of first questions overlap, indicating the huge influx of new users is responsible for the falling patterns. This is a good example of the \emph{defaulting} described in \cite{lin2017better}. While the community metrics suffer of a period of time, they start to recover after some time. Also, while the number of 1st questions skyrockets, the number of followup questions stays largely the same, indicating most of the new users are in fact one-day-flies \cite{slag2015one}.
SuperUser shows an odd pattern. The average sentiment values and votes scores are stable before the change and decrease dramatically shortly afterward. However, the sentiment recovers after 12 months. The ITS model chosen in this thesis is not able to capture the apparent pattern. However, the number of 1st questions skyrockets indicating a huge influx of new users. The time frames of the falling sentiment values and vote scores, and the rising number of first questions overlap, indicating the huge influx of new users is responsible for the falling patterns. This is a good example of the \emph{defaulting} described in \cite{lin2017better}. While the community metrics suffer for a period of time, they start to recover after some time. Also, while the number of 1st questions skyrockets, the number of follow-up questions stays largely the same, indicating most of the new users are in fact one-day-flies \cite{slag2015one}.
% similarities in results and differences
% so: only community that shows a clear improvement when comapred to prior to change sentiment
@@ -48,13 +48,13 @@ SuperUser shows an odd pattern. The average sentiment values and votes scores ar
%DROP write about onedayflies. SuperUser, electronics.stackexchange.com, MathOverflow, tex.stackexchange.com, more likely dont write about this, justification too hard
%DONE write about defaulting
%TODO more refs?
By and large, the change introduced by the StackExchange team has a clear positive effect on more than half of the investigated communities. The sentiments of answers to questions of new contributors increases as well as the the number of questions from new contributors. The vote score is not particularly affected by the change.
By and large, the change introduced by the StackExchange team has a clear positive effect on more than half of the investigated communities. The sentiments of answers to questions of new contributors increase as well as the number of questions from new contributors. The vote score is not particularly affected by the change.
math.stackexchange.com is not really affected by the change, although the number of 1st questions stabilized a bit and follow-up questions from new contributors increase again. MathOverflow shows a similar picture. The sentiment on electronics.stackexchange.com also is not particularly affected by the change and continues to decrease.
Two of the communities, SuperUser and stats.stackexchange.com, have a delayed temporary decrease in sentiment which recovers after about 12 months, which may be attributable to the larger influx of new contributors \cite{lin2017better}. The selected ITS model is not designed to capture the sentiment pattern of these communities.
Five of the communities, AskUbuntu, math.stackexchange.com, MathOverflow, StackOverflow, and tex.stackexchange.com, show seasonality \cite{bernal2017interrupted} in the number of contributions from new users. In most of these communities, in the month of December the number of contributions fall to a local minimum. The mostly likely explanation is that large parts of the world are on holiday in the later half of December, thus decreasing the number of contributions.
Five of the communities, AskUbuntu, math.stackexchange.com, MathOverflow, StackOverflow, and tex.stackexchange.com, show seasonality \cite{bernal2017interrupted} in the number of contributions from new users. In most of these communities, in the month of December, the number of contributions falls to a local minimum. The most likely explanation is that large parts of the world are on holiday in the latter half of December, thus decreasing the number of contributions.
% expectations from before the experiment and how they match with results
% did change from SE produce the desired results?
@@ -64,7 +64,7 @@ Some investigated data sets show interesting patterns. StackOverflow shows the c
% interesting single results?
The average sentiment of the StackOverflow community is the most stable in terms of deviation from the regression. This is expected as StackOverflow is the largest community by far and has the most questions created by newcomers. On the other hand, MathOverflow is the sparsest community and has the least amount of questions from new contributors. The level of the average sentiment also varies greatly between communities. stats.stackexchange.com has the highest level of average sentiment compared to the other communities, whereas, tex.stackexchange.com has the lowest level average sentiment. MathOverflow has the highest level of vote scores by far. Also, in most communities, the number of questions from new contributors slowly decreases over time. This may be a result of the filling of gaps in the knowledge repository over time. %TODO ref for last sentence
The average sentiment of the StackOverflow community is the most stable in terms of deviation from the regression. This is expected as StackOverflow is the largest community by far and has the most questions created by newcomers. On the other hand, MathOverflow is the sparsest community and has the least amount of questions from new contributors. The level of the average sentiment also varies greatly between communities. stats.stackexchange.com has the highest level of average sentiment compared to the other communities, whereas, tex.stackexchange.com has the lowest level of average sentiment. MathOverflow has the highest level of vote scores by far. Also, in most communities, the number of questions from new contributors slowly decreases over time. This may be a result of the filling of gaps in the knowledge repository over time. %TODO ref for last sentence
% as expected #answers per month vary greatly -> mabye into data sets section
% some communties have a high average sentiment compared to others

View File

@@ -1,9 +1,9 @@
\chapter{Conclusion}
%repeat motivation
StackExchange introduced the the \emph{new contributor} indicator as one of many efforts to improve the quality of the interactions in their communities. The indicator is shown in the answer box if the question is from a new contributor. The indicator reminds the answerer that the question was stated by new user who might not now the rules that well, and also to mind the code of conduct when giving their answer. This measure is supposed to increase the friendliness and welcomingness towards new users, while also be more forgiving when new users break a rule or convention unknowingly.
StackExchange introduced the \emph{new contributor} indicator as one of many efforts to improve the quality of the interactions in their communities. The indicator is shown in the answer box if the question is from a new contributor. The indicator reminds the answerer that the question was stated by a new user who might not know the rules that well, and also to mind the code of conduct when giving their answer. This measure is supposed to increase the friendliness and welcomingness towards new users, while also being more forgiving when new users break a rule or convention unknowingly.
The motivation of this thesis is find out if the \emph{new contributor} indicator actually achieved its intented goal of getting the communities to be more welcoming towards new users and giving new contributors a good interaction experience. To measure this improvement, this thesis uses 3 properties: the sentiment of the answers, the vote score of question, and the number of questions from new contributors. To measure the sentiment, the answer texts are passed to VADER \cite{hutto2014vader}. An Interrupted Time Series analysis is then performed on the all 3 properties.
The motivation of this thesis is to find out if the \emph{new contributor} indicator actually achieved its intended goal of getting the communities to be more welcoming towards new users and giving new contributors a good interaction experience. To measure this improvement, this thesis uses 3 properties: the sentiment of the answers, the vote score of the question, and the number of questions from new contributors. To measure the sentiment, the answer texts are passed to VADER \cite{hutto2014vader}. An Interrupted Time Series analysis is then performed on all 3 properties.
% chage is 1 of many efforts to make community better
% describe change shortly
% goal was to find out whether change improves user interaction experience for new users
@@ -11,7 +11,7 @@ The motivation of this thesis is find out if the \emph{new contributor} indicato
% eval: vader -> sentiment, ITs, by looking at the 10 largest communities (list here, note so is by far the largest)
%repeat datensätze
As StackExchange has over 150 communities, this thesis only focuses on the 10 largest communities: StackOverflow, AskUbuntu, ServerFault, MathOverflow, SuperUser, stats.stackexchange.com, tex.stackexchange.com, unix.stackexchange.com, math.stackexchange.com, and electronics.stackexchange.com. StackOverflow is the largest community by far, more than 10 times the size over 2nd largest community math.stackexchange.com. The other communities are closer together in terms of size.
As StackExchange has over 150 communities, this thesis only focuses on the 10 largest communities: StackOverflow, AskUbuntu, ServerFault, MathOverflow, SuperUser, stats.stackexchange.com, tex.stackexchange.com, unix.stackexchange.com, math.stackexchange.com, and electronics.stackexchange.com. StackOverflow is the largest community by far, more than 10 times the size of 2nd largest community math.stackexchange.com. The other communities are closer together in terms of size.
% 10 largest se communities
% stack overflow by far the largest, >10x times math.se
%
@@ -19,7 +19,7 @@ As StackExchange has over 150 communities, this thesis only focuses on the 10 la
%repeat results
% change seems to be a success, not a one size fits all solution that works everywhere but a majority of communities benefit
% benefitters
The change introduced by the StackExchange team is a success, although it does not work for all communities. The new contributor indicator is not a silver bullet that is the single solution that works on all communities. The change produced desired results in more than half of the investigated communities: StackOverflow, AskUbuntu, ServerFault, stats.stackexchange.com, tex.stackexchange.com, and unix.stackexchange.com. In general, the number of 1st questions for new contributors increases in all of these 6 communities, while the number of followup questions increase in most of these communities. The sentiment values of the answers rises in all of these communities except for ServerFault. The vote score analysis does not yield meaninful results, either the vote score is not affected at all or changes drastically before or after change. Only for ServerFault the time matches and there a huge spike, but the vote score quickly returns to previous levels.
The change introduced by the StackExchange team is a success, although it does not work for all communities. The new contributor indicator is not a silver bullet that is the single solution that works for all communities. The change produced desired results in more than half of the investigated communities: StackOverflow, AskUbuntu, ServerFault, stats.stackexchange.com, tex.stackexchange.com, and unix.stackexchange.com. In general, the number of 1st questions for new contributors increases in all of these 6 communities, while the number of follow-up questions increases in most of these communities. The sentiment values of the answers rise in all of these communities except for ServerFault. The vote score analysis does not yield meaningful results, either the vote score is not affected at all or changes drastically before or after the change. Only for ServerFault, the time matches, and there is a huge spike, but the vote score quickly returns to previous levels.
%in general
% first questions improved in all, followup in too in most of these communities
% sentiment raises in all except ServerFault
@@ -27,7 +27,7 @@ The change introduced by the StackExchange team is a success, although it does n
%
% no benefits/no evidence
MathOverflow, SuperUser, math.stackexchange.com, and electronics.stackexchange.com do not profit as much from the change and show not an increase but decrease or continuation in the decrease of sentiment. The picture is not clear for these communities, only one statistic improves, compared to the the rise in the benefitting communities where the measured properties rise accross the board. The number of 1st questions decreases for all for these 4 communities and the number of follow-up questions also decreases for all these communities except for math.stackexchange.com. Similarly to the benefitting communties, the vote score is either not particularly affected or it changes drastically before or after change. The sentiment does not improve in these communities.
MathOverflow, SuperUser, math.stackexchange.com, and electronics.stackexchange.com do not profit as much from the change and show not an increase but a decrease or continuation in the decrease of sentiment. The picture is not clear for these communities, only one statistic improves, compared to the rise in the benefitting communities where the measured properties rise across the board. The number of 1st questions decreases for all of these 4 communities and the number of follow-up questions also decreases for all these communities except for math.stackexchange.com. Similarly to the benefitting communities, the vote score is either not particularly affected or changes drastically before or after the change. The sentiment does not improve in these communities.
% the picture is often not clear for these communities, only one statistic improves, in contrast to the benfitting communtites where stats rise accross the board
% 1st question decrease for all communities -> bad sign
% follow up questions decrease for all except math.se
@@ -37,20 +37,20 @@ MathOverflow, SuperUser, math.stackexchange.com, and electronics.stackexchange.c
%repeat discussion
%
Some of the investigated communities have intresting features in their data. In half of the communities seasonsality \cite{bernal2017interrupted} can be detected visually. In most of these communties the month of December signifies a month with less communty interactions wich may be attributed to the holiday season at the end of December. SuperUser saw a huge influx of new contributors shortly after the change who asked a lot of questions and dropping the sentiment and vote score value during that period. Stats.stackexchange.com has a similar pattern, although not as pronounced. This effect is also described in \cite{lin2017better}.
Some of the investigated communities have interesting features in their data. In half of the communities, seasonality \cite{bernal2017interrupted} can be detected visually. In most of these communities, the month of December signifies a month with fewer community interactions which may be attributed to the holiday season at the end of December. SuperUser saw a huge influx of new contributors shortly after the change who ask a lot of questions and drop the sentiment and vote score value during that period. Stats.stackexchange.com has a similar pattern, although not as pronounced. This effect is also described in \cite{lin2017better}.
% one day files, state communities
% seasonality, state communities, state months
% large influx, state communities, state gravity
%other odd things (peculiarities)
The results of the StackOverflow community most closely resembles the expectation of improving the welcomingness and also most closly matches the example Interrupted Time Series shown in section 3. StackOverflow is also the community with the most dense and stable dataset while MathOverflow is the community with the sparsest data set. The sentment levels vary greatly between communities, stats.stackexchange.com has highest, while tex.stackexchange.com has lowest. The vote score level is the highest on MathOverflow by far.
The results of the StackOverflow community most closely resemble the expectation of improving the welcomingness and also most closely matches the example Interrupted Time Series shown in section 3. StackOverflow is also the community with the densest and most stable dataset while MathOverflow is the community with the sparsest data set. The sentiment levels vary greatly between communities, stats.stackexchange.com has the highest, while tex.stackexchange.com has the lowest. The vote score level is the highest on MathOverflow by far.
% - stack overflow most stable, expected as largest community, also closly matches section 3 example
% - MathOverflow sparsest
% - sentiment levels vary drastically between communities, stats has highest, tex lowest
% - same goes for vote score, MathOverflow highest by far
%closing sentences
Overall, the new constributor indicator is a success and the majority of the communties benefitted from it. It is not a silver bullet solution that works on all communities. There is no simple solution as the communities are too diverse and the new constributor indicator is only one of many measure StackExchange has taken to improve user experience in their communities.
Overall, the new contributor indicator is a success and the majority of the communities benefitted from it. It is not a silver bullet solution that works for all communities. There is no simple solution as the communities are too diverse and the new contributor indicator is only one of many measures StackExchange has taken to improve user experience in their communities.
% change worked over all, majority benefitted
% no silver bullet, no simple solution, no 1 size fits all, communities to diverse