From a362d99c429123f6a2611627a4046ed4769adfbe Mon Sep 17 00:00:00 2001 From: wea_ondara Date: Thu, 14 May 2020 11:33:13 +0200 Subject: [PATCH] wip --- text/3_method.tex | 2 +- text/4_datasets.tex | 26 +++++++++++++------------- text/5_results.tex | 34 ++++++++++++++++++++++------------ text/6_discussion.tex | 3 +++ 4 files changed, 39 insertions(+), 26 deletions(-) diff --git a/text/3_method.tex b/text/3_method.tex index 0920dbf..11e3f04 100644 --- a/text/3_method.tex +++ b/text/3_method.tex @@ -1,6 +1,6 @@ \chapter{Method} -StackExchange introduced a \emph{new contributor} indicator to all communities on $21^{st}$ of August in 2018 at 9 pm UTC \cite{post2018come}. This step is one of many StackExchange took to make the platform and its members more welcoming towards new users. This indicator is shown the potential answerers in the answer text box of a question flagged as from a new contributor as shown in figure \ref{newcontributor}. The indicator is added to a question if the question is the first contribution of a user or if the first contribution (question or answer) of the user was less than 7 days ago \cite{sonic2018what}. The indicator is then shown for 7 days from the creation date of the question. Note that the user can be registered for a long time and then post their first question and it is counted as a question from a new contributor. Also, if a user decides to delete all their contributions from the site and then creates a new question this question will have the \emph{new contributor} indicator attached. The sole deciding factor for the indicator is the date and time of the first non-deleted contribution and the 7-day window afterward. +StackExchange introduced a \emph{new contributor} indicator to all communities on $21^{st}$ of August in 2018 at 9 pm UTC \cite{post2018come}. This step is one of many StackExchange took to make the platform and its members more welcoming towards new users. This indicator is shown to potential answerers in the answer text box of a question flagged as from a new contributor as shown in figure \ref{newcontributor}. The indicator is added to a question if the question is the first contribution of a user or if the first contribution (question or answer) of the user was less than 7 days ago \cite{sonic2018what}. The indicator is then shown for 7 days from the creation date of the question. Note that the user can be registered for a long time and then post their first question and it is counted as a question from a new contributor. Also, if a user decides to delete all their contributions from the site and then creates a new question this question will have the \emph{new contributor} indicator attached. The sole deciding factor for the indicator is the date and time of the first non-deleted contribution and the 7-day window afterward. \begin{figure} \centering\includegraphics[scale=0.47]{figures/new_contributor} diff --git a/text/4_datasets.tex b/text/4_datasets.tex index be9affc..c342c38 100644 --- a/text/4_datasets.tex +++ b/text/4_datasets.tex @@ -26,11 +26,11 @@ These datasets are selected due to their size as larger datasets yield more cons %sections 1 per site -\section{StackOverflow.com} datavalues not computed yet. %TODO insert values +\section{StackOverflow.com} StackOverflow is the largest and oldest community of the StackExchange platform. -The community has 165567 registered users of which 3467 were active between December 2019 and February 2020. -Members asked 116797 questions in total and gave 202751 answers with an average answer density of 1.73 answers per question. -New users asked 42996 questions with an average of 1.129 questions per new user during their first week after registration. +The community has 11867244 registered users of which 297192 were active between December 2019 and February 2020. +Members asked 18699974 questions in total and gave 27981749 answers with an average answer density of 1.496 answers per question. +New users asked 2880039 questions with an average of 1.240 questions per new user during their first week after their first contribution. \begin{figure}[H] \begin{subfigure}[c]{0.5\textwidth} @@ -49,7 +49,7 @@ New users asked 42996 questions with an average of 1.129 questions per new user ``Mathematics Stack Exchange is a question and answer site for people studying math at any level and professionals in related fields.'' \cite{mathstackexchangecom} The community has 624671 registered users of which 17074 were active between December 2019 and February 2020. Members asked 1170938 questions in total and gave 1565188 answers with an average answer density of 1.336 answers per question. -New users asked 265704 questions with an average of 1.336 questions per new user during their first week after registration. +New users asked 265704 questions with an average of 1.336 questions per new user during their first week after first contribution. \begin{figure}[H] \begin{subfigure}[c]{0.5\textwidth} @@ -68,7 +68,7 @@ New users asked 265704 questions with an average of 1.336 questions per new user MathOverflow.net is a rather small community for professional mathematicians. The community has 105471 registered users of which 1501 were active between December 2019 and February 2020. Members asked 108083 questions in total and gave 144918 answers with an average answer density of 1.34 answers per question. -New users asked 23746 questions with an average of 1.131 questions per new user during their first week after registration. +New users asked 23746 questions with an average of 1.131 questions per new user during their first week after first contribution. \begin{figure}[H] \begin{subfigure}[c]{0.5\textwidth} @@ -87,7 +87,7 @@ New users asked 23746 questions with an average of 1.131 questions per new user AskUbuntu.com is a rather small community for Ubuntu users and developers. The community has 783614 registered users of which 7033 were active between December 2019 and February 2020. Members asked 334194 questions in total and gave 418051 answers with an average answer density of 1.25 answers per question. -New users asked 157018 questions with an average of 1.101 questions per new user during their first week after registration. +New users asked 157018 questions with an average of 1.101 questions per new user during their first week after first contribution. \begin{figure}[H] \begin{subfigure}[c]{0.5\textwidth} @@ -106,7 +106,7 @@ New users asked 157018 questions with an average of 1.101 questions per new user ServerFault.com is a rather small community for system and network administrators. The community has 451180 registered users of which 3947 were active between December 2019 and February 2020. Members asked 274564 questions in total and gave 432334 answers with an average answer density of 1.574 answers per question. -New users asked 88547 questions with an average of 1.106 questions per new user during their first week after registration. +New users asked 88547 questions with an average of 1.106 questions per new user during their first week after first contribution. \begin{figure}[H] \begin{subfigure}[c]{0.5\textwidth} @@ -125,7 +125,7 @@ New users asked 88547 questions with an average of 1.106 questions per new user SuperUser.com is a rather small community for computer enthusiasts and power users. The community has 861533 registered users of which 7392 were active between December 2019 and February 2020. Members asked 424718 questions in total and gave 587559 answers with an average answer density of 1.383 answers per question. -New users asked 161397 questions with an average of 1.085 questions per new user during their first week after registration. +New users asked 161397 questions with an average of 1.085 questions per new user during their first week after first contribution. \begin{figure}[H] \begin{subfigure}[c]{0.5\textwidth} @@ -144,7 +144,7 @@ New users asked 161397 questions with an average of 1.085 questions per new user electronics.stackexchange.com is a rather small community for electrical engineering. The community has 184795 registered users of which 3172 were active between December 2019 and February 2020. Members asked 130025 questions in total and gave 221811 answers with an average answer density of 1.705 answers per question. -New users asked 47035 questions with an average of 1.126 questions per new user during their first week after registration. +New users asked 47035 questions with an average of 1.126 questions per new user during their first week after first contribution. \begin{figure}[H] \begin{subfigure}[c]{0.5\textwidth} @@ -163,7 +163,7 @@ New users asked 47035 questions with an average of 1.126 questions per new user ``Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.'' \cite{statsstackexchangecom} The community has 227032 registered users of which 4485 were active between December 2019 and February 2020. Members asked 151777 questions in total and gave 148046 answers with an average answer density of 0.975 answers per question. -New users asked 57636 questions with an average of 1.112 questions per new user during their first week after registration. +New users asked 57636 questions with an average of 1.112 questions per new user during their first week after first contribution. \begin{figure}[H] \begin{subfigure}[c]{0.5\textwidth} @@ -182,7 +182,7 @@ New users asked 57636 questions with an average of 1.112 questions per new user tex.stackexchange.com is a rather small community for TEX and related typesetting systems. The community has 171867 registered users of which 3280 were active between December 2019 and February 2020. Members asked 188860 questions in total and gave 227875 answers with an average answer density of 1.206 answers per question. -New users asked 59692 questions with an average of 1.191 questions per new user during their first week after registration. +New users asked 59692 questions with an average of 1.191 questions per new user during their first week after first contribution. \begin{figure}[H] \begin{subfigure}[c]{0.5\textwidth} @@ -201,7 +201,7 @@ New users asked 59692 questions with an average of 1.191 questions per new user unix.stackexchange.com is a rather small community for Linux and Unix-like operating systems. The community has 356498 registered users of which 4565 were active between December 2019 and February 2020. Members asked 174625 questions in total and gave 256007 answers with an average answer density of 1.466 answers per question. -New users asked 62437 questions with an average of 1.124 questions per new user during their first week after registration. +New users asked 62437 questions with an average of 1.124 questions per new user during their first week after first contribution. \begin{figure}[H] \begin{subfigure}[c]{0.5\textwidth} diff --git a/text/5_results.tex b/text/5_results.tex index 5552203..1e70a9e 100644 --- a/text/5_results.tex +++ b/text/5_results.tex @@ -1,6 +1,7 @@ \chapter{Results} %TODO some text here +This section shows the results of the experiments described in section 3 on the data sets described in section 4. In the following diagrams, the blue line states the average sentiment of the answers to questions from new contributors. This line also has numbers attached to it at every datapoint and shows the number of answers that formed the sentiment average. The orange line shows ITS analysis as a 3-segment line. @@ -8,36 +9,40 @@ - -% pvalues ... +% pvalues ... %TODO write some text to each result \section{StackOverflow.com} \begin{figure}[H] -% \centering\includegraphics[scale=0.47]{../stackoverflow.com/output/its/average_sentiments-i1.png} + \centering\includegraphics[scale=0.47]{../stackoverflow.com/output/its/average_sentiments-i1.png} \caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on StackOverflow.com} \label{stackoverflow_its} \end{figure} +StackOverflow shows a very slight decrease in average sentiment of time before the change had been introduced. When the change occured the average sentiment jumped up by about 0.003. After the change the sentiments reached higher levels and kept rising. +% sentiment falling prior to change +% jump upward at the change +% sentiments rising after change \section{math.stackexchange.com} -\begin{figure}[H] +The math.stackexchange.com community shows a decrease in average sentiments prior to the change. The sentiment make a small jump upward when the change is introduced, however, the sentiments decrease faster after the indroduction of the change compared to before the change.\begin{figure}[H] \centering\includegraphics[scale=0.47]{../math.stackexchange.com/output/its/average_sentiments-i1.png} \caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on math.stackexchange.com} \label{math_its} \end{figure} -% noticable crash in sentiments -% its not required in this case, change in sentiment so obvious -%TODO maybe investigate if this is an error +% sentiments falling prior to the change +% sentiments falling faster than before the change \section{MathOverflow.net} +MathOverflow shows a constant regresssion before the change, however, average sentiments are low at about 10 months before the change and spiked high directly before the change. When the change is introduced regression makes a small jumps up and decreases thereafter. This data set is sparse compared to the other datasets. \begin{figure}[H] \centering\includegraphics[scale=0.47]{../mathoverflow.net/output/its/average_sentiments-i1.png} \caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on MathOverflow.com} \label{matho_its} \end{figure} -% same as previous example, big crash -% avg sentiment was higher to beginning but snetiments dropped even more compared to previous +% senitments stable/constant prior to the change +% falling after the change \section{AskUbuntu.com} +AskUbuntu saw a decrease in average sentiments prior to the change. After the introduction of the change the regression dipped but sentiments keep rising drastically since then. \begin{figure}[H] \centering\includegraphics[scale=0.47]{../askubuntu.com/output/its/average_sentiments-i1.png} \caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on AskUbuntu.com} @@ -48,6 +53,7 @@ %maybe: sentiments did not change drastically as seen in maths communities \section{ServerFault.com} +ServerFault shows gradually rising average sentiments prior to the change. At the time of the change the regession makes a jump upward and the average sentiment decrease slowly afterward. \begin{figure}[H] \centering\includegraphics[scale=0.47]{../serverfault.com/output/its/average_sentiments-i1.png} \caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on ServerFault.com} @@ -57,6 +63,7 @@ % small jump in avg sentiments at change date \section{SuperUser.com} +SuperUser shows only sightly decreasing average sentiment up to the change. At the change time the regression takes a dip down and the regression shows a downward trend after the change. Indeed the average sentiments dipped considerably when the change is introducted the average sentiment recovers about 13 months later. Data available in the future will show if the recovery is persistent. \begin{figure}[H] \centering\includegraphics[scale=0.47]{../superuser.com/output/its/average_sentiments-i1.png} \caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on SuperUser.com} @@ -67,6 +74,7 @@ % recovery after after 13 months to not quite the previous levels \section{electronics.stackexchange.com} +On electronics.stackexchange.com the average sentiment decreases continuously prior to the change. At the change date the regression makes a little jump upward but the trend from before the change continues afterward. Similarly to SuperUser, the average sentiment recover at about 12 months after the change is introduced and future data will be necessary to determine if the recovery is persistent.s \begin{figure}[H] \centering\includegraphics[scale=0.47]{../electronics.stackexchange.com/output/its/average_sentiments-i1.png} \caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on electronics.stackexchange.com} @@ -77,17 +85,18 @@ % more data in the future will be required to determine if upward trend in the end continues \section{stats.stackexchange.com} +On stats.stackexchange.com the average sentiment is steadily decreasing prior to the change. The regression dips when the change is introduced. However, the average sentiment after the change indicate a slight upward trend. \begin{figure}[H] \centering\includegraphics[scale=0.47]{../stats.stackexchange.com/output/its/average_sentiments-i1.png} \caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on stats.stackexchange.com} \label{stats_its} \end{figure} % sentiments steadily decreasing prior to the change -% huge dip in avg sentiment after the change -% downward trend continues after the change at about the same rate -% TODO same error as in math sites? +% dip in avg sentiment at the change date +% sight upward trend after the change \section{tex.stackexchange.com} +On tex.stackexchange.com the average sentiment is low comapred to the other investigated data sets. Prior to the change the average sentiment only slightly decreases. When the change is introduced the regreesion takes a dip down. After the change the analysis indicates a strong increase in average sentiment. Future data will be required to see if this upward trend continues or evens out. \begin{figure}[H] \centering\includegraphics[scale=0.47]{../tex.stackexchange.com/output/its/average_sentiments-i1.png} \caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on tex.stackexchange.com} @@ -99,6 +108,7 @@ % trend after change strongly upward \section{unix.stackexchange.com} +On unix.stackexchange.com the average sentiment is decreasing prior to the change. When the change is introduced the regreesion take a small dip down, however, the average sentiment increases fast after the change. \begin{figure}[H] \centering\includegraphics[scale=0.47]{../unix.stackexchange.com/output/its/average_sentiments-i1.png} \caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on unix.stackexchange.com} diff --git a/text/6_discussion.tex b/text/6_discussion.tex index bcc257c..49a71a3 100644 --- a/text/6_discussion.tex +++ b/text/6_discussion.tex @@ -5,6 +5,9 @@ % interesting single results? maybe strong dip in maths sites % did change from SE produce the desired results? +% as expected #answers per month vary greatly +% some communties have a high average sentiment compared to others + %future research % investigate different change pattern and why they occured