This commit is contained in:
wea_ondara
2022-09-04 10:27:29 +02:00
parent a5e60a7335
commit 5154ed89e9
4 changed files with 38 additions and 4 deletions

View File

@@ -25,7 +25,7 @@ If these criteria improve after the change is introduced, the community is affec
%only when new contributor insicator is shown
\section{Vader}
To measure the effect on the sentiment of the change this thesis utilizes the Vader\cite{hutto2014vader} sentiment analysis tool. This decision is based on the performance in analyzing and categorizing microblog-like texts, the speed of processing, and the simplicity of use. Vader uses a lexicon of words, and rules related to grammar and syntax. This lexicon was manually created by \citeauthor{hutto2014vader} and is therefore considered a \emph{gold standard lexicon}. Each word has a sentiment value attached to it. Negative words, for instance, \emph evil, have negative values; good words, for instance, \emph brave, have positive values. The range of these values is continuous, so words can have different intensities, for instance, \emph bad has a higher value than \emph evil. This feature of intensity distinction makes Vader a valance-based approach.
However, just simply looking at the words in a text is not enough and therefore Vader also uses rules to determine how words are used in conjunction with other words. Some words can boost other words. For example, ``They did well.'' is less intense than ``They did extremely well.''. This works for both positive and negative sentences. Moreover, words can have different meanings depending on the context, for instance, ``Fire provides warmth.'' and ``Boss is about to fire an employee.'' This feature is called \emph{Word Sense Disambiguation}.

View File

@@ -13,6 +13,9 @@ represent the ITS of the blue and orange line respectively. In these diagrams no
\pagebreak
\section{StackOverflow.com}
StackOverflow shows a very slight decrease in the average sentiment of time before the change is introduced. When the change occurs the average sentiment jumps up. After the change, the sentiments reach higher levels and keep rising. The average vote score rises right before and stays fairly constant after the change. This indicates that the vote score is not affected by the change. However, the number of questions from new contributors increases after the change while before the change is fairly constant. The number of follow-up questions from new contributors declines before the change and rise after the change.
The sentiments improve after the change compared to before the change, indicating the change has a positive effect. The trend of the vote score is not affected at all. Although the change heightens the base level of the vote score, the trend is the same after the change, indicating the change did not bring a long term effect. The amount of 1st questions improved after the change and turned the stagnant trend into to an increasing trend. The followup questions also improved in the same manner. This shows that new contributors ask more questions than before. Summarizing, the sentiment improve, the vote score is largely unaffected, and the number of questions improve, suggesting that the community benefits from the change.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../stackoverflow.com/output/its/average_sentiments-i1.png}
@@ -31,6 +34,7 @@ StackOverflow shows a very slight decrease in the average sentiment of time befo
\label{stackoverflow_questionsits}
\end{subfigure}
\end{center}
\caption{Interrupted time series analysis on StackOverflow.com}
\end{figure}
\pagebreak
% sentiment falling prior to change
@@ -38,7 +42,9 @@ StackOverflow shows a very slight decrease in the average sentiment of time befo
% sentiments rising after change
\section{AskUbuntu.com}
AskUbuntu sees a decrease in average sentiments prior to the change. After the introduction of the change, the regression dips but sentiments keep rising drastically since then. The vote score has a huge range of values prior to and after the change, however, the graph indicates the vote score declines after the change. The number of 1st questions slightly decreases prior to the change and starts rising after the change.
AskUbuntu sees a decrease in average sentiments prior to the change. After the introduction of the change, the regression dips but sentiments keep rising drastically since then. The vote score has a huge range of values prior to and after the change, however, the graph indicates the vote score declines after the change. The number of 1st questions slightly decreases prior to the change and starts rising after the change. The number of followup questions stays largely the same after the change.
The sentiments improve after the change compared to before the change, indicating the change has a positive effect. The vote score changes from a fairly stable trends and takes a turn downwards after the change. Contrary, the number of questions asked by new users improve after the change. The number of 1st questions trend takes a turn from decreasing to increasing after the change and the followup questions stablilize from a slightly decreasing trend. Summarizing, the sentiment does improve after the change, as well as the number of questions asked by new users. The vote score does seem to be affected negativly. In general, the results indicated that the community benefits from the change.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../askubuntu.com/output/its/average_sentiments-i1.png}
@@ -57,6 +63,7 @@ AskUbuntu sees a decrease in average sentiments prior to the change. After the i
\label{ubuntu_questionsits}
\end{subfigure}
\end{center}
\caption{Interrupted time series analysis on AskUbuntu.com}
\end{figure}
\pagebreak
% senitments have gradually fallen prior to the change
@@ -65,6 +72,8 @@ AskUbuntu sees a decrease in average sentiments prior to the change. After the i
\section{ServerFault.com}
ServerFault shows gradually rising average sentiments prior to the change. At the time of the change, the regression makes a jump upward and the average sentiment decreases slowly afterward. The vote score falls prior to the change, made a huge jump upward, and quickly returns to the levels just prior to the change. The number of 1st questions, however, sees a drastic change. Prior to the change, the number of 1st questions decreases steadily, while after the change the numbers increase at the same pace as they fall prior to the change. The number of follow-up questions also sees the same course direction, falling prior and raising after the change.
The sentiment stays large the same before an after the change. Eventhough, it is slowly rising at first and falling after the change, due to the small jump in sentiment at the change date, overall the sentiment value is pretty stable. The vote score does not really improve after the change. Although the vote score makes a huge leap upward, it quickly returns to values before the change. Despite, sentiment and vote score not being affected in the long run, the number of 1st questions improved dramatically. The downward trend reversed into an upward trend with roughly the same grade. The followup questions have the same trends albeit not as drastic. Summarizing, eventhough the sentiment and vote score are not really affected, the turn in number of first question and followup questions indicates that the change positivly affected the community.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../serverfault.com/output/its/average_sentiments-i1.png}
@@ -83,6 +92,7 @@ ServerFault shows gradually rising average sentiments prior to the change. At th
\label{fault_questionsits}
\end{subfigure}
\end{center}
\caption{Interrupted time series analysis on ServerFault.com}
\end{figure}
\pagebreak
% sentiments fairly stable before and after the change
@@ -90,6 +100,8 @@ ServerFault shows gradually rising average sentiments prior to the change. At th
\section{stats.stackexchange.com}
On stats.stackexchange.com the average sentiment decreases steadily prior to the change. The regression dips when the change is introduced. However, the average sentiment after the change indicates a slight upward trend. The vote score also decreases prior to the change but does not recover afterward. However, the number of 1st questions and follow-up questions rise prior to the change and increase even faster after the change.
The sentiment trend improved after the change and the sentimenti s raising after the change compared to before. Eventhough the the sentiment is on a lower level after the change, the trend after the change already outperforms the the trend before the change after 10 to 15 months. The vote score is not really affected by the change. However, 4 to 5 months the vote score falls into a vally for about 10 months before recovering. This can be the result of another outside factor. By looking at the number of 1st questions, it can be said that the vote score dipped because the number of first questions spiked during the previously state time frame. This theory would be supported by \cite{lin2017better}. While the trends for 1st and followup questions is stagnent before the change they improved after the change. Summarizing, the sentiment improves after the change, the vote score is not affected, and the number of 1st and followup questions improves, indicating the community benefits from the change.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../stats.stackexchange.com/output/its/average_sentiments-i1.png}
@@ -108,6 +120,7 @@ On stats.stackexchange.com the average sentiment decreases steadily prior to the
\label{stats_questionsits}
\end{subfigure}
\end{center}
\caption{Interrupted time series analysis on stats.stackexchange.com}
\end{figure}
\pagebreak
% sentiments steadily decreasing prior to the change
@@ -116,6 +129,9 @@ On stats.stackexchange.com the average sentiment decreases steadily prior to the
\section{tex.stackexchange.com}
On tex.stackexchange.com the average sentiment is low compared to the other investigated data sets. Prior to the change the average sentiment only slightly decreases. When the change is introduced the regression takes a dip down and after the change, the average sentiment increases drastically. Future data will be required to see if this upward trend continues or evens out. In stark contrast, the vote score shows a downward trend, although there is a short window around the change date where vote scores are higher compared to before and after the change. The number of 1st questions has a downward trend before the change and an upward trend afterward. The downward trend of the number of follow-up questions is uninterrupted by the change.
The sentiments improve after the change compared to the stagnant trend before the change, indicating the change has a positive effect. The trend of the vote score is not affected at all. Although the vote score is high around the change date, this is not a result of the change but a conincidence as the vote score increase several month before the change is actually occurs. The vote score is on a continuous downward trend with a same peek around the change date. This indicates the change did not affect the vote score. The amount of 1st questions improved after the change and turned the downward trend into to an upward trend with the grade. The followup questions do not see an improvement. This shows that more new contributors ask their 1st question than before, however, they still tend to become one-day-flies. Summarizing, the sentiment improve, the vote score is largely unaffected, and the number of 1st questions does improve, suggesting that the community benefits from the change.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../tex.stackexchange.com/output/its/average_sentiments-i1.png}
@@ -134,6 +150,7 @@ On tex.stackexchange.com the average sentiment is low compared to the other inve
\label{tex_questionsits}
\end{subfigure}
\end{center}
\caption{Interrupted time series analysis on tex.stackexchange.com}
\end{figure}
\pagebreak
% avg sentiment fairly low compared to the other investigated communities
@@ -143,6 +160,8 @@ On tex.stackexchange.com the average sentiment is low compared to the other inve
\section{unix.stackexchange.com}
On unix.stackexchange.com the average sentiment decreases prior to the change. When the change is introduced the regression takes a small dip down, however, the average sentiment increases fast after the change. The vote score shows a continuous downward trend and the number of 1st and follow-up questions fall slightly prior to the change and increase afterward.
The sentiments improve after the change compared to before the change, indicating the change has a positive effect. The trend of the vote score is not affected at all, the downward trend is almost continuous, indicating the change does not affect the vote score. The amount of 1st questions improved after the change and turned the stagnant trend into to an increasing trend. The followup questions also improved in a similar manner. This shows that new contributors ask more questions than before. Summarizing, the sentiment improve, the vote score is largely unaffected, and the number of questions improve, suggesting that the community benefits from the change.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../unix.stackexchange.com/output/its/average_sentiments-i1.png}
@@ -161,6 +180,7 @@ On unix.stackexchange.com the average sentiment decreases prior to the change. W
\label{unix_questionsits}
\end{subfigure}
\end{center}
\caption{Interrupted time series analysis on unix.stackexchange.com}
\end{figure}
\pagebreak
% sentiments decreasing prior to the change
@@ -181,6 +201,8 @@ More than half of the communities show benefits from the change. The number of f
\section{math.stackexchange.com}
The math.stackexchange.com community shows a decrease in average sentiments, vote score, and the number of questions prior to the change. The measurements make a small jump upward when the change is introduced, however, they continue their downward trend after the introduction of the change. Only the number of follow-up questions stabilizes and begins to increase after the change.
The sentiment trend does not improve long term. Eventhough the sentiment jumps up a bit at the change date, the decreasing trend is enforced. Similarly the vote score does not improve either and keeps decreasing after the change. Contrary, the number of questions ask by new contributors do improve. The number of 1st questions seem to stablize and the number of followup question even reverse the trend and start increasing after the change. Summarizing, the sentiment and vote score do not seem to be affected, however, the number of question from new constributors trend to improve. This shows users seem to be more willing to interact with the community, eventhough the sentiment of the interactions still decreases. The change does not indicate a clear improvement according to its goal.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../math.stackexchange.com/output/its/average_sentiments-i1.png}
@@ -199,6 +221,7 @@ The math.stackexchange.com community shows a decrease in average sentiments, vot
\label{math_questionsits}
\end{subfigure}
\end{center}
\caption{Interrupted time series analysis on math.stackexchange.com}
\end{figure}
\pagebreak
% sentiments falling prior to the change
@@ -206,6 +229,8 @@ The math.stackexchange.com community shows a decrease in average sentiments, vot
\section{MathOverflow.net}
MathOverflow shows a constant regression before the change, however, average sentiments are low at about 10 months before the change and spike high directly before the change. When the change is introduced the regression makes a small jump up and decreases thereafter. The votes score steadily increases prior to the change and then quickly returns to the level from 3 years before the change. The number of 1st questions falls prior to the change and stabilizes thereafter. This data set is sparse compared to the other datasets. Also, the vote scores are high compared to other datasets.
The sentiment trend does not improve long term and even changed from constant to decreasing trend after the change. The vote score does not improve either and changes from an raising to a sharply falling trend. Contrary, the number of questions ask by new contributors do improve. The number of 1st questions stablize. However, the number of followup questions start decreasing after the change. Summarizing, the sentiment, vote score, and number of followup questions are affected negativly. Contrary, the number of 1st question from new constributors trend stabilizes. The change does not indicate a clear improvement according to its goal.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../mathoverflow.net/output/its/average_sentiments-i1.png}
@@ -224,6 +249,7 @@ MathOverflow shows a constant regression before the change, however, average sen
\label{matho_questionsits}
\end{subfigure}
\end{center}
\caption{Interrupted time series analysis on MathOverflow.net}
\end{figure}
\pagebreak
% senitments stable/constant prior to the change
@@ -231,6 +257,8 @@ MathOverflow shows a constant regression before the change, however, average sen
\section{electronics.stackexchange.com}
On electronics.stackexchange.com the average sentiment and votes decrease continuously prior to the change. At the change date, the regression makes a little jump upward but the trend from before the change continues afterward. Similarly to SuperUser, the average sentiment recovers at about 12 months after the change is introduced and future data will be necessary to determine if the recovery is persistent. The number of 1st questions rises continuously prior to the change and decreases thereafter. The number of follow-up questions falls slightly prior to the change and stabilizes afterward.
The sentiment trend does not improve long term. Eventhough the sentiment jumps up a bit at the change date, the same decreasing trend still continues. The vote score trend does not improve either and keeps decreasing after the change, however, the vote score does make a big leap upwards at the change. The number of 1st questions asked by new contributors decrease. The number of followup questions seem to stablize. Summarizing, the sentiment does not seem to be affected. The vote score continues its downward trend although on a higher level than before. The number of question from new constributors trend does not show real improvements. The change does not indicate a clear improvement according to its goal.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../electronics.stackexchange.com/output/its/average_sentiments-i1.png}
@@ -249,6 +277,7 @@ On electronics.stackexchange.com the average sentiment and votes decrease contin
\label{ele_questionsits}
\end{subfigure}
\end{center}
\caption{Interrupted time series analysis on electronics.stackexchange.com}
\end{figure}
\pagebreak
% sentiments were falling continuously before and after the change
@@ -257,6 +286,9 @@ On electronics.stackexchange.com the average sentiment and votes decrease contin
\section{SuperUser.com}
SuperUser shows only sightly decreasing average sentiment and vote score up to the change. At the change time the regressions take a dip down and the regression shows a downward trend after the change. Indeed the average sentiments and vote score dipped considerably when the change is introduced. The average sentiment recovers about 13 months later, while the vote score does not recover as well. The number of 1st questions decreases prior to the change and then goes through the roof indicating a huge wave of new users. This drastic influx of new users may explain the crash of the average sentiment and vote score that occurs at the same time. Data available in the future will show if the recovery is persistent.
The sentiment and vote score analysis show a huge dip starting at 4 months after the change is introduced. In the same time frame the number of 1st questions skyrockets to more than double the previous levels. This is similar to the feature found in the results from stats.stackexchange.com, although this example is much more pronounced. This feature also seems to be produced by the huge influx of new users to the community. As described in \cite{lin2017better}, the quality of interactions in the community dip for a while but recover over time. The sentiment recovers after about 13 month. The vote score also starts to recover at the same time, however not as quickly as the sentiment value. Eventhough a lot of new users joined the community, the amount of followup questions stayed largly the same. Summarizing, the sentiment and vote score analysis does not yield a meaningful result as the time frame after the change includes an outside factor with a huge impact. The number of followup questions do not seem to increase despite the number of first questions doubling, indicating that a lot of the new users are one-day-files%TODO ref?
. The results of this analysis are inconclusive.
\begin{figure}[H]
\begin{subfigure}[t]{0.5\textwidth}
\includegraphics[scale=0.37]{../superuser.com/output/its/average_sentiments-i1.png}
@@ -275,6 +307,7 @@ SuperUser shows only sightly decreasing average sentiment and vote score up to t
\label{super_questionsits}
\end{subfigure}
\end{center}
\caption{Interrupted time series analysis on SuperUser.com}
\end{figure}
\pagebreak
% sentiments fairly stable until the change date

View File

@@ -4,7 +4,8 @@ The ITS analysis of the investigated communities shows mixed results. Some commu
Beside StackOverflow, 5 other communities seem to profit from the change: AskUbuntu, ServerFault, stats.stackexchange.com, tex.stackexchange.com, and unix.stackexchange.com. AskUbuntu shows an interesting zig-zag pattern in the average sentiment graph. Also, the average sentiment falls before the change and raises thereafter, indicating that the change works for this community. However, further data is needed to see if the zig-zag pattern repeats itself. The number of 1st questions starts increasing again after the change stopping the downward trend before that.
On stats.stackexchange.com the average sentiment falls before the change but since the change, the downward trend stops and the sentiment starts to rise slowly, suggesting the change has a positive effect on the community. This is supported by the increase in the number of 1st and followup questions by new contributors. The vote score takes a dip after the change but starts to recover after 12 months which could be the result of another change.
On stats.stackexchange.com the average sentiment falls before the change but since the change, the downward trend stops and the sentiment starts to rise slowly, suggesting the change has a positive effect on the community. This is supported by the increase in the number of 1st and followup questions by new contributors. The vote score takes a dip after the change but starts to recover after 12 months which could be the result of another factor.
%TODO see text in results, vote score and 1st questin same timeframe
In the tex.stackexchange.com community sentiments are stable before the change and show a stark rising pattern after the change. The change seems to work for this community but future data will be necessary to see if the rising pattern continues in the shown manner. The votes score ITS does not fit the model and values before and after the change indicate a linear downward trend. However, the number of 1st questions increases slightly after the change while the prior trend shows a decreasing development.