diff --git a/text/3_method.tex b/text/3_method.tex index 4b5bb43..af00199 100644 --- a/text/3_method.tex +++ b/text/3_method.tex @@ -34,7 +34,7 @@ Furthermore, Vader also detects language features commonly found in social media After all these features are considered, Vader assigns a sentiment value between -1 and 1 on a continuous range. The sentiment range is divided into 3 classes: negative (-1 to -0.05), neutral (-0.05 to 0.05), and positive (0.05 to 1). The outer edges of this range are rarely reached as the text would have to be extremely negative or positive which is very unlikely. %speed -Due to this mathematical simplicity, Vader is really fast when computing a sentiment value for a given text. This feature is one of the requirements \citeauthor{hutto2014vader} originally posed. They proposed that Vader shall be fast enough to do online (real-time) analysis of social media text. +Due to this mathematical simplicity, Vader is really fast when computing a sentiment value for a given text. This feature is one of the requirements \citeauthor{hutto2014vader} originally posed. They proposed that Vader shall be fast enough to do online (real-time) analysis of social media text. %simplicy Vader is also easy to use. It does not require any pre-training on a dataset as it already has a human-curated lexicon and rules related to grammar and syntax. Therefore the sentiment analysis only requires an input to evaluate. This thesis uses a publicly available implementation of Vader.\footnote{\url{https://github.com/cjhutto/vaderSentiment}} The design of Vader allows fast and verifiable analysis. @@ -101,7 +101,7 @@ This algorihm generates an ITS where the line before the change is on a lower le \centering\includegraphics[scale=0.7]{figures/itsexample} \caption{An example that visualizes how ITS works. The change of the system occurs at month 0. The blue line shows the average sentiment of fictional answers grouped by month. The numbers attached to the blue line show the number of sentiment values for a given month. The yellow line represents the ITS analysis as a three-segment line. This example shows the expected behavior of the data sets in the following sections.} \label{itsexample} -\end{figure}\label{itsexample} +\end{figure} %interrupted time series diff --git a/text/5_results.tex b/text/5_results.tex index 44141b5..0269cc0 100644 --- a/text/5_results.tex +++ b/text/5_results.tex @@ -1,6 +1,6 @@ \chapter{Results} -This section shows the results of the experiments described in section 3 on the data sets described in section 4. In the following pages, there 3 diagrams for each community. +This section shows the results of the experiments described in section 3 on the data sets described in section 4. In the following pages, there 3 diagrams for each community. The diagrams capture 3 different aspects: the sentiment of answers, the vote score of questions, and number of questions. These aspects are all measured with regard to questions from new users. In diagrams (a), the blue line states the average sentiment (\emph{average sentiment} in diagram legend) of the answers to questions from new contributors. Also, the numbers attached to the blue line indicate number of answers to questions from new users that formed the average sentiment. The orange line (\emph{sm single ITS} in the diagram legend) represents the ITS over the whole period of the avaiable data. As stated in section 3.2, data density variabilty is a factor to take into account, therefore, the orange line represents the weighted ITS. The green, red, purple, and brown lines also represent ITS, however the time period considered for ITS before and after the change are limited to 6, 9, 12, and 15 months respectively. diff --git a/text/6_discussion.tex b/text/6_discussion.tex index 701ccfd..cf2a8ad 100644 --- a/text/6_discussion.tex +++ b/text/6_discussion.tex @@ -1,9 +1,25 @@ \chapter{Discussion} -The ITS analysis of the investigated communities shows mixed results. Some communities show an increase in sentiment while others are not affected at all or show a decrease in sentiment. The StackOverflow community has a fairly stable average sentiment before the change. The average sentiment jumps to a higher level and keeps rising after the change is introduced. Furthermore, the number of 1st questions from new contributors starts rising drastically after the change while prior levels stagnate. Also, the follow-up questions start increasing slightly. The votes score trend takes a new direction 9 months before the change and is unrelated to it. The change has a positive effect on the StackOverflow community. Beside StackOverflow, 5 other communities seem to profit from the change: AskUbuntu, ServerFault, stats.stackexchange.com, tex.stackexchange.com, and unix.stackexchange.com. AskUbuntu shows an interesting zig-zag pattern in the average sentiment graph. Also, the average sentiment falls before the change and raises thereafter, indicating that the change works for this community. However, further data is needed to see if the zig-zag pattern repeats itself. The number of 1st questions starts increasing again after the change stopping the downward trend before that. On stats.stackexchange.com the average sentiment falls before the change but since the change, the downward trend stops and the sentiment starts to rise slowly, suggesting the change has a positive effect on the community. This is supported by the increase in the number of 1st and followup questions by new contributors. The vote score takes a dip after the change but starts to recover after 12 months which could be the result of another change. In the tex.stackexchange.com community sentiments are stable before the change and show a stark rising pattern after the change. The change seems to work for this community but future data will be necessary to see if the rising pattern continues in the shown manner. The votes score ITS does not fit the model and values before and after the change indicate a linear downward trend. However, the number of 1st questions increases slightly after the change while the prior trend shows a decreasing development. unix.stackexchange.com also shows a decreasing pattern prior and a rising pattern after the change. The vote score analysis shows a fairly linear downward trend before and after the change and is not affected by it. However, the number of 1st questions by new contributors starts to drastically increase while before the change the levels are constant, indicating this community also profits from the change. On ServerFault the sentiment rises gradually before the change, jumps upward by a small value when the change is introduced and the sentiment falls slowly thereafter but the levels are pretty stable over the analyzed period. The vote scores show the change has a huge impact on the community. The previously decreasing trend jumps up by a large amount. However, the vote score rapidly returns to levels right before the change. Contrary, the number of first questions turns direction and starts increasing at the same rate it is falling previously. +The ITS analysis of the investigated communities shows mixed results. Some communities show an increase in sentiment while others are not affected at all or show a decrease in sentiment. The StackOverflow community has a fairly stable average sentiment before the change. The average sentiment jumps to a higher level and keeps rising after the change is introduced. Furthermore, the number of 1st questions from new contributors starts rising drastically after the change while prior levels stagnate. Also, the follow-up questions start increasing slightly. The votes score trend takes a new direction 9 months before the change and is unrelated to it. The change has a positive effect on the StackOverflow community. + +Beside StackOverflow, 5 other communities seem to profit from the change: AskUbuntu, ServerFault, stats.stackexchange.com, tex.stackexchange.com, and unix.stackexchange.com. AskUbuntu shows an interesting zig-zag pattern in the average sentiment graph. Also, the average sentiment falls before the change and raises thereafter, indicating that the change works for this community. However, further data is needed to see if the zig-zag pattern repeats itself. The number of 1st questions starts increasing again after the change stopping the downward trend before that. + +On stats.stackexchange.com the average sentiment falls before the change but since the change, the downward trend stops and the sentiment starts to rise slowly, suggesting the change has a positive effect on the community. This is supported by the increase in the number of 1st and followup questions by new contributors. The vote score takes a dip after the change but starts to recover after 12 months which could be the result of another change. + +In the tex.stackexchange.com community sentiments are stable before the change and show a stark rising pattern after the change. The change seems to work for this community but future data will be necessary to see if the rising pattern continues in the shown manner. The votes score ITS does not fit the model and values before and after the change indicate a linear downward trend. However, the number of 1st questions increases slightly after the change while the prior trend shows a decreasing development. + +unix.stackexchange.com also shows a decreasing pattern prior and a rising pattern after the change. The vote score analysis shows a fairly linear downward trend before and after the change and is not affected by it. However, the number of 1st questions by new contributors starts to drastically increase while before the change the levels are constant, indicating this community also profits from the change. + +On ServerFault the sentiment rises gradually before the change, jumps upward by a small value when the change is introduced and the sentiment falls slowly thereafter but the levels are pretty stable over the analyzed period. The vote scores show the change has a huge impact on the community. The previously decreasing trend jumps up by a large amount. However, the vote score rapidly returns to levels right before the change. Contrary, the number of first questions turns direction and starts increasing at the same rate it is falling previously. %~ - - -The other communities do not seem to profit from the change so clearly. The average sentiment stays constant on MathOverflow before the change and decreases afterward. However, the sentiment levels start increasing six months before the change and are unrelated, indicating the sentiment values are not particularly affected by the change. The vote score is steadily increasing before the change and the crashes down shortly after the change. However, the vote score is very high compared to other communities. The number of 1st questions stabilizes after the change compared to the slight downward previously. math.stackexchange.com shows a downward trend before and after the change for sentiment and vote score. The sentiment ITS is particularly affected by the low sentiment values at the end and future data is required to determine if this trend continues. However, the number of 1st questions stabilizes a bit after changes and follow up questions even see and a slight increase after the change. The electronics.stackexchange.com community has a similar pattern for the sentiment value and vote scores compared to math.stackexchange.com. However, the sentiment values seem to recover after about 12 months and future data is required to see if the rise at the end of the period is a long term trend. The rising number of first questions of new contributors stops at the change date and transition into a decreasing pattern. SuperUser shows an odd pattern. The average sentiment values and votes scores are stable before the change and decrease dramatically shortly afterward. However, the sentiment recovers after 12 months. The ITS model chosen in this thesis is not able to capture the apparent pattern. However, the number of 1st question skyrockets indicating a huge influx of new users. The time frames of the falling sentiment values and vote scores, and the rising number of first questions overlap, indicating the huge influx of new users is responsible for the falling patterns. +The other communities do not seem to profit from the change so clearly. The average sentiment stays constant on MathOverflow before the change and decreases afterward. However, the sentiment levels start increasing six months before the change and are unrelated, indicating the sentiment values are not particularly affected by the change. The vote score is steadily increasing before the change and the crashes down shortly after the change. However, the vote score is very high compared to other communities. The number of 1st questions stabilizes after the change compared to the slight downward previously. + +math.stackexchange.com shows a downward trend before and after the change for sentiment and vote score. The sentiment ITS is particularly affected by the low sentiment values at the end and future data is required to determine if this trend continues. However, the number of 1st questions stabilizes a bit after changes and follow up questions even see and a slight increase after the change. + +The electronics.stackexchange.com community has a similar pattern for the sentiment value and vote scores compared to math.stackexchange.com. However, the sentiment values seem to recover after about 12 months and future data is required to see if the rise at the end of the period is a long term trend. The rising number of first questions of new contributors stops at the change date and transition into a decreasing pattern. + +SuperUser shows an odd pattern. The average sentiment values and votes scores are stable before the change and decrease dramatically shortly afterward. However, the sentiment recovers after 12 months. The ITS model chosen in this thesis is not able to capture the apparent pattern. However, the number of 1st question skyrockets indicating a huge influx of new users. The time frames of the falling sentiment values and vote scores, and the rising number of first questions overlap, indicating the huge influx of new users is responsible for the falling patterns. % similarities in results and differences % so: only community that shows a clear improvement when comapred to prior to change sentiment diff --git a/todo3.txt b/todo3.txt index f830a7f..e4c389a 100644 --- a/todo3.txt +++ b/todo3.txt @@ -12,7 +12,7 @@ DONE > abschließend, große table mit stats die eh schon drin sind results captions ausbauen -describe what the lines are thres +DONE describe what the lines are thres describe the number describe everything describe single sm