This commit is contained in:
wea_ondara
2020-05-02 11:52:30 +02:00
parent a02705666e
commit 9d195f0f68
7 changed files with 52 additions and 26 deletions

View File

@@ -14,7 +14,7 @@ StackExchange introduced a \emph{new contributor} indicator to all communities o
% new user indicator visible for 1 week ...
%TODO more vader explanation
To measure the effectiveness of the change we chose Vader, a sentiment analysis tool designed for social media interactions \cite{hutto2014vader}. Vader uses a lexicon of words with attached sentiment values and rules related to grammar and syntax to determine a sentiment value between -1 and 1 to a given piece of text. The sentiment range is divided into 3 classes: negative (-1 to -0.05), neutral (-0.05 to 0.05), and positive (0.05 to 1). The outer edges of the value space are rarely reached as the text would have to be negative or positive to the extremes which is very unlikely.
To measure the effectiveness of the change this thesis utilizes Vader, a sentiment analysis tool with exceptional performance in analysing and categorizing microblog-like texts as well as good generalization in other domains \cite{hutto2014vader}. The choice is based on the speed and simplicity of Vader. Vader uses a lexicon of words with attached sentiment values and rules related to grammar and syntax to determine a sentiment value between -1 and 1 to a given piece of text. The sentiment range is divided into 3 classes: negative (-1 to -0.05), neutral (-0.05 to 0.05), and positive (0.05 to 1). The outer edges of the value space are rarely reached as the text would have to be extremely negative or positive which is very unlikely. This design allows fast and verifiable analysis.
% sentiment calculation via vaderlib, write whole paragraph and explain, also add ref to paper \cite{hutto2014vader}
@@ -44,6 +44,14 @@ After preprocessing the raw data, relevant data is filtered and computed. Questi
\section{Analysis}
An interrupted time series (ITS) analysis captures trends before and after a change in a system and fits very well with the question this thesis investigates. ITS can be applied to a large variety of data if the data contains the same kind of data points before and after the change and when the change date and time are known. \citeauthor{bernal2017interrupted} published a paper on how ITS works \cite{bernal2017interrupted}. ITS works well on medical data, for instance, when a new treatment is introduced ITS can visualize if the treatment improves a condition. For ITS no control group is required and often control groups are not feasible. ITS only works with the before and after data and a date where a change was introduced.
ITS relies on linear regression and tries to fit a three-segment linear function to the data. The authors also described cases where more than three segments are used but these models quickly raise the complexity of the analysis and for this thesis a three-segment linear regression is sufficient. The three segments are lines to fit the data before and after the change as well as one line to connect the other two lines at the change date. Figure \ref{itsexample} shows an example of an ITS. Each segment is captured by a tensor of the following formula $Y_t = \beta_0 + \beta_1T + \beta_2X_t + \beta_3TX_t$, where $T$ represents time as a number, for instance, number of months since the start of data recording, $X_t$ represents 0 or 1 depending on whether the change is in effect, $\beta_0$ represents the value at $T = 0$, $\beta_1$ represents the slope before the change, $\beta_2$ represents the value when the change is introduced, and $\beta_3$ represents the slope after the change. Contrary to the method in \cite{bernal2017interrupted} where the ITS is performed on aggregated values per month, this thesis performs the ITS on single data points, as the premise that the aggregated values all have the same weight within a certain margin is not fulfilled. Performing the ITS with aggregated values would skew the linear regression more towards data points with less weight. Single data point fitting prevents this, as weight is taken into account with more data points.
%TODO include ITS example img
\begin{figure}
\centering\includegraphics[scale=0.7]{figures/itsexample}
\caption{An example that visualizes how ITS works. The change of the system occurs at month 0. The blue line shows the average sentiment of fictional answers grouped by month. The numbers attached to the blue line show the number of sentiment values for a given month. The yellow line represents the ITS analysis as a three-segment line.}
\label{itsexample}
\end{figure}
%interrupted time series
% ref tutorial paper \cite{bernal2017interrupted}