wip
This commit is contained in:
@@ -42,9 +42,9 @@ After preprocessing the raw data, relevant data is filtered and computed. Questi
|
||||
|
||||
|
||||
\section{Analysis}
|
||||
An interrupted time series (ITS) analysis captures trends before and after a change in a system and fits very well with the question this thesis investigates. ITS can be applied to a large variety of data if the data contains the same kind of data points before and after the change and when the change date and time are known. \citeauthor{bernal2017interrupted} published a paper on how ITS works \cite{bernal2017interrupted}. ITS works well on medical data, for instance, when a new treatment is introduced ITS can visualize if the treatment improves a condition. For ITS no control group is required and often control groups are not feasible. ITS only works with the before and after data and a date where a change was introduced.
|
||||
An interrupted time series (ITS) analysis captures trends before and after a change in a system and fits very well with the question this thesis investigates. ITS can be applied to a large variety of data if the data contains the same kind of data points before and after the change and when the change date and time are known. \citeauthor{bernal2017interrupted} published a paper on how ITS works \cite{bernal2017interrupted}. ITS performes well on medical data, for instance, when a new treatment is introduced ITS can visualize if the treatment improves a condition. For ITS no control group is required and often control groups are not feasible. ITS only works with the before and after data and a point in time where a change was introduced.
|
||||
ITS relies on linear regression and tries to fit a three-segment linear function to the data. The authors also described cases where more than three segments are used but these models quickly raise the complexity of the analysis and for this thesis a three-segment linear regression is sufficient. The three segments are lines to fit the data before and after the change as well as one line to connect the other two lines at the change date. Figure \ref{itsexample} shows an example of an ITS. Each segment is captured by a tensor of the following formula $Y_t = \beta_0 + \beta_1T + \beta_2X_t + \beta_3TX_t$, where $T$ represents time as a number, for instance, number of months since the start of data recording, $X_t$ represents 0 or 1 depending on whether the change is in effect, $\beta_0$ represents the value at $T = 0$, $\beta_1$ represents the slope before the change, $\beta_2$ represents the value when the change is introduced, and $\beta_3$ represents the slope after the change. Contrary to the method in \cite{bernal2017interrupted} where the ITS is performed on aggregated values per month, this thesis performs the ITS on single data points, as the premise that the aggregated values all have the same weight within a certain margin is not fulfilled. Performing the ITS with aggregated values would skew the linear regression more towards data points with less weight. Single data point fitting prevents this, as weight is taken into account with more data points.
|
||||
%TODO include ITS example img
|
||||
|
||||
|
||||
\begin{figure}
|
||||
\centering\includegraphics[scale=0.7]{figures/itsexample}
|
||||
|
||||
@@ -12,7 +12,7 @@
|
||||
\label{stackoverflow_its}
|
||||
\end{figure}
|
||||
|
||||
\section{math.stackexchange.com}
|
||||
\section{math.stackexchange.com}
|
||||
\begin{figure}[H]
|
||||
\centering\includegraphics[scale=0.47]{../math.stackexchange.com/output/its/average_sentiments-i1.png}
|
||||
\caption{An interrupted time series analysis of the sentiments of answer to questions created by new contributors on math.stackexchange.com}
|
||||
|
||||
Reference in New Issue
Block a user