228 lines
11 KiB
TeX
228 lines
11 KiB
TeX
\chapter{Datasets}
|
|
|
|
%TODO maybe more text
|
|
%general
|
|
StackExchange provides complete datasets of its communities for research purposes on archive.org \cite{archivestackexchange}. StackExchange also provides a short guide on how to interpret the provided data, as some data values are strictly numerical and do not convey any meaning without the knowledge of what these values represent.
|
|
This thesis investigates the largest datasets available and includes the datasets of the following communities:
|
|
\begin{itemize}
|
|
\item StackOverflow.com
|
|
\item math.stackexchange.com
|
|
\item MathOverflow.net
|
|
\item AskUbuntu.com
|
|
\item ServerFault.com
|
|
\item SuperUser.com
|
|
\item electronics.stackexchange.com
|
|
\item stats.stackexchange.com
|
|
\item tex.stackexchange.com
|
|
\item unix.stackexchange.com
|
|
\end{itemize}
|
|
These datasets are selected due to their size as larger datasets yield more consistent results. Smaller datasets may be too sparse to take any meaningful conclusions. Also, outliers would influence the results more when compared to outliner in bigger datasets. The dataset contain all the necessary data since the creation of the respective community and until the last day of February 2020.
|
|
|
|
% from archive.org \cite{archivestackexchange}
|
|
% list of datasets
|
|
% selected largest dataset, smaller datasets data to sparse to take concolusions, statistcal change of outliner to big, outlines would effect the outcome by too much
|
|
% larger data sets yield more consistent results
|
|
% dataset include data since inception of community until some date
|
|
|
|
|
|
%sections 1 per site
|
|
\section{StackOverflow.com} datavalues not computed yet. %TODO insert values
|
|
StackOverflow is the largest and oldest community of the StackExchange platform.
|
|
The community has 165567 registered users of which 3467 were active between December 2019 and February 2020.
|
|
Members asked 116797 questions in total and gave 202751 answers with an average answer density of 1.73 answers per question.
|
|
New users asked 42996 questions with an average of 1.129 questions per new user during their first week after registration.
|
|
|
|
\begin{figure}[H]
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../stackoverflow.com/output/posthist/activeusers-i3.png}
|
|
\label{so_activeusers}
|
|
\subcaption{Active users with activity in the last 3 months}
|
|
\end{subfigure}
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../stackoverflow.com/output/posthist/postsanswers-i3.png}
|
|
\label{so_postsanswers}
|
|
\subcaption{Questions and answers counts over time}
|
|
\end{subfigure}
|
|
\end{figure}
|
|
|
|
\section{math.stackexchange.com}
|
|
``Mathematics Stack Exchange is a question and answer site for people studying math at any level and professionals in related fields.'' \cite{mathstackexchangecom}
|
|
The community has 624671 registered users of which 17074 were active between December 2019 and February 2020.
|
|
Members asked 1170938 questions in total and gave 1565188 answers with an average answer density of 1.336 answers per question.
|
|
New users asked 265704 questions with an average of 1.336 questions per new user during their first week after registration.
|
|
|
|
\begin{figure}[H]
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../math.stackexchange.com/output/posthist/activeusers-i3.png}
|
|
\label{math_activeusers}
|
|
\subcaption{Active users with activity in the last 3 months}
|
|
\end{subfigure}
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../math.stackexchange.com/output/posthist/postsanswers-i3.png}
|
|
\label{math_postsanswers}
|
|
\subcaption{Questions and answers counts over time}
|
|
\end{subfigure}
|
|
\end{figure}
|
|
|
|
\section{MathOverflow.net}
|
|
MathOverflow.net is a rather small community for professional mathematicians.
|
|
The community has 105471 registered users of which 1501 were active between December 2019 and February 2020.
|
|
Members asked 108083 questions in total and gave 144918 answers with an average answer density of 1.34 answers per question.
|
|
New users asked 23746 questions with an average of 1.131 questions per new user during their first week after registration.
|
|
|
|
\begin{figure}[H]
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../mathoverflow.net/output/posthist/activeusers-i3.png}
|
|
\label{matho_activeusers}
|
|
\subcaption{Active users with activity in the last 3 months}
|
|
\end{subfigure}
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../mathoverflow.net/output/posthist/postsanswers-i3.png}
|
|
\label{matho_postsanswers}
|
|
\subcaption{Questions and answers counts over time}
|
|
\end{subfigure}
|
|
\end{figure}
|
|
|
|
\section{AskUbuntu.com}
|
|
AskUbuntu.com is a rather small community for Ubuntu users and developers.
|
|
The community has 783614 registered users of which 7033 were active between December 2019 and February 2020.
|
|
Members asked 334194 questions in total and gave 418051 answers with an average answer density of 1.25 answers per question.
|
|
New users asked 157018 questions with an average of 1.101 questions per new user during their first week after registration.
|
|
|
|
\begin{figure}[H]
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../askubuntu.com/output/posthist/activeusers-i3.png}
|
|
\label{ubuntu_activeusers}
|
|
\subcaption{Active users with activity in the last 3 months}
|
|
\end{subfigure}
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../askubuntu.com/output/posthist/postsanswers-i3.png}
|
|
\label{ubuntu_postsanswers}
|
|
\subcaption{Questions and answers counts over time}
|
|
\end{subfigure}
|
|
\end{figure}
|
|
|
|
\section{ServerFault.com}
|
|
ServerFault.com is a rather small community for system and network administrators.
|
|
The community has 451180 registered users of which 3947 were active between December 2019 and February 2020.
|
|
Members asked 274564 questions in total and gave 432334 answers with an average answer density of 1.574 answers per question.
|
|
New users asked 88547 questions with an average of 1.106 questions per new user during their first week after registration.
|
|
|
|
\begin{figure}[H]
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../serverfault.com/output/posthist/activeusers-i3.png}
|
|
\label{fault_activeusers}
|
|
\subcaption{Active users with activity in the last 3 months}
|
|
\end{subfigure}
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../serverfault.com/output/posthist/postsanswers-i3.png}
|
|
\label{fault_postsanswers}
|
|
\subcaption{Questions and answers counts over time}
|
|
\end{subfigure}
|
|
\end{figure}
|
|
|
|
\section{SuperUser.com}
|
|
SuperUser.com is a rather small community for computer enthusiasts and power users.
|
|
The community has 861533 registered users of which 7392 were active between December 2019 and February 2020.
|
|
Members asked 424718 questions in total and gave 587559 answers with an average answer density of 1.383 answers per question.
|
|
New users asked 161397 questions with an average of 1.085 questions per new user during their first week after registration.
|
|
|
|
\begin{figure}[H]
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../superuser.com/output/posthist/activeusers-i3.png}
|
|
\label{super_activeusers}
|
|
\subcaption{Active users with activity in the last 3 months}
|
|
\end{subfigure}
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../superuser.com/output/posthist/postsanswers-i3.png}
|
|
\label{super_postsanswers}
|
|
\subcaption{Questions and answers counts over time}
|
|
\end{subfigure}
|
|
\end{figure}
|
|
|
|
\section{electronics.stackexchange.com}
|
|
electronics.stackexchange.com is a rather small community for electrical engineering.
|
|
The community has 184795 registered users of which 3172 were active between December 2019 and February 2020.
|
|
Members asked 130025 questions in total and gave 221811 answers with an average answer density of 1.705 answers per question.
|
|
New users asked 47035 questions with an average of 1.126 questions per new user during their first week after registration.
|
|
|
|
\begin{figure}[H]
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../electronics.stackexchange.com/output/posthist/activeusers-i3.png}
|
|
\label{elec_activeusers}
|
|
\subcaption{Active users with activity in the last 3 months}
|
|
\end{subfigure}
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../electronics.stackexchange.com/output/posthist/postsanswers-i3.png}
|
|
\label{elec_postsanswers}
|
|
\subcaption{Questions and answers counts over time}
|
|
\end{subfigure}
|
|
\end{figure}
|
|
|
|
\section{stats.stackexchange.com (Cross Validated)}
|
|
``Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.'' \cite{statsstackexchangecom}
|
|
The community has 227032 registered users of which 4485 were active between December 2019 and February 2020.
|
|
Members asked 151777 questions in total and gave 148046 answers with an average answer density of 0.975 answers per question.
|
|
New users asked 57636 questions with an average of 1.112 questions per new user during their first week after registration.
|
|
|
|
\begin{figure}[H]
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../stats.stackexchange.com/output/posthist/activeusers-i3.png}
|
|
\label{stats_activeusers}
|
|
\subcaption{Active users with activity in the last 3 months}
|
|
\end{subfigure}
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../stats.stackexchange.com/output/posthist/postsanswers-i3.png}
|
|
\label{stats_postsanswers}
|
|
\subcaption{Questions and answers counts over time}
|
|
\end{subfigure}
|
|
\end{figure}
|
|
|
|
\section{tex.stackexchange.com}
|
|
tex.stackexchange.com is a rather small community for TEX and related typesetting systems.
|
|
The community has 171867 registered users of which 3280 were active between December 2019 and February 2020.
|
|
Members asked 188860 questions in total and gave 227875 answers with an average answer density of 1.206 answers per question.
|
|
New users asked 59692 questions with an average of 1.191 questions per new user during their first week after registration.
|
|
|
|
\begin{figure}[H]
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../tex.stackexchange.com/output/posthist/activeusers-i3.png}
|
|
\label{tex_activeusers}
|
|
\subcaption{Active users with activity in the last 3 months}
|
|
\end{subfigure}
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../tex.stackexchange.com/output/posthist/postsanswers-i3.png}
|
|
\label{tex_postsanswers}
|
|
\subcaption{Questions and answers counts over time}
|
|
\end{subfigure}
|
|
\end{figure}
|
|
|
|
\section{unix.stackexchange.com}
|
|
unix.stackexchange.com is a rather small community for Linux and Unix-like operating systems.
|
|
The community has 356498 registered users of which 4565 were active between December 2019 and February 2020.
|
|
Members asked 174625 questions in total and gave 256007 answers with an average answer density of 1.466 answers per question.
|
|
New users asked 62437 questions with an average of 1.124 questions per new user during their first week after registration.
|
|
|
|
\begin{figure}[H]
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../unix.stackexchange.com/output/posthist/activeusers-i3.png}
|
|
\label{unix_activeusers}
|
|
\subcaption{Active users with activity in the last 3 months}
|
|
\end{subfigure}
|
|
\begin{subfigure}[c]{0.5\textwidth}
|
|
\includegraphics[scale=0.35]{../unix.stackexchange.com/output/posthist/postsanswers-i3.png}
|
|
\label{unix_postsanswers}
|
|
\subcaption{Questions and answers counts over time}
|
|
\end{subfigure}
|
|
\end{figure}
|
|
|
|
|
|
|
|
% general information
|
|
% dataset from to dates
|
|
% #user, #questions, #answers, #votes, #avg answer/question
|
|
|
|
%plots
|
|
% #users
|
|
% #questions, #answers
|