This commit is contained in:
wea_ondara
2020-01-06 14:00:37 +01:00
parent b0a45a25aa
commit bd6d1dbe6b

View File

@@ -6,10 +6,10 @@ new users: Users are considered new users if their first contribution (question
Data: The data sets are aquired from archive.org [https://archive.org/download/stackexchange]. We analysed following data sets:
- electronics.stackexchange.com
- math.stackexchange.com
- math.stackexchange.com (kaputt timeout)
- mathoverflow.net
- serverfault.com
- stats.stackexchange.com
- stats.stackexchange.com (kaputt analyse_batch letzter plot, 42, 37 datapoints)
- stackoverflow.com (not yet)
- superuser.com
- tex.stackexchange.com
@@ -20,7 +20,7 @@ question and answers may contain code sections. These sections should not contri
Therefore, code sections are excluded in the analysis.
Familiarizing with the data sets: We created plots for:
Familiarizing with the data sets: We investigated following questions:
- How many answers where given to questions in each time interval? (posthist.py)
- How many users were active in each time interval? (posthist.py)
- What is the distribution of users with exactly X answers in a given time interval? (posthist.py)
@@ -30,5 +30,5 @@ Familiarizing with the data sets: We created plots for:
- What are the reactions (answer sentiments) to questions of new users and users who post the most (95%tile)?
Analysis:
ITS: We performed an ITS with 3 tensors (slope before, slope at change, slope after) on the sentiments of anwers to questions of new users (answers within 7 days of the first contribution). We choose to not aggregate the sentiments to an average per months but rather use every sentiment of an answer to a question individually (better results as number of observations in every time frame many vary greatly, thus skewing the results).
ITS: We performed an interrupted time series (ITS) with 3 tensors (slope before, slope at change, slope after) on the sentiments of anwers to questions of new users (answers within 7 days of the first contribution). We choose to not aggregate the sentiments to an average per months but rather use every sentiment of an answer to a question individually (better results as number of observations in every time frame many vary greatly, thus skewing the results).