This commit is contained in:
wea_ondara
2020-01-03 13:24:55 +01:00
parent 1947846aa8
commit e04da245ea

34
summary Normal file
View File

@@ -0,0 +1,34 @@
Question: Did the "new contributor" badge have an impact on social interactions on stack exchange sites?
This badge had been introducted around August to September in 2018. The "new contributor" badge is visible until 1 week after the first contribution of a user.
Definitions:
new users: Users are considered new users if their first contribution (question or answer) was less than 7 days ago.
Data: The data sets are aquired from archive.org [https://archive.org/download/stackexchange]. We analysed following data sets:
- electronics.stackexchange.com
- math.stackexchange.com
- mathoverflow.net
- serverfault.com
- stats.stackexchange.com
- stackoverflow.com (not yet)
- superuser.com
- tex.stackexchange.com
- unix.stackexchange.com
Preprocessing: Some entries in the data sets are broken (e.g. no unique identifiers, etc.) and are filtered out. Furthermore,
question and answers may contain code sections. These sections should not contribute to the sentiment as they may skew results.
Therefore, code sections are excluded in the analysis.
Familiarizing with the data sets: We created plots for:
- How many answers where given to questions in each time interval? (posthist.py)
- How many users were active in each time interval? (posthist.py)
- What is the distribution of users with exactly X answers in a given time interval? (posthist.py)
- What are the proportions of negative, neutral, and postive answers in each time interval? (posthist.py)
- What are the differences between new users and others reguarding sentiment? (analyse_batch.py)
- What is the distribution of sentiments in each time interval? (analyse_batch.py)
- What are the reactions (answer sentiments) to questions of new users and users who post the most (95%tile)?
Analysis:
ITS: We performed an ITS with 3 tensors (slope before, slope at change, slope after) on the sentiments of anwers to questions of new users (answers within 7 days of the first contribution). We choose to not aggregate the sentiments to an average per months but rather use every sentiment to a question individually (better results as number of observations at every time frame many vary greatly).