Files
master/summary
wea_ondara 8877747692 wip
2020-01-23 13:04:33 +01:00

35 lines
2.2 KiB
Plaintext

Question: Did the "new contributor" badge have an impact on social interactions on stack exchange sites?
This badge had been introducted around August to September in 2018. The "new contributor" badge is visible until 1 week after the first contribution of a user.
Definitions:
new users: Users are considered new users if their first contribution (question or answer) was less than 7 days ago.
Data: The data sets are aquired from archive.org [https://archive.org/download/stackexchange]. We analysed following data sets:
- electronics.stackexchange.com
- math.stackexchange.com (kaputt timeout)
- mathoverflow.net
- serverfault.com
- stats.stackexchange.com
- stackoverflow.com (not yet)
- superuser.com
- tex.stackexchange.com
- unix.stackexchange.com
Preprocessing: Some entries in the data sets are broken (e.g. no unique identifiers, etc.) and are filtered out. Furthermore,
question and answers may contain code sections. These sections should not contribute to the sentiment as they may skew results.
Therefore, code sections are excluded in the analysis.
Familiarizing with the data sets: We investigated following questions:
- How many answers where given to questions in each time interval? (posthist.py)
- How many users were active in each time interval? (posthist.py)
- What is the distribution of users with exactly X answers in a given time interval? (posthist.py)
- What are the proportions of negative, neutral, and postive answers in each time interval? (posthist.py)
- What are the differences between new users and others reguarding sentiment? (analyse_batch.py)
- What is the distribution of sentiments in each time interval? (analyse_batch.py)
- What are the reactions (answer sentiments) to questions of new users and users who post the most (95%tile)?
Analysis:
ITS: We performed an interrupted time series (ITS) with 3 tensors (slope before, slope at change, slope after) on the sentiments of anwers to questions of new users (answers within 7 days of the first contribution). We choose to not aggregate the sentiments to an average per months but rather use every sentiment of an answer to a question individually (better results as number of observations in every time frame many vary greatly, thus skewing the results).