35 lines
2.2 KiB
Plaintext
35 lines
2.2 KiB
Plaintext
Question: Did the "new contributor" badge have an impact on social interactions on stack exchange sites?
|
|
This badge had been introducted around August to September in 2018. The "new contributor" badge is visible until 1 week after the first contribution of a user.
|
|
|
|
Definitions:
|
|
new users: Users are considered new users if their first contribution (question or answer) was less than 7 days ago.
|
|
|
|
Data: The data sets are aquired from archive.org [https://archive.org/download/stackexchange]. We analysed following data sets:
|
|
- electronics.stackexchange.com
|
|
- math.stackexchange.com (kaputt timeout)
|
|
- mathoverflow.net
|
|
- serverfault.com
|
|
- stats.stackexchange.com
|
|
- stackoverflow.com (not yet)
|
|
- superuser.com
|
|
- tex.stackexchange.com
|
|
- unix.stackexchange.com
|
|
|
|
Preprocessing: Some entries in the data sets are broken (e.g. no unique identifiers, etc.) and are filtered out. Furthermore,
|
|
question and answers may contain code sections. These sections should not contribute to the sentiment as they may skew results.
|
|
Therefore, code sections are excluded in the analysis.
|
|
|
|
|
|
Familiarizing with the data sets: We investigated following questions:
|
|
- How many answers where given to questions in each time interval? (posthist.py)
|
|
- How many users were active in each time interval? (posthist.py)
|
|
- What is the distribution of users with exactly X answers in a given time interval? (posthist.py)
|
|
- What are the proportions of negative, neutral, and postive answers in each time interval? (posthist.py)
|
|
- What are the differences between new users and others reguarding sentiment? (analyse_batch.py)
|
|
- What is the distribution of sentiments in each time interval? (analyse_batch.py)
|
|
- What are the reactions (answer sentiments) to questions of new users and users who post the most (95%tile)?
|
|
|
|
Analysis:
|
|
ITS: We performed an interrupted time series (ITS) with 3 tensors (slope before, slope at change, slope after) on the sentiments of anwers to questions of new users (answers within 7 days of the first contribution). We choose to not aggregate the sentiments to an average per months but rather use every sentiment of an answer to a question individually (better results as number of observations in every time frame many vary greatly, thus skewing the results).
|
|
|