wip

2021-05-23 21:51:24 +02:00
parent cb827b384e
commit 24c7cea26f
3 changed files with 39 additions and 24 deletions
--- a/text/2_relwork.tex
+++ b/text/2_relwork.tex
@@ -120,7 +120,7 @@ For this project four mentors were hand selected and therefore the project would
 % Rolling out the Welcome Wagon: June Update \cite{friend2018rolling} “Ask a Question Wizard” prototype, reduce exclusion (negative feelings, expectations and experiences), improve inclusion (learn from other communities facing similar problems), classification of abusive and unwelcoming comments


-%TODO Unwelcomeness is a large problem on StackExchange; not so strong; maybe other sentence
+%Unwelcomeness is a large problem on StackExchange; not so strong; maybe other sentence
 Unwelcomeness is a large problem on StackExchange \cite{ford2016paradise}\footref{friend2018rolling}\footref{hanlon2018stack}. Although unwelcomeness affects all new users, users from marginalized groups suffer significantly more \cite{vasilescu2014gender}\footref{hanlon2018stack}. \citeauthor{ford2016paradise} investigated barriers users face when contributing to StackOverflow. The authors identified 14 barriers in total hindering newcomers to contribute and five barriers were rated significantly more problematic for women than men. On StackOverflow only 5.8\% (2015\footnote{\url{https://insights.stackoverflow.com/survey/2015}}, 7.9\% 2019\footref{stackoversurvey2019}) of active users identify as women. \citeauthor{david2008community} found similar results of 5\% women in their work on \emph{Community-based production of open-source software} \cite{david2008community}. These numbers are comparatively small to the number of degrees in Science, Technology, Engineering, and Mathematics (STEM) \cite{clark2005women} where 20\% are achieved by women \cite{hill2010so}. Despite the difference, the percentage of women on StackOverflow has increased in recent years.

 %discrimitation
@@ -138,8 +138,8 @@ Unwelcomeness is a large problem on StackExchange \cite{ford2016paradise}\footre
 While attracting and onboarding new users is an important step for growing a community, keeping them on the platform and turning them long lasting community members is equally as important for growth as well as sustainability. Users have to feel the benefits of staying with the community. Without the benefits a user has little to no motivation to interact with the community and will most likely drop out of it. Benefits are diverse, however, they can be grouped into 5 categories: information exchange, social support, social interaction, time and location flexibility, and permanency \cite{iriberri2009life}. 
 As StackExchange is a CQA platform, the benefits from information exchange, time and location flexibility, and permanency are more prevalent, while social support, and social interaction are more in the background. Social support and social interaction are more relevant in communities where individuals communicyte about topics reguarding themselves, for instance, communities where health aspects are the main focus \cite{maloney2005multilevel}. Time and location flexibility is important for all online communities. Information exchange, and permanency are important for StackExchange as it is a large collection of knowledge which mostly does not change over time or from one individual to another. StackExchange' content is driven by the community and therefore depends on the voluntarism of its users, making benefits even more important.

-The backbone of a community is always the user base and its volunarism to participate with the community. Even if the community is lead by a commerical core team, the community is almost always several orders of magnitude greater than the number of the paid employees forming the core team \cite{butler2002community}. The core team often provides the infrastructur the community and does some cummunity. However, most of the community work is done by volunteers of the community . %TODO get number on employees and volunteers on stackexchange/overflow
-%This is also true for the StackExchange platform where the core team of paid employees is XXX and the number of voluntary community members performing community work is XXX \footnote{\url{LINK}}
+The backbone of a community is always the user base and its volunarism to participate with the community. Even if the community is lead by a commerical core team, the community is almost always several orders of magnitude greater than the number of the paid employees forming the core team \cite{butler2002community}. The core team often provides the infrastructur the community and does some cummunity. However, most of the community work is done by volunteers of the community.
+This is also true for the StackExchange platform where the core team of paid employees is between 200 to 500\footnote{\url{https://www.linkedin.com/company/stack-overflow}} (this includes employees working on other products) and the number of voluntary community members (these users have access to moderation tools) performing community work is around 10,000 \footnote{\url{https://data.stackexchange.com/stackoverflow/revision/1412005/1735651/users-with-rep-20k}}.

 In a community, users can generally be split in 2 groups by motivation to voluntarily contribute: One group acts out of altruism, where users contribute with the reason to help others and do good to the community; the second group acts out of egoism and selfish reasons, for instance, getting recognition from other people \cite{ginsburg2004framework}. Users of the second group still help the community but their primary goal not neccessarily the health of commiunity but gaining reputation and making a name for themselves. Contrary, users of the first group primarly focus on helping the community and see reputation as a positive side effect which also feeds back in their ability to help others. While these groups have different objectives, both groups need recognition of their efforts \cite{iriberri2009life}. There are several methods for recognizing the value a member provides to the community: reputation, awards, trust, identity, etc. \cite{ginsburg2004framework}. Reputation, trust, and identity are often reached gradually over time by continuously working on them, awards are reached at discrete points in time. Awards often take some time and effort to achive. However, awards should not be easily achievable as their value come from the work that is required for them\cite{lawler2000rewarding}. They should also be meaningful in the community they are used in. Most importantly, award have to be visible to the public, so other members can see them. In this way, awards become a powerful motivator to users.

@@ -189,7 +189,7 @@ StackExchange employes serveral features to engage users with the platform, for

 Reputation plays a important role on StackExchange and indicates the credibility of a user as well as a primary source of answers of high quality \cite{movshovitz2013analysis}. Although the largest chunk of all questions is posted by low-reputated users, high-reputated users post more questions on average. To earn a high reputation a user has to invest a lot of effort and time into the community, for instance, asking good questions or providing useful answers to questions of others. Reputation is earned when a question or answer is upvoted by other users, or if an answer is accepted as the solution to a question by the question creator. \citeauthor{mamykina2011design} found that the reputation system of StackOverflow encourages users to compete productively \cite{mamykina2011design}. But not every user participates equally, and participation depends on the personality of the user \cite{bazelli2013personality}. \citeauthor{bazelli2013personality} showed that the top-reputated users on StackOverflow are more extroverted compared to users with less reputation. \citeauthor{movshovitz2013analysis} found that by analyzing the StackOverflow community network, experts can be reliably identified by their contribution within the first few months after their registration. Graph analysis also allowed the authors to find spamming users or users with other extreme behavior. 

-Although gaining reputation takes time and effort, users can take certain advantages to gain reputation faster by gaming the system \cite{bosu2013building}. \citeauthor{bosu2013building} analyzed the reputation system and found five strategies to increase the reputation in a fast way: Firstly, answering questions with tags that have a small expertise density. This reduces competitiveness against other users and increases the chance of upvotes and answer acceptance. Secondly, questions should be answered promptly. The question asker will most likely accept the first arriving answer that solves the question. This is also supported by \cite{anderson2012discovering}. Thirdly, answering first also gives the user an advantage over other answerers. Fourthly, activity during off-peak hours reduces the competition from other users. Finally, contributing to diverse areas will also help in developing a higher reputation. %TODO help vamipires, noobs, reputation collectors \cite{srba2016stack}
+Although gaining reputation takes time and effort, users can take certain advantages to gain reputation faster by gaming the system \cite{bosu2013building, srba2016stack}. \citeauthor{bosu2013building} analyzed the reputation system and found five strategies to increase the reputation in a fast way: Firstly, answering questions with tags that have a small expertise density. This reduces competitiveness against other users and increases the chance of upvotes and answer acceptance. Secondly, questions should be answered promptly. The question asker will most likely accept the first arriving answer that solves the question. This is also supported by \cite{anderson2012discovering}. Thirdly, answering first also gives the user an advantage over other answerers. Fourthly, activity during off-peak hours reduces the competition from other users. Finally, contributing to diverse areas will also help in developing a higher reputation. This behavior may, however, decrease answer quality when users focus too much on reputation collection and disregard the quality of their posts\cite{srba2016stack}.

 % DONE Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow \cite{anderson2012discovering} accepted answer strongly depends on when answers arrive, considered not only the question and accepted answer but the set of answers to a question

@@ -267,8 +267,11 @@ Another solution is to find content abusers (noobs, help vampires, etc.) directl

 \section{Analysis}

-%general blabla
-% sentiment intensity (Valence based), lexical features
+When analyzing a community, one typically finds 2 types of data: text, and meta data. Meta data is realively easy to quantify, while text is much more complicated and intricate to quantify. Text contains a large variety of features and depending on the research in question, researchers have to decide which features they want to include. This thesis investigates the (un-)friendlyness in the communication between users an will therefore perform sentiment analysis on the texts. The next section will go into more detail on sentiment analysis. After the data (text and meta data) is quantified, one often want to know how the data has changed over time. The trend analysis section follows the sentiment analysis section. 
+
+%
+%assign values to text
+%analyze trend



@@ -318,17 +321,15 @@ Creating hand crafted tools is often a huge undertaking. They depend on a hand c
 % - TODO list some application examples 
 % ...

-Linguistic Inquiry and Word Count (LIWC) \cite{pennebaker2001linguistic,pennebakerdevelopment} is one of the more popular tools. Due to its widespread usage, LIWC is well verfied, both internally and externally. Its lexicon consists of about 4500 words where words are categorized into one or more of the 76 defined categories. Approximatly 400 words have a positive and 500 words have a negative emotion. %TODO ref for 400 500, list example see todo
-However, the lexicon is proprietary, so .... %TODO ref or remove
-LIWC also has some drawbacks, for instance, it does not capture acronyms, emoticons, or slang words. Furthermore, LIWC's lexicon uses a polarity-based approach, meaning that it cannot distinguish between the scentences ''This pizza is good`` and ''This pizza is excellent``. \emph Good and \emph excellent are both in the category of positive emotion but LIWC does not distinguish between single words in the same category.
+Linguistic Inquiry and Word Count (LIWC) \cite{pennebaker2001linguistic,pennebakerdevelopment} is one of the more popular tools. Due to its widespread usage, LIWC is well verfied, both internally and externally. Its lexicon consists of about 6,400 words where words are categorized into one or more of the 76 defined categories \cite{pennebaker2015development}. 620 words have a positive and 744 words have a negative emotion. Examples for positive words are: love, nice, sweet; examples for negative words are: hurt, ugly, nasty. LIWC also has some drawbacks, for instance, it does not capture acronyms, emoticons, or slang words. Furthermore, LIWC's lexicon uses a polarity-based approach, meaning that it cannot distinguish between the scentences ''This pizza is good`` and ''This pizza is excellent``\cite{hutto2014vader}. \emph Good and \emph excellent are both in the category of positive emotion but LIWC does not distinguish between single words in the same category.

 %General Inquirer (GI) \cite{stone1966general} 1966 TODO ref wrong?
 % - 11k words, 1900 pos, 2300 neg, all approx (vader)
 % - very old (1966), continuously refined, still in use (vader)
 % - misses lexical feature detection (acronyms, ...) and sentiment intensity (vader)

-General Inquirer (GI)\cite{stone1966general} is one of the oldest sentiment tools still in use. It was originally designed in 1966 and has been continuously refined and now consists of about 11000 words where 1900 positively rated words and 2300 negatively rated words. %TODO how does it work
-Like LIWC, GI uses a polarity-based lexicon and therefore is not able to capture sentiment intensity. Also, GI does not recognize lexical features, such as, acronyms, initalisms, etc.. %TODO ref
+General Inquirer (GI)\cite{stone1966general} is one of the oldest sentiment tools still in use. It was originally designed in 1966 and has been continuously refined and now consists of about 11000 words where 1900 positively rated words and 2300 negatively rated words.
+Like LIWC, GI uses a polarity-based lexicon and therefore is not able to capture sentiment intensity\cite{hutto2014vader}. Also, GI does not recognize lexical features, such as, acronyms, initalisms, etc..


 %Hu-Liu04 \cite{hu2004mining,liu2005opinion}, 2004
@@ -339,9 +340,7 @@ Like LIWC, GI uses a polarity-based lexicon and therefore is not able to capture
 % - bootstrapped from wordnet (wellknown english lexical database) (vader, hu2004mining)

 %TODO refs
-Hu-Liu04 \cite{hu2004mining,liu2005opinion} is a opinion mining tool. It searches for features in multiple pieces of text, for instance, product reviews, and 
-rates the opinion of the feature by using a binary classification\cite{hu2004mining}. Crutially Hu-Liu04 does not summarize the texts but summarizes ratings of the opinions about features mentioned in the texts. Hu-Liu04 was bootstrapped from WordNet\cite{TODO} and then extended further. It now uses a lexicon consisting of about 6800 words where 2000 words have a positive sentiment and 4800 word have a negative sentiment attached. %TODO ref
-This tools is, by design, better suited for social media texts, although it also misses emiticons, acronyms and initialisms.
+Hu-Liu04 \cite{hu2004mining,liu2005opinion} is a opinion mining tool. It searches for features in multiple pieces of text, for instance, product reviews, and rates the opinion of the feature by using a binary classification\cite{hu2004mining}. Crutially Hu-Liu04 does not summarize the texts but summarizes ratings of the opinions about features mentioned in the texts. Hu-Liu04 was bootstrapped from WordNet\cite{hu2004mining} and then extended further. It now uses a lexicon consisting of about 6800 words where 2000 words have a positive sentiment and 4800 word have a negative sentiment attached\cite{hutto2014vader}. This tools is, by design, better suited for social media texts, although it also misses emiticons, acronyms and initialisms.

 %SenticNet \cite{cambria2010senticnet} 2010
 % - concept-level opinion and sentiment analysis tool (vader)
@@ -351,7 +350,7 @@ This tools is, by design, better suited for social media texts, although it also
 % - lexicon: 14250 common-sense concepts, with polarity scores [-1,1] continuous, and many other values (vader)
 % - TODO list some concepts (vader) or maybe not

-SenticNet \cite{cambria2010senticnet} is also an opinion mining tool but it focuses on concept-level opinions. SenticNet is based on a paradigm called \emph{Sentic Mining} which uses a combination of concepts from artificial integelligence and the Semantic Web. More specifically, it uses graph mining and dimentionality reduction. SenticNets lexicon consists of about 14250 common-sense concepts which a have rating on many scales of which one is a polarity score with a continuous range from -1 to 1. This continuous range of polarity scores enables SenticNet to be sentiment-intensity aware.
+SenticNet \cite{cambria2010senticnet} is also an opinion mining tool but it focuses on concept-level opinions. SenticNet is based on a paradigm called \emph{Sentic Mining} which uses a combination of concepts from artificial integelligence and the Semantic Web. More specifically, it uses graph mining and dimentionality reduction. SenticNets lexicon consists of about 14250 common-sense concepts which a have rating on many scales of which one is a polarity score with a continuous range from -1 to 1\cite{hutto2014vader}. This continuous range of polarity scores enables SenticNet to be sentiment-intensity aware.


 %ANEW (Affective Norms for English Words) \cite{bradley1999affective} 1999
@@ -360,12 +359,14 @@ SenticNet \cite{cambria2010senticnet} is also an opinion mining tool but it focu
 % - words get value 1-9 (neg-pos, continuous), 5 neutral (TODO maybe list word examples with associated value) (vader, bradley1999affective)
 % - therefore captures sentiement intensity (vader, bradley1999affective)
 % - misses lexical features (e.g. acronyms, ...) (vader)
-Affective Norms for English Words (ANEW) \cite{bradley1999affective} is sentiment analysis tool and was introducted to standardize research and offer a way to compare research. Its lexicon is fairly small and consists of only 1034 words which are ranked pleasure, arousal, and dominance. However, ANEW uses a continuous scale from 1 to 9 where 1 represents the negative end, 9 represents the positive end, and 5 is considered neutral. With this design, ANEW is able to capture sentiment intensity. However, ANEW still misses lexical features, for instance, acronyms.
+Affective Norms for English Words (ANEW) \cite{bradley1999affective} is sentiment analysis tool and was introducted to standardize research and offer a way to compare research. Its lexicon is fairly small and consists of only 1034 words which are ranked pleasure, arousal, and dominance. However, ANEW uses a continuous scale from 1 to 9 where 1 represents the negative end, 9 represents the positive end, and 5 is considered neutral. With this design, ANEW is able to capture sentiment intensity. However, ANEW still misses lexical features, for instance, acronyms\cite{hutto2014vader}.

 %wordnet \cite{miller1998wordnet} 1998, TODO maybe exlcude or just mention briefly in sentiwordnet
 % - well-known English lexical database (vader)
 % - group synonyms (synsets) together (vader)
-% - TODO
+% - 
+
+WordNet analyzes text with a dictionary which contains lexical contepts \cite{miller1995wordnet,miller1998wordnet}. Each lexical concept contains multiple words which are synonyms, called synsets. These synsets are then linked by semantic relations. With this lexicon text acan be queried in multiple different ways.


 %sentiwordnet \cite{baccianella2010sentiwordnet}
@@ -376,8 +377,7 @@ Affective Norms for English Words (ANEW) \cite{bradley1999affective} is sentimen
 % - lexicon very noisy, most synset not pos or neg but mix (vader)
 % - misses lexical features (vader)

-SentiWordNet \cite{baccianella2010sentiwordnet} is an extension of WordNet and adds ... %TODO whats the difference
-Its lexicon consists of about 147000 synsets, each having 3 value (positive, neutral, negative) attached to them. The each value has a continuous range from 0 to 1 and the sum of these 3 values is set to be 1. The values of each synset are calculated by a mix of semi supervised algorithms, mostly propergation and classifiers. This distinguishes SentiWordNet from previously explained sentiment tools, where the lexica are exclusively created by humans (except for simple mathemtical operations, for instance, averaging of values). Therefore, SentiWordNets lexicon is not considered to be a human-curated gold standard. Furthermore, the lexicon is very noisy and most of the synsets neigher positive or negative but a mix of both\cite{hutto2014vader}. Moreover, SentiWordNet misses lexical features, for instance, acronyms, initalisms and emoticons.
+SentiWordNet \cite{baccianella2010sentiwordnet} is an extension of WordNet and adds sentiment scores to the synsets. Its lexicon consists of about 147000 synsets, each having 3 values (positive, neutral, negative) attached to them. The each value has a continuous range from 0 to 1 and the sum of these 3 values is set to be 1. The values of each synset are calculated by a mix of semi supervised algorithms, mostly propergation and classifiers. This distinguishes SentiWordNet from previously explained sentiment tools, where the lexica are exclusively created by humans (except for simple mathemtical operations, for instance, averaging of values). Therefore, SentiWordNet's lexicon is not considered to be a human-curated gold standard. Furthermore, the lexicon is very noisy and most of the synsets neigher positive or negative but a mix of both\cite{hutto2014vader}. Moreover, SentiWordNet misses lexical features, for instance, acronyms, initalisms and emoticons.

 %Word-Sense Disambiguation (WSD) \cite{akkaya2009subjectivity}, 2009
 % - TODO
@@ -386,7 +386,7 @@ Its lexicon consists of about 147000 synsets, each having 3 value (positive, neu
 % - derive meaning from context -> disambiguation (vader, akkaya2009subjectivity)
 % - distinguish subjective and objective word usage, sentences can only contain negative words used in object ways -> sentence not negative, TODO example sentence (akkaya2009subjectivity)

-Word-Sense Disambiguation (WSD)\cite{akkaya2009subjectivity} is not a sentiment analysis tool per se but it can be used to enhance others. In languages certain words have different meanings depending on the context they are used in. When sentiment tools, which do not use WSD, analyze a piece of text, some words which have different meanings depending on the context may skew the resulting sentiment. Some words can even change from positive to negative or vice versa depending on the context. WSD tries to distinguish between subjective and objective word usage. For example: ... %TODO insert example
+Word-Sense Disambiguation (WSD)\cite{akkaya2009subjectivity} is not a sentiment analysis tool per se but it can be used to enhance others. In languages certain words have different meanings depending on the context they are used in. When sentiment tools, which do not use WSD, analyze a piece of text, some words which have different meanings depending on the context may skew the resulting sentiment. Some words can even change from positive to negative or vice versa depending on the context. WSD tries to distinguish between subjective and objective word usage. For example: \emph{The party was great.} and \emph{The party lost many votes}. Although \emph party is written exactly the same it has 2 completly different meanings. Depending on the context, ambiguous words can have different sentiment.


 %%%%% automated (machine learning)