diff --git a/text/2_relwork.tex b/text/2_relwork.tex index 6717c4f..32269e4 100644 --- a/text/2_relwork.tex +++ b/text/2_relwork.tex @@ -205,7 +205,7 @@ Quality also depends on the type of platform. \cite{lin2017better} showed that e % A comprehensive survey and classification of approaches for community question answering \cite{srba2016comprehensive}, meta study on papers published between 2005 and 2014 -\subsection{Analysis} +\section{Analysis} %general blabla % sentiment intensity (Valence based), lexical features @@ -215,21 +215,27 @@ Quality also depends on the type of platform. \cite{lin2017better} showed that e % sentiment analyse: es gibt 10-15 methoden, % alle sentiment methoden + vader -\subsubsection{Sentiment analysis} +\subsection{Sentiment analysis} %challenges (vader) % - coverage (e.g. of lexical features, important in mircoblog texts) % - sentiment intensity (some of the following tools ignore intensity completly (just -1, or 1) % - creating a human-validated gold standard lexicon is very time consuming/labor intensive, with sentiment valence scores, feature detection and context awareness, -%%%%% handcrafted + +% polarity-based -> binary +% valence-base -> continuous + +%%%%% handcrafted - TODO order by sofistication, sentwordnet last %liwc (Linguistic Inquiry and Word Count) \cite{pennebaker2001linguistic,pennebakerdevelopment}, 2001 -% - acronyms, initialisms, emoticons, or slang, which are known to be important for sentiment analysis of social text (vader) +% - well verified +% - ignores acronyms, initialisms, emoticons, or slang, which are known to be important for sentiment analysis of social text (vader) % - cannot recognise sentiment intensity (all word have an equal weight) (vader) % - ca 4500 words (uptodate?), ca 400 pos words, ca 500 neg words, lexicon proprietary (vader) +% - TODO list some application examples % ... %General Inquirer (GI) \cite{stone1966general} 1966 -% - 11k words, 1900 pos, 2300 neg, all approx +% - 11k words, 1900 pos, 2300 neg, all approx (vader) % - very old (1966), continuously refined, still in use (vader) % - misses lexical feature detection (acronyms, ...) and sentiment intensity (vader) %Hu-Liu04 \cite{hu2004mining,liu2005opinion}, 2004 @@ -238,18 +244,35 @@ Quality also depends on the type of platform. \cite{lin2017better} showed that e % - bootstrapped from wordnet (wellknown english lexical database) (vader) %Word-Sense Disambiguation (WSD) \cite{akkaya2009subjectivity}, 2009 % - TODO +% - not a sentiment analysis tool per se but can be combined with sentiement analysis tool to distinuish multiple meaning for a word (vader) +% - a word can have multiple meanings, pos neu neg depending on context (vader) +% - derive meaning from context -> disambiguation (vader) +%ANEW (Affective Norms for English Words) \cite{bradley1999affective} 1999 +% - lexicon: 1034 words, ranked by pleasure, arousal, and dominance (vader) +% - words get value 1-9 (neg-pos, continuous), 5 neutral (TODO maybe list word examples with associated value) (vader) +% - therefore captures sentiement intensity (vader) +% - misses lexical features (e.g. acronyms, ...) (vader) +%SenticNet \cite{cambria2010senticnet} 2010 +% - concept-level opinion and sentiment analysis tool (vader) +% - sentic mining: combination of AI and Semantic Web (vader) +% - graphmining and dimensionality reduction (vader) +% - lexicon: 14250 common-sense concepts, with polarity scores [-1,1] continuous, and many other values (vader) +% - TODO list some concepts (vader) %wordnet \cite{miller1998wordnet} 1998 +% - well-known English lexical database (vader) +% - group synonyms (synsets) together (vader) % - TODO %sentiwordnet \cite{baccianella2010sentiwordnet} -% - TODO -%ANEW (Affective Norms for English Words) \cite{bradley1999affective} -% - TODO -%SenticNet \cite{cambria2010senticnet} -% - TODO +% - extension of wordnet (vader) +% - 147k synset, with 3 values for pos neu neg, sum of synset (TODO pos neu neg?) = 1, range 0-1 continuous (vader) +% - synset values calc by complex mix of semi supervised algorithms (properagtion methods and classifiers) -> not a gold standard lexicon (vader) +% - lexicon very noisy, most synset not pos or neg but mix (vader) +% - misses lexical features (vader) %%%%% automated (machine learning) %often require large training sets, compare to creating a lexicon (vader) %training data must represent as many features as possible, otherwise feature is not learned, often not the case (vader) +%training data should be unbiased, or else wrong learning (NOT VADER) %very cpu and memory intensive, slow, compare to lexicon-based (vader) %derived features not nachvollziehbar as a human (black-box) (vader) %generaization problem (vader) @@ -283,7 +306,7 @@ Quality also depends on the type of platform. \cite{lin2017better} showed that e % its % ursprüngliches paper ITS, wie hat man das früher (davor) gemacht -\subsubsection{Trend analysis} +\subsection{Trend analysis}