wip
This commit is contained in:
@@ -205,7 +205,7 @@ Quality also depends on the type of platform. \cite{lin2017better} showed that e
|
|||||||
% A comprehensive survey and classification of approaches for community question answering \cite{srba2016comprehensive}, meta study on papers published between 2005 and 2014
|
% A comprehensive survey and classification of approaches for community question answering \cite{srba2016comprehensive}, meta study on papers published between 2005 and 2014
|
||||||
|
|
||||||
|
|
||||||
\subsection{Analysis}
|
\section{Analysis}
|
||||||
|
|
||||||
%general blabla
|
%general blabla
|
||||||
% sentiment intensity (Valence based), lexical features
|
% sentiment intensity (Valence based), lexical features
|
||||||
@@ -215,21 +215,27 @@ Quality also depends on the type of platform. \cite{lin2017better} showed that e
|
|||||||
|
|
||||||
% sentiment analyse: es gibt 10-15 methoden,
|
% sentiment analyse: es gibt 10-15 methoden,
|
||||||
% alle sentiment methoden + vader
|
% alle sentiment methoden + vader
|
||||||
\subsubsection{Sentiment analysis}
|
\subsection{Sentiment analysis}
|
||||||
|
|
||||||
%challenges (vader)
|
%challenges (vader)
|
||||||
% - coverage (e.g. of lexical features, important in mircoblog texts)
|
% - coverage (e.g. of lexical features, important in mircoblog texts)
|
||||||
% - sentiment intensity (some of the following tools ignore intensity completly (just -1, or 1)
|
% - sentiment intensity (some of the following tools ignore intensity completly (just -1, or 1)
|
||||||
% - creating a human-validated gold standard lexicon is very time consuming/labor intensive, with sentiment valence scores, feature detection and context awareness,
|
% - creating a human-validated gold standard lexicon is very time consuming/labor intensive, with sentiment valence scores, feature detection and context awareness,
|
||||||
|
|
||||||
%%%%% handcrafted
|
|
||||||
|
% polarity-based -> binary
|
||||||
|
% valence-base -> continuous
|
||||||
|
|
||||||
|
%%%%% handcrafted - TODO order by sofistication, sentwordnet last
|
||||||
%liwc (Linguistic Inquiry and Word Count) \cite{pennebaker2001linguistic,pennebakerdevelopment}, 2001
|
%liwc (Linguistic Inquiry and Word Count) \cite{pennebaker2001linguistic,pennebakerdevelopment}, 2001
|
||||||
% - acronyms, initialisms, emoticons, or slang, which are known to be important for sentiment analysis of social text (vader)
|
% - well verified
|
||||||
|
% - ignores acronyms, initialisms, emoticons, or slang, which are known to be important for sentiment analysis of social text (vader)
|
||||||
% - cannot recognise sentiment intensity (all word have an equal weight) (vader)
|
% - cannot recognise sentiment intensity (all word have an equal weight) (vader)
|
||||||
% - ca 4500 words (uptodate?), ca 400 pos words, ca 500 neg words, lexicon proprietary (vader)
|
% - ca 4500 words (uptodate?), ca 400 pos words, ca 500 neg words, lexicon proprietary (vader)
|
||||||
|
% - TODO list some application examples
|
||||||
% ...
|
% ...
|
||||||
%General Inquirer (GI) \cite{stone1966general} 1966
|
%General Inquirer (GI) \cite{stone1966general} 1966
|
||||||
% - 11k words, 1900 pos, 2300 neg, all approx
|
% - 11k words, 1900 pos, 2300 neg, all approx (vader)
|
||||||
% - very old (1966), continuously refined, still in use (vader)
|
% - very old (1966), continuously refined, still in use (vader)
|
||||||
% - misses lexical feature detection (acronyms, ...) and sentiment intensity (vader)
|
% - misses lexical feature detection (acronyms, ...) and sentiment intensity (vader)
|
||||||
%Hu-Liu04 \cite{hu2004mining,liu2005opinion}, 2004
|
%Hu-Liu04 \cite{hu2004mining,liu2005opinion}, 2004
|
||||||
@@ -238,18 +244,35 @@ Quality also depends on the type of platform. \cite{lin2017better} showed that e
|
|||||||
% - bootstrapped from wordnet (wellknown english lexical database) (vader)
|
% - bootstrapped from wordnet (wellknown english lexical database) (vader)
|
||||||
%Word-Sense Disambiguation (WSD) \cite{akkaya2009subjectivity}, 2009
|
%Word-Sense Disambiguation (WSD) \cite{akkaya2009subjectivity}, 2009
|
||||||
% - TODO
|
% - TODO
|
||||||
|
% - not a sentiment analysis tool per se but can be combined with sentiement analysis tool to distinuish multiple meaning for a word (vader)
|
||||||
|
% - a word can have multiple meanings, pos neu neg depending on context (vader)
|
||||||
|
% - derive meaning from context -> disambiguation (vader)
|
||||||
|
%ANEW (Affective Norms for English Words) \cite{bradley1999affective} 1999
|
||||||
|
% - lexicon: 1034 words, ranked by pleasure, arousal, and dominance (vader)
|
||||||
|
% - words get value 1-9 (neg-pos, continuous), 5 neutral (TODO maybe list word examples with associated value) (vader)
|
||||||
|
% - therefore captures sentiement intensity (vader)
|
||||||
|
% - misses lexical features (e.g. acronyms, ...) (vader)
|
||||||
|
%SenticNet \cite{cambria2010senticnet} 2010
|
||||||
|
% - concept-level opinion and sentiment analysis tool (vader)
|
||||||
|
% - sentic mining: combination of AI and Semantic Web (vader)
|
||||||
|
% - graphmining and dimensionality reduction (vader)
|
||||||
|
% - lexicon: 14250 common-sense concepts, with polarity scores [-1,1] continuous, and many other values (vader)
|
||||||
|
% - TODO list some concepts (vader)
|
||||||
%wordnet \cite{miller1998wordnet} 1998
|
%wordnet \cite{miller1998wordnet} 1998
|
||||||
|
% - well-known English lexical database (vader)
|
||||||
|
% - group synonyms (synsets) together (vader)
|
||||||
% - TODO
|
% - TODO
|
||||||
%sentiwordnet \cite{baccianella2010sentiwordnet}
|
%sentiwordnet \cite{baccianella2010sentiwordnet}
|
||||||
% - TODO
|
% - extension of wordnet (vader)
|
||||||
%ANEW (Affective Norms for English Words) \cite{bradley1999affective}
|
% - 147k synset, with 3 values for pos neu neg, sum of synset (TODO pos neu neg?) = 1, range 0-1 continuous (vader)
|
||||||
% - TODO
|
% - synset values calc by complex mix of semi supervised algorithms (properagtion methods and classifiers) -> not a gold standard lexicon (vader)
|
||||||
%SenticNet \cite{cambria2010senticnet}
|
% - lexicon very noisy, most synset not pos or neg but mix (vader)
|
||||||
% - TODO
|
% - misses lexical features (vader)
|
||||||
|
|
||||||
%%%%% automated (machine learning)
|
%%%%% automated (machine learning)
|
||||||
%often require large training sets, compare to creating a lexicon (vader)
|
%often require large training sets, compare to creating a lexicon (vader)
|
||||||
%training data must represent as many features as possible, otherwise feature is not learned, often not the case (vader)
|
%training data must represent as many features as possible, otherwise feature is not learned, often not the case (vader)
|
||||||
|
%training data should be unbiased, or else wrong learning (NOT VADER)
|
||||||
%very cpu and memory intensive, slow, compare to lexicon-based (vader)
|
%very cpu and memory intensive, slow, compare to lexicon-based (vader)
|
||||||
%derived features not nachvollziehbar as a human (black-box) (vader)
|
%derived features not nachvollziehbar as a human (black-box) (vader)
|
||||||
%generaization problem (vader)
|
%generaization problem (vader)
|
||||||
@@ -283,7 +306,7 @@ Quality also depends on the type of platform. \cite{lin2017better} showed that e
|
|||||||
|
|
||||||
% its
|
% its
|
||||||
% ursprüngliches paper ITS, wie hat man das früher (davor) gemacht
|
% ursprüngliches paper ITS, wie hat man das früher (davor) gemacht
|
||||||
\subsubsection{Trend analysis}
|
\subsection{Trend analysis}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user