NLP&ML: 2008

Example of Linguistic Features

Lexical Features

word
pos
syllables (estimated based on distribution patterns of vowels and consonants)
position (begin/end/middle)

Syntactic Features

depending on tool but mostly features from parse tree?

Example Tool for Generating Syntactic Features

Link Grammar: context-free lexicalized grammar. Rules are link requirements (set of disjuncts of possible usage of word). Word sequence belongs to grammar if linkage is planar (connected graph where at most one link between each word pair and no cross link)

Some open issues in NLP

Still lack of semantic parsers for general domain [Shi05EMNLP]

Note Don't forget that "A journey of a thousand miles begin with a single step": Confucius

Promising Approaches from Paper
Stefan Rüping, A Simple Method For Estimating Conditional Probabilities For SVMs.
Milgram, J., Mohamed Cheriet, Sabourin, R.,  Estimating accurate multi-class probabilities with support vector machines
Ben Van Calster¹, Jan Luts¹, Johan A. K. Suykens¹, George Condous², Tom Bourne³, Dirk Timmerman⁴ and Sabine Van Huffel¹ , Comparing Methods for Multi-class Probabilities in Medical Decision Making Using LS-SVMs and Kernel Logistic Regression

Some of them ordering by its effective

KLR (Kernel Logistic Regression): optimization problem is similar to SVM except exponential loss function is used instead of L1 loss
     P(y|x) = \frac{1}{1 + e^{-y(w*x-b}}
drawback typically all \alpha_i are non zero, all examples play a role in estimating, whereas only support vectors are needed in SVM. Hence, computational cost.
SVM-Platt:    \sigma_{a,b}(z) = \frac{1}{1+e^{-az+b}}
     with a>0 to obtain monotonically increasing function
     a,b are found by minimization of the cross-entropy (CRE) error over a subset of the data which has not been used for training
     CRE = \sum_i y_ilog(z_i) + (1-y_i)log(1-z_i)
SVM-Beta: \sigma(z) = \beta^{-1}_{\alpha1,\beta1} \beta_{\alpha1,\beta1}(z)
     \beta_{\alpha,\beta} is a beta distribution [garczarek02classification]
SVM-Softmax:   \sigma_{softmax}(z) = \frac{e^{\gamma z}}{\sum_{j=1}^k exp^{\gamma z}}
  \gamma is derived by minimizing the negative log-likelihood of training data which takes form:
  -\sum_{i=1}^{n}\sum_{j=1}{c}(t_i^j log(z_i))
     monotonously map decision function value to interval [0,1]
     with the objective  to exploit the outputs of all SVMs to estimate overall probabilities, the softmax function is presented.
     it can be regarded as a generalization of the sigmoid for multi-class case
                    

Bibtex:
@inproceedings{ruping04simple,
author    = {Stefan R{\"u}ping},
title     = {A Simple Method For Estimating Conditional Probabilities
          For SVMs},
booktitle = {LWA 2004: Lernen - Wissensentdeckung - Adaptivit{\"a}t},
year      = {2004},
pages     = {206-210},
crossref  = {DBLP:conf/lwa/2004},
bibsource = {DBLP, http://dblp.uni-trier.de}
}

NLP&ML

Monday, August 4, 2008

Short Note on Linguistic Feature

Sunday, August 3, 2008

Some Trends of Asian Language Technology

Sunday, July 6, 2008

Estimate Conditional Probabilies for SVM

Blog Archive

About Me