Monday, August 4, 2008

Short Note on Linguistic Feature

Example of Linguistic Features

Lexical Features
  • word
  • pos
  • syllables (estimated based on distribution patterns of vowels and consonants)
  • position (begin/end/middle)
Syntactic Features
  • depending on tool but mostly features from parse tree?
Example Tool for Generating Syntactic Features
  • Link Grammar: context-free lexicalized grammar. Rules are link requirements (set of disjuncts of possible usage of word). Word sequence belongs to grammar if linkage is planar (connected graph where at most one link between each word pair and no cross link)

Some open issues in NLP
  • Still lack of semantic parsers for general domain [Shi05EMNLP]
Note Don't forget that "A journey of a thousand miles begin with a single step": Confucius

Sunday, August 3, 2008

Some Trends of Asian Language Technology

  • segmentation and tokenization: segment into unique word tokens   
  • Lemmatization: dictionary base form for an inflected verb or adjective 
  • Noun Decompounding: separate compound nouns
  • POS tagging
  • Sentence boundary detection
  • Base NP analysis: identify sets of words including a noun which describes a single nominal expression 

Sunday, July 6, 2008

Estimate Conditional Probabilies for SVM

Promising Approaches from Paper
Stefan RĂ¼ping, A Simple Method For Estimating Conditional Probabilities For SVMs.
Milgram, J., Mohamed Cheriet, Sabourin, R., Estimating accurate multi-class probabilities with support vector machines
Ben Van Calster1, Jan Luts1, Johan A. K. Suykens1, George Condous2, Tom Bourne3, Dirk Timmerman4 and Sabine Van Huffel1 , Comparing Methods for Multi-class Probabilities in Medical Decision Making Using LS-SVMs and Kernel Logistic Regression

Some of them ordering by its effective

KLR (Kernel Logistic Regression): optimization problem is similar to SVM except exponential loss function is used instead of L1 loss
P(y|x) = \frac{1}{1 + e^{-y(w*x-b}}
drawback typically all \alpha_i are non zero, all examples play a role in estimating, whereas only support vectors are needed in SVM. Hence, computational cost.
SVM-Platt: \sigma_{a,b}(z) = \frac{1}{1+e^{-az+b}}
with a>0 to obtain monotonically increasing function
a,b are found by minimization of the cross-entropy (CRE) error over a subset of the data which has not been used for training
CRE = \sum_i y_ilog(z_i) + (1-y_i)log(1-z_i)
SVM-Beta: \sigma(z) = \beta^{-1}_{\alpha1,\beta1} \beta_{\alpha1,\beta1}(z)
\beta_{\alpha,\beta} is a beta distribution [garczarek02classification]
SVM-Softmax: \sigma_{softmax}(z) = \frac{e^{\gamma z}}{\sum_{j=1}^k exp^{\gamma z}}
\gamma is derived by minimizing the negative log-likelihood of training data which takes form:
-\sum_{i=1}^{n}\sum_{j=1}{c}(t_i^j log(z_i))
monotonously map decision function value to interval [0,1]
with the objective to exploit the outputs of all SVMs to estimate overall probabilities, the softmax function is presented.
it can be regarded as a generalization of the sigmoid for multi-class case


Bibtex:
@inproceedings{ruping04simple,
author = {Stefan R{\"u}ping},
title = {A Simple Method For Estimating Conditional Probabilities
For SVMs},
booktitle = {LWA 2004: Lernen - Wissensentdeckung - Adaptivit{\"a}t},
year = {2004},
pages = {206-210},
crossref = {DBLP:conf/lwa/2004},
bibsource = {DBLP, http://dblp.uni-trier.de}
}

About Me

I am the normal young boy with the normal daily life but looking for fantastic nights which brighten my life motivation.