.:: Home Page ::. [Private]

I did my PhD in the Machine Learning & Applications (A³) research team of the Computer Science Laboratory of the Paris 13 university (LIPN). My PhD advisor was Pr. Younès Bennani.

My thesis subject was on Kernel methods for structured data, in particular text data. I am currently investigating Kernels, Support Vector Machines and Semi-supervised methods. I'm also interested in Sequence Mining.

Thesis Topic: Machine Learning with Semantic Kernels for Textual Data

PhD Dissertation Abstract

   Since the early eighties, statistical methods and, more specifically, the machine learning for textual data processing have known a considerable growth of interest. This is mainly due to the fact that the number of documents to process is growing exponentially. Thus, expert-based methods have become too costly, losing the research focus to the profit of machine learning-based methods.
   In my thesis, I focus on two main issues. The first one is the processing of semi-structured textual data with kernel-based methods. I present, in this context, a semantic kernel for documents structured by sections under the XML format. This kernel captures the semantic information with the use of an external source of knowledge e.g., a thesaurus. The kernel was evaluated on a medical document corpus with the UMLS thesaurus. It was ranked in the top ten of the best methods, according to the F1-score, among 44 algorithms at the 2007 CMC Medical NLP International Challenge.
   The second issue is the study of the use of latent concepts extracted by statistical methods such as the Latent Semantic Analysis (LSA). I present, in a first part, kernels based on linguistic concepts from external sources and on latent concepts of the LSA. I show that a kernel integrating both kinds of concepts improves the text categorization performances. Then, in a second part, I present a kernel that uses local LSAs to extract latent concepts. Local latent concepts are used to have a more finer representation of the documents.

Keywords: Machine Learning, Kernels, Support Vector Machine, Text categorization, Semantic Similarity Measures.
Adivsor: Pr. Younès Bennani
Date of defense: December 12, 2007 (Passed with the highest distinction)

Doctoral Committee:
Download my PhD thesis (in French)

Research Topics

  • Kernels and Support Vector Machines
  • Machine Learning, Supervised and Semi-Supervised algorithms
  • Text Classification and text mining

I also worked in the following fields:
  • Sequence and itemset mining
  • Model-Based diagnostics