|
||||||
|
||||||
|
||||||
I did my PhD in the Machine Learning & Applications (A³) research team of the Computer Science Laboratory of the Paris 13 university (LIPN). My PhD advisor was Pr. Younès Bennani.
My thesis subject was on Kernel methods for structured data, in particular text data. I am currently investigating Kernels, Support Vector Machines and Semi-supervised methods. I'm also interested in Sequence Mining.
Thesis Topic: Machine Learning with Semantic Kernels for Textual DataPhD Dissertation Abstract
Since the early eighties, statistical methods and, more specifically, the machine learning for textual data processing have known a considerable growth of interest. This is mainly due to the fact that the number of documents to process is growing exponentially. Thus, expert-based methods have become too costly, losing the research focus to the profit of machine learning-based methods.
Download my PhD thesis (in French)
In my thesis, I focus on two main issues. The first one is the processing of semi-structured textual data with kernel-based methods. I present, in this context, a semantic kernel for documents structured by sections under the XML format. This kernel captures the semantic information with the use of an external source of knowledge e.g., a thesaurus. The kernel was evaluated on a medical document corpus with the UMLS thesaurus. It was ranked in the top ten of the best methods, according to the F1-score, among 44 algorithms at the 2007 CMC Medical NLP International Challenge. The second issue is the study of the use of latent concepts extracted by statistical methods such as the Latent Semantic Analysis (LSA). I present, in a first part, kernels based on linguistic concepts from external sources and on latent concepts of the LSA. I show that a kernel integrating both kinds of concepts improves the text categorization performances. Then, in a second part, I present a kernel that uses local LSAs to extract latent concepts. Local latent concepts are used to have a more finer representation of the documents. Keywords: Machine Learning, Kernels, Support Vector Machine, Text categorization, Semantic Similarity Measures. Adivsor: Pr. Younès Bennani Date of defense: December 12, 2007 (Passed with the highest distinction) Doctoral Committee:
Research Topics
I also worked in the following fields:
|