"Using Symbolic Knowledge in the UMLS to Disambiguate Words in Small Da" by Gondy Leroy and Thomas C. Rindflesch
 

Document Type

Conference Proceeding

Department

Information Systems and Technology (CGU)

Publication Date

2004

Disciplines

Databases and Information Systems | Management Information Systems

Abstract

Current approaches to word sense disambiguation use and combine various machine-learning techniques. Most refer to characteristics of the ambiguous word and surrounding words and are based on hundreds of examples. Unfortunately, developing large training sets is time-consuming. We investigate the use of symbolic knowledge to augment machine-learning techniques for small datasets. UMLS semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. A naïve Bayes classifier was trained for 15 words with 100 examples for each. The most frequent sense of a word served as the baseline. The effect of increasingly accurate symbolic knowledge was evaluated in eight experimental conditions. Performance was measured by accuracy based on 10-fold cross-validation. The best condition used only the semantic types of the words in the sentence. Accuracy was then on average 10% higher than the baseline; however, it varied from 8% deterioration to 29% improvement. In a follow-up evaluation, we noted a trend that the best disambiguation was found for words that were the least troublesome to the human evaluators.

Rights Information

© 2004 Gondy Leroy and Thomas C. Rindflesch

Terms of Use & License Information

Terms of Use for work posted in Scholarship@Claremont.

Share

COinS