CGU Faculty Publications and Research

Using Symbolic Knowledge in the UMLS to Disambiguate Words in Small Datasets with a Naive Bayes Classifier

Gondy Leroy, Claremont Graduate UniversityFollow
Thomas C. Rindflesch, National Library of Medicine

Document Type

Conference Proceeding

Department

Information Systems and Technology (CGU)

Publication Date

2004

Disciplines

Databases and Information Systems | Management Information Systems

Abstract

Current approaches to word sense disambiguation use and combine various machine-learning techniques. Most refer to characteristics of the ambiguous word and surrounding words and are based on hundreds of examples. Unfortunately, developing large training sets is time-consuming. We investigate the use of symbolic knowledge to augment machine-learning techniques for small datasets. UMLS semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. A naïve Bayes classifier was trained for 15 words with 100 examples for each. The most frequent sense of a word served as the baseline. The effect of increasingly accurate symbolic knowledge was evaluated in eight experimental conditions. Performance was measured by accuracy based on 10-fold cross-validation. The best condition used only the semantic types of the words in the sentence. Accuracy was then on average 10% higher than the baseline; however, it varied from 8% deterioration to 29% improvement. In a follow-up evaluation, we noted a trend that the best disambiguation was found for words that were the least troublesome to the human evaluators.

Rights Information

Terms of Use & License Information

DOI

10.3233/978-1-60750-949-3-381

Recommended Citation

G. Leroy and T. C. Rindflesch. "Using Symbolic Knowledge in the UMLS to Disambiguate Words in Small Datasets with a Naive Bayes Classifier," MedInfo, San Francisco, 2004.

Download

Find in your library

Included in

Databases and Information Systems Commons, Management Information Systems Commons

COinS

CGU Faculty Publications and Research

Using Symbolic Knowledge in the UMLS to Disambiguate Words in Small Datasets with a Naive Bayes Classifier

Document Type

Department

Publication Date

Disciplines

Abstract

Rights Information

Terms of Use & License Information

DOI

Recommended Citation

Included in

Search

Browse

Author Corner

Useful Links

CGU Faculty Publications and Research

Using Symbolic Knowledge in the UMLS to Disambiguate Words in Small Datasets with a Naive Bayes Classifier

Authors

Document Type

Department

Publication Date

Disciplines

Abstract

Rights Information

Terms of Use & License Information

DOI

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Useful Links