CGU Faculty Publications and Research

Effects of Information and Machine Learning Algorithms on Word Sense Disambiguation with Small Datasets

Gondy Leroy, Claremont Graduate UniversityFollow
Thomas C. Rindflesch

Document Type

Article

Department

Information Systems and Technology (CGU)

Publication Date

8-2005

Disciplines

Computer Sciences | Databases and Information Systems | Medicine and Health Sciences

Abstract

Current approaches to word sense disambiguation use (and often combine) various machine learning techniques. Most refer to characteristics of the ambiguity and its surrounding words and are based on thousands of examples. Unfortunately, developing large training sets is burdensome, and in response to this challenge, we investigate the use of symbolic knowledge for small datasets. A naïve Bayes classifier was trained for 15 words with 100 examples for each. Unified Medical Language System (UMLS) semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. The most frequent sense of a word served as the baseline. The effect of increasingly accurate symbolic knowledge was evaluated in nine experimental conditions. Performance was measured by accuracy based on 10-fold cross-validation. The best condition used only the semantic types of the words in the sentence. Accuracy was then on average 10% higher than the baseline; however, it varied from 8% deterioration to 29% improvement. To investigate this large variance, we performed several follow-up evaluations, testing additional algorithms (decision tree and neural network), and gold standards (per expert), but the results did not significantly differ. However, we noted a trend that the best disambiguation was found for words that were the least troublesome to the human evaluators. We conclude that neither algorithm nor individual human behavior cause these large differences, but that the structure of the UMLS Metathesaurus (used to represent senses of ambiguous words) contributes to inaccuracies in the gold standard, leading to varied performance of word sense disambiguation techniques.

Comments

Final published version can be found at: Gondy Leroy, Thomas C. Rindflesch, Effects of information and machine learning algorithms on word sense disambiguation with small datasets, International Journal of Medical Informatics, Volume 74, Issues 7–8, August 2005, Pages 573-585, ISSN 1386-5056, http://dx.doi.org/10.1016/j.ijmedinf.2005.03.013. (http://www.sciencedirect.com/science/article/pii/S1386505605000262)

Rights Information

Terms of Use & License Information

Recommended Citation

Leroy, Gondy and Rindflesch, Thomas C., "Effects of Information and Machine Learning Algorithms on Word Sense Disambiguation with Small Datasets" (2005). CGU Faculty Publications and Research. 92.
https://scholarship.claremont.edu/cgu_fac_pub/92

Download

Find in your library

Included in

Databases and Information Systems Commons, Medicine and Health Sciences Commons

COinS

CGU Faculty Publications and Research

Effects of Information and Machine Learning Algorithms on Word Sense Disambiguation with Small Datasets

Document Type

Department

Publication Date

Disciplines

Abstract

Comments

Rights Information

Terms of Use & License Information

Recommended Citation

Included in

Search

Browse

Author Corner

Useful Links

CGU Faculty Publications and Research

Effects of Information and Machine Learning Algorithms on Word Sense Disambiguation with Small Datasets

Authors

Document Type

Department

Publication Date

Disciplines

Abstract

Comments

Rights Information

Terms of Use & License Information

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Useful Links