Student Co-author

CGU Graduate

Document Type



Information Systems and Technology (CGU)

Publication Date



Artificial Intelligence and Robotics | Computational Linguistics | Computer Sciences | Databases and Information Systems


Consumer health information written by health care professionals is often inaccessible to the consumers it is written for. Traditional readability formulas examine syntactic features like sentence length and number of syllables, ignoring the target audience's grasp of the words themselves. The use of specialized vocabulary disrupts the understanding of patients with low reading skills, causing a decrease in comprehension. A naive Bayes classifier for three levels of increasing medical terminology specificity (consumer/patient, novice health learner, medical professional) was created with a lexicon generated from a representative medical corpus. Ninety-six percent accuracy in classification was attained. The classifier was then applied to existing consumer health web pages. We found that only 4% of pages were classified at a layperson level, regardless of the Flesch reading ease scores, while the remaining pages were at the level of medical professionals. This indicates that consumer health web pages are not using appropriate language for their target audience.


Best Paper Award

Previously linked to as:,499.

Publisher pdf reproduced with permission.

Rights Information

© 2007 Institute of Electrical and Electronics Engineers (IEEE). This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of The Claremont Colleges's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to

Terms of Use & License Information

Terms of Use for work posted in Scholarship@Claremont.