TY - JOUR
T1 - The role of domain knowledge in automating medical text report classification
AU - Wilcox, Adam B.
AU - Hripcsak, George
N1 - Funding Information:
This work was supported by National Library of Medicine Grants R01 LM06910 “Discovering and Applying Knowledge in Clinical Databases,” R01 LM06274 “Unlocking Data from Medical Records with Text Processing,” and Pfizer, Inc. grant “Using Information Systems to Advance Clinical Research and Clinical Care.”
PY - 2003
Y1 - 2003
N2 - Objective: To analyze the effect of expert knowledge on the inductive learning process in creating classifiers for medical text reports. Design: The authors converted medical text reports to a structured form through natural language processing. They then inductively created classifiers for medical text reports using varying degrees and types of expert knowledge and different inductive learning algorithms. The authors measured performance of the different classifiers as well as the costs to induce classifiers and acquire expert knowledge. Measurements: The measurements used were classifier performance, training-set size efficiency, and classifier creation cost. Results: Expert knowledge was shown to be the most significant factor affecting inductive learning performance, outweighing differences in learning algorithms. The use of expert knowledge can affect comparisons between learning algorithms. This expert knowledge may be obtained and represented separately as knowledge about the clinical task or about the data representation used. The benefit of the expert knowledge is more than that of inductive learning itself, with less cost to obtain. Conclusion: For medical text report classification, expert knowledge acquisition is more significant to performance and more cost-effective to obtain than knowledge discovery. Building classifiers should therefore focus more on acquiring knowledge from experts than trying to learn this knowledge inductively.
AB - Objective: To analyze the effect of expert knowledge on the inductive learning process in creating classifiers for medical text reports. Design: The authors converted medical text reports to a structured form through natural language processing. They then inductively created classifiers for medical text reports using varying degrees and types of expert knowledge and different inductive learning algorithms. The authors measured performance of the different classifiers as well as the costs to induce classifiers and acquire expert knowledge. Measurements: The measurements used were classifier performance, training-set size efficiency, and classifier creation cost. Results: Expert knowledge was shown to be the most significant factor affecting inductive learning performance, outweighing differences in learning algorithms. The use of expert knowledge can affect comparisons between learning algorithms. This expert knowledge may be obtained and represented separately as knowledge about the clinical task or about the data representation used. The benefit of the expert knowledge is more than that of inductive learning itself, with less cost to obtain. Conclusion: For medical text report classification, expert knowledge acquisition is more significant to performance and more cost-effective to obtain than knowledge discovery. Building classifiers should therefore focus more on acquiring knowledge from experts than trying to learn this knowledge inductively.
UR - http://www.scopus.com/inward/record.url?scp=0037634427&partnerID=8YFLogxK
U2 - 10.1197/jamia.M1157
DO - 10.1197/jamia.M1157
M3 - Article
C2 - 12668687
AN - SCOPUS:0037634427
SN - 1067-5027
VL - 10
SP - 330
EP - 338
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 4
ER -