Accuracy and reliability of forensic handwriting comparisons期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Accuracy and reliability of forensic handwriting comparisons

Authors:	R. Austin Hicklin Linda Eisenhart Nicole Richetelli Meredith D. Miller Peter Belcastro Ted M. Burkes Connie L. Parks Michael A. Smith JoAnn Buscaglia Eugene M. Peters Rebecca Schwartz Perlman Jocelyn V. Abonamah Brian A. Eckenrode

Affiliation:	^aNoblis, Inc., Reston, VA 20191;^bFederal Bureau of Investigation, Laboratory Division, Questioned Document Unit, Quantico, VA 22135;^cMeredith DeKalb Miller, Green Cove Springs, FL 32043;^dFederal Bureau of Investigation, Laboratory Division, Research and Support Unit, Quantico, VA 22135;^eIdeal Innovations, Inc., Arlington, VA 22203

Abstract:	Forensic handwriting examination involves the comparison of writing samples by forensic document examiners (FDEs) to determine whether or not they were written by the same person. Here we report the results of a large-scale study conducted to assess the accuracy and reliability of handwriting comparison conclusions. Eighty-six practicing FDEs each conducted up to 100 handwriting comparisons, resulting in 7,196 conclusions on 180 distinct comparison sets, using a five-level conclusion scale. Erroneous “written by” conclusions (false positives) were reached in 3.1% of the nonmated comparisons, while 1.1% of the mated comparisons yielded erroneous “not written by” conclusions (false negatives). False positive rates were markedly higher for nonmated samples written by twins (8.7%) compared to nontwins (2.5%). Notable associations between training and performance were observed: FDEs with less than 2 y of formal training generally had higher error rates, but they also had higher true positive and true negative rates because they tended to provide more definitive conclusions; FDEs with at least 2 y of formal training were less likely to make definitive conclusions, but those definitive conclusions they made were more likely to be correct (higher positive predictive and negative predictive values). We did not observe any association between writing style (cursive vs. printing) and rates of errors or incorrect conclusions. This report also provides details on the repeatability and reproducibility of conclusions, and reports how conclusions are affected by the quantity of writing and the similarity of content. Forensic science is under scrutiny, particularly for pattern-based disciplines in which source conclusions are reported. The National Research Council report Strengthening Forensic Science in the United States: A Path Forward (1) stated that “The scientific basis for handwriting comparisons needs to be strengthened” and noted that “there has been only limited research to quantify the reliability and replicability of the practices used by trained document examiners.” The President’s Council of Advisors on Science and Technology (PCAST) report Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods (2) expressed concerns regarding the validity and reliability of conclusions made by forensic examiners, and called for empirical testing: “The only way to establish the scientific validity and degree of reliability of a subjective forensic feature-comparison method—that is, one involving significant human judgment—is to test it empirically by seeing how often examiners actually get the right answer. Such an empirical test of a subjective forensic feature-comparison method is referred to as a ‘black-box test.’” The National Commission on Forensic Science also called for such testing (3). Although the accuracy and reliability of conclusions made by forensic document examiners (FDEs) have been the focus of multiple studies over the years (4 –10), the designs of those studies are notably different from this study (and from PCAST’s recommendations), and therefore the resulting rates are not directly comparable (in particular, when comparing open-set to closed-set studies, comparing studies based on one-to-one vs. one-to-many examinations, and comparing studies that use notably different conclusion scales; see SI Appendix, Appendix B for a summary).This study was conducted to provide data that can be used to assess the scientific validity of handwriting comparisons, for use by policy makers, laboratory managers, the legal community, and FDEs. This study follows the approach used in the previous FBI Laboratory–Noblis latent print black box study (11) and later recommended by the PCAST report. The design utilizes open-set, one-to-one document comparisons to evaluate the conclusions reached by practicing FDEs when comparing writing samples selected to be broadly comparable to casework. The primary purposes of the study are to measure the accuracy of conclusions by FDEs when comparing handwriting samples and to assess reliability by measuring the reproducibility (interexaminer variability) and repeatability (intraexaminer variability) of those conclusions. Secondary purposes include reporting any associations between the accuracy of the decisions in this study, factors related to the participants (such as training or experience), and factors related to the samples (such as quantity of writing, comparability of content, limitations, or style of writing).

Keywords:	forensics handwriting decision analysis documents error rates

设为首页 | 免责声明 | 关于勤云 | 加入收藏