Photo of Bexis

Since it was published in 2011, the third edition of the Federal Judicial Center’s Reference Manual for Scientific Evidence has been the go-to guide for federal judges seeking to sort out scientific testimony, and a major source of non-precedential authority for both sides when arguing motions under Fed. R. Evid. 702.  2011, however, was fifteen years ago.  The FJC and its academic collaborators have been promising an update for several years.

It’s finally here, and you can get a free PDF copy of your very own here.

The Reference Manual on Scientific Evidence, Fourth Edition checks in at 1682 pages, so don’t expect a substantive analysis here.  But here is the table of contents:

Liesa L. Richter & Daniel J. Capra, “The Admissibility of Expert Testimony,” 1

Michael Weisberg & Anastasia Thanukos, “How Science Works,” 47

Valena E. Beety, Jane Campbell Moriarty, & Andrea L. Roth , “Reference Guide on Forensic Feature Comparison Evidence,” 113

David H. Kaye, “Reference Guide on Human DNA Identification Evidence,” 207

Thomas D. Albright & Brandon L. Garrett, “Reference Guide on Eyewitness Identification,” 361

David H. Kaye & Hal S. Stern, “Reference Guide on Statistics and Research Methods,” 463

Daniel L. Rubinfeld & David Card, “Reference Guide on Multiple Regression and Advanced Statistical Models,” 577

Shari Seidman Diamond, Matthew Kugler, & James N. Druckman, “Reference Guide on Survey Research,” 681

Mark A. Allen, Carlos Brain, & Filipe Lacerda, “Reference Guide on Estimation of Economic Damages,” 749

M. Elizabeth Marder & Joseph V. Rodricks, “Reference Guide on Exposure Science and Exposure Assessment,” 831

Steve C. Gold, Michael D. Green, Jonathan Chevrier, & Brenda Eskenazi, “Reference Guide on Epidemiology,” 897

David L. Eaton, Bernard D. Goldstein, & Mary Sue Henifin, “Reference Guide on Toxicology,” 1027

John B. Wong, Lawrence O. Gostin, & Oscar A. Cabrera, “Reference Guide on Medical Testimony,” 1105

Henry T. Greely & Nita A. Farahany, “Reference Guide on Neuroscience,” 1185

Kirk Heilbrun, David DeMatteo, & Paul S. Appelbaum, “Reference Guide on Mental Health Evidence,” 1269

Chaouki T. Abdallah, Bert Black, & Edl Schamiloglu, “Reference Guide on Engineering,” 1353

Brian N. Levine, Joanne Pasquarelli, & Clay Shields, “Reference Guide on Computer Science,” 1409

James E. Baker & Laurie N. Hobart, “Reference Guide on Artificial Intelligence,” 1481

Jessica Wentz & Radley Horton, “Reference Guide on Climate Science,” 1561

We compared this table of contents to the one for the Third Edition, and the differences are in bold.  There is considerable turnover in authorship, with only one chapter unchanged from 2011.  That’s not surprising, since authors who were considered grey-beard experts in their fields fifteen years ago have only gotten greyer (and older) since.  What’s of more import is the addition of three entirely new chapters – on computer science, artificial intelligence, and climate science.

Frankly, we’re surprised and disappointed that there wasn’t a fourth chapter on genetics and genomics, which we view as being of far greater general impact than “climate science,” which is much more of a niche area.

We have skimmed the chapter on AI, because Bexis has been involved with the Lawyers for Civil Justice’s recent submission on the proposed Fed. R. Evid. 707 that would create a possible avenue for admission of computer-generated evidence without a supporting expert.  “Computer-generated” is a broad term and encompasses a lot more than AI (one of the problems with the current draft), but as for AI, the Reference Manual’s new chapter only reinforces our previously-stated belief that there is no way AI can be admissible without the proponent offering expert testimony to support it.  The Reference Manual lists multiple questions that are “essential to authenticating and validating the use of AI.”  Reference Manual (4th), at 1514.

  • What is the AI trained to identify, how has it been weighted, and how is it currently weighted?
  • Does the system have a method to transparently identify these answers?  If not, why not?
  • Are the false positive and false negative rates known, if applicable, or hallucination rates?  If so, how do these rates relate to the case at hand?
  • How has AI accuracy been validated, and is the accuracy of the AI updated on a constant basis?
  • What are the AI’s biases? (See “Probing for Bias” questions below.)
  • Is authenticity an issue?
  • How do each of these questions and answers align with how the AI application is being used by the court or proffered as evidence?

Id.  None of these questions is self-evident, nor do we think that any layperson could competently address them.  Nor does the manual:

Judges might also consider that a qualified AI expert or witness ought to be able to credibly answer these questions, or perhaps the expert or witness may not be qualified to address the application at issue.

Id.

As mentioned in connection with the above questions, the Manual also suggests still more questions specifically designed “to probe for bias.”  Id. at 1529.  This set is even more extensive – and more detailed – than the recommended general questions.  Here are only some of the questions recommended by the manual – limited to those that could be applicable to prescription medical product liability litigation:

  • Who designed the algorithm at issue?
  • What process of review was the algorithm subjected to?
  • Were stakeholders – groups likely to be affected by the AI application – consulted in its conception, design, development, operation, and maintenance?
  • What is in the underlying training, validation, and testing data?
  • How has the chosen data been cleaned, altered, or assessed for bias?
  • How have the data points been evaluated for relevancy to the task at hand?
  • Is the data temporally relevant or stale?
  • Are certain groups improperly over-or under-represented?
  • How might definitions of the data points used impact the algorithm analysis?
  • Do the data or weighted factors include real or perceived racial, ethnic, gender, or other sensitive categories of social identity descriptors, or any proxies for those categories?  If so, why?
  • Have engineers and lawyers reviewed the way these criteria are weighted in and by the algorithm as part of the design and on an ongoing basis?  In accord with what process of validation and review?
  • Is the model the state of the art?  How does it compare against any industry standard evaluation metrics or application-specific benchmarks?
  • How might the terms or phrasings in the user-generated prompts bias the systems’ outputs?  Can these prompts be phrased in a more neutral way?
  • Do any of the terms used have alternative meanings?
  • Are the algorithm’s selection criteria known?  Iterative?  Retrievable in a transparent form?  If not, why not?
  • Does the application rely on a neural network?  If so, are the parameters and weights utilized within the neural network known or retrievable?
  • Does the design allow for emerging methodologies that provide for such transparency?  If a transparent methodology is possible, has it been used?  (If not, why not?)
  • If transparency is not possible with state of the art technology, and a less transparent methodology is employed, what is the risk that the system will rely on parameters that are unintended or unknown to the designers or operators?  How high is the risk?  Is the risk demonstrated?  How is the risk mitigated?
  • Is the input query or prompt asking for a judgment, a fact, or a prediction?
  • Is the judgment, fact, or prediction subject to ambiguity in response?
  • Are there situational factors or facts in play that could, or should, alter the algorithm’s predictive accuracy?
  • Is the application one in which nuance and cultural knowledge are essential to determine its accuracy or to properly query it?
  • Are the search terms and equations objective or ambiguous?  Can they be more precise and more objective? If not, why?
  • What is the application’s false positive rate?
  • What is the false negative rate?
  • What information corroborates or disputes the determination reached by the AI application?
  • Is the application designed to allow for real-time assessment?  If not, is operational necessity the reason, or is it simply a matter of design?
  • Is there a process for such assessment that occurs after the fact?
  • Is the AI being used for the purpose for which it was designed and trained?
  • Is the AI being used to inform or corroborate a human decision?  Are humans relying on the AI to decide or to inform and augment human decisions?

Id. at 1529-30.

To the extent – if at all − that proposed Rule 707 contemplates AI being admissible without expert testimony that could answer these questions (to the extent relevant in a given case), we wonder whether the federal judiciary’s right hand (the Rules Committee) has been following what its left hand (the Committee on Science for Judges) has been doing.  We frankly don’t see any plausible avenue (other than consent of the parties) that AI evidence could possibly be admitted without supporting – and probably extensive supporting – expert testimony that is prepared to address the questions posed in this new chapter of the Reference Manual on Scientific Evidence.

And that’s just one aspect of one chapter.  Have fun reading.