Photo of Bexis

This “just desserts” story caught our eyes earlier this year – a hot-shot expert witness, on artificial intelligence, no less, got caught with his own hand in the AI cookie jar.  As a result, his credibility was destroyed, and his testimony was excluded.  The litigation leading to Kohls v. Ellison, 2025 WL 66514 (D. Minn. Jan. 10, 2025), concerned a Minnesota anti-deepfake statute.  The plaintiffs were political operatives claiming a First Amendment right to create deep fakes of candidates they opposed.  Id. at *1.  The defendant hired a California-based professor to testify “about artificial intelligence (“AI”), deepfakes, and the dangers of deepfakes to free speech and democracy.”  Id.

The AI expert, however, used AI himself in preparing his material and “included fabricated material in his declaration.”  Id.  Specifically, the would-be expert “admitted that his declaration inadvertently included citations to two non-existent academic articles, and incorrectly cited the authors of a third article.”  Id.  AI had provided “fake citations to academic articles, which [the expert] failed to verify before including them in his declaration.”  Id.  The state sought to submit a belated amendment removing the fictitious citations, but the court was having none of it.  Id.

The AI expert’s AI-based report was excluded in its entirety. 

[T]he Court cannot accept false statements − innocent or not − in an expert’s declaration submitted under penalty of perjury.  Accordingly, given that the [expert] Declaration’s errors undermine its competence and credibility, the Court will exclude consideration of [that] expert testimony

Id. at *5.  “The irony. . . . a credentialed expert on the dangers of AI and misinformation, has fallen victim to the siren call of relying too heavily on AI − in a case that revolves around the dangers of AI, no less.”  Id. at *3.  Moreover, the expert had committed a cardinal, but very common, sin of a litigation-engaged expert.  He had not lived up to his usual professional standards in reaching his paid opinions:

It is particularly troubling to the Court that [the expert] typically validates citations with a reference software when he writes academic articles but did not do so when submitting the . . . Declaration. . . .  One would expect that greater attention would be paid to a document submitted under penalty of perjury than academic articles.

Id.  The expert “abdicate[d his] independent judgment and critical thinking skills in favor of ready-made, AI-generated answers.”  Id. at *4.

Even though counsel that hired this AI-dependent expert professed no knowledge of the AI-generated falsehoods in the expert’s declaration, that did not excuse them.  Rule 11 “imposes a ‘personal, nondelegable responsibility’ to ‘validate the truth and legal reasonableness of the papers filed’ in an action.”  Id. (citation and quotation marks omitted).  In this context, attorneys have an obligation “to ask their witnesses whether they have used AI in drafting their declarations and what they have done to verify any AI-generated content.”  Id.

The court ultimately excluded the AI-generated report because the false citations had destroyed the expert’s credibility and required “steep” sanctions:

[The expert’s] citation to fake, AI-generated sources in his declaration . . . shatters his credibility with this Court.  At a minimum, expert testimony is supposed to be reliable.  Fed. R. Evid. 702.  More fundamentally, signing a declaration under penalty of perjury is not a mere formality. . . .  The Court should be able to trust the indicia of truthfulness that declarations made under penalty of perjury carry, but that trust was broken here.

Moreover, citing to fake sources imposes many harms, including wasting the opposing party’s time and money, the Court’s time and resources, and reputational harms to the legal system. . . .  Courts therefore do not, and should not, make allowances for a party who cites to fake, nonexistent, misleading authorities − particularly in a document submitted under penalty of perjury.  The consequences of citing fake, AI-generated sources for attorneys and litigants are steep.  Those consequences should be no different for an expert offering testimony to assist the Court under penalty of perjury.

Id. at *4-5 (citations and quotation marks omitted).

We think that the court reached the right result in Kohls, but for the wrong reason.  The problem with AI-generated expert testimony is not limited to AI hallucinations.  Instead, it’s deeper and goes to the concept of “expertise” itself.  Who is the expert?  Is it the person who signs the report and is proffered as an expert, or is it whatever AI program the expert used?  It is well established that one expert cannot simply “parrot” the opinions of another.  We’ve written several posts that make this point.  Why should it be any different when the expert blindly parrots something that a black-box AI program spits out, rather than some other expert? 

With that question in mind, we decided to look for other decisions that have addressed experts who used AI to create their submissions.  We think that asking the right questions led to the right answer in In re Celsius Network LLC, 655 B.R. 301 (Bankr. S.D.N.Y. 2023).  Celsius Network was a bankruptcy case applying Rule 702.  The report in question, however, was not written by the expert who signed it.  Rather, it was written by AI, and for that reason it was excluded.

The [expert] Report was not written by [the expert]. Although [he] directed and guided its creation, the 172-page Report, which was generated within 72 hours, was written by artificial intelligence at the instruction of [the expert].  By his own testimony, a comprehensive human-authored report would have taken over 1,000 hours to complete.  In fact, it took [the expert] longer to read [the] report than to generate it.  The Court therefore separately evaluates the [expert] Report. . . .  [T]he Court finds that the . . . Report is unreliable and fails to meet the standard for admission.

Id. at 308.  This AI-generated expert report could not be reliable.

  • “In preparing the report, [the expert] did not review the underlying source material for any sources cited, nor does he know what his team did (or did not do) to review and summarize those materials.”
  • “There were no standards controlling the operation of the artificial intelligence that generated the Report.”
  • “The Report contained numerous errors, ranging from duplicated paragraphs to mistakes in its description of [relevant parameters].”
  • “The [expert] Report was not the product of reliable or peer-reviewed principles and methods.”

Id. at 308.  Thus, Celsius Network determined “that the Report does not meet the standard set forth under Rule 702.”  Id. at 309.

In an earlier case, an expert offered analysis of data that he had fed into a set of algorithms and “click[ed] ‘Go.’”  In re Marriott International, Inc., Customer Data Security Breach Litigation, 602 F. Supp.3d 767, 787 (D. Md. 2022).  That was not enough to be admissible under Rule 702.

Algorithms are not omniscient, omnipotent, or infallible.  They are nothing more than a systematic method of performing some particular process from a beginning to an end.  If improperly programmed, if the analytical steps incorporated within them are erroneous or incomplete, or if they are not tested to confirm their output is the product of a system or process capable of producing accurate results (a condition precedent to their admissibility), then the results they generate cannot be shown to be relevant, reliable, helpful to the fact finder, or to fit the circumstances of the particular case in which they are used. . . .  [The expert’s] willingness to rely on his own untested conclusion that his model could reliably be applied to the facts of this case is insufficient to meet the requirements of Rule 702.

Id. (footnote omitted).

A similar state-law case, Matter of Weber, 220 N.Y.S.3d 620 (N.Y. Sur. 2024), is from a state trial court.  It’s not entirely clear that the damages opinions excluded in Weber were even those of a qualified expert, the decision treated them as such and found them “inherently unreliable.”  Id. at 633.  They had been generated by an AI program (CoPilot).  The Weber expert simply parroted whatever the AI program generated:

Despite his reliance on artificial intelligence, [the expert] could not recall what input or prompt he used to assist him. . . .  He also could not state what sources [AI] relied upon and could not explain any details about how [the AI] works or how it arrives at a given output.  There was no testimony on whether these [AI] calculations considered any fund fees or tax implications.

Id.  The would-be expert nonetheless claimed that AI use was “generally accepted” in the relevant field.  Id. at 634 (New York state courts follow Frye).

The court had “no objective understanding as to how [the AI program] works,” and thus tried it out itself.  The program gave three different answers to what should have been a simple mathematical calculation – and none of those matched the supposed expert’s number.  Id. at 633 “[T]he fact there are variations at all calls into question the reliability and accuracy of [AI] to generate evidence to be relied upon in a court proceeding.”  Id.  Interestingly, when asked “are your calculations reliable enough for use in court,” the program responded that, standing alone, it was probably not ready for legal prime time.

[The AI] responded with “[w]hen it comes to legal matters, any calculations or data need to meet strict standards. I can provide accurate info, but it should always be verified by experts and accompanied by professional evaluations before being used in court. . . .  ”  It would seem that even [the program] itself self-checks and relies on human oversight and analysis.  It is clear from these responses that the developers of the [AI] program recognize the need for its supervision by a trained human operator to verify the accuracy of the submitted information as well as the output.

Id. at 634.  To prevent “garbage in, garbage out . . . a user of . . . artificial intelligence software must be trained or have knowledge of the appropriate inputs to ensure the most accurate results.”  Id. at 634 n.25.

Weber thus rejected the testimony, citing “due process issues” that “arise when decisions are made by a software program, rather than by, or at the direction of a [human].”  Id. at 634.

[T]he record is devoid of any evidence as to the reliability of [the AI program] in general, let alone as it relates to how it was applied here.  Without more, the Court cannot blindly accept as accurate, calculations which are performed by artificial intelligence.

Id.  Weber made several “finding” with respect to AI:

  • AI is “any technology that uses machine learning, natural language processing, or any other computational mechanism to simulate human intelligence, including . . . evidence creation or analysis, and legal research.”
  • “‘Generative A.I.’ [i]s artificial intelligence that is capable of generating new content (such as images or text) in response to a submitted prompt (such as a query).”
  • “[P]rior to evidence being introduced which has been generated by an artificial intelligence product or system, counsel has an affirmative duty to disclose the use of artificial intelligence.”
  • AI generated evidence “should properly be subject to a Frye hearing prior to its admission.”

Id. at 635.

Concord Music Group, Inc. v. Anthropic PBC, 2025 WL 1482734 (Mag. N.D. Cal. May 23, 2025), is another instance of an expert exposed by an AI hallucination – “a citation to an article that did not exist and whose purported authors had never worked together.”  Id. at *3.  The court considered the infraction “serious,” but not as “grave as it first appeared.”  Id.

[Proponent’s] counsel protests that this was “an honest citation mistake” but admits that Claude.ai was used to “properly format” at least three citations and, in doing so, generated a fictitious article name with inaccurate authors (who have never worked together) for the citation at issue.  That is a plain and simple AI hallucination.

Id. (citation omitted).  However, “the underlying article exists, was properly linked to and was located by a human being using Google search.”  Id.  For that reason, Concord did not view the situation as one where “attorneys and experts have abdicated their independent judgment and critical thinking skills in favor of ready-made, AI-generated answers.”  Id. (indirectly quoting Kohls).  Still the existence of the hallucination was fishy enough that the relevant paragraph from the expert report was stricken:

It is not clear how such an error − including a complete change in article title − could have escaped correction during manual cite-check by a human being. . . .  [The court’s] Civil Standing Order requires a certification “that lead trial counsel has personally verified the content’s accuracy.” Neither the certification nor verification has occurred here.

Id.  Further, as in Kohls “this issue undermines the overall credibility of [the expert’s] written declaration, a factor in the Court’s conclusion.”  Id.  Cf. Shoraka v. Bank of Am., N.A., 2023 WL 8709700, at *3 (C.D. Cal. Dec. 1, 2023) (excluding non-AI expert report that “consist[ed] almost entirely of paragraphs . . . simply copied and pasted from online sources”).

On the other hand, we have Ferlito v. Harbor Freight Tools USA, Inc., 2025 WL 1181699 (E.D.N.Y. April 23, 2025).  The plaintiff’s expert, lacking formal credentials, claimed considerable practical experience.  Among other reasons, the defendant sought to exclude his report because “after completing the report, he entered a query into ChatGPT about the best way to secure a hammer head to a handle, which produced a response consistent with his expert opinion.”  Id. at *1.  Ferlito denied exclusion because the expert had only used AI “after he had written his report to confirm his findings” – findings initially “based on his decades of experience.”  Id. at *4.  The expert “professed to being ‘quite amazed’ that the ‘ChatGPT search confirmed what [he] had already opined’” and claimed, “that he did not rely on ChatGPT.”  Id.  Taking that testimony at face value, Ferlito allowed the expert opinions:

There is no indication that [the expert] used ChatGPT to generate a report with false authority or that his use of AI would render his testimony less reliable.  Accordingly, the Court finds no issue with [the expert’s] use of ChatGPT in this instance.

Id.  Ferlito is not necessarily inconsistent with the previous decisions because of the expert’s denial that he had used AI to generate the actual report.

Considering this precedent, while it does seem that some courts are being distracted, in addressing Rule 702 issues by AI’s propensity for hallucinations, most of them do understand the more basic issue with expert use of AI – that the opinions are no longer those of the experts themselves.  Rather, when experts use AI to generate their reports, they have reduced themselves to “parrot” status, blindly reciting whatever the AI program generates.  As such, AI generated expert reports should not be admissible without some means of validating the workings of the AI algorithms themselves, which we understand is not possible in most (if not all) large-language AI models.