Predictive coding (also called “technology assisted review” (“TAR”)) involves the use of computerized artificial intelligence to extrapolate from attorney coding of small (and repeated) sample document sets ultimately to govern huge document productions. This technology has appeared (to us) probably the most promising development in discovery since that subject went electronic … and promptly ran badly off the rails due to exorbitant cost. Nothing else we know of – short of significantly tighter legal limits on discovery − has the promise of reducing ediscovery costs to the extent that predictive coding can. Thus, we’ve blogged about it several times since 2012, when the first cases contemplating its use were decided.
But we haven’t said much recently.
Eighteen months can be forever on the technological frontier, so we decided to take another look at the case law to see what had happened to predictive coding since the first three cases in 2012.
The case law has exploded. Where only a handful of cases existed back then, now we find dozens. Substantively, we’re happy to report that courts don’t seem to have anything bad to say about using computers to undertake relevance review for documents subject to production in litigation.
The two Moore cases cited in our May 2012 post − Moore v. Publicis Groupe, 287 F.R.D. 182 (Mag. S.D.N.Y. 2012), and Moore v. Publicis Groupe SA, 2012 WL 1446534 (S.D.N.Y. April 26, 2012), adopting the magistrate’s report – still rank right up there in terms of the quality of discussion. Last year, the same magistrate (Peck) who presided over Moore stated that, in the interim, “the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.” Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125, 127 (Mag. S.D.N.Y. 2015) (footnote omitted).
[W]hile I generally believe in cooperation, requesting parties can insure that training and review was done appropriately by other means, such as statistical estimation of recall at the conclusion of the review as well as by whether there are gaps in the production, and quality control review of samples from the documents categorized as non-responsive. . . . One point must be stressed − it is inappropriate to hold TAR to a higher standard than keywords or manual review. Doing so discourages parties from using TAR for fear of spending more in motion practice than the savings from using TAR for review.
Id. at 128-29 (footnote omitted). While Magistrate Peck has been an active proselytizer for predictive coding, see Da Silva Moore v. Publicis Groupe, 868 F. Supp. 2d 137, 140-46 (Mag. S.D.N.Y. 2012), adopted, slip op. (S.D.N.Y. Nov. 8, 2012) (unsuccessful recusal attempt based on that advocacy), he is by no means a lone voice in the wilderness any longer.
Rio Tinto relied to a significant degree on Dynamo Holdings Ltd. Partnership v. CIR, 2014 WL 4636526 (T.C. Sept. 17, 2014), in which the tax court approved use of predictive coding to meet a government subpoena for electronically stored information (“ESI”). The court rejected the government’s assertion that predictive coding was “unproven technology.” To the contrary,
Although predictive coding is a relatively new technique, and a technique that has yet to be sanctioned (let alone mentioned) by this Court in a published Opinion, the understanding of e-discovery and electronic media has advanced significantly in the last few years, thus making predictive coding more acceptable in the technology industry than it may have previously been. In fact, we understand that the technology industry now considers predictive coding to be widely accepted for limiting e-discovery to relevant documents and effecting discovery of ESI without an undue burden. Where, as here, petitioners reasonably request to use predictive coding to conserve time and expense, and represent to the Court that they will retain electronic discovery experts to meet with [the government’s] counsel or his experts to conduct a search acceptable to [the government], we see no reason petitioners should not be allowed to use predictive coding to respond to [the government’s] discovery request.
Id. at *5 (citations and footnotes omitted). Rule 1 – mandating that all civil rules “be construed to secure the just, speedy, and inexpensive determination of every case” – supported the use of predictive coding. Id. at *7.
Predictive coding was also invoked as a discovery best practice in Nat’l Day Laborer Organizing Network v. U.S. Immigration & Customs Enforcement Agency, 877 F. Supp.2d 87 (S.D.N.Y. 2012), and a reason for doubting the sufficiency of manually conducted discovery:
[P]arties can (and frequently should) rely on latent semantic indexing, statistical probability models, and machine learning tools to find responsive documents. Through iterative learning, these methods (known as “computer-assisted” or “predictive” coding) allow humans to teach computers what documents are and are not responsive to a particular FOIA or discovery request and they can significantly increase the effectiveness and efficiency of searches.
Id. at 109 (footnote omitted). See also Johnson v. Ford Motor Co., 2015 WL 4137707, at *11 (Mag. S.D.W. Va. July 8, 2015) (ordering parties “to consider other methods of searching such as predictive coding”), objections sustained in part and overruled in part on other grounds, 2015 WL 6758234 (S.D.W. Va. Nov. 5, 2015); Burnett v. Ford Motor Co., 2015 WL 4137847, at *11 (Mag. S.D.W. Va. July 8, 2015) (same); Burd v. Ford Motor Co., 2015 WL 4137915, at *11 (Mag. S.D.W. Va. July 8, 2015) (same); Malone v. Kantner Ingredients, Inc., 2015 WL 1470334, at *3 (D. Neb. March 31, 2015) (“Predictive coding is now promoted (and gaining acceptance) as not only a more efficient and cost effective method of ESI review, but a more accurate one.”) (citations omitted); In re Cellular Telephones, 2014 WL 7793690, at *9 (Mag. D. Kan. Dec. 30, 2014) (“sophisticated new techniques, including metadata filtering, predictive coding, and other forms of technology-assisted review, are immensely advantageous . . . in terms of efficiency”); Federal Housing Finance Agency v. HSBC North America Holdings Inc., 2014 WL 584300, at *3 (S.D.N.Y. Feb. 14, 2014) (“predictive coding had a better track record in the production of responsive documents than human review”); In re Search of Information Associated with Facebook Account Identified by Username Aaron. Alexis that is Stored at Premises Controlled by Facebook, Inc., 21 F. Supp.3d 1, 11 (D.D.C. 2013) (“there has been a sea change in the manner in which computers, which now contain enormous amounts of data, are searched with technology assisted review replacing other forms of searching, including the once thought gold standard of file-by-file and document-by-document review”) (footnote omitted); Harris v. Subcontracting Concepts, LLC, 2013 WL 951336, at *5 (Mag. N.D.N.Y. March 11, 2013) (“With the advent of software, predictive coding, spreadsheets, and similar advances, the time and cost to produce large reams of documents can be dramatically reduced.”)
As we mentioned before, plaintiffs are ill-advised to sit back and refuse to participate in discussions about predictive coding and then try to act as Monday-morning quarterbacks after the defendant has expended money, time, and effort in the process:
[Plaintiffs’] request that [defendant] go back to Square One . . . and institute predictive coding at that earlier stage sits uneasily with the proportionality standard in Rule 26(b)(2)(C). . . . It might well be that predictive coding, instead of a keyword search . . . would unearth additional relevant documents. But it would cost . . . millions of dollars to test [plaintiffs’] theory that predictive coding would produce a significantly greater number of relevant documents. Even in light of the needs of the hundreds of plaintiffs in this case, the very large amount in controversy, the parties’ resources, the importance of the issues at stake, and the importance of this discovery in resolving the issues, I can’t find that the likely benefits of the discovery proposed by [plaintiffs] equals or outweighs its additional burden on, and additional expense to, [defendant].
In re Biomet M2a Magnum Hip Implant Products Liability Litigation, 2013 WL 1729682, at *2-3 (N.D. Ind. Apr. 18, 2013) (citations omitted). See In re Biomet M2a Magnum Hip Implant Products Liability Litigation, 2013 WL 6405156, at *2 (N.D. Ind. Aug. 21, 2013) (rejecting further harassment by plaintiffs – no entitlement to irrelevant documents used in “seed set” to calibrate predictive coding).
Sometimes litigants that started document productions without predictive coding, concluded that what they were doing was inordinately expensive and sought to switch to predictive coding in the midst of their productions. Such attempts to “switch horses in midstream” have led to conflicting results. In Bridgestone America, Inc. v. IBM Corp., 2014 WL 4923014 (Mag. M.D. Tenn. July 22, 2014), the court allowed such a switch, holding that “the uses of predictive coding is a judgment call, hopefully keeping in mind the exhortation of Rule 26 that discovery be tailored by the court to be as efficient and cost-effective as possible.” Id. at *1. The switch was conditioned on “openness and transparency,” such as turning over all of the “seed documents” (those used in the initial testing of predictive coding) to the other side. Id. Conversely, in Progressive Casualty Ins. Co. v. Delaney, 2014 WL 3563467 (Mag. D. Nev. July 18, 2014), the court would not allow one party to alter prior discovery stipulations and adopt predictive coding “without the Defendants’ agreement . . . and without seeking leave of the court.” Id. at *2. Still, the court agreed that predictive coding was “more accurate” than human review:
[T]he traditional ways lawyers have culled the universe of potentially responsive documents for production − manual human review, or keyword searches − are ineffective tools to cull responsive ESI in discovery. Predictive coding has emerged as a far more accurate means of producing responsive ESI in discovery. Studies show it is far more accurate than human review or keyword searches which have their own limitations.
Id. at *8 (citations omitted). “Had the parties worked with their e-discovery consultants and agreed at the onset of this case to a predictive coding-based ESI protocol, the court would not hesitate to approve a transparent, mutually agreed upon ESI protocol.” Id. at *9. However, the court would not countenance a unilateral attempt to impose a predictive coding regime while reneging on prior agreements – particularly when the proposal “fails to comply with all of the best practices” and “lacks transparency and cooperation.” Id. The takeaway here is to consider predictive coding from the get-go, and if things change, try cooperation first.
Finally, for the sake of completeness, here are some miscellaneous judicial decisions indicating judicial approval of the use predictive coding with respect to document (almost always ESI) discovery in particular cases: Ghorbanian v. Guardian Life Insurance Co., 2016 WL 1077251, at *2 (W.D. Wash. March 18, 2016) (citing to local rule encouraging use of TAR); Kissing Camels Surgery Center, LLC v. Centura Health Corp., 2016 WL 277721, at *4 (Mag. D. Colo. Jan. 22, 2016) (discussion indicating use of predictive coding); Knauf Insulation, LLC v. Johns Manville Corp., 2015 WL 7089725, at *3 (Mag. S.D. Ind. Nov. 13, 2015) (same); In re Domestic Drywall Antitrust Litigation, 300 F.R.D. 228, 233 (E.D. Pa. 2014) (recognizing predictive coding as a “more sophisticated methodology[y]” that might be used); Green v. American Modern Home Insurance Co., 2014 WL 6668422 at *1 (W.D. Ark. Nov. 24, 2014) (case management order advising use of TAR); Arnett v. Bank of America, N.A., 2014 WL 4672458, at *9 (D. Or. Sept. 18, 2014) (discussion indicating use of predictive coding); In re Bridgepoint Education, Inc., 2014 WL 3867495, at *2-4 (Mag. S.D. Cal. Aug. 6, 2014) (discussion of discovery disputes indicating that both sides are using predictive coding); FDIC v. Bowden, 2014 WL 2548137, at *13 (Mag. S.D. Ga. June 6, 2014) (ordering parties to “consider” predictive coding; citing Delaney); Aurora Cooperative Elevator Co. v. Aventine Renewable Energy-Aurora W. LLC, slip op. at 1-2 (Mag. D. Neb. March 10, 2014) (requiring parties to use predictive coding); Chen-Oster v. Goldman, Sachs & Co., 2014 WL 716521, at *1 (Mag. S.D.N.Y. Feb. 18, 2014) (refusing to order production of irrelevant predictive coding seed documents); Hinterberger v. Catholic Health Systems, Inc., 2013 WL 2250591, at *34 (Mag. W.D.N.Y. May 21, 2013) (attempt to disqualify predictive coding expert); Hinterberger v. Catholic Health Systems, Inc., 2013 WL 2250603, at *3 (Mag. W.D.N.Y. May 21, 2013) (dispute about predictive coding details); Gordon v. Kaleida Health, 2013 WL 2250506, at *27-28 (Mag. W.D.N.Y. May 21, 2013) (attempt to disqualify predictive coding expert); Gordon v. Kaleida Health, 2013 WL 2250579, at *3 (Mag. W.D.N.Y. May 21, 2013) (dispute about predictive coding details); Edwards v. Nat’l Milk Producers Federation, slip op. (N.D. Cal. Apr. 16, 2013) (stipulation detailing predictive coding protocol); W Holding Co. v. Chartis Insurance Co., 2013 WL 1352562, at *5 (D.P.R. April 3, 2013) (requiring parties to confer about using TAR); In re Actos (Pioglitazone) Products Liability Litigation, 2012 WL 7861249, at *3 (W.D. La. July 27, 2012) (MDL case management order describing predictive coding protocol); EORHB, Inc. v. HOA Holdings LLC, 2013 WL 1960621, at *1 (Del. Ch. May 6, 2013) (predictive coding agreement vacated where number of documents too small for it to be cost effective); Global Aerospace Inc. v. Landow Aviation LP, 2012 WL 1431215 (Va. Cir. April 23, 2012) (order approving use of predictive coding).
We would like to acknowledge assistance from Reed Smith’s ediscovery “Red Team” with some of the preliminary research for this post.