On June 22nd, 2023, two lawyers and their firm were fined for citing fake court cases in legal proceedings. The lawyers didn’t invent fake case law on purpose – instead, they were hallucinated by ChatGPT.
How did this happen? Well, the lawyer who wrote the filing that cited the fake case law said he was under the impression that ChatGPT was a “super search engine” that would only source real information (he also argued that he was not great with new technology). The other lawyer who was sanctioned admitted that while he was working on the case, he didn’t actually review his colleague’s work, instead just assuming that it was fact checked and correct.
There’s a lot happening in this story, and while it’s a particularly egregious case of LLM misuse, it’s a great example of some missteps that happen all the time when adopting AI. If we could go back in time and play puppeteer, here are some things we’d change about this law firm’s approach.
When It Really Matters, Use AI With Guardrails
ChatGPT is kind of a “super search engine,” but it’s still prone to making things up. While it’s great to harness the power of an LLM, accuracy is paramount, especially when dealing with something as sensitive as a lawsuit.
An AI agent trained with retrieval augmented generation (RAG) is a great solution to this problem. RAG is a process for improving the output of an LLM-based tool by training it to reference a verified, authoritative knowledge base. In this situation, a law firm could train their agent (or whatever bot they’re using) to only source answers from a database of real case law (like Westlaw).
While there would still be a training process, RAG is a great start to ensuring that an LLM is producing accurate results.
Make Time to Double Check
ChatGPT failed in this case, but the humans failed too, and arguably in a bigger way. The attorney writing the filing should have checked case law, but his partner should have checked too. This is pretty obvious in hindsight, and it’s easy to make fun of their mistake. But double checking is sometimes easier said than done.
It’s important to assess the situations in which LLMs are used regularly. Are they used for basic, time-saving tasks? Or are workers turning to them because they don’t have adequate time to perform their own specialized tasks? Do teams have adequate time to double check, or are they working in a pressure cooker where they don’t have time to fully review? While we don’t know what working in this law firm is like, it’s easy to see how culture issues could create problems in other environments where LLMs are being used.
Implementing AI in a way that is useful, helpful, and functional requires careful attention to technical details, as well as the culture of the organization adopting it. Thinking through these challenges in advance is vital to avoiding fiascos like this one.