TY - JOUR
T1 - Large language models as tax attorneys
T2 - a case study in legal capabilities emergence
AU - Nay, John J.
AU - Karamardian, David
AU - Lawsky, Sarah B.
AU - Tao, Wenting
AU - Bhat, Meghana
AU - Jain, Raghav
AU - Lee, Aaron Travis
AU - Choi, Jonathan H.
AU - Kasai, Jungo
N1 - Publisher Copyright:
© 2024 The Authors.
PY - 2024/4/15
Y1 - 2024/4/15
N2 - Better understanding of Large Language Models' (LLMs) legal analysis abilities can contribute to improving the efficiency of legal services, governing artificial intelligence and leveraging LLMs to identify inconsistencies in law. This paper explores LLM capabilities in applying tax law. We choose this area of law because it has a structure that allows us to set up automated validation pipelines across thousands of examples, requires logical reasoning and maths skills, and enables us to test LLM capabilities in a manner relevant to real-world economic lives of citizens and companies. Our experiments demonstrate emerging legal understanding capabilities, with improved performance in each subsequent OpenAI model release. We experiment with retrieving and using the relevant legal authority to assess the impact of providing additional legal context to LLMs. Few-shot prompting, presenting examples of question-answer pairs, is also found to significantly enhance the performance of the most advanced model, GPT-4. The findings indicate that LLMs, particularly when combined with prompting enhancements and the correct legal texts, can perform at high levels of accuracy but not yet at expert tax lawyer levels. As LLMs continue to advance, their ability to reason about law autonomously could have significant implications for the legal profession and AI governance. This article is part of the theme issue 'A complexity science approach to law and governance'.
AB - Better understanding of Large Language Models' (LLMs) legal analysis abilities can contribute to improving the efficiency of legal services, governing artificial intelligence and leveraging LLMs to identify inconsistencies in law. This paper explores LLM capabilities in applying tax law. We choose this area of law because it has a structure that allows us to set up automated validation pipelines across thousands of examples, requires logical reasoning and maths skills, and enables us to test LLM capabilities in a manner relevant to real-world economic lives of citizens and companies. Our experiments demonstrate emerging legal understanding capabilities, with improved performance in each subsequent OpenAI model release. We experiment with retrieving and using the relevant legal authority to assess the impact of providing additional legal context to LLMs. Few-shot prompting, presenting examples of question-answer pairs, is also found to significantly enhance the performance of the most advanced model, GPT-4. The findings indicate that LLMs, particularly when combined with prompting enhancements and the correct legal texts, can perform at high levels of accuracy but not yet at expert tax lawyer levels. As LLMs continue to advance, their ability to reason about law autonomously could have significant implications for the legal profession and AI governance. This article is part of the theme issue 'A complexity science approach to law and governance'.
KW - artificial intelligence
KW - computational law
KW - large language models
KW - law informs code
KW - law-informed AI
KW - machine learning
UR - https://www.scopus.com/pages/publications/85186140789
U2 - 10.1098/rsta.2023.0159
DO - 10.1098/rsta.2023.0159
M3 - Article
C2 - 38403061
AN - SCOPUS:85186140789
SN - 1364-503X
VL - 382
JO - Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
JF - Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
IS - 2270
M1 - 20230159
ER -