Interrogating LLM design under copyright law

  • Johnny Tian-Zheng Wei
  • , Maggie Wang
  • , Ameya Godbole
  • , Jonathan Choi
  • , Robin Jia

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    The current discourse on large language models (LLMs) and copyright largely takes a "behavioral"perspective, focusing on model outputs and evaluating whether they are substantially similar to training data. However, substantial similarity is difficult to define algorithmically and a narrow focus on model outputs is insufficient to address all copyright risks. In this interdisciplinary work, we take a complementary "structural"perspective and shift our focus to how LLMs are trained. We operationalize a notion of "fair learning"by measuring whether any training decision substantially affected the model's memorization. As a case study, we deconstruct Pythia, an open-source LLM, and demonstrate the use of causal and correlational analyses to make factual determinations about Pythia's training decisions. By proposing a legal standard for fair learning and connecting memorization analyses to this standard, we identify how judges may advance the goals of copyright law through adjudication. Finally, we discuss how a fair learning standard might evolve to enhance its clarity by becoming more rule-like and incorporating external technical guidelines.

    Original languageEnglish
    Title of host publicationACMF AccT 2025 - Proceedings of the 2025 ACM Conference on Fairness, Accountability,and Transparency
    PublisherAssociation for Computing Machinery, Inc
    Pages3030-3045
    Number of pages16
    ISBN (Electronic)9798400714825
    DOIs
    StatePublished - Jun 23 2025
    Event8th Annual ACM Conference on Fairness, Accountability, and Transparency, FAccT 2025 - Athens, Greece
    Duration: Jun 23 2025Jun 26 2025

    Publication series

    NameACMF AccT 2025 - Proceedings of the 2025 ACM Conference on Fairness, Accountability,and Transparency

    Conference

    Conference8th Annual ACM Conference on Fairness, Accountability, and Transparency, FAccT 2025
    Country/TerritoryGreece
    CityAthens
    Period06/23/2506/26/25

    Keywords

    • Copyright
    • LLMs
    • regression

    Fingerprint

    Dive into the research topics of 'Interrogating LLM design under copyright law'. Together they form a unique fingerprint.

    Cite this