An In-Depth Investigation of Data Collection in LLM App Ecosystems

  • Yuhao Wu
  • , Evin Jaff
  • , Ke Yang
  • , Ning Zhang
  • , Umar Iqbal

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

LLM app (tool) ecosystems are rapidly evolving to support sophisticated use cases that often require extensive user data collection. Given that LLM apps are developed by third parties and anecdotal evidence indicating inconsistent enforcement of policies by LLM platforms, sharing user data with these apps presents significant privacy risks. In this paper, we aim to bring transparency in data practices of LLM app ecosystems. We examine OpenAI's GPT app ecosystem as a case study. We propose an LLM-based framework to analyze the natural language specifications of GPT Actions (custom tools) and assess their data collection practices. Our analysis reveals that Actions collect excessive data across 24 categories and 145 data types, with third-party Actions collecting 6.03% more data on average. We find that several Actions violate OpenAI's policies by collecting sensitive information, such as passwords, which is explicitly prohibited by OpenAI. Lastly, we develop an LLM-based privacy policy analysis framework to automatically check the consistency of data collection by Actions with disclosures in their privacy policies. Our measurements indicate that the disclosures for most of the collected data types are omitted, with only 5.8% of Actions clearly disclosing their data collection practices.

Original languageEnglish
Title of host publicationIMC 2025 - Proceedings of the 2025 ACM Internet Measurement Conference
PublisherAssociation for Computing Machinery
Pages150-170
Number of pages21
ISBN (Electronic)9798400718601
DOIs
StatePublished - Oct 15 2025
Event25th ACM Internet Measurement Conference, IMC 2025 - Madison, United States
Duration: Oct 31 2025Oct 31 2025

Publication series

NameProceedings of the ACM SIGCOMM Internet Measurement Conference, IMC
VolumePart of 213823
ISSN (Print)2150-3761

Conference

Conference25th ACM Internet Measurement Conference, IMC 2025
Country/TerritoryUnited States
CityMadison
Period10/31/2510/31/25

Keywords

  • large language models
  • llm platforms
  • llm tools
  • privacy
  • security
  • third-party applications

Fingerprint

Dive into the research topics of 'An In-Depth Investigation of Data Collection in LLM App Ecosystems'. Together they form a unique fingerprint.

Cite this