TY - JOUR
T1 - A scalable entropy estimator
AU - Golia, Priyanka
AU - Juba, Brendan
AU - Meel, Kuldeep S.
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
PY - 2025
Y1 - 2025
N2 - We revisit the well-studied problem of estimating the Shannon entropy of a probability distribution, now given access to a probability-revealing conditional sampling oracle. In this model, the oracle takes as input the representation of a set S, and returns a sample from the distribution obtained by conditioning on S, together with the probability of that sample in the distribution. Our work is motivated by applications of such algorithms in Quantitative Information Flow analysis (QIF) in programming-language-based security. Here, information-theoretic quantities capture the effort required on the part of an adversary to obtain access to confidential information. These applications demand accurate measurements when the entropy is small. Existing algorithms that do not use conditional samples require a number of queries that scale inversely with the entropy, which is unacceptable in this regime. On the other hand, prior work in the conditional sampling model only obtained a high-order polynomial query complexity, O(m7ε8log1δ) queries, to obtain additive ε-approximations on a domain of size O(2m); note furthermore that additive approximations are also unacceptable for such applications. No prior work could obtain polynomial-query multiplicative approximations to the entropy in the low-entropy regime. We obtain multiplicative (1+ε)-approximations using only O(mε2log1δ) queries to the probability-revealing conditional sampling oracle. Indeed, moreover, we obtain small, explicit constants, and demonstrate that our algorithm obtains a substantial improvement in practice over the previous state-of-the-art methods used for entropy estimation in QIF.
AB - We revisit the well-studied problem of estimating the Shannon entropy of a probability distribution, now given access to a probability-revealing conditional sampling oracle. In this model, the oracle takes as input the representation of a set S, and returns a sample from the distribution obtained by conditioning on S, together with the probability of that sample in the distribution. Our work is motivated by applications of such algorithms in Quantitative Information Flow analysis (QIF) in programming-language-based security. Here, information-theoretic quantities capture the effort required on the part of an adversary to obtain access to confidential information. These applications demand accurate measurements when the entropy is small. Existing algorithms that do not use conditional samples require a number of queries that scale inversely with the entropy, which is unacceptable in this regime. On the other hand, prior work in the conditional sampling model only obtained a high-order polynomial query complexity, O(m7ε8log1δ) queries, to obtain additive ε-approximations on a domain of size O(2m); note furthermore that additive approximations are also unacceptable for such applications. No prior work could obtain polynomial-query multiplicative approximations to the entropy in the low-entropy regime. We obtain multiplicative (1+ε)-approximations using only O(mε2log1δ) queries to the probability-revealing conditional sampling oracle. Indeed, moreover, we obtain small, explicit constants, and demonstrate that our algorithm obtains a substantial improvement in practice over the previous state-of-the-art methods used for entropy estimation in QIF.
KW - Entropy estimation
KW - Information leakage
KW - Quantified information flow
KW - Shannon entropy
UR - http://www.scopus.com/inward/record.url?scp=85217394042&partnerID=8YFLogxK
U2 - 10.1007/s10703-024-00467-w
DO - 10.1007/s10703-024-00467-w
M3 - Article
AN - SCOPUS:85217394042
SN - 0925-9856
JO - Formal Methods in System Design
JF - Formal Methods in System Design
ER -