What is the Value of Data? on Mathematical Methods for Data Quality Estimation

  • Netanel Raviv
  • , Siddharth Jain
  • , Jehoshua Bruck

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a given dataset. We assess a dataset's quality by a quantity we call the expected diameter, which measures the expected disagreement between two randomly chosen hypotheses that explain it, and has recently found applications in active learning. We focus on Boolean hyperplanes, and utilize a collection of Fourier analytic, algebraic, and probabilistic methods to come up with theoretical guarantees and practical solutions for the computation of the expected diameter. We also study the behaviour of the expected diameter on algebraically structured datasets, conduct experiments that validate this notion of quality, and demonstrate the feasibility of our techniques.

Original languageEnglish
Title of host publication2020 IEEE International Symposium on Information Theory, ISIT 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2825-2830
Number of pages6
ISBN (Electronic)9781728164328
DOIs
StatePublished - Jun 2020
Event2020 IEEE International Symposium on Information Theory, ISIT 2020 - Los Angeles, United States
Duration: Jul 21 2020Jul 26 2020

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
Volume2020-June
ISSN (Print)2157-8095

Conference

Conference2020 IEEE International Symposium on Information Theory, ISIT 2020
Country/TerritoryUnited States
CityLos Angeles
Period07/21/2007/26/20

Fingerprint

Dive into the research topics of 'What is the Value of Data? on Mathematical Methods for Data Quality Estimation'. Together they form a unique fingerprint.

Cite this