TY - JOUR
T1 - Cost effective active search
AU - Jiang, Shali
AU - Moseley, Benjamin
AU - Garnett, Roman
N1 - Publisher Copyright:
© 2019 Neural information processing systems foundation. All rights reserved.
PY - 2019
Y1 - 2019
N2 - We study a paradigm of active learning we call cost effective active search, where the goal is to find a given number of positive points from a large unlabeled pool with minimum labeling cost. Most existing methods solve this problem heuristically, and few theoretical results have been established. Here we adopt a principled Bayesian approach for the first time. We first derive the Bayesian optimal policy and establish a strong hardness result: the optimal policy is hard to approximate, with the best-possible approximation ratio bounded below by ?(n0.16). We then propose an efficient and nonmyopic policy, simulating future search progress using the negative Poisson binomial distribution. We propose simple and fast approximations for its expectation, which serves as an essential role in our proposed policy. We conduct comprehensive experiments on drug and materials discovery datasets and demonstrate that our proposed method is superior to a popular (greedy) baseline.
AB - We study a paradigm of active learning we call cost effective active search, where the goal is to find a given number of positive points from a large unlabeled pool with minimum labeling cost. Most existing methods solve this problem heuristically, and few theoretical results have been established. Here we adopt a principled Bayesian approach for the first time. We first derive the Bayesian optimal policy and establish a strong hardness result: the optimal policy is hard to approximate, with the best-possible approximation ratio bounded below by ?(n0.16). We then propose an efficient and nonmyopic policy, simulating future search progress using the negative Poisson binomial distribution. We propose simple and fast approximations for its expectation, which serves as an essential role in our proposed policy. We conduct comprehensive experiments on drug and materials discovery datasets and demonstrate that our proposed method is superior to a popular (greedy) baseline.
UR - http://www.scopus.com/inward/record.url?scp=85090170091&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85090170091
SN - 1049-5258
VL - 32
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019
Y2 - 8 December 2019 through 14 December 2019
ER -