Active Learning from Relative ComparisonsIEEE Trans. Knowl. Data Eng.

About

Authors
Sicheng Xiong, Yuanli Pei, Romer Rosales, Xiaoli Fern
Year
2015
DOI
10.1109/TKDE.2015.2462365
Subject
Computational Theory and Mathematics / Information Systems / Computer Science Applications

Similar

Epidemiological observations of old age

Authors:
Per From Hansen
1983

A comparison of two algorithms for robot learning from demonstration

Authors:
Halit Bener Suay, Sonia Chernova
2011

Putting Women First: Women and Health in a Rural Community

Authors:
Rahul Goswami (from the Foreword)
2011

Hidden Foreign Body as a Cause of Recurrent Hemoptysis in a Teenage Girl

Authors:
C.W. Pattison, A.J. Learning, E.R. Townsend
1988

Relative effects of posture and activity on human height estimation from surveillance footage

Authors:
Nerrolyn Ramstrand, Simon Ramstrand, Per Brolund, Kristin Norell, Peter Bergström
2011

Text

1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2015.2462365, IEEE Transactions on Knowledge and Data Engineering 1

Active Learning from Relative Comparisons

Sicheng Xiong, Yuanli Pei, Ro´mer Rosales, and Xiaoli Z. Fern

Abstract—This work focuses on active learning from relative comparison information. A relative comparison specifies, for a data triplet (xi, xj , xk), that instance xi is more similar to xj than to xk. Such constraints, when available, have been shown to be useful toward learning tasks such as defining appropriate distance metrics or finding good clustering solutions. In real-world applications, acquiring constraints often involves considerable human effort, as it requires the user to manually inspect the instances. This motivates us to study how to select and query the most useful relative comparisons to achieve effective learning with minimum user effort. Given an underlying class concept that is employed by the user to provide such constraints, we present an information-theoretic criterion that selects the triplet whose answer leads to the highest expected information gain about the classes of a set of examples. Directly applying the proposed criterion requires examining O(n3) triplets with n instances, which is prohibitive even for datasets of moderate size. We show that a randomized selection strategy can be used to reduce the selection pool from O(n3) to O(n) with minimal loss in efficiency, allowing us to scale up to considerably larger problems. Experiments show that the proposed method consistently outperforms baseline policies.

Index Terms—Active Learning, Relative Comparisons. ✦ 1 INTRODUCTION

R ECENT work has shown that domain knowledge in the formof relative comparisons, which reveal relative similarities among instances, can be useful for many learning tasks such as distance metric learning and data clustering [1], [6], [12], [14], [15], [17], [19]. Specifically, a relative comparison regarding instance triplets (xi,xj ,xk) specifies whether instance xi is more similar to xj than xk . Such relative comparisons can provide constraints or hints that are approximately consistent with the underlying distance and can help learning an appropriate distance metric. They can also help pointing out the instance-to-cluster assignment relations and improve clustering results.

While in some scenarios one can acquire relative comparisons automatically (e.g., using user click-through data in information retrieval tasks [19]), for many real applications, acquiring such comparisons requires manual inspection of the instances, which is both label intensive and time consuming. For example, in one of our applications, we want to learn a distance metric that best categorizes different vocalizations of birds into their species.

Providing a relative comparison constraint in this case will require a human to examine the bird vocalizations either by visually inspecting their spectrograms or listening to the audio segments, which costs a lot of human effort. This motivates us to study the active learning problem to optimize the selection of relative comparisons in order to achieve effective metric learning with minimum user effort. To the best of our knowledge, existing studies have only considered randomly or sub-optimally selected triplets, and there has been no attempt to optimize the triplet selection for metric learning tasks. • This work was done while Sicheng Xiong was a student at the School of

Electrical Engineering and Computer Science, Oregon State University.

E-mail: sichengxiong@gmail.com. • Yuanli Pei and Xiaoli Z. Fern are with School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331,

USA. E-mail: {peiy, xfern}@eecs.oregonstate.edu. • Ro´mer Rosales is with LinkedIn, Mountain View, CA 94043, USA.

E-mail:rrosales@linkedin.com.

Manuscript received April 16, 2015;

In this paper, we study the problem of active learning from relative comparisons. We consider the scenario where the relative comparisons are obtained by iteratively querying an oracle, and the answers are returned in a way that instances from the same class are considered more similar than those from different classes. We study this scenario because in practice the class concept is often employed by a user to answer such queries. Our goal is to learn an appropriate distance metric by iteratively posing such queries.

Here we focus on learning a distance metric simply for the reason that relative comparisons have been mostly applied to distance metric learning tasks.

In this work, we propose an information theoretic objective that selects a query whose answer will, in expectation, lead to the largest information gain about the classes of the instances.

While the proposed objective can be used with any metric learning algorithm that considers relative comparisons, a main obstacle is the need to evaluate O(n3) triplets to select the optimal query, where n is the number of instances. To reduce this complexity for selecting a triplet, we introduce a simple yet effective sampling procedure that is guaranteed to identify quasi-optimal triplets in

O(n) with clear performance guarantees. Our experimental results show that the triplets selected by the proposed active learning method allow for: 1) learning better metrics than those selected by two different baseline policies, and 2) increasing the classification accuracy of nearest neighbor classifiers. 2 RELATED WORK

Learning with relative comparisons. Learning with relative comparisons has been studied in the contexts of distance metric learning, data clustering and data dimension reduction [12], [14], [15], [17], [19], [23]. Schultz et al. [19] formulated a constrained optimization problem where the constraints are defined by relative comparisons and the objective is to learn a distance metric that remains as close to an unweighted Euclidean metric as possible. Rosales et al. [17] proposed to learn a projection matrix from relative comparisons. This approach also employed relative 1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.