首页 | 本学科首页   官方微博 | 高级检索  
检索        


Ranking and combining multiple predictors without labeled data
Authors:Fabio Parisi  Francesco Strino  Boaz Nadler  Yuval Kluger
Institution:aDepartment of Pathology, Yale University School of Medicine, New Haven, CT, 06520;;bDepartment of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel; and;cNew York University Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, New York, NY, 10016
Abstract:In a broad range of classification and decision-making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier’s accuracy can be assessed using available labeled data, and raises two questions: Given only the predictions of several classifiers over a large set of unlabeled test data, is it possible to (i) reliably rank them and (ii) construct a metaclassifier more accurate than most classifiers in the ensemble? Here we present a spectral approach to address these questions. First, assuming conditional independence between classifiers, we show that the off-diagonal entries of their covariance matrix correspond to a rank-one matrix. Moreover, the classifiers can be ranked using the leading eigenvector of this covariance matrix, because its entries are proportional to their balanced accuracies. Second, via a linear approximation to the maximum likelihood estimator, we derive the Spectral Meta-Learner (SML), an unsupervised ensemble classifier whose weights are equal to these eigenvector entries. On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point than majority voting for estimating the maximum likelihood solution. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth.Every day, multiple decisions are made based on input and suggestions from several sources, either algorithms or advisers, of unknown reliability. Investment companies handle their portfolios by combining reports from several analysts, each providing recommendations on buying, selling, or holding multiple stocks (1, 2). Central banks combine surveys of several professional forecasters to monitor rates of inflation, real gross domestic product growth, and unemployment (36). Biologists study the genomic binding locations of proteins by combining or ranking the predictions of several peak detection algorithms applied to large-scale genomics data (7). Physician tumor boards convene a number of experts from different disciplines to discuss patients whose diseases pose diagnostic and therapeutic challenges (8). Peer-review panels discuss multiple grant applications and make recommendations to fund or reject them (9). The examples above describe scenarios in which several human advisers or algorithms provide their predictions or answers to a list of queries or questions. A key challenge is to improve decision making by combining these multiple predictions of unknown reliability. Automating this process of combining multiple predictors is an active field of research in decision science (cci.mit.edu/research), medicine (10), business (refs. 11 and 12 and www.kaggle.com/competitions), and government (www.iarpa.gov/Programs/ia/ACE/ace.html and www.goodjudgmentproject.com), as well as in statistics and machine learning.Such scenarios, whereby advisers of unknown reliability provide potentially conflicting opinions, or propose to take opposite actions, raise several interesting questions. How should the decision maker proceed to identify who, among the advisers, is the most reliable? Moreover, is it possible for the decision maker to cleverly combine the collection of answers from all of the advisers and provide even more accurate answers?In statistical terms, the first question corresponds to the problem of estimating prediction performances of preconstructed classifiers (e.g., the advisers) in the absence of class labels. Namely, each classifier was constructed independently on a potentially different training dataset (e.g., each adviser trained on his/her own using possibly different sources of information), yet they are all being applied to the same new test data (e.g., list of queries) for which labels are not available, either because they are expensive to obtain or because they will only be available in the future, after the decision has been made. In addition, the accuracy of each classifier on its own training data is unknown. This scenario is markedly different from the standard supervised setting in machine learning and statistics. There, classifiers are typically trained on the same labeled data and can be ranked, for example, by comparing their empirical accuracy on a common labeled validation set. In this paper we show that under standard assumptions of independence between classifier errors their unknown performances can still be ranked even in the absence of labeled data.The second question raised above corresponds to the problem of combining predictions of preconstructed classifiers to form a metaclassifier with improved prediction performance. This problem arises in many fields, including combination of forecasts in decision science and crowdsourcing in machine learning, which have each derived different approaches to address it. If we had external knowledge or historical data to assess the reliability of the available classifiers we could use well-established solutions relying on panels of experts or forecast combinations (1114). In our problem such knowledge is not always available and thus these solutions are in general not applicable. The oldest solution that does not require additional information is majority voting, whereby the predicted class label is determined by a rule of majority, with all advisers assigned the same weight. More recently, iterative likelihood maximization procedures, pioneered by Dawid and Skene (15), have been proposed, in particular in crowdsourcing applications (1623). Owing to the nonconvexity of the likelihood function, these techniques often converge only to a local, rather than global, maximum and require careful initialization. Furthermore, there are typically no guarantees on the quality of the resulting solution.In this paper we address these questions via a spectral analysis that yields four major insights:
  1. Under standard assumptions of independence between classifier errors, in the limit of an infinite test set, the off-diagonal entries of the population covariance matrix of the classifiers correspond to a rank-one matrix.
  2. The entries of the leading eigenvector of this rank-one matrix are proportional to the balanced accuracies of the classifiers. Thus, a spectral decomposition of this rank-one matrix provides a computationally efficient approach to rank the performances of an ensemble of classifiers.
  3. A linear approximation of the maximum likelihood estimator yields an ensemble learner whose weights are proportional to the entries of this eigenvector. This represents an efficient, easily constructed, unsupervised ensemble learner, which we term Spectral Meta-Learner (SML).
  4. An interest group of conspiring classifiers (a cartel) that maliciously attempts to veer the overall ensemble solution away from the (unknown) ground truth leads to a rank-two covariance matrix. Furthermore, in contrast to majority voting, SML is robust to the presence of a small-enough cartel whose members are unknown.
In addition, we demonstrate the advantages of spectral approaches based on these insights, using both simulated and real-world datasets. When the independence assumptions hold approximately, SML is typically better than most classifiers in the ensemble and their majority vote, achieving results comparable to the maximum likelihood estimator (MLE). Empirically, we find SML to be a better starting point for computing the MLE that consistently leads to improved performance. Finally, spectral approaches are also robust to cartels and therefore helpful in analyzing surveys where a biased subgroup of advisers (a cartel) may have corrupted the data.
Keywords:spectral analysis  classifier balanced accuracy  unsupervised learning  cartels  crowdsourcing
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号