| Author | Title | Year | Journal/Proceedings | Reftype | DOI/URL |
|---|---|---|---|---|---|
| Eckstein, J. & Goldberg, N. | An Improved Branch-and-bound Method for Maximum Monomial Agreement | 2008 | OPT 2008 Optimization for Machine Learning, NIPS 2008 Workshop. | conference | URL |
| Abstract: The $NP$-hard maximum monomial agreement (MMA) problem consists of finding a single logical conjunction that best fits a weighted dataset of ``positive'' and ``negative'' binary vectors. Computing classifiers using boosting methods involves a maximum agreement subproblem at each iteration, although such subproblems are typically solved by heuristic methods. Here, we describe an exact branch and bound method for maximum agreement over Boolean monomials, improving on the earlier work of Goldberg and Shan. In particular, we develop a tighter upper bounding function and an improved branching procedure that exploits knowledge of the bound and the dataset, while having a lower branching factor. Experimental results show that the new method is able to solve larger problem instances and runs faster within a linear programming boosting procedure applied to medium-sized datasets from the UCI repository. |
|||||
BibTeX:
@conference{EG08,
author = {Jonathan Eckstein and Noam Goldberg},
title = {An Improved Branch-and-bound Method for Maximum Monomial Agreement},
booktitle = {OPT 2008 Optimization for Machine Learning, NIPS 2008 Workshop.},
year = {2008},
url = {http://opt2008.kyb.tuebingen.mpg.de/eckstein.pdf}
}
|
|||||
| Goldberg, N., Kolda, T. G. & Yoshimura, A. S. | Concurrent Optimization with DUET: DIRECT Using External Trial Points | 2008 | techreport | ||
| Abstract: We consider the general problem of minimizing a real function subject to bound constraints. We are motivated by applications in which the derivatives of the objective function are unavailable or expensive to compute. This can occur, for example, when the function is not given analytically and is evaluated by running a simulation. The DIRECT algorithm by Jones, Pertunnen, and Stuckman is a global derivative-free op- timization algorithm. It is a deterministic algorithm that samples the space following a scheme that is justiable theoretically in case of Lipschitz continuous functions. In practice, DIRECT has proven to be ecient, with respect to the number of function evaluations, in nding a global minimum for a wide variety of functions. We extend the DIRECT algorithm to make use of additional free trial points that are made avail- able when running DIRECT simultaneously with other optimization algorithms. We compare several deterministic and randomized variants of the DIRECT algorithm that use the external trial points (DUET). We present experimental results with external trial points that are sampled at random to show an improvement of DUET over the performance of the DIRECT algorithm. |
|||||
BibTeX:
@techreport{GKY08,
author = {Noam Goldberg and Tamara G. Kolda and Ann S. Yoshimura},
title = {Concurrent Optimization with DUET: DIRECT Using External Trial Points},
year = {2008},
number = {SAND2008-5844}
}
|
|||||
| Goldberg, N. & Shan, C. | Boosting Optimal Logical Patterns | 2007 | Proceedings of the Seventh SIAM International Conference on Data Mining Edited by Chid Apte, Bing Liu, Srinivasan Parthasarathy, and David Skillicorn |
inproceedings | URL |
| Abstract: We consider the supervised learning of a binary clas- sifier from noisy observations. We use smooth boost- ing to linearly combine abstaining hypotheses, each of which maps a subcube of the attribute space to one of the two classes. We introduce a new branch-and-bound weak learner to maximize the agreement rate of each hypothesis. Dobkin et al. give an algorithm for maxi- mizing agreement with real-valued attributes. Our algorithm improves on the time complexity of Dobkin et al.’s as long as the data can be binarized so that the number of binary attributes is o(log of the number of observations × number of real-valued attributes). Fur- thermore, we have fine-tuned our branch-and-bound al- gorithm with a queuing discipline and optimality gap to make it fast in practice. Finally, since logical patterns in Hammer et al.’s Logical Analysis of Data (LAD) frame- work are equivalent to abstaining monomial hy- potheses, any boosting algorithm can be combined with our proposed weak learner to construct LAD models. On various data sets, our method outperforms state-of- the-art methods that use suboptimal or heuristic weak learners, such as SLIPPER. It is competitive with other optimizing classifiers that combine monomials, such as LAD. Compared to LAD, our method eliminates many free parameters that restrict the hypothesis space and require extensive fine-tuning by cross-validation. |
|||||
BibTeX:
@inproceedings{GS07,
author = {Noam Goldberg and Chung--chieh Shan},
title = {Boosting Optimal Logical Patterns},
booktitle = {Proceedings of the Seventh SIAM International Conference on Data Mining |
|||||
| Goldberg, N., Word, J., Boros, E. & Kantor, P. | Optimal Sequential Inspection Policies | 2008 | techreport | URL | |
| Abstract: We consider the problem of combining a given set of diagnostic tests into an inspection system that can classify items of interest (cases) with maximum accuracy subject to budget constraints. One motivating application is sequencing diagnostic tests for container inspection problems, where the diagnostic tests may correspond to radiation sensors, documents checks, or imaging systems. We consider mixtures of decision trees as inspection systems following the work of Boros, Fedzorah, Kantor, Saeger and Stroud. We establish some properties of efficient inspection systems. Given an available set of elementary or compound tests (all are called "devices"), we characterize the optimal classification of cases, based on the the various "readings" or test scores given by the device. More generally, we consider the problem of optimally assigning a set of cases to a set of possible actions, based on the device ``readings'', while satisfying a budget constraint. The measure of performance is the fraction of all cases, in a specific class of interest, which are classified correctly. We find that this problem is a special case of a known variant of the knapsack problem, known as the Linear Multiple Choice Knapsack problem (LMCK). Exploiting the special features that we establish, we propose an algorithm that solves our special variant more efficiently than the existing LMCK algorithms. Finally, we propose a dynamic programming algorithm that enumerates all of the efficient, undominated, inspection policies in a two dimensional cost-detection space. Our inspection policies may sequence any arbitrary number of tests and are not restricted in the branching factor (or number of channels). Our approach directly solves the bi-criterion optimization problem of maximizing detection and minimizing cost, and thus supports sensitivity analysis over different budget or detection requirements, and the optimization of any monotone (possibly nonlinear) utility function over the efficient set. | |||||
BibTeX:
@techreport{GWBK08,
author = {Noam Goldberg and Jonathan Word and Endre Boros and Paul Kantor},
title = {Optimal Sequential Inspection Policies},
year = {2008},
number = {RRR-14},
note = {Also avaiable as DIMACS technical report},
url = {http://rutcor.rutgers.edu/pub/rrr/reports2008/14_2008.pdf}
}
|
|||||
Created by JabRef on 25/11/2008.