Identification of Threshold, Regular and Submodular Monotone Systems:
Theory and Algorithms
Endre Boros (PI)
RUTCOR, Rutgers University
640 Bartholomew Road
Piscataway, NJ 08854-8003
Phone: (732) 445-3235
Fax: (732) 445-5472
Email: boros@rutcor.rutgers.edu
URL: http:\\rutcor.rutgers.edu\~boros
|
Vladimir Gurvich (Co-PI)
RUTCOR, Rutgers University
640 Bartholomew Road
Piscataway, NJ 08854-8003
Phone: (732) 445-3664
Fax: (732) 445-5472
Email: gurvich@rutcor.rutgers.edu
|
Leonid Khachiyan (Co-PI)
Computer Science Department
Rutgers University
110 Frelinghuysen Road
Piscataway, NJ 08854-8004
Phone: (732) 445-2003
Fax: (732) 445-0537
Email: leonid@cs.rutgers.edu
|
|
|
Khaled Elbassioni (Post-Doctoral Fellow)
RUTCOR, Rutgers University
640 Bartholomew Road
Piscataway, NJ 08854-8003
Phone: (732) 445-3664
Fax: (732) 445-5472
Email: elbassio@paul.rutgers.edu
|
Project Award Information
- Award Number: IIS-0118635
- Award Duration: December 1, 2001 - November 30, 2004
Keywords
Incremental generation,
representation and learning of monotone systems,
data mining, frequent and infrequent sets,
hypergraph transversals, Boolean dualization.
Project Summary
This project is aimed at the development of algorithms,
theory, and applications for the identification, including the
enumeration of all minimal representatives, of implicitly given
monotone systems. Monotone systems are a frequent target of
knowledge discovery as they represent important information
hidden in large data bases and complex networks. Problems
involving the determination of monotone systems arise naturally in
a multitude of areas, including data mining (finding all maximal
frequent, minimal infrequent sets), text mining (finding the best
linear query in vector space models), machine learning (finding
the best rules or patterns), hypergraph dualization (generating
all minimal transversals), reliability theory (generating all
minimal working and/or maximal failing states), integer
programming (generating all minimal feasible solutions),
stochastic programming (constructing deterministic equivalents to
certain stochastic models), etc. Both the determination, and
efficient and complete representation of such systems are the
subject of this project.
We study the problem of identifying implicitly given monotone systems, and building on several recent results we aim at developing a
coherent theory of such identification problems. A major part of our research effort focuses on
identifying tractable classes, developing efficient algorithms for those, and implementing and testing the resulting new methods. In particular we
study monotone systems definable via systems of threshold, regular
or submodular functions, since these classes seem to be the most
important building blocks of tractable classes.
Publications and Products
- E. Boros, K. Elbassioni, V. Gurvich, L. Khachiyan and K. Makino.
Dual-Bounded Generating Problems: Efficient and Inefficient points for Discrete Probability Distributions and Sparse Boxes for Multidimensional Data.
Submitted (October 6, 2003), Theoretical Computer Science, special issue, ed., G. Woeginger
- E. Boros, K. Elbassioni, V. Gurvich and L. Khachiyan.
On the Complexity of Some Enumeration Problems for Matroids.
Submitted (May 22, 2003), SIAM Journal on Discrete Mathematics.
- E. Boros, K. Elbassioni, V. Gurvich and L. Khachiyan.
Algorithms for Enumerating Cycles in Matroids.
Accepted, 14th Annual International Symposium on Algorithms and Computation (ISAAC 2003)
(Kyoto, Japan, December 2003.)
- E. Boros, K. Elbassioni, V. Gurvich and L. Khachiyan.
An Efficient Implementation of a Quasi-Polynomial Algorithm for Generating Hypergraph Transversals.
In: Algorithms -- ESA 2003 (11th Annual European Symposium on Algorithms), (G. D. Battista and U. Zwick, eds., Budapest, Hungary, September 15-19, 2003.),
Lecture Notes in Computer Science 2832 (2003) pp. 556-567.
(Springer Verlag, Berlin, Heidelberg, New York)
- E. Boros, K. Elbassioni, V. Gurvich, L. Khachiyan and K. Makino.
An Intersection Inequality for Discrete Distributions and Related Generation Problems.
In: Automata, Languages and Programming, 30th International Colloquium, {ICALP} 2003.
(J.C.M. Baeten and J.K. Lenstra and J. Parrow and G.J. Woeginger, eds., Eindhoven, The Netherlands, June 30 – July 4, 2003.)
Lecture Notes in Computer Science 2719 (2003) pp. 543-555.
(Springer Verlag, Berlin, Heidelberg, New York)
- E. Boros, K. Elbassioni, V. Gurvich and L. Khachiyan.
Extending the Balas-Yu Inequality on the Number of Maximal Independent Sets of Graphs to Hypergraphs and Lattice Products with Applications.
Mathematical Programming (B), 98(1-3) (2003) pp. 355-368.
- E. Boros, V. Gurvich, L. Khachiyan and K. Makino.
On maximal frequent and minimal infrequent sets in binary matrices.
Annals of Mathematics and Artificial Intelligence 39(3) (2003) pp. 211-221.
- E. Boros, T. Horiyama, T. Ibaraki, K. Makino and M. Yagiura.
Finding Essential Attributes from Binary Data.
Annals of Mathematics and Artificial Intelligence 39(3) (2003) pp. 223-257.
- E. Boros, K. Elbassioni, V. Gurvich and L. Khachiyan.
An inequality for polymatroid functions and its applications.
Discrete Applied Mathematics, 131(2) (2003) pp. 255-281.
- E. Boros, V. Gurvich, L. Khachiyan and K. Makino.
Dual-Bounded Generating Problems: Weighted Transversals of a Hypergraph.
Discrete Applied Mathematics, accepted, 2003.
- E. Boros, K. Elbassioni, V. Gurvich and L. Khachiyan.
Generating Dual-Bounded Hypergraphs.
Optimization Methods and Software, 17(5) (2002) pp. 749 - 781.
- E. Boros, K. Elbassioni, V. Gurvich, L. Khachiyan and K. Makino.
Dual-Bounded Generating Problems: All Minimal Integer Solutions for a Monotone System of Linear Inequalities.
SIAM Journal on Computing, 31(1-3) (2002) pp. 1624-1643.
- E. Boros, K. Elbassioni, V. Gurvich and L. Khachiyan.
Matroid Intersections, Polymatroid Inequalities, and Related Problems.
In: The 27th International Symposium on Mathematical Foundations of Computer Science, MFCS 2002.
(K. Diks and W. Rytter, eds., Warsaw, Poland, August 26-30, 2002),
Lecture Notes in Computer Science 2420 (2002) pp. 143-154,
(Springer Verlag, Berlin, Heidelberg, New York).
- E. Boros, K. Elbassioni, V. Gurvich, L. Khachiyan and K. Makino.
Dual-Bounded Hypergraphs: A Survey.
In: Workshop on Discrete Mathematics and Data Mining, 2nd SIAM International Conference on Data Mining,
Arlington, VA, April 11-13, 2002,
pp. 87-98.
- E. Boros, V. Gurvich, L. Khachiyan and K. Makino.
On the complexity of generating maximal frequent and minimal infrequent sets in binary matrices.
In: Proceedings of the 19th International Symposium on Theoretical Aspects of Computer Science (STACS 2002).
(H. Alt and A. Ferreira, eds.), Lecture Notes in Computer Science 2285 (2002) pp. 133-141.
- K. Elbassioni. An algorithm for dualization in products of lattices.
In: Proceedings of the 10th Annual European Symposium on Algorithms (ESA 2002).
(Rolf H. Möhring and Rajeev Raman, eds.),
Lecture Notes in Computer Science, 2461, (2002) pp. 424-435.
- T. Imielinski, L. Khachiyan, A. Abdulghani.
Cubegrades: Generalizing Association Rules,
Data Mining and Knowledge Discovery 6 (2002) pp. 219-256.
- E. Boros, K. Elbassioni, V. Gurvich, L. Khachiyan and K. Makino.
Generating all minimal integer solutions to a monotone system of linear inequalities.
In: Automata, Languanges and Programming, 28th International Colloquium (ICALP 2001).
(F. Orejas, P.G. Spirakis and Jan van Leeuwen, eds.),
Lecture Notes in Computer Science 2076 (2001) pp. 92-103.
- E. Boros, V. Gurvich, L. Khachiyan and K. Makino.
Dual-Bounded Generating Problems: Partial and Multiple Transversals of a Hypergraph.
SIAM Journal on Computing 30(6) (2001) pp. 2036-2050.
- E. Boros, V. Gurvich, L. Khachiyan and K. Makino.
Weighted Transversals of a Hypergraph.
In: Proceedings of 2nd Japanese-Hungarian Symposium on Discrete Mathematics and Its Applications
(2001) pp. 13-22.
- E. Boros, K. Elbassioni, V. Gurvich and L. Khachiyan.
An Incremental RNC Algorithm for Generating All Maximal Independent Sets in Hypergraphs of Bounded Dimension.
Parallel Processing Letters 10(4) (2000) pp. 253-266.
- E. Boros, V. Gurvich, L. Khachiyan and K. Makino.
Generating Partial and Multiple Transversals of a Hypergraph.
In: Automata, Languanges and Programming, 27th International Colloquium (ICALP 2000).
(Montanari, J.D.P. Rolim and E. Welzl, eds.)
Lecture Notes in Computer Science 1853 (2000) pp. 588-599.
- Implementation of a quasi-polynomial algorithm for generating hypergraph transversals, completed;
results available on the project's web-site, and are to be presented at ESA'2003.
- Experimental implementation of joint-generation of monotone systems represented by a membership oracle;
currently under experimental evaluation by the postdoctoral associate supported by the project.
Project Related Activities, Tutorials and Invited Lectures
- E. Boros: How to generate many items.
Presented at the REU Seminar, DIMACS, Rutgers University,
July 30, 2003.
- Organization of a tutorial and three invited sessions at the
EURO/INFORMS Joint International Conference,
Istanbul, Turkey,
July 6-10, 2003. Talks related to the project:
- E. Boros: Patterns and theories in LAD (tutorial).
- V. Gurvich: On the complexity of generation.
- K. Elbassioni: An Intersection Inequality for Discrete Distributions
and Related Generation Problems.
Presented at The 30-th International Colloquium on Automata, Languages and Programming (ICALP 2003),
Eindhoven, The Netherlands, June 30 - July 4, 2003.
- E. Boros: Listing all minimal solutions of a monotone system.
Presented at the Discrete Algorithm Colloquium, Japanese OR Society, Kyoto University, Kyoto, Japan,
May 30, 2003.
- L. Khachiyan: Dual Bounded Monotone Properties.
Presented at the Theoretical Computer Science and Discrete Mathematics Seminar, IAS,
Princeton, May 5 2003.
- E. Boros: Frequent sets in binary matrices.
Presented at the Alfréd Rényi Mathematical Institute of the Hungarian Academy of Sciences,
Budapest, Hungary, February 27, 2003.
- E. Boros: Operations Research in Data Analysis.
Presented at the Day of Applied Mathematics, Eötvös Loránd University, Budapest, Hungary,
February 26, 2003.
- L. Khachiyan: Generating Dual Bounded Hypergraphs.
Presented at The First International Conference on Optimization Methods and Software,
Hangzhou, China, December 15-18, 2002.
- E. Boros: Justifiable learning.
Presented at INFORMS Annual Meeting,
San Jose, California, November 17-20, 2002.
- K. Elbassioni: An algorithm for dualization in products of lattices.
Presented at 10th European Symposium on Algorithms,
University of Rome "La Sapienza", Italy, September 17-21, 2002.
- L. Khachiyan: Matroid Intersections, Polymatroid Inequalities, and Related Problems.
Presented at The 27th International Symposium on Mathematical Foundations of Computer Science,
Warszawa - Otwock, Poland, August 26-30, 2002.
- K. Elbassioni: An inequality for polymatroid functions.
Presented at
Integer Programming Conference in honor of Egon Balas,
June 3-5, 2002, Carnegie Mellon University, Pittsburgh, Pennsylvania.
- Organization of a one day workshop on Discrete Mathematics and Data Mining
at the
2nd SIAM International Conference on Data Mining,
Arlington, Virginia, April 11-13, 2002. Talks related to the project:
- E. Boros: Strongest patterns in LAD (tutorial).
- K. Elbassioni: Dual-bounded hypergraphs: a survey.
- E. Boros: On the complexity of generating maximal frequent and minimal infrequent sets.
19th International Symposium on Theoretical Aspects of Computer Science,
Antibes Juan-les-Pins, France
March 15, 2002.
- K. Elbassioni: On dualization in products of forests.
19th International Symposium on Theoretical Aspects of Computer Science,
Antibes Juan-les-Pins, France
March 15, 2002.
- Organization of a tutorial and two invited sessions at the
7th International Symposium on Artificial Intelligence and Mathematics,
Fort Lauderdale, Florida, January 2-4, 2002. Talks related to the project:
- E. Boros: Logical analysis of data (tutorial).
- E. Boros: How to handle categorical variables in machine learning.
- K. Elbassioni: Learning monotone binary functions in products of lattices.
Project Impact
The most visible effects of our project are the increasing number of presentations at various conferences related to this subject.
The general approach, reducing the generation of a number of interesting monotone systems to hypergraph dualization,
brought back this exciting family of equivalent problems to the forefront of current research.
This can be seen, on the one hand, from the number of the project's submissions accepted for
presentation at various refereed computer science conferences (e.g. at ICALP, ESA, ISAAC in 2003, STACS, SIAM ICDM, ESA, MFCS in 2002, etc.)
A more direct impact can be observed, on the other hand,
from recent research inspired by our results, including the discovery of
new efficient cases of monotone dualization (Eiter, Gottlob and Makino, 2002),
an improved quasi-polynomial dualization algorithm (Tamaki, 2000), and an elegant construction,
demonstrating that simpler,
but perhaps better known dualization algorithms cannot work efficiently,
even if sorting the input is optimally exploited (Takata, 2002).
Two graduate students at Rutgers University was involved with this line of research in 2001-2002.
Khaled Elbassioni is supported by the project as a postdoctoral fellow in 2003.
Several undergraduate students were involved with related research problems within the REU program at DIMACS, Rutgers University under the guidance of the PI-s of this project
in each of the summers of 2001, 2002 and 2003.
We have organized tutorials, workshops and several sessions at the EURO/INFORMS Joint International Meeting in 2003, and at the
7th International Symposium on Artificial Intelligence and Mathematics and the
2nd SIAM International Conference on Data Mining in 2002.
Goals, Objectives and Targeted Activities
Our long term goals include:
(a) Characterization of monotone systems in terms of algebraic separators
between their minimal member and maximal non-member elements,
and the identification of tractable cases, when it is possible to generate efficiently all
minimal members (or all maximal non-members) using such a separator as an orcale for the system.
(b) Devise new and efficient algorithms for the (joint)
generation of monotone systems in the tractable cases,
including the family of minimal feasible solutions to systems of threshold,
regular and submodular inequalities defined over the set of binary vectors,
over an integer lattice, or over the product of certain partial orders.
(c) Create a toolkit of efficient serial and parallel
implementations for dualization, and for the generation of
monotone systems. Test the toolkit and the new algorithms on
standard test beds of machine learning and data mining.
To achieve these goals, we implemented a basic quasi-polynomial (in worst case)
dualization algorithm for monotone subfamilies of an integer lattice,
with the aim of focusing on applications dealing with non-binary attributes/properties,
so that in the future we can avoid a separate ``binarization'' phase.
We also continued our research of submodular (threshold, regular, etc.) separators of monotone systems, and
released or finalized 3 publications in the past few months.
To enhance the disemination of our results, we organized several special sessions, workshops, and
gave tutorials and invited lectures at various conferences (see list above).
We also continued to maintain active research contacts with other research groups of this area,
in particular with researchers from the University of Illinois, Chicago
(with G. Turan),
Osaka University, Osaka, Japan (with K. Makino),
and with Kyoto University, Kyoto, Japan (with T. Ibaraki and M. Yagiura).
Area Background
Monotone systems frequently appear in knowledge discovery as
they are representing important information hidden in large data
bases, complex networks, etc.
In this project we study dual bounded monotone systems, that is monotone systems
in which the number of maximal non-members is bounded by a polynomial of the number of minimal elements.
Such systems can be characterized in terms of the mathematical nature of the
separating surface between their minimal members and maximal non-members.
A surprisingly large number of practically important monotone systems can be shown to belong to this class,
including the family of feasible solutions to a system of monotone inequalities
(linear, regular or submodular inequalities in binary or integer variables),
the working states of a complex network (in which links may fail),
infrequent and frequent sets of a binary matrix, etc.
The algorithmic approach we follow exploits the properties of the separating surface,
guaranteeing an efficient bound on the number of maximal non-members of the monotone system in question,
and applies joint generation. Since hypergraph dualization is the core of this method,
and since most practical examples involve non-binary input
(data vectors from an integral lattice, or from a product of certain partial orders)
our research also focuses on extending efficient
dualization techniques for integer lattices and for the products of certain partial orders.
Area References
- R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A.I. Verkamo,
Fast discovery of association rules, In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy eds.,
Advances in Knowledge Discovery and Data Mining, 307-328, AAAI Press, Menlo Park, California, 1996.
- M. Anthony and N. Biggs, Computational Learning Theory, Cambridge University Press, 1992.
- S. Brin, R. Motwani, J. Ullman, and S. Tsur.
Dynamic itemset counting and implication rules for market basket data.
In: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data, pp. 255-264.
- C. J. Colbourn, The combinatorics of network reliability, Oxford University Press, 1987.
- T. Eiter and G. Gottlob, Identifying the minimal transversals of a hypergraph and related problems,
SIAM Journal on Computing, 24 (1995) 1278-1304.
- M.I. Fredman and L. Khachiyan, On the complexity of dualization of monotone disjunctive normal forms.
J. of Algorithms, 21 (1996) 618-628.
- D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen,
Data mining, hypergraph transversals and machine learning.
In: Proceedings of the 16th ACM-SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, (1997) pp. 12-15.
- V. Gurvich and L. Khachiyan, On generating the irredundant conjunctive
and disjunctive normal forms of monotone Boolean functions.
Discrete Applied Mathematics 96-97 (1999), 363-373.
- D.S. Johnson, M. Yannakakis and C.H. Papadimitriou,
On generating all maximal independent sets,
Information Processing Letters, 27 (1988) 119-123.
- R. Karp and M, Luby, Monte-Carlo algorithms for enumeration and reliability problems,
in Proc. 24th IEEE Symp. on Foundations of Computer Science (1983) 56-64.
- E.L. Lawler, J.K. Lenstra and A.H.G. Rinnooy Kan,
Generating all maximal independent sets: NP-hardness and polynomial-time algorithms,
SIAM Journal on Computing, 9 (1980) 558-565.
- L. Lovász, Submodular functions and convexity. In:
Mathematical Programming: The State of the Art, Bonn 1982, pp. 235-257, (Springer Verlag, 1983).
- H. Mannila and H. Toivonen, Multiple uses of frequent sets and condensed representations. In:
Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, (1996) pp. 189-194.
- S. Muroga, Threshold Logic and Its Applications, Wiley-Interscience, 1971.
- R.C. Read and R.E. Tarjan, Bounds on backtrack algorithms for listing cycles, paths, and spanning trees,
Networks 5 (1975) 237-252.
- R.H. Sloan, K. Takata, G. Turan, On frequent sets of Boolean matrices,
Annals of Mathematics and Artificial Intelligence 24 (1998) 1-4.
- J.D. Ullman, Principles of Database and Knowledge Base Systems, Vols. 1 and 2, Computer Science Press, 1988.
Project Website
http://rutcor.rutgers.edu/~boros/IDM/0118635.html
Illustrations
Computational results and PDF presentations can be accessed via the project website, see above.