Identification of Threshold, Regular and Submodular Monotone Systems: Theory and Algorithms



Endre Boros (PI)
RUTCOR, Rutgers University
640 Bartholomew Road
Piscataway, NJ 08854-8003
Phone: (732) 445-3235
Fax: (732) 445-5472
Email: boros@rutcor.rutgers.edu
URL: http:\\rutcor.rutgers.edu\~boros
Vladimir Gurvich (Co-PI)
RUTCOR, Rutgers University
640 Bartholomew Road
Piscataway, NJ 08854-8003
Phone: (732) 445-3664
Fax: (732) 445-5472
Email: gurvich@rutcor.rutgers.edu
Leonid Khachiyan (Co-PI)
Computer Science Department
Rutgers University
110 Frelinghuysen Road
Piscataway, NJ 08854-8004
Phone: (732) 445-2003
Fax: (732) 445-0537
Email: leonid@cs.rutgers.edu


Khaled Elbassioni (Post-Doctoral Fellow)
RUTCOR, Rutgers University
640 Bartholomew Road
Piscataway, NJ 08854-8003
Phone: (732) 445-3664
Fax: (732) 445-5472
Email: elbassio@paul.rutgers.edu

Project Award Information


Keywords

Incremental generation, representation and learning of monotone systems, data mining, frequent and infrequent sets, hypergraph transversals, Boolean dualization.


Project Summary

This project is aimed at the development of algorithms, theory, and applications for the identification, including the enumeration of all minimal representatives, of implicitly given monotone systems. Monotone systems are a frequent target of knowledge discovery as they represent important information hidden in large data bases and complex networks. Problems involving the determination of monotone systems arise naturally in a multitude of areas, including data mining (finding all maximal frequent, minimal infrequent sets), text mining (finding the best linear query in vector space models), machine learning (finding the best rules or patterns), hypergraph dualization (generating all minimal transversals), reliability theory (generating all minimal working and/or maximal failing states), integer programming (generating all minimal feasible solutions), stochastic programming (constructing deterministic equivalents to certain stochastic models), etc. Both the determination, and efficient and complete representation of such systems are the subject of this project.

We study the problem of identifying implicitly given monotone systems, and building on several recent results we aim at developing a coherent theory of such identification problems. A major part of our research effort focuses on identifying tractable classes, developing efficient algorithms for those, and implementing and testing the resulting new methods. In particular we study monotone systems definable via systems of threshold, regular or submodular functions, since these classes seem to be the most important building blocks of tractable classes.


Publications and Products


Project Related Activities, Tutorials and Invited Lectures


Project Impact

The most visible effects of our project are the increasing number of presentations at various conferences related to this subject. The general approach, reducing the generation of a number of interesting monotone systems to hypergraph dualization, brought back this exciting family of equivalent problems to the forefront of current research. This can be seen, on the one hand, from the number of the project's submissions accepted for presentation at various refereed computer science conferences (e.g. at ICALP, ESA, ISAAC in 2003, STACS, SIAM ICDM, ESA, MFCS in 2002, etc.) A more direct impact can be observed, on the other hand, from recent research inspired by our results, including the discovery of new efficient cases of monotone dualization (Eiter, Gottlob and Makino, 2002), an improved quasi-polynomial dualization algorithm (Tamaki, 2000), and an elegant construction, demonstrating that simpler, but perhaps better known dualization algorithms cannot work efficiently, even if sorting the input is optimally exploited (Takata, 2002).

Two graduate students at Rutgers University was involved with this line of research in 2001-2002. Khaled Elbassioni is supported by the project as a postdoctoral fellow in 2003. Several undergraduate students were involved with related research problems within the REU program at DIMACS, Rutgers University under the guidance of the PI-s of this project in each of the summers of 2001, 2002 and 2003.

We have organized tutorials, workshops and several sessions at the EURO/INFORMS Joint International Meeting in 2003, and at the 7th International Symposium on Artificial Intelligence and Mathematics and the 2nd SIAM International Conference on Data Mining in 2002.


Goals, Objectives and Targeted Activities

Our long term goals include: (a) Characterization of monotone systems in terms of algebraic separators between their minimal member and maximal non-member elements, and the identification of tractable cases, when it is possible to generate efficiently all minimal members (or all maximal non-members) using such a separator as an orcale for the system. (b) Devise new and efficient algorithms for the (joint) generation of monotone systems in the tractable cases, including the family of minimal feasible solutions to systems of threshold, regular and submodular inequalities defined over the set of binary vectors, over an integer lattice, or over the product of certain partial orders. (c) Create a toolkit of efficient serial and parallel implementations for dualization, and for the generation of monotone systems. Test the toolkit and the new algorithms on standard test beds of machine learning and data mining.

To achieve these goals, we implemented a basic quasi-polynomial (in worst case) dualization algorithm for monotone subfamilies of an integer lattice, with the aim of focusing on applications dealing with non-binary attributes/properties, so that in the future we can avoid a separate ``binarization'' phase. We also continued our research of submodular (threshold, regular, etc.) separators of monotone systems, and released or finalized 3 publications in the past few months. To enhance the disemination of our results, we organized several special sessions, workshops, and gave tutorials and invited lectures at various conferences (see list above). We also continued to maintain active research contacts with other research groups of this area, in particular with researchers from the University of Illinois, Chicago (with G. Turan), Osaka University, Osaka, Japan (with K. Makino), and with Kyoto University, Kyoto, Japan (with T. Ibaraki and M. Yagiura).


Area Background

Monotone systems frequently appear in knowledge discovery as they are representing important information hidden in large data bases, complex networks, etc. In this project we study dual bounded monotone systems, that is monotone systems in which the number of maximal non-members is bounded by a polynomial of the number of minimal elements. Such systems can be characterized in terms of the mathematical nature of the separating surface between their minimal members and maximal non-members. A surprisingly large number of practically important monotone systems can be shown to belong to this class, including the family of feasible solutions to a system of monotone inequalities (linear, regular or submodular inequalities in binary or integer variables), the working states of a complex network (in which links may fail), infrequent and frequent sets of a binary matrix, etc. The algorithmic approach we follow exploits the properties of the separating surface, guaranteeing an efficient bound on the number of maximal non-members of the monotone system in question, and applies joint generation. Since hypergraph dualization is the core of this method, and since most practical examples involve non-binary input (data vectors from an integral lattice, or from a product of certain partial orders) our research also focuses on extending efficient dualization techniques for integer lattices and for the products of certain partial orders.


Area References

  1. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A.I. Verkamo, Fast discovery of association rules, In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy eds., Advances in Knowledge Discovery and Data Mining, 307-328, AAAI Press, Menlo Park, California, 1996.
  2. M. Anthony and N. Biggs, Computational Learning Theory, Cambridge University Press, 1992.
  3. S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data, pp. 255-264.
  4. C. J. Colbourn, The combinatorics of network reliability, Oxford University Press, 1987.
  5. T. Eiter and G. Gottlob, Identifying the minimal transversals of a hypergraph and related problems, SIAM Journal on Computing, 24 (1995) 1278-1304.
  6. M.I. Fredman and L. Khachiyan, On the complexity of dualization of monotone disjunctive normal forms. J. of Algorithms, 21 (1996) 618-628.
  7. D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen, Data mining, hypergraph transversals and machine learning. In: Proceedings of the 16th ACM-SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, (1997) pp. 12-15.
  8. V. Gurvich and L. Khachiyan, On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions. Discrete Applied Mathematics 96-97 (1999), 363-373.
  9. D.S. Johnson, M. Yannakakis and C.H. Papadimitriou, On generating all maximal independent sets, Information Processing Letters, 27 (1988) 119-123.
  10. R. Karp and M, Luby, Monte-Carlo algorithms for enumeration and reliability problems, in Proc. 24th IEEE Symp. on Foundations of Computer Science (1983) 56-64.
  11. E.L. Lawler, J.K. Lenstra and A.H.G. Rinnooy Kan, Generating all maximal independent sets: NP-hardness and polynomial-time algorithms, SIAM Journal on Computing, 9 (1980) 558-565.
  12. L. Lovász, Submodular functions and convexity. In: Mathematical Programming: The State of the Art, Bonn 1982, pp. 235-257, (Springer Verlag, 1983).
  13. H. Mannila and H. Toivonen, Multiple uses of frequent sets and condensed representations. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, (1996) pp. 189-194.
  14. S. Muroga, Threshold Logic and Its Applications, Wiley-Interscience, 1971.
  15. R.C. Read and R.E. Tarjan, Bounds on backtrack algorithms for listing cycles, paths, and spanning trees, Networks 5 (1975) 237-252.
  16. R.H. Sloan, K. Takata, G. Turan, On frequent sets of Boolean matrices, Annals of Mathematics and Artificial Intelligence 24 (1998) 1-4.
  17. J.D. Ullman, Principles of Database and Knowledge Base Systems, Vols. 1 and 2, Computer Science Press, 1988.

Project Website

http://rutcor.rutgers.edu/~boros/IDM/0118635.html

Illustrations

Computational results and PDF presentations can be accessed via the project website, see above.