Identification of Threshold, Regular and Submodular Monotone Systems: Theory and Algorithms



Endre Boros
RUTCOR, Rutgers University
640 Bartholomew Road
Piscataway, NJ 08854-8003
Phone: (732) 445-3235
Fax: (732) 445-5472
Email: boros@rutcor.rutgers.edu
URL: http:\\rutcor.rutgers.edu\~boros
Vladimir Gurvich
RUTCOR, Rutgers University
640 Bartholomew Road
Piscataway, NJ 08854-8003
Phone: (732) 445-3664
Fax: (732) 445-5472
Email: gurvich@rutcor.rutgers.edu
Leonid Khachiyan
Computer Science Depratment
Rutgers University
110 Frelinghuysen Road
Piscataway, NJ 08854-8004
Phone: (732) 445-2003
Fax: (732) 445-0537
Email: leonid@cs.rutgers.edu

Project Award Information


Keywords

Incremental generation, representation and learning of monotone systems, data mining, frequent and infrequent sets, hypergraph transversals, Boolean dualization.


Project Summary

This project is aimed at the development of algorithms, theory, and applications for the identification, including the enumeration of all minimal representatives, of implicitly given monotone systems. Monotone systems are a frequent target of knowledge discovery as they represent important information hidden in large data bases and complex networks. Problems involving the determination of monotone systems arise naturally in a multitude of areas, including data mining (finding all maximal frequent, minimal infrequent sets), text mining (finding the best linear query in vector space models), machine learning (finding the best rules or patterns), hypergraph dualization (generating all minimal transversals), reliability theory (generating all minimal working and/or maximal failing states), integer programming (generating all minimal feasible solutions), stochastic programming (constructing deterministic equivalents to certain stochastic models), etc. Both the determination, and efficient and complete representation of such systems are the subject of the proposed project.

We plan to study monotone systems, and building on several recent results to develop a coherent theory of such identification problems. We plan to identify tractable classes, and develop efficient algorithms for those. In particular we plan to study monotone systems definable via systems of threshold, regular or submodular functions, since these classes seem to be the most important building blocks of tractable classes. Finally, we plan to implement the resulting new methods, and test those on a variety of test applications.


Publications and Products


Project Related Activities, Tutorials and Invited Lectures


Project Impact

Despite the early stage in our project (it started just a few months ago), there are already some visible effects, due primarily to the fact that this line of research we started much earlier. The general approach, reducing the generation of a number of interesting monotone systems to hypergraph dualization, brought back this exciting family of equivalent problems to the forefront of current research. This can be seen, on the one hand, from the increased number of related papers accepted for presentation at various computer science conferences (see e.g. ICALP 2001, STACS 2002, STOC 2002, SIAM ICDM 2002, etc.). A more direct impact can be observed, on the other hand, from recent research inspired by our results, including the discovery of new efficient cases of monotone dualization (Eiter, Gottlob and Makino, 2002), an improved quasi-polynomial dualization algorithm (Tamaki, 2000), and an elegant construction, demonstrating that simpler, but perhaps better known dualization algorithms cannot work efficiently, even if sorting the input is optimally exploited (Takata, 2002).

Two graduate students at Rutgers University was involved with this line of research. Lijie Shi worked on extremal combinatorial problems related to the generation of frequent sets in binary matrices, and defended successfully his Ph.D. thesis last Fall. Khaled Elbassioni is currently working on generalizations of dualization, and is expected to defend his Ph.D. thesis later this year.

We have organized a well attended session at the 7th International Symposium on Artificial Intelligence and Mathematics devoted to Combinatorial data analysis; and actively participated in the organization of a one day workshop on Discrete Mathematics and Data Mining within the 2nd SIAM International Conference on Data Mining (we have received 44 submissions, accepted 16, out of which 5 belong to this area of research).


Goals, Objectives and Targeted Activities

Our long term goals include: (a) Characterization of monotone systems in terms of algebraic separators between their minimal member and maximal non-member elements, and the identification of tractable cases, when it is possible to generate efficiently all minimal members (or all maximal non-members) using such a separator as an orcale for the system. (b) Devise new and efficient algorithms for the (joint) generation of monotone systems in the tractable cases, including the family of minimal feasible solutions to systems of threshold, regular and submodular inequalities defined over the set of binary vectors, over an integer lattice, or over the product of certain partial orders. (c) Create a toolkit of efficient serial and parallel implementations for dualization, and for the generation of monotone systems. Test the toolkit and the new algorithms on standard test beds of machine learning and data mining.

To achieve these goals, we started the development of a basic quasi-polynomial (worst case) dualization algorithm for monotone subfamilies of an integer lattice, with the aim of focusing on applications dealing with non-binary attributes/properties, so that in the future we can avoid the ``binarization'' phase. We also continued our research of submodular (threshold, regular, etc.) separators of monotone systems, and released or finalized 4 publications in the past few months. To enhance the disemination of our results, we organized several special sessions, workshops, and gave tutorials and invited lectures at various conferences (see list above). We also started to build active research contacts with other research groups of this area, in particular with researchers from the University of Illinois, Chicago (with R.B. Sloan and G. Turan), Osaka University, Osaka, Japan (with K. Makino), and with Kyoto University, Kyoto, Japan (with T. Ibaraki and M. Yagiura).


Project References


Area Background

Monotone systems frequently appear in knowledge discovery as they are representing important information hidden in large data bases, complex networks, etc. In this project we study dual bounded monotone systems, that is monotone systems in which the number of maximal non-members is bounded by a polynomial of the number of minimal elements. Such systems can be characterized in terms of the mathematical nature of the separating surface between their minimal members and maximal non-members. A surprisingly large number of practically important monotone systems can be shown to belong to this class, including the family of feasible solutions to a system of monotone inequalities (linear, regular or submodular inequalities in binary or integer variables), the working states of a complex network (in which links may fail), infrequent and frequent sets of a binary matrix, etc. The algorithmic approach we follow exploits the properties of the separating surface, guaranteeing an efficient bound on the number of maximal non-members of the monotone system in question, and applies joint generation. Since hypergraph dualization is the core of this method, and since most practical examples involve non-binary input (data vectors from an integral lattice, or from a product of certain partial orders) our research also focuses on extending efficient dualization techniques for integer lattices and for the products of certain partial orders.


Area References

  1. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A.I. Verkamo, Fast discovery of association rules, In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy eds., Advances in Knowledge Discovery and Data Mining, 307-328, AAAI Press, Menlo Park, California, 1996.
  2. M. Anthony and N. Biggs, Computational Learning Theory, Cambridge University Press, 1992.
  3. S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data, pp. 255-264.
  4. C. J. Colbourn, The combinatorics of network reliability, Oxford University Press, 1987.
  5. T. Eiter and G. Gottlob, Identifying the minimal transversals of a hypergraph and related problems, SIAM Journal on Computing, 24 (1995) 1278-1304.
  6. M.I. Fredman and L. Khachiyan, On the complexity of dualization of monotone disjunctive normal forms. J. of Algorithms, 21 (1996) 618-628.
  7. D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen, Data mining, hypergraph transversals and machine learning. In: Proceedings of the 16th ACM-SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, (1997) pp. 12-15.
  8. V. Gurvich and L. Khachiyan, On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions. Discrete Applied Mathematics 96-97 (1999), 363-373.
  9. D.S. Johnson, M. Yannakakis and C.H. Papadimitriou, On generating all maximal independent sets, Information Processing Letters, 27 (1988) 119-123.
  10. R. Karp and M, Luby, Monte-Carlo algorithms for enumeration and reliability problems, in Proc. 24th IEEE Symp. on Foundations of Computer Science (1983) 56-64.
  11. E.L. Lawler, J.K. Lenstra and A.H.G. Rinnooy Kan, Generating all maximal independent sets: NP-hardness and polynomial-time algorithms, SIAM Journal on Computing, 9 (1980) 558-565.
  12. L. Lovász, Submodular functions and convexity. In: Mathematical Programming: The State of the Art, Bonn 1982, pp. 235-257, (Springer Verlag, 1983).
  13. H. Mannila and H. Toivonen, Multiple uses of frequent sets and condensed representations. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, (1996) pp. 189-194.
  14. S. Muroga, Threshold Logic and Its Applications, Wiley-Interscience, 1971.
  15. R.C. Read and R.E. Tarjan, Bounds on backtrack algorithms for listing cycles, paths, and spanning trees, Networks 5 (1975) 237-252.
  16. R.H. Sloan, K. Takata, G. Turan, On frequent sets of Boolean matrices, Annals of Mathematics and Artificial Intelligence 24 (1998) 1-4.
  17. J.D. Ullman, Principles of Database and Knowledge Base Systems, Vols. 1 and 2, Computer Science Press, 1988.