|
RUTCOR, Rutgers University 640 Bartholomew Road Piscataway, NJ 08854-8003 Phone: (732) 445-3235 Fax: (732) 445-5472 Email: boros@rutcor.rutgers.edu URL: http:\\rutcor.rutgers.edu\~boros |
RUTCOR, Rutgers University 640 Bartholomew Road Piscataway, NJ 08854-8003 Phone: (732) 445-3664 Fax: (732) 445-5472 Email: gurvich@rutcor.rutgers.edu |
Computer Science Depratment Rutgers University 110 Frelinghuysen Road Piscataway, NJ 08854-8004 Phone: (732) 445-2003 Fax: (732) 445-0537 Email: leonid@cs.rutgers.edu |
Incremental generation, representation and learning of monotone systems, data mining, frequent and infrequent sets, hypergraph transversals, Boolean dualization.
This project is aimed at the development of algorithms, theory, and applications for the identification, including the enumeration of all minimal representatives, of implicitly given monotone systems. Monotone systems are a frequent target of knowledge discovery as they represent important information hidden in large data bases and complex networks. Problems involving the determination of monotone systems arise naturally in a multitude of areas, including data mining (finding all maximal frequent, minimal infrequent sets), text mining (finding the best linear query in vector space models), machine learning (finding the best rules or patterns), hypergraph dualization (generating all minimal transversals), reliability theory (generating all minimal working and/or maximal failing states), integer programming (generating all minimal feasible solutions), stochastic programming (constructing deterministic equivalents to certain stochastic models), etc. Both the determination, and efficient and complete representation of such systems are the subject of the proposed project.
We plan to study monotone systems, and building on several recent results to develop a coherent theory of such identification problems. We plan to identify tractable classes, and develop efficient algorithms for those. In particular we plan to study monotone systems definable via systems of threshold, regular or submodular functions, since these classes seem to be the most important building blocks of tractable classes. Finally, we plan to implement the resulting new methods, and test those on a variety of test applications.
Despite the early stage in our project (it started just a few months ago), there are already some visible effects, due primarily to the fact that this line of research we started much earlier. The general approach, reducing the generation of a number of interesting monotone systems to hypergraph dualization, brought back this exciting family of equivalent problems to the forefront of current research. This can be seen, on the one hand, from the increased number of related papers accepted for presentation at various computer science conferences (see e.g. ICALP 2001, STACS 2002, STOC 2002, SIAM ICDM 2002, etc.). A more direct impact can be observed, on the other hand, from recent research inspired by our results, including the discovery of new efficient cases of monotone dualization (Eiter, Gottlob and Makino, 2002), an improved quasi-polynomial dualization algorithm (Tamaki, 2000), and an elegant construction, demonstrating that simpler, but perhaps better known dualization algorithms cannot work efficiently, even if sorting the input is optimally exploited (Takata, 2002).
Two graduate students at Rutgers University was involved with this line of research. Lijie Shi worked on extremal combinatorial problems related to the generation of frequent sets in binary matrices, and defended successfully his Ph.D. thesis last Fall. Khaled Elbassioni is currently working on generalizations of dualization, and is expected to defend his Ph.D. thesis later this year.
We have organized a well attended session at the 7th International Symposium on Artificial Intelligence and Mathematics devoted to Combinatorial data analysis; and actively participated in the organization of a one day workshop on Discrete Mathematics and Data Mining within the 2nd SIAM International Conference on Data Mining (we have received 44 submissions, accepted 16, out of which 5 belong to this area of research).
Our long term goals include: (a) Characterization of monotone systems in terms of algebraic separators between their minimal member and maximal non-member elements, and the identification of tractable cases, when it is possible to generate efficiently all minimal members (or all maximal non-members) using such a separator as an orcale for the system. (b) Devise new and efficient algorithms for the (joint) generation of monotone systems in the tractable cases, including the family of minimal feasible solutions to systems of threshold, regular and submodular inequalities defined over the set of binary vectors, over an integer lattice, or over the product of certain partial orders. (c) Create a toolkit of efficient serial and parallel implementations for dualization, and for the generation of monotone systems. Test the toolkit and the new algorithms on standard test beds of machine learning and data mining.
To achieve these goals, we started the development of a basic quasi-polynomial (worst case) dualization algorithm for monotone subfamilies of an integer lattice, with the aim of focusing on applications dealing with non-binary attributes/properties, so that in the future we can avoid the ``binarization'' phase. We also continued our research of submodular (threshold, regular, etc.) separators of monotone systems, and released or finalized 4 publications in the past few months. To enhance the disemination of our results, we organized several special sessions, workshops, and gave tutorials and invited lectures at various conferences (see list above). We also started to build active research contacts with other research groups of this area, in particular with researchers from the University of Illinois, Chicago (with R.B. Sloan and G. Turan), Osaka University, Osaka, Japan (with K. Makino), and with Kyoto University, Kyoto, Japan (with T. Ibaraki and M. Yagiura).
Monotone systems frequently appear in knowledge discovery as they are representing important information hidden in large data bases, complex networks, etc. In this project we study dual bounded monotone systems, that is monotone systems in which the number of maximal non-members is bounded by a polynomial of the number of minimal elements. Such systems can be characterized in terms of the mathematical nature of the separating surface between their minimal members and maximal non-members. A surprisingly large number of practically important monotone systems can be shown to belong to this class, including the family of feasible solutions to a system of monotone inequalities (linear, regular or submodular inequalities in binary or integer variables), the working states of a complex network (in which links may fail), infrequent and frequent sets of a binary matrix, etc. The algorithmic approach we follow exploits the properties of the separating surface, guaranteeing an efficient bound on the number of maximal non-members of the monotone system in question, and applies joint generation. Since hypergraph dualization is the core of this method, and since most practical examples involve non-binary input (data vectors from an integral lattice, or from a product of certain partial orders) our research also focuses on extending efficient dualization techniques for integer lattices and for the products of certain partial orders.