Fifth International Symposium on
Artificial Intelligence and Mathematics
January 4-6, 1998, Fort Lauderdale, Florida

Abstracts of Talks

Abstracts are listed in alphabetical order by the speakers' names printed in bold face.

On the complexity of designing compact perceptrons and some consequences, by Edoardo Amaldi School of Operations Research and Theory Center, Cornell University, Ithaca, NY 14853. (SU 13:30)

When training perceptrons (linear threshold functions) not only the performance on the training set is important but also the number of features that are actually needed. For instance, if only a small fraction of features suffice to correctly classify all given examples, a training algorithm that also minimizes the number of nonzero weights would require far fewer examples to achieve a given level of accuracy than any algorithm that does not try to discard unrelevant features. To investigate the computational complexity of the problem of designing compact networks, we first focus on linearly separable training sets and we improve the nonapproximability bound for the corresponding feature minimization problem. In particular, we disprove the conjecture by Van Horn and Martinez and show that no polynomial time algorithm is guaranteed to yield a weight vector that correctly classifies all training examples and whose number of nonzero weights is within a logarithmic factor of the optimum, unless all problems in NP can be solved in quasi-polynomial time. O(log p) of the optimum, where p is the number of examples Then we discuss some consequences on the complexity of designing compact neural networks and of finding sparse approximate solutions to linear systems. The latter problem arises, for instance, when designing compact radial basis function networks. According to theoretical and experimental results, feature minimization or more generally sparse approximation is an important issue in learning from examples since it is a natural way of implementing Occam's razor principle.

Perceive This as That -- Analogical Cognitive Transitions with Categorical Tools, by Zippora Arzi­Gonczarowski Typographics, Ltd. Jerusalem, Israel. (TU 8:30)

This study formalizes cognitive transitions between artificial perceptions that consist of an analogical transference of perception. The formalization is performed within a framework that has been used before to formalize other aspects of artificial cognition. The mathematical infrastructure consists of a basic category of `artificial perceptions'. Each `perception' consists of a set of `world elements', a set of `connotations', and a three­valued (true, false, undefined) predicative connection between the two sets. `Perception morphisms' describe paths between perceptions. This setting provides the foundations for a theory of artificial perceptions. Artificial cognitive processes are captured by perception morphisms and other categorical constructs. In the present study the theory is incremented by showing how analogical transitions can also be captured by categorical constructs. A categorical factorization of analogical transitions is then shown to capture transitions into metaphorical perceptions inspired by the analogy. It is further shown how structural aspects of `better' analogies can be captured and evaluated by the same categorical setting. The formalization is illustrated by examples. The presentation is concluded by fusing the new results into the exist­ ing collection of formalizations of other artificial cognitive processes within the same premises. The unified theory adds some novel extensions of pre­ theoretical conceptions. Intuitions concerning the connection between ob­ servant, insightful perceptions (formalized by valid and complete boolean perceptions) and distinctive, profound analogies or metaphors (formalized by highly structured perception morphisms) obtain formal support (in the form of a natural transformation). By employing well­developed tools of mathematics, it is thus possible to capture complex phenomena that could be grasped only loosely by verbal descriptions. The mathematical framework models them in a precise, testable and applicable form, augmenting the evi­ dence that mathematical categorization of artificial perceptions can be useful to AI.

If-Then Statements in Games and AI, by Robert John Aumann, Hebrew University, Israel. (MO 11:00)

In mathematics, an assertion of the form "if p, then q" means simply "q or not p." Philosophers call this a "material conditional." In everyday discourse, though, "if-then" statements have more content. For example, the statement "if Kruschev had not backed down in the Cuban Missile Crisis, then a nuclear conflict would have ensued" is not automatically true just because Kruschev DID back down. We call it a "substantive conditional".

Substantive conditionals play a vital role in everyday decision making, as well as in Game Theory and, presumably, AI. In this lecture we will discuss the task of formalizing them, and draw conclusions about rational behavior, and "common knowledge" of such behavior, in games.

On the conversion between non-binary and binary constraint satisfaction problems, by Fahiem Bacchus and Peter van Beek University of Alberta, Canada (MO 8:30)

Constraint-based reasoning or constraint satisfaction is a simple, yet powerful paradigm in which many interesting problems---from natural language parsing to scheduling---can be formulated.  It is well known that any non-binary discrete constraint satisfaction problem (CSP) can be translated to an equivalent binary CSP.  Two general translations are known: the dual graph translation and the hidden variable translation.  The dual graph representation can often be a natural CSP formulation of a problem (crossword puzzles and scene labeling are two examples from the literature), and in such cases it is also possible to reverse the translation converting a binary CSP into an non-binary one.  There has been little theoretical or experimental work on how well various backtracking algorithms perform on the dual graph and the hidden variable representations in comparison to their performance on the corresponding non-binary CSP.  In this talk, I will present our first steps towards addressing this issue, and I will present some results, both theoretical and empirical, to help understand the tradeoffs involved.  One way of efficiently solving a CSP representation of a problem is to translate it to a new representation.  The ultimate aim of this research is to give guidance for when one should consider translating between non-binary and binary representations.

Characterization of non-monotone non-constructive systems, by Philippe Besnard, CNRS, IRISA, Campus de Beaulieu, F-35042 Rennes Cedex, France, and Torsten Schaub, Institut fur Informatik, Universitat Potsdam, Postfach 60 15 53, D-14415 Potsdam, Germany. (SU 14:30)

Logical systems have traditionnally been monotone. Artificial Intelligence has introduced nonmonotone systems. A crucial issue in connection with such systems is that of a characterizing theory. On the one hand, there have been proposals to weakening a theory of monotone systems so as to make it encompass nonmonotone systems but the resulting theory clearly falls short of delineating nonmonotony. On the other hand, failure of monotony cannot be taken as a founding feature because no theory worthy of the name is based on the mere absence of a property.

We refer to an analogous case, not in logic but in lattice theory, and show that a similar solution of exploiting subtructures even yields a decent theory for the most negative notion: nonmonotone non-constructive systems. The main result is that in an appropriate context, failure of monotony for a system is equivalent to it being a proper extension of an underlying traditional logic.

Appropriate context means that deductive closure of the underlying non-constr- uctive logic applies to the nonmonotone system while consistency is preserved. Technically, let 0 denote absurdity, p, q denote formulas whereas S, T stand for sets of formulas, and S |-/ p denies S |- p. The main theorem writes:

  1. S |- 0 whenever S |~ 0
  2. if S |~ p and S, p |- q then S |~ q
  3. if S |-/ p and S, p |-/ 0 then S, T, p |- 0 and S, T |-/ 0 for some T
Taking (a)-(b)-(c) for granted, S |~ p and S, T |~/ p for some p, S, T if and only if |~ is a proper extension of |-.

Consequently, non-constructive nonmonotone reasoning is given a characterization which is more satisfactory than via the usual poor idea of failure of monotony.

Writing Cn and Th as the operations for |~ and |- resp., - condition (a) is consistency preservation: 0 is in Cn(S) only if 0 is in Th(S); - condition (b) is a weak form of deductive closure which merely states that if p is in Cn(S) then Th(S U {p}) is a subset of Cn(S) but Th(Cn(S))=Cn(S) need not hold. Such a feature can be found in default logic, a well-known nonmonotone system where Cn(W) can contain p and q while not containing p&q,... (but Cn(W) contains all classical consequences of p such as q->p&q,... and the same for q). - condition (c) stands for non-constructiveness: variant of reductio ad absurdum. In (c), negation of p in the context S is represented by T which can be infinite and need not involve any particular symbol. Indeed, no assumption is made about the language as another example is absurdity that can have the form of a nullary connective or it may correspond to deducibility of all formulas or whatever else is desired. As for inference, there need not be any compactness property holding and the underlying logic need not have any theorem.

In view of the literature, we argue that the result is most general. We also discuss (monotone) intermediate logics and nonmonotone constructive systems: When the underlying logic is intuitionistic logic, failure of the main theorem is shown to coincide with the existence of intermediate logics.

Monotonocity, decision lists  and partially defined discrete functions, by Jan C. Bioch Department of Computer Science, Erasmus University Rotterdam, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands. (TU 11:00)

In many real-world classification problems in which the classes have a natural ordering it is necessary that objects with better attribute values are classified in higher classes. These problems can be formulated as inducing a monotone extension of a partially defined discrete function.

The class of discrete functions contains the class of binary functions and Boolean functions and it is well-known that discrete functions  occur in many problems in artificial intelligence, machine learning, operations research and many other areas.

In this paper we discuss the problem of generating decision lists of monotone extensions of partially defined discrete functions and in general the representation of (monotone) discrete functions by decision lists.

A model theory for Figure Ground Location, by Thomas Bittner Department of Geoinformation, Technical University of Vienna and National Center of Geographic Information and Analysis, NCGIA. (MO 13:30)

Much effort has been expended on the problem of constructing formal theories for qualitative spatial reasoning (QSR). Most of these theories are about spatial individuals. This paper takes a different approach: It is about location of individuals, rather than about individuals themselves. Equivalence classes of individuals are defined in terms of their location within an ordering frame of references. Individuals are regions in the plane. The frame of reference is formed by a finite regional partition of the plane.

The coordinate language of figure ground location is built upon the RCC theory. Figure ground location is expressed in terms predicates about relations between regional individuals (the figures) and a set of regional individuals satisfying a partition axiom (the ground). The ground provides a frame of reference in which the figures are located. Primitive expressions of the figure ground coordinate language are n­tuples of figure ground predicates. Figure ground predicates are statements about relations between a pair of `ground regions sharing a common boundary segment' and a figure region. The base of the coordinate n­tuples is a corresponding n­tuple of pairs of `ground regions sharing a boundary segment'.

A model theoretic approach is taken to characterize location, i.e., equivalence classes of individuals, and their relation to the individuals themselves. This provides a formal basis to evaluate which properties of the domain of individuals are preserved in the domain of equivalence classes. A set theoretic interpretation of figure ground location is given. It is based on the interpretation of the RCC theory in the domain of non empty regular closed (NERC) sets of a T_3 topological space (Gotts 1996a). Figure ground locations are interpreted in the power set of NERC sets of two dimensional T_4 topological spaces. Operations on sets of NERC sets of equivalent location within a regional partition are defined. These operations form a co­Heyting algebra, and a double commutative monoid. A discrete, graph theoretical interpretation of figure ground location is given. It is shown to be isomorphic to the algebra of sets of NERC sets of equivalent location.

From the methodological point of view this approach is analytical rather than synthetic . Whatever can be known about location is already encoded in the structure of the frame of reference and the locations within it. This has the advantage that complete knowledge about location is independent of knowledge about particular individuals. The disadvantage is that the degree to which properties of the domain of individuals are preserved depends on the frame of reference, e.g., its resolution, its adjustment within the space,..., and the definition of location within it.

Abstraction from individuals to location has been also investigated in the domain of linguistics and cognitive science, e.g., (Talmy 1983). It results in reduced complexity but can cause wrong conclusions if the structure of the domain of location does not correspond to the structure of the domain of individuals (Stevens and Coupe 1978).

  • Gotts, N. M. (1996a). An Axiomatic Approach to Topology for Spatial Information Systems. Leeds, School of Computer Studies.
  • Stevens, A. and P. Coupe (1978). ``Disortions in Judged Spatial Relations.'' Cognitive Psychology 10: 422­437.
  • Talmy, L. (1983). How Language Structures Space. Spatial Orientation: Theory, Research, and Application. H.
  • Pick and L. Acredolo. New York, NY, Plenum Press.

On the structure of some classes of minimal unsatisfiable formulas in CNF, by Hans Kleine Büning University of Paderborn, Department of Mathematics and Computer Science, D-33095 Paderborn, Germany. (MO 13:30)

Minimal unsatisfiable propositional formulas in conjunctive normal form are formulas which are unsatisfiable and deleting an arbitrary clause results in a satisfiable formula. It is well known that the minimal unsatisfiability problem is DP--complete.

For fixed k we consider classes of formulas over n variables with n+k clauses. Let MU(k) be the class of minimal unsatisfiable formulas of this form. We present some results on the complexity of the minimal unsatisfiability problem for MU(k) and characterize for some k the structure of MU(k)-formulas.

In particular we can show that MU(1) and MU(2) are solvable in time O(n2). For MU(1) the algorithm is based on the observation that in a minimal unsatisfiable formula at least one variable occurs exactly twice. For MU(2) after a simplification the formulas have for each n a unique form.

Finally we show for MU(1) an representation in form of basic matrices.

A Comparison of Linear Logic with Wave Logic, by Thomas L. Clarke University of Central Florida, Institute for Simulation and Training, 3280 Progress Drive, Orlando, FL. (SU 13:30)

Linear logic, since its introduction by Girard in 1987 [Theoretical Computer Science, 50:1-102], has proven expressive and powerful.  Linear logic has provided natural encodings of Turing machines, Petri nets and other computational models.  Linear logic is also capable of naturally modeling resource dependent aspects of reasoning.

Wave logic was first described by Orlov in 1978 [Philosophica Naturalis, 17:120-9].  Related to quantum logic, wave logic has not yet found wide application.

This paper discusses the relation between linear logic and wave logic.  Several theorems detailing the reduction of the logics one to the other are proven.  Lie groups provide the connection between the exponential modal storage operators of linear logic and the eigenfunctions of wave logic.

Implications of these results for modeling reason and intelligence are discussed.  Particularly suggestive are possible relations between complexity results for linear logic and non-computablity results for the wave mechanics underlying wave logic. 

Characterizing consistency based diagnosis, by Sylvie Coste-Marquis and Pierre Marquis, CRIL, Université d'Artois & IUT de Lens, rue de l'Université ­ S.P. 16 ­ F­62307 Lens Cedex, France. (TU 9:00)

The talk will be centered on the problem of characterizing consistency-based diagnoses for a system. We introduce three properties that any valuable characterization should satisfy: information-preservation, operationality and conciseness. We show that the characterizations pointed out so far in the literature (especially, the minimal diagnoses, kernel diagnoses and MT-diagnoses) do not necessarily satisfy all these requirements.

As an alternative to the kernel diagnoses approach, we introduce the theory minimal conflicts for a system. As in the MT-diagnoses approach, a theory is used so as to incorporate hierarchical information about the components within the diagnosis process. The size of theory minimal conflicts characterizations can prove exponentially smaller than the size of the corresponding kernel diagnoses characterization, even if it is not always the case. We will show how theory minimal conflicts characterizations can be used to check whether a candidate diagnosis is a consistency-based one (or a kernel diagnosis) in polynomial time and to find out one such diagnosis in polynomial time. We will present an algorithm for computing theory minimal conflicts, based on their theory prime implicates characterization. This algorithm does not require the minimal conflicts for the system to be generated.

Optimizing with constraints: a case study in scheduling maintenance of electric power units, by Rina Dechter and Dan Frost, Information and Computer Science Dept., University of California, Irvine, Irvine, CA 92717. (MO 9:00)

A well-studied problem in the electric power industry is that of optimally scheduling preventative maintenance of power generating units within a power plant.  We show how these problems can be cast as constraint satisfaction problems  and provide an "iterative learning" algorithm which solves the problem in the following manner. In order to find an optimal schedule, the algorithm solves a series of CSPs with successively tighter cost-bound constraints. For the solution of each problem in the series we use constraint learning,  which involves recording additional constraints that are uncovered during search. However, instead of solving each problem independently, after a problem with a certain cost-bound is solved successfully the new constraints recorded by learning are used in subsequent attempts to find a schedule with lower cost-bound.

We show empirically that on a class of randomly generated maintenance scheduling problems ''iterative learning'' reduces the time to find a good schedule. We also provide a comparative study of  the most competitive CSP algorithms on the maintenance scheduling benchmark.

Applications of Linear Logic to AI and Natural Language Processing, by Christophe Fouquere University of Paris, 13, France. (MO 16:30)

Classical logic is well suited for the formalization of mathematics, but is notoriously insufficient for a lot of problems in Artificial Intelligence, Data Bases, Linguistics. For these cases, a lot of nonmonotonic systems have been developed. Our viewpoint is different: we tried to represent some of these problems in Linear Logic (introduced in 1985 by J.-Y. Girard). We will focus during this talk on three specific problems: the two first are formalized in fragments of the commutative Linear Logic, the last one in a fragment of noncommutative Linear Logic.

    The Applications:
  1. Plan generation, i.e. sequences of actions, can easily be represented in Linear Logic (LL), as LL allows resources to be explicitly managed, hence solving the so-called "frame problem". We represent deterministic as well as non-deterministic sequences of actions.
  2. Representing rules with default knowledge and exceptions is useful for AI situations, as well as for object langages and data bases. Though LL is monotonic, some of its connectives allow one to model non-monotonicity inside it.
  3. The last application concerns linguistics. We give a representation of the TAG formalism (Tree Adjoining Grammar) in an extension of the Lambek calculus. In particular, we show how the adjunction operation may be formalized as a (admissible) rule in a fragment of this calculus.

Introduction to the Session on Constraints, by Eugene C. Freuder (Constraint Computation Center and Department of Computer Science, University of New Hampshire, Durham, NH 03824). (Monday Morning)

Constraint computation is emerging as a major new computing paradigm out of developments in artificial intelligence, programming languages, operations research, discrete mathematics, and other fields.

Many problems are naturally viewed as Constraint Satisfaction Problems, consisting of variables with domains of potential values, and constraints specifying which combinations of values are allowed. For example, in a graph coloring problem, the variables are the vertices, the values are the colors, and the constraints specify that neighboring vertices have different colors. AI applications are found in many areas, including planning, design, diagnosis, and robotics. There is a natural extension to constraint optimization.

Mathematics has supported the theoretical underpinnings of constraint computation and informed the development of constraint satisfaction algorithms. For example:

  • Group theory has been used to study symmetric problem structures.
  • Graph theory has been used to derive computational complexity results.
  • Interval arithmetic has been used to cope with inexact data.

Constraint programming in turn has tackled hard mathematical problems. For example:

  • SAT problems.
  • Graph problems.
  • Numerical analysis problems.

A Propositional Theorem Prover to Solve Planning and Other Problems, by Allen Van Gelder and Fumiaki Kamiya Okushi, University of California, Santa Cruz. (SU 8:45)

Classical STRIPS-style planning problems are formulated as theorems to be proven from a new point of view.  The result for a refutation-based theorem prover may be a propositional formula that is to be proven unsatisfiable.  This formula is identical to the formula that may be derived directly by various ``sat compilers'', but the theorem-proving view provides valuable additional information not in the formula: namely, the theorem to be proven.  Traditional satisfiability methods, most of which are based on model search, are unable to exploit this additional information.  However, a new algorithm, called ``Modoc'', is able to exploit this information and has achieved performance comparable or superior to the fastest known satisfiability methods, including stochastic search methods, on planning problems that have been reported by other researchers, as well as formulas derived from other applications.  Unlike most theorem provers, Modoc performs well on both satisfiable and unsatisfiable formulas.

Modoc works by a combination of back-chaining from the theorem clauses and forward-chaining on tractable subformulas.  In many cases, Modoc is able to solve a planning problem without finding a complete assignment, because the back-chaining methodology is able to ignore irrelevant clauses.

Experimental results are presented for planning problems and formulas derived from other applications.

Propositional Search with k-Clause Introduction Can be Polynomially Simulated by Resolution, by Allen Van Gelder, University of California, Santa Cruz. (MO 8:30)

Purdom proposed ``complement search'' as a method to introduce additional constraints into a satisfiability search, without changing whether the formula is satisfiable, with the hope of reducing the amount of search. Kullmann proposed ``blocked-clause introduction'', which is closely related to ``complement search'', with the same aim. This paper generalizes the concept of ``blocked clause'' to ``nondecisive clause''. A ``nondecisive clause'' has the same useful property that it can be either added to or removed from a CNF formula without changing whether the formula is satisfiable. However, the ``nondecisive'' property is preserved under more operations than is ``blocked''.

Consider a class of satisfiability algorithms that uses as its operations resolution, subsumption, and the ``splitting'' rule, such as the classical branching algorithm of Davis, Loveland and Logemann, and its modern variants. It is known that these algorithms can be ``polynomially simulated'' by resolution proofs; therefore, exponential lower bounds known for resolution apply to these algorithms.

This paper analyzes the above class of algorithms with the enhancement that nondecisive clauses (including blocked clauses) may be introduced at any search node, with the restrictions that (1) the clause does not contain variables that are new to the formula, and (2) the clause has at most $k$ literals, where $k$ is a constant. This paper shows that this enhanced class of algorithms still permits polynomial simulation by resolution, but the degree of the polynomial depends on $k$ in the simulation described.

If only restriction (1) is removed, it is already known that introduction of ternary blocked clauses suffices to produce an exponential speed-up relative to resolution. If only restriction (2) is removed, the question is open.

Exact Classification with 2-Layer Neural Nets: Theoretical Results and Open Problems, by Gavin Gibson Biomathematics and Statistics Scotland, Cathy Z. W. Hassell Sweatman, and Bernard Mulgrew Department of Electrical Engineering, University of Edinburgh, UK. (SU 8:45)

This paper considers the classification capabilities of neural networks which incorporate a single hidden layer of McCulloch-Pitts units, as characterised by the decisions regions they can realise exactly (i.e. such the misclassified region has zero measure).  It builds on research which demonstrated that the realisability of a set can be determined by restricting attention to an arbitrary neighbourhood of its boundary.  This enables the existence of a network realising a given region to be proved without explicitly constructing its architecture.  The paper describes how this leads to a characterisation of all bounded 2-dimensional realisable regions whose bounding lines lie in general position. A generalisation of this result to higher dimensions is presented.  The paper concludes by conjecturing a geometric characterisation of exactly realisable regions in n dimensions. Strategies for obtaining a proof of such a result are suggested and discussed.

Geometric Foundations for Interval-Based Probabilities, by Vu Ha, and Peter Haddawy, Dept of EE & CS, University of Wisconsin-Milwaukee. (TU 9:30)

The need to reason with imprecise probabilities arises in a wealth of situations ranging from pooling of knowledge from multiple experts to abstraction-based probabilistic planning. Researchers have typically represented imprecise probabilities using intervals and have developed a wide array of different techniques to suit their particular requirements. In this paper we provide an analysis of some of the central issues in representing and reasoning with interval probabilities. We use the well-developed area of convex geometry as the underlying foundation for our analysis. In particular, we point out the ubiquity of the probability cross-product operator, a generalization of which is known in the convex geometry literature as the Minkowski operator.  We provide a throrough analysis of some key properties of this operator relative to manipulation of sets of probability distributions. We demonstrate the application of our results to the problems of inference in interval Bayesian networks and projection of abstract probabilistic actions.

Set­Theoretic Completeness for Epistemic and Conditional Logic, by Joseph Y. Halpern Dept. Computer Science, Cornell University, Ithaca, NY 14853. (SU 14:00)

The standard approach to logic in the literature in philosophy and mathemat­ ics, which has also been adopted in computer science, is to define a language (the syntax), an appropriate class of models together with an interpretation of formu­ las in the language (the semantics), a collection of axioms and rules of inference characterizing reasoning (the proof theory), and then relate the proof theory to the semantics via soundness and completeness results. Here we consider an ap­ proach that is more common in the economics literature, which works purely at the semantic, set­theoretic level. We provide set­theoretic completeness results for a number of epistemic and conditional logics, and contrast the expressive power of the syntactic and set­theoretic approaches.

Using Multi-Agent Systems to Represent Uncertainty, by Joseph Halpern Dept. Computer Science, Cornell University, Ithaca, NY 14853. (MO 15:15)

Uncertainty is a fundamental---and unavoidable---feature of daily life. In order to reason about uncertainty, we need to have the tools to represent it well.  I'll discuss a general framework, that incorporates knowledge, time, and probability, and gives us a systematic way of representing uncertainty.  In this framework, we  can reasoning about agents' beliefs about the world, their beliefs about other agents' beliefs, and their beliefs about the future. Just as importantly, the framework gives us the tools to model at a semantic level many phenomena of interest, from adversarial games to belief revision.  Finally, the framework emphasizes the importance of specifying clearly the protocol that generates a system, and provides the tools for describing protocols and the systems they generate.   I'll show by example how the framework can give insight into a wide range of problems, from coordination to knowledge base queries to puzzles like the Monty Hall puzzle. 

Convexity and Logical Analysis of Data, by Oya Ekin, Peter L. Hammer, and Alexander Kogan, RUTCOR, Rutgers University, New Jersey. (SU 16:30)

A Boolean function is called k-convex if for any pair x,y of its true points at Hamming distance at most k, every point "between" x and y is also true. Given a set of true points and a set of false points, the central question of Logical Analysis of Data is the study of the Boolean functions whose values agree with those of the given points. In this paper we examine such data sets which admit k-convex Boolean extensions. We provide polynomial algorithms for finding a k-convex extension, if any, and for finding the maximum k for which a k-convex extension exists. We study the problem of uniqueness, and provide a polynomial algorithm for checking whether all k-convex extensions agree in a point outside the given data set. We estimate the number of k-convex Boolean functions, and show that for small k this number is double exponential. On the other hand, we show that for very large k the class of k-convex Boolean functions is PAC-learnable.

Simulated Annealing and Tabu Search for Constraint Solving, by Jin­Kao Hao, and Jerome Pannier LGI2P/EMA­EERIE, Parc Scientifique Georges Besse, F­30000 Nimes, France. (Canceled)

Constraint solving either for satisfaction or optimization purpose occupies a very important place in Artificial Intelligence (AI) and Mathematics. Informally, constraint solving consists in finding assignment of values to variables while respecting some constraints and eventually optimizing a cost function. Constraint problems embody many well­known problems such as graph coloring and satisfiability, and numerous practical applications related to resource assignment, planning, scheduling and so on. Constraint problems are in general NP­complete or NP­hard. Given the fact that it is unlikely there exists any efficient solution for these problems, we are interested in their practical solving using heuristic methods.

In this paper, we present an experimental study of local search for constraint solving. For this purpose, we experiment with two algorithms based on Simulated Annealing (SA) and Tabu Search (TS) for solving the maximal constraint satisfaction problem. These two algorithms were tested on a wide range of large random instances going from 100 to 300 variables with 10 to 30 values per variable. Experimental results lead to the following conclusions:

  • TS gives better solutions than SA for a fixed number of moves;
  • TS needs fewer moves to get solutions of the same quality as SA;
  • TS is faster in terms of running time to carry out a same number of moves.
These results are consistent with previous comparative studies on special classes of constraint problems such as graph coloring and Max­SAT. This study confirms that an appropriate neighborhood and good data structures are indispensable for good performance of an algorithm. It confirms also the importance and difficulty of an appropriate tuning of the parameters. It remains to know how and to which extent these results can be generalized to other constraint networks (for instance, highly over­constrainted networks) and other constraint problems.

A closer look at the pure implicational calculus, by Peter Heusch, and Ewald Speckenmeyer, Universität zu Küln, Institut für Informatik, Pohligstr. 1, D-50969 Köln, Germany. (MO 14:00)

In their paper titled An Algorithm for the Class of Pure Implicational Formulas Franco et al. show how the falsifiability problem for Boolean formulas in pure implicational form can be solved in time O(kkn2), where k denotes the number of Boolean constants false and n denotes the number of variables, each one occuring at most twice. This is an improvement over the previously known runtime bound O(nk) and shows that the problem is fixed parameter tractable. We improve this result further by modifying the algorithm such that a runtime of O(2kp(k,n)), where p(k,n) is a polynomial function wrt. k and n. We study further the functional behavior of pure implicational formulas.

The phase transition in random Horn satisfiability, by Gabriel Istrate, and Mitsunori Ogihara, Department of Computer Science, University of Rochester, Rochester, NY 14627. (MO 17:00)

This paper studies the phase transition in random Horn satisfiability: under the random model O(n; m) in which a formula is obtained by choosing m clauses independently, uniformly at random, and with repetition from all Horn clauses in n variables, t(n) = 2n is the satisfiability threshold for the Horn satisfiability. Unlike all other cases of satisfiability, we prove that the threshold is coarse since, if m(n) = c 2n then lim Pr H in O(n,m(n)) [H is satisfiable] = 1-F(e-c) where F(x)=(1-x)(1-x2)(1-x4).... This resolves both of the remaining tractable cases in Schaefer's Dichotomy Theorem. We obtain our result by a probabilistic analysis of the following algorithm, denoted PUR, for Horn satisfiability:

    Procedure 1 PUR(H)
  • if (H contains two opposite literals) then >
    • return FALSE
  • else
    • if (H contains no positive literals) then
      • return TRUE;
    • else
      • choose such a positive literal x
      • let H' be the formula obtained by setting x to 1 in H
      • return PUR(H' );
    • end if
  • end if

Continuous And Discrete-Time Nonlinear Gradient Descent Relative Loss Bounds and Convergence, by Manfred Warmuth and Arun Jagota Department of Computer Science, University of California at Santa Cruz. (TU 11:00)

We introduce a general algorithm for continuous-time non-linear gradient-descent, with the non-linearity captured by the choice of a certain possibly non-linear ``link'' function. Euler-discretizations of this algorithm yield, for various choices of link function, the conventional gradient-descent algorithm and several exponentiated gradient algorithms. We obtain relative loss bounds for the general algorithm in an on-line setting for both the continuous-time and discrete-time versions. These bounds reveal the dependence on the link function and show that an additional term is present in the discrete-time case that disappears in the continuous-time version. This additional term is responsible for the pair of dual norms that appear in the relative loss bounds for linear and logistic regression. The continuous-time version is shown to have a simple proof of convergence. Convergence of Hopfield recurrent neural networks is seen as a special case. Consequences of our on-line loss results to batch (off-line) learning are also briefly discussed.

Constraints and Universal Algebra, by Peter Jeavons, and David Cohen, Department of Computer Science, Royal Holloway, University of London, UK, and by Justin Pearson Department of Information Technology Mid Sweden University, S­851 70 Sundsvall, Sweden. (MO 9:30)

In this paper we explore the links between constraint satisfaction problems and universal algebra. We show that a constraint satisfaction problem instance can be viewed as a pair of relational structures, and the solutions to the problem are then the structure preserving mappings between these two relational structures. We give a number of examples to illustrate how this framework can be used to express a wide variety of combinatorial problems, many of which are not generally considered as constraint satisfaction problems. We also show that certain key aspects of the mathematical structure of constraint satisfaction problems can be precisely described in terms of the notion of a Galois connection, which is a standard notion of universal algebra. Using this result, we obtain an algebraic characterisation of the property of minimality in a constraint satisfaction problem. We also obtain a similar algebraic criterion for determining whether or not a given set of solutions can be expressed by a constraint satisfaction problem of a given arity.

Semantic Dimension: On the Effectiveness of Naive Data Fusion Methods in Certain Learning and Detection Problems, by Paul B. Kantor, SCILS and RUTCOR, Rutgers University. (TU 8:30)

The effectiveness of certain naive methods for combining information from two or more sensors, or classifiers is considered. A vector space model representing the ideal solution to such a problem is proposed. The effective dimension of this vector space is called the ``semantic dimension'' (SD) of the Problem-Method combination . It is proposed that imperfect classifiers or detectors be represented as random-variable-valued functions defined on the space of ideal solutions. It is shown that for spaces of low SD, and normal distributions for the functions involved, the imperfect solution has a good chance to be ``near'' the ideal solution. It is further shown that, for spaces of high SD, any single imperfect solution is highly unlikely to lie near the ideal. This ``repulsion effect'' is due to the volume element in the space. The notion of stochastically independent classifiers or detectors is defined in a natural way. It is then shown analytically that a naive symmetric combination of classifiers or sensors which can be represented as a linear combination of the several methods is likely to be more effective than any of them alone. In this way, combination rules are shown to be able, quite generally, to produce solutions which, unlike most mixtures, do not lie ''between'' their constituent parts, but ``beyond'' them. An expression for the optimum weights for system combination is obtained in terms of the ideal performance of systems in the class considered.

Research supported in part by the Alexandria Digital Library Project, UCSB. Project Director: T.R. Smith.

Discontinuous Phase Transitions in Random Decision Problems, by Scott Kirkpatrick, IBM, Remi Monasson, ENS, Paris, Bart Selman, Cornell, Lidror Troyansky, Jerusalem, and Riccardo Zecchina, Torino. (SU 15:15)

I will present analytical and numerical results for a satisfiability problem which interpolates between the polynomial 2-SAT and the NP-complete 3-SAT problems.  Experiments show that the onset of exponential search cost in the most probable case occurs not at the point where the problem becomes NP, but at a discontinuous phase transition, in which many solutions compete to become the ground state.  We view large combinatorial problems as finite instances of infinite random structures for which the ideas of phase transitions and the methods of statistical mechanics are appropriate. "Finite size scaling" techniques provide a link between the physical analogies suggested by phase transitions and the design of practical algorithms.

Functional Dependencies in Horn Theories, by Toshihide Ibaraki, Department of Applied Mathematics and Physics, Graduate School of Engineering, Kyoto University, Kyoto 606, Japan, Alexander Kogan, RUTCOR, Rutgers University, New Jersey, and Kazuhisa Makino, Department of Systems and Human Science, Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka 560, Japan. (TU 11:30)

Functional dependencies are commonly used in relational database design to express integrity constraints. The most popular way of representing knowledge in artificial intelligence is by means of a Horn theory. For any given Horn theory, we study the structure of the set of functional dependencies, prove that it is quasi-acyclic, and provide polynomial algorithms for the recognition of whether a functional dependency holds, and for the generation of some representative sets of functional dependencies.

Gains from Concurrenting of the Constraint Solving, by Richard Krajcoviech, and Margareta Kotocova, Slovak University of Technology, Ilkovicova 3, 812 19 Bratislava, Slovak Republic. (MO 9:00)

Difficulty of the search-based solving of the constraint satisfaction problems is under strong investigation. Already, it is well known that the hardest searches appear over some problems that are in the underconstrained region. Although they are rare there, they significantly affect a mean search cost.

The talk focuses on the concurrent execution of several instances of a search algorithm. The concurrency-control mechanism over the instances is investigated. A new, non-uniform time-sharing mechanism will be presented. If applied over a search algorithm, it is able to reduce significantly the number of the hardest problems in the underconstrained region.

Power of the exposed mechanism is proved experimentally. The well-known minimal-remaining-values heuristic, if used with the new time-sharing mechanism, is more efficient than the more sophisticated, Brelaz's heuristic, although the simpler heuristic is so poor in single evaluation that the experiments did not end in a reasonable time. All experiments use the graph three-colouring problem.

Type Grammar Revisited, by J. Lambeck, McGill University, Montreal, Canada. (MO 17:00)

A protogroup  is an ordered monoid in which each element a has both a left proto-inverse al such that ala ¾ 1 and a right proto-inverse ar such that aar  ¾ 1. We explore the assignment of elements of a free protogroup to English words as an aid for checking which strings are well-formed sentences.  By a pregroup we mean a protogroup which also satisfies 1 ¾ ala and 1 ¾ aar, rendering al a left adjoint and ar a right adjoint of a.  A pregroup is precisely a poset model of classical non-commutative linear logic in which the tensor product coincides with the dual.

An Information-Theoretic Approach to Data Mining, by Mark Last, and Oded Maimon, Department of Industrial Engineering, Tel-Aviv University, Tel-Aviv 69978, Israel. (TU 9:00)

We suggest a novel, information-theoretic statistical approach to one of the important problems of Data Mining - finding a minimal set of input (predictive) attributes explaining a target (classification) attribute.  The method is useful for knowledge discovery in relational databases with many irrelevant and noisy attributes of mixed nature (including continuous, ordinal, nominal, and binary-valued features).

A multilevel information-theoretic connectionist network is constructed to evaluate the mutual information of input and target attributes in a database.  Each hidden layer represents an interaction of an input attribute with other input attributes.  A stepwise gradient procedure is developed to select, at each iteration, next input attribute providing the maximum decrease in the conditional entropy of a target attribute (maximum increase in the mutual information between the input attributes and the target attribute).  A likelihood-ratio test is applied to checking the statistical significance of the entropy change.  When the change becomes insignificant, the search stops.

The final hidden layer is used to predict values of a target attribute. Meaningful rules are extracted from the network connection weights and ranked by their contribution to the mutual information between the input attributes and the target attribute.  Thus, the most informative rules can be discovered automatically.

Due to the distribution-free nature of the information-theoretic calculations, the method is especially efficient when applied to multi-attribute relations, where some input attributes may be dependent or conditionally dependent, given a target attribute.  We conclude the paper with describing the application of the developed approach to three real-life data sets from different application domains:  a credit approval database, a manufacturing database and a university registration database. 

Nonlinear Regularization, by Joerg C. Lemm, Institut fuer Theoretische Physik I, Wilhelm-Klemm Str.9, D-48149 Muenster, Germany. (TU 9:30)

Function approximation with a finite number of training data requires the specification of additional restrictions to allow generalization. Those appear as additional terms in the regularization functional. Usually they measure the degree of invariance under  symmetry operations relevant for input noise, like smoothness in the case of infinitesimal translations. From a Bayesian point of view this represents prior knowledge which, for example, has to be extracted from verbal statements by fuzzy methods.

A probabilistic AND can be implemented in a quadratic approximation, as it is done in most commonly used regularization functionals, yielding stationarity equations linear in the function values. Utilizing also knowledge in form of a probabilistic or fuzzy XOR, however, yields nonquadratic functionals with nonlinear stationarity equations. In an analogy to physics two such approaches are discussed, one using a mixture model corresponding to a system at finite temperature, another specifying effective interactions similar to the Landau--Ginzburg treatment of phase transitions.

A numerically solved model is used to examplify the two theoretical approaches and to discuss iteration procedures for solving the nonlinear stationarity equations.

Computational Learning by an Optical Thin­Film Model , by Xiaodong Li, School of Environmental and Information Sciences, Charles Sturt University, Albury, NSW 2640, Australia, and Martin Purvis Computer and Information Science, University of Otago Dunedin, New Zealand. (TU 11:30)

This paper describes a computational learning model inspired by the technology of optical thin­film multilayers from the field of optics. It can be used as an alternative to the widely used neural network models. With the thicknesses of thin­film layers serving as adjustable ``weights'' for the computation, the optical thin­film model is capable of approximating virtually any kind of nonlinear mapping. The thin­film multilayer structure and its computational algorithm are presented. A pattern recognition problem is used as an example to demonstrate the model's learning capability.

Automating the Finite Element Method: a Test Bed for Soft Computing Methods, by Larry Manevitz, and Dan Givoli, Department of Computer Science, University of Haifa, Israel. (SU 9:15)

The finite element method is a numerical method for the solution of partial differential equations with boundary conditions.   However, it is not a "black-box" system;  good numerical results are dependent on the use of high level human expertise.Thus, on the high level, it looks like an "expert system" style problem.  However, more careful inspection shows this global problem is sub-divided into a variety of optimization problems; each potentially solvable by various soft computing (and more standard) techniques. So a solution replacing all or some of the human expertise is a fairly sophisticated hybrid system, which can be implemented in whole or in part.

In addition, this architecture and the variety of problems to be solved lends the finite element method as a "test-bed" for the various techniques because:  (1)  the various sub-problems can be solved in different ways; these partial solutions are practically independent of each other and thus one solution can be substituted for another in the global system.  (2) the finite element method itself is a tool heavily used to solve serious applications - thus we have an automatic interface to "non-toy" problems. Evaluation of the efficacy of solutions can be done on a variety of "real" problems;  essentially every partial differential equation gives an application. Thus, for example, one can choose problems with analytic solutions to give a "gold-standard" for evaluation purposes.

So far we have implemented and evaluated two such sub-systems, (an expert system for efficient node numbering, and a self-organizing neural network for the optimal automatic placement of a mesh).  Underway is  a system for the automatic adaptation of a mesh for time varying equations. (This is important for solving shock waves, fluid flows in cavities, airflow around airplanes, etc.)  Additional sub-problems are identified together with potential soft-computing solutions.

Forecasting electricity demand using gated neural networks and statistical pruning, by D. Morgan Mangeas, National Research Institute on Transport and Security, France. (MO 14:30)

Linear or nonlinear global model are known to be well suited to problems with stationnary dynamics.  However, the assumption of stationnarity is violated in many real-world phenomena.  In the field of forecasting time series, an important sub-class of nonstationnarity is piece-wise stationnarity where the the series switches between different regimes. For example, regimes of electricity demand depend on the seasons, and regimes of financial forecasts (grown and recession) depend on the economy. Adressing these problems, we present a class of models for time series prediction that we call mixture of experts.  They were introduced into the connectionnist community by R. Jacon, M. Jordan, S Nowlan and G. Hinton in 1991. In our model, we use nonlinear gating network gated nonlinear experts. The input space can be split nonlinearly through the hidden units of a gating network and the learned sub-processes can be nonlinear through the hidden units of the expert networks.

The basic idea, behind mixture of experts, is simple: simultaneously, the gating networks learn to split the input space, and the experts learn local features from the data. The problem is that the splitting of the input space is unknown because the only information available is the next value of the series. This requires blending supervised and unsupervised learning. The supervised component learns to forecast the next value, the unsupervised component discovers the hidden regimes. The trade-off between flexibility in the gates and flexibility in the experts is an important degree of freedom in this model class.

This model allows a soft partitioning of the input space and adaptative noise levels (variances) of the experts. It is is promising in terms of prediction and analysis. In particlar three elements yield deeper understanding of the underlying process:

  • the analysis of the individual experts (e.g., average predictability in the different regimes, and sensitivity of the importance of the inputs),
  • the analysis of the gate (e.g. the correlation between the outputs and the auxiliary variables),
  • the analysis the noise levels, associated to each experts.
After describing the mathematical framework, we detail and analyze the forecasting model of the electricity demand in France, using mixture of experts and a statistical pruning method called SSM.

Introduction to the Sessions on Artificial Neural Networks, by Eddy Mayoraz IDIAP, Switzerland. (Sunday)

Two special sessions on Artificial Neural Networks will be held at the Conference. These sessions are devoted to combinatorial and complexity issues arising in the field of neural networks.

There is an abundant literature studying the ability of neural networks to approximate any arbitrary real valued function. A consequence of these powerful results states for example that a feedforward neural network with

  • one hidden layer made of units computing a sigmoidal function of a linear combination of the inputs,
  • one output unit computing a linear combination of the hidden units,
can approximate any arbitrary real valued function defined on a compact set of the Euclidian space.

Lately, some papers were also devoted to the relationship between the number of hidden units and the quality of the approximation.

A similar topic consists of studying the ability of feedforward neural networks with threshold units (heavyside functions instead of sigmoidal functions) to recognize a subset of the Euclidian space, or equivalently, to compute exactly its characteristic function. If depth-3 networks can recognize any arbitrary region of the Euclidian space, the set of regions recognizable by a depth-2 network is not well understood and its characterization generated a few interested papers in the last 5 years.

The aim of the first session is to present the state of the art in this field with a geometrical and combinatorial flavor. The first two papers are devoted to the characterization of the regions that can be recognized by a depth-2 feedforward neural network with threshold gates. The last paper addresses the computational complexity of recognizing such regions.

Artificial neural networks, and in particular feedforward networks, are effectively used nowadays to solve real life applications. However, among the predominant criticisms, the slowness of the training process is always mentioned. To cope with this drawback, numerous improvements of the standard training algorithms have been proposed and alternative architectures have been developed. To have a deep understanding of the potentialities and limits of a new computational model, a study of the computational complexity of its parametrization for solve a particular problem is essential.

The second session starts with two talks on computational complexity issues arising in two different and particular architectures of neural networks. It ends with the presentation of third type of neural networks, known as mixture of experts or gated neural network, and its application to an industrial problem is described.

On the computational complexity of recognizing regions classifiable by a 2-layer perceptron, by Eddy Mayoraz IDIAP, Switzerland. (SU 9:45)

This work is concerned with the computational complexity of the recognition of LP2, the class of regions of the Euclidian space that can be classified exactly by a two-layered perceptron.

First, the difficulty of the general question is illustrated by a very simple region V known to be in LP2 but for which there is no known two-layered perceptron of polynomial size recognizing V.

Then, some subclasses of LP2 of particular interest are studied. A natural restriction of LP2 consists in limiting the hidden units of the two-layered perceptron to those computing essential half-spaces of the region to be classified. The complexity of the class of regions recognizable by such model is discussed in different framework.

Iterated differences of polyhedra have been proposed in the literature as being elements of LP2. The complexity of deciding whether a region is an iterated difference of polyhedra is also studied.

Automatic and Practical Equational Deduction, by William McCune, Mathematics and Computer Science Division, Argonne National Laboratory. (SU 11:00)

This talk is about the use of automatic deduction programs to solve problems and explore new areas in mathematics and logic. Three successful cases are presented in the areas of (1) algebraic geometry, where an equational inference rule was formulated to attack some problems about cubic curves, (2) axiomatization of Boolean algebra, where the Robbins conjecture has been proved automatically, and (3) quantum logic, where new bases for ortholattices are being explored.  Some of the difficulties that arise when applying the methods to richer theories are given.

Planning and Presenting Construction Proofs, by Erica Melis, Universität des Saarlandes, Fachbereich Informatik,  D-66041 Saarbrücken, Germany. (SU 9:45)

Automated theorem provers, such as OTTER  have gained considerably strength. In general, however, traditional automated theorem proving suffers from exhaustive search in super-exponential search spaces while humans can cope with long and complex proofs and have strategies to avoid less promising proof paths. Moreover, automated theorem provers produce proofs that can hardly be understood by humans.

Now proof planning seems to be a promising framework for systems that truly assist mathematicians and students because it can produce hierarchically structured proofs whose steps comprise chunks of proofs. These steps in proof plans are called methods because they are intended represent mathematical methods such as induction or diagonalization.

We show how knowledge-based proof planning needs relatively little search and produces comprehensible results for proofs in which mathematical objects are constructed, in particular proofs of limit theorems such as LIM+ or LIM*. 

A method used here, LimHeuristic,  asserts the existence of an auxiliary variable M and reduces an inequality goal to subgoals in which M occurs. Other methods reduce inequality goals directly by passing them to a constraint solver.

The construction of mathematical objects with certain properties can be supported by constraint reasoning. Therefore, we integrate a constraint solver into  the proof planner. This constraint solver incrementally restricts the range of (existentially quantified) variables and yields a final constraint store from which an answer assertion can be extracted by a projection algorithm. The constraints are projected onto those  variables for which  witnesses have to be constructed in the proof.

There are two ways of presenting a proof that involves constructions: one that states the definition of an object and another that provides the construction of the object. The latter  communicates more information and  should, hence, be preferred in teaching proofs.

Hence, we  provide two different styles of proof presentation:

  1. one that starts with the final constraint store (let ...) and
  2. another  presentation follows the proof plan whose nodes are annotated by constraint states. This plan is a high-level description on how to find the proof including the construction of δ.

The first presentation (textbook style) still needs to be extracted from the proof plan. This is achieved in the following way.  The expansion of the proof plan methods yields a proof. This proof has to be augmented by  an answer assertion that captures the projection of the final constraint store.  The resulting ND-level proof can serve as a  basis for OMEGA's natural language presentation by proverb which in the near future will produces a textbook style proof in English with the beginning `Let δ be ...'. In such a proof presentation, however, the information on how δ has been constructed is not present any more.

Formal Interoperability, by José Meseguer, Computer Science Laboratory, SRI International, Menlo Park, CA 94025. (MO 13:30)

At present, formal methods for software specification and verification are monolithic, in the sense that in each approach only one formal system or specification language is used to formalize the desired system properties. For this reason, although formal approaches are very useful at their appropriate level of abstraction, their formal systems, and the tools based on them, are as it were autistic, because they lack the theoretical basis for being related to other formalisms and to their supporting tools. As a consequence, it is at present extremely difficult to integrate in a rigorous way different formal descriptions, and to reason across such descriptions. This situation is very unsatisfactory, and presents one of the biggest obstacles to the use of formal methods in software engineering because, given the complexity of large software systems, it is a fact of life that no single perspective, no single formalization or level of abstraction suffices to represent a system and reason about its behavior. In practice we find ourselves in constant need of moving back and forth between different formalizations capturing different aspects of a system. For example, as pointed out by Walter and Bellman, in the simulation area this need is felt very strongly because of the different types of mathematical models and corresponding constraints that a simulation must satisfy. Similarly, in a large software system we typically have very different requirements, such as functional correctness, proper real time behavior, concurrency, security, and fault­tolerance, which correspond to different views of the system and that are typically expressed in different formal systems. Often these requirements affect each other, but it can be extremely difficult to reason about their mutual interaction, and no tools exist to support such reasoning. We need mathematical foundations, (meta­)formal methods, and tools to achieve Formal Interoperability that is, the capacity to move in a mathematically rigorous way across the different formalizations of a system, and to use in a rigorously integrated way the different tools supporting such formalizations. Since the most basic need appears at the level of relating and integrating different mathematical formalisms, the foundations required must be metamathematical in nature. That is, they must provide rigorous foundational answers to the questions:

What is a logical formalism?

How can different formalisms be related?

In this paper I briefly sketch recent work on meta­logical foundations that seems promising as a conceptual basis on which to achieve the goal of formal interoperability. Specificaly, I will briefly discuss: (i) the theory of general logics , and the associated notion of map between logics; (ii) the notion of a meta­logical framework , and the use of rewriting logic for this purpose; (iii) the idea of a meta­logical module calculus to combine specifications across different formalisms; and (iv) the concept of reflection in general logics and in rewriting logic as a very powerful technique to achieve formal interoperability. At the mathematical level, category theory provides very good methods to articulate all these concepts. In fact, this is entirely natural, since formal interoperability has essentially to do with mappings between different logical systems. Therefore, I also sketch some ways in which category theory is very useful as a foundation.

Supported by Office of Naval Research Contracts N00014­95­C­0225, N00014­96­C­0114, and NSF Grant CCR­9633363.

Phenomenon Oriented Grammar: Category Theory Meets Formal Linguistics, by M. Andrew Moshier, Department of Computer Science, Mathematics and Physics, Chapman University, Orange, CA. (MO 14:00)

Phenomenon Oriented Grammar (POG), a grammar formalism being developed jointly at Chapman University and the University of the Saarland, emphasizes universal linguistic claims, as opposed to the analyses of individual signs according to those universals. This emphasis leads to the notion that category theory provides a more suitable foundation for formal linguistics than the traditional foundation of graphs (trees and feature structures). In this talk, I will criticize feature-based grammar formalisms from this perspective and argue that a category-theoretic approach to the semantics of grammar formalisms yields a richer stock of tools with which to explain linguistic universals. Via a brief introduction to POG, I will demonstrate several pointsat which the category-theoretic perspective has influenced the design of the formalism and several examples where linguistic theory internal concepts, e.g., the Head Feature Principle of HPSG, are better explained by category-theoretic mechanisms.

Combining a logical and an analogical framework for route generation and description, by Bernard Moulin, and Driss Kettani, Computer Science Department, Pouliot Building, and Research Center of Geomatics, Casault Building, Laval University, Ste Foy (QC) G1K 7P4, Canada. (MO 14:00)

Route generation is an ideal subject for studying the interactions of spatial and temporal reasoning in a practical setting. Starting from a cognitive study of pedestrian route descriptions in urban environments generated by human subjects, we identified the main requirements for a knowledge-based-system that would generate and describe routes in a cognitively plausible way. In this paper we present the main characteristics of the GRAAD system which provides an integrated logical and analogical framework for reasoning about space and time when generating and describing the routes of a virtual pedestrian (VP) in a virtual urban environment. The analogical framework is based on the notion of spatial conceptual map (SCM) which is used to specify a spatial environment composed of landmark objects and medium objects (streets, avenues, etc.) and to simulate the displacements of VP in that environment. A module uses this analogical framework to generate candidate paths between any two positions located on medium objects of the SCM. The analogical model is used to simulate the topological properties of the SCM and to determine possible paths in it. The logical framework is used to formalize an agent's reasoning when it tries to determine a route in a SCM based on "decision criteria" ensuring that the resulting route description will be cognitively acceptable to users. Examples of decision criteria are: "minimize route length", "minimize orientation changes", "use most salient objects as landmarks". The logical framework must enable the agent to reason about VP's possible moves  along ways in the SCM in order to establish a route between the positions containing the origin and destination points of the route. To this end, we need a temporal framework that allows considering alternative courses of actions: we use a first-order, multi-modal, branching time logic which is a first-order variant of CTL*, Emerson's Computational Tree Logic (1990), extended to a possible-worlds framework. The logical framework is based on a possible worlds approach used to carry out temporal reasoning and to generate plans that are used to compute VP's displacements in the SCM and to generate natural language descriptions of these displacements. These descriptions are cognitively plausible because the SCM's medium objects are partitioned into areas which can be directly described using verbal expressions similar to those used by people when describing routes.

Parallel Cooperative Propositional Theorem Proving, by Fumiaki Kamiya Okushi, University of California at Santa Cruz. (MO 9:00)

A parallel satisfiability testing algorithm called Parallel Modoc is presented. Parallel Modoc uses Modoc as a building block. Modoc is an implementation of Model Elimination, extended by a mechanism to prune away branches that cannot lead to a successful (sub)refutation. The pruning information is encoded in a partial truth assignment called an autarky. As a descendant of Model Elimination, Modoc also includes a mechanism to record successful (sub)refutations as lemmas and to recall them as necessary.

In contrast with many of the other parallel SAT testers, Parallel Modoc uses parallel processing to allow processes to cooperate by sharing lemmas and autarkies as they are found. Combining autarkies generally is not straightforward because two autarkies found by two separate processes may have conflicting assignments. The paper presents an algorithm to combine two arbitrary autarkies to form a larger autarky.

Parallel Modoc executes multiple instances of Modoc as separate processes. When a Modoc process finds a new autarky or lemma, apart from saving it for its own future use, it sends the information out to other Modoc processes. On receiving the information, a Modoc process combines it with local information in a consistent way and henceforth uses it as if it had been found locally.

Experimental results obtained on up to 8 processor systems are reported. For many of the formulas, Parallel Modoc achieves significant super-linear speedup. Problems that could not be solved in an hour by Modoc were often solved by Parallel Modoc in the order of minutes, and in some cases, in seconds.

Lemma and Cut Strategies for Propositional Model Elimination, by Allen Van Gelder and Fumiaki Kamiya Okushi, University of California at Santa Cruz. (MO 16:30)

This paper describes new ``lemma'' and ``cut'' strategies that are efficient to apply in the setting of propositional Model Elimination. It builds upon the C­literal strategy proposed by Shostak, and studied further by Letz, Mayr and Goller. Previous strategies for managing lemmas and C­literals in Model Elimination were oriented toward first­order theorem proving. The original ``cumulative'' strategy re­ members lemmas forever, and was found to be too inefficient. The previously reported C­literal and unit lemma strategies, such as ``strong regularity'', forget them unnecessarily soon in the propositional domain. An intermediate strategy, called ``quasi­persistent'' lemmas, is introduced. Supplementing this strategy, methods for ``eager'' lemmas, and two forms of controlled cut provide further efficiencies. The techniques have been incorporated into Modoc, which is an implementation of Model Elimination, extended with a new pruning method that is designed to eliminate certain refutation attempts that cannot succeed. While these strategies are not necessary for the theoretical completeness of Modoc, experiments show them to increase the efficiency substantially, in practice. Experimental data based on an implementation in C is reported. On random 3CNF formulas at the ``hard'' ratio of 4.27 clauses per variable, Modoc is not as effective as recently reported model­searching methods. On more structured formulas from applications, such as circuit­fault detection, it is superior. This performance is achieved in spite of the fact that Modoc incorporates almost no heuristics to guide its searches.

Pattern Recognition using Artificial Neural Networks with White Noise, by J. M. Blackledge, and A. Osanlou, Department of Mathematical Sciences, De Montfort University, Leicester, UK. (TU 12:00)

This paper discusses the most significant findings of a comprehensive study on the use and suitability of Artificial Neural Networks (ANNs) for process modelling and control. Using a pattern recognition problem associated with a fluid flow process control unit commonly used in the Petrochemicals industries, this study has shown that ANN may be suitable for solving many process control engineering problems particularly in modelling control applications in a noisy environment. They have already been used successfully as non-linear filters to extract signals from noise. In contrast, conventional approaches suffer from sharp increases in computation and implementation costs with non-linearity. As a result such approaches (e.g. the general lest-mean-square method extended on the basis of a truncated Wiener-Voltaire series) have only been used in weak non-linear cases. Rather than filtering the noise as a pre-processing operation, this work has investigated and analysed the effect of adding noise. To accomplish this, two principal objectives were set, namely,to address process control issues regarding noise and to investigate and evaluate mapping and generalising attributes of various paradigms for neuro-control solutions. These objectives have been achieved and in doing so have provided the following important conclusions:

  1. The whole process of applying feed forward neural networks to a process control application, particularly in a noisy environment has been experimentally analysed. Such networks are inherently stable as no information is fed back during operation.
  2. By extending the static networks with feedback connections, the utility of dynamic networks in neuro- control applications were practically examined. When conducting the experiments, it was found that training these networks with feedback is more problematic (e.g. in terms of stability). However, such recurrent networks offer a great potential for further research efforts in process control solutions.
  3. Neural networks robustness to noisy data has been analysed. By adding a significant amount of noise to each element of all the input training vectors independently, a robust weight set was developed.  On testing, it was found that careful use of slightly noisy data effectively improved the network's performance in terms of generalisation and learning as well as causing a reduction in the number of training cycles. Adding a small amount of noise randomly, forced the networks to train on a series of approximations of the input patterns. This drove the networks to generalise, resulting in an improvement in their performance.  In other words the added noise actually helped the networks in their pattern recognition task. Since noise is a major problem for real world applications, this is a particularly important result.
From the neuro-control viewpoint, such robustness to noisy data that occurs in real world applications is an important attribute. It indicates the neural networks potential in solving problems where conventional solutions have proved inadequate, if not impossible.  A reasonable conclusion to be drawn from the experimental work carried out for this paper, would recommend process control as a promising area for further analysis and development of neuro-control systems. 

Multilayer neural nerworks and polyhedral dichotomies, by Claire Kenyon, LRI, Universite de Paris-Sud, France, and Hélène Paugam-Moisy, LIP, Ecole Normale Superieure de Lyon, France. (SU 9:15)

We study the number of hidden layers required by a multilayer network with threshold units to compute a dichotomy f from Rd to {0,1}, defined by a finite set of hyperplanes. We show that this question is far more intricate than computing Boolean functions, although this well-known problem is underlying our research. We present recent advances on the characterization of dichotomies, from Rd to {0,1}, which require two hidden layers to be exactly realized.

Generation and comparison of decision strategies for solving satisfiability problems, by Robert Rodosek, IC-Park, Imperial College, London SW7 2AZ, England. (MO 14:30)

The paper presents a framework for building algorithms yielding even better worst-case performance of the current well-known algorithms for solving satisfiability problems. We demonstrate how to generate algorithms with a good average- and worst-case performance.

About arc-consistency in semiring-based constraint problems, by Stefano Bistarelli, and Francesca Rossi, Dipartimento di Informatica, Corso Italia 40, 56125 Pisa, Italy. (MO 10:00)

Semiring-based constraint problems (SCSPs) [1] extend classical constraint problems (CSPs) by allowing preferences, costs, priorities, probabilities, and other non-crisp features. They are based over the notion of a semiring, that is, a set plus two operations (+ and *), in the sense that each combination of values for some variables connected by a constraint has a semiring value associated with it, which is to be interpreted as its level of preference. Then, the two operations are used, respectively, to project out some variables and to combine constraints. It has been shown [1,2] that classical local consistency techniques, which are very successful in CSPs in discovering inconsistencies and speeding up the search for a solution, can be extended to the general SCSP framework if the * operation is idempotent. Here we study the properties and possible advantages of the extension to SCSPs of the most successful of such techniques, called arc-consistency, although we claim that all our results can be extended to higer levels of consistency. In particular, we study the relationship between classical arc-consistency (AC) and semiring-based arc-consistency (SAC) over SCSPs, showing that they discover the same inconsistencies. Thus sometimes it may be reasonable to apply just classical AC instead of SAC because AC is more efficient. Moreover, we show that, if an SCSP is SAC, then

  • some hidden variables can be removed without changing its set of solutions and associated semiring values,
  • if its constraint graph is tree-shaped, an optimal solution can be found without backtracking, and
  • non-tree-shaped SCSPs can be solved by performing backtracking only on a subset of the variables (those involved in a cycle-cutset). 
These last three results are extensions of similar results already existing for the CSP case, which however here are particularly interesting both because of the generality of the SCSP framework, and also because SCSPs usually express constraint optimization, and not just satisfaction, problems.
  1. S. Bistarelli, U. Montanari, and F. Rossi. Constraint Solving over Semirings. In: Proc. IJCAI95. Morgan Kaufman, 1995.
  2. S. Bistarelli, U. Montanari, and F. Rossi. Semiring-based Constraint Solving and Optimization. In JACM, vol.44, n.2, pp. 201-236, March 1997.

On the Characteristic Models of Boolean Functions, by Dan Roth, University Of Illinois at Urbana-Champaign. (SU 17:00)

The characteristic models of a Boolean function are the extreme models of the function with respect to various order relations on the Boolean cube. This notion has been shown useful in the study of Boolean functions and has found applications in knowledge representation and reasoning (reasoning with models), learning theory, database theory (relational data bases) and data mining. I will present some old and new results on characteristic models and their usage in representing Boolean functions.

An Algorithm for the Class of Pure Implicational Formulas, by John Franco, University of Cincinnati, Judy Goldsmith, University of Kentucky, John Schlipf, University of Cincinnati Ewald Speckenmeyer, Universität zu Köln, and R. P. Swaminathan, University of Cincinnati. (MO 10:00)

Peter Heusch, in his doctoral dissertation at Cologne, investigated testing falsifiability for pure implicational formulas --- formulas built up from proposition letters using only ==> --- containing at most k occurrences of one proposition variable z and at most two occurrences of each other proposition variable.  He showed that (1) testing falsifiability for such formulas is NP-complete, and (2) testing falsifiability for such formulas can be performed in time O(nk), where n is the length of the formula.

Better understanding of how features of a problem contribute to its complexity, it is hoped, may lead to a better understanding, for example, of the classes P and NP and the differences between them.  We asked whether Heusch's algorithm is optimal; in particular, is Heusch's problem fixed parameter tractable? An algorithm with a distinguished parameter k is fixed parameter tractable, as defined by Downey and Fellows, if the algorithm takes time O(g(k)p(n)), where p is a polynomial and g is an arbitrary total function.  Heusch's algorithm is not fixed parameter tractable, under the parameter k above, since the degree of the polynomial increases with k.

We chose to work instead with formulas built up with the set ==>, f of connectives, where f is a predicate constant representing ``false,'' since this set is well-known to be a complete.  Parallel to Heusch's class of formulas, we consider formulas with at most 2 occurrences of each proposition letter and k occurrences of f.  Heusch showed that the difficulty of his problem is maximized when z is forced to be false in any falsifying assignment, making his problem reducible to ours.  From Heusch's result the analogous NP-completeness result, and an O(nk+1) time bound for testing satisfiability, follow trivially.

We present an O((k-1)kn2) algorithm for testing satisfiability of such formulas.  An O(kk+1n2) algorithm for testing falsifiability follows trivially.  It follows that Heusch's problem is indeed fixed parameter tractable.

On the Complexity of Computing and Learning with Networks of Spiking Neurons, by Wolfgang Maass, and Michael Schmitt, Institute for Theoretical Computer Science, Technische Universität Graz, Klosterwiesgasse 32/2, A--8010 Graz, Austria (SU 14:00)

In a network of spiking neurons a new set of parameters becomes relevant which has no counterpart in traditional neural network models: the time that a pulse needs to travel through a connection between two neurons (henceforth called ``delay'' of a connection). It is known that these delays are tuned in biological neural systems through a variety of mechanisms. We investigate the VC-dimension of networks of spiking neurons where the delays are viewed as ``programmable parameters'' and we prove tight bounds for this VC-dimension. Thus we get quantitative estimates for the diversity of functions that a network with fixed architecture can compute with different settings of its delays. It turns out that a network of spiking neurons with $k$ adjustable delays is able to compute a much richer class of Boolean functions than a threshold circuit with $k$ adjustable weights. The results also yield bounds for the number of training examples that an algorithm needs for tuning the delays of a network of spiking neurons. Results about the computational complexity of such algorithms are also given. \end{abstract}

Switching Portfolios, by Yoram Singer, AT&T Labs, Room A277, 180 Park Avenue, Florham Park, NJ 07932. (Canceled)

Recently, there has been work on on­line portfolio selection algorithms which are competitive with the best constant rebalanced portfolio determined in hindsight [2, 6, 3]. By their nature, these algorithms employ the assumption that high yield returns can be achieved using a fixed asset allocation strategy. However, stock markets are far from being stationary and in many cases the return of a constant rebalanced portfolio is much smaller than the return of an ad­hoc investment strategy that adapts to changes in the market. In this paper we present an efficient portfolio selection algorithm that is able to track a changing market. We also describe a simple extension of the algorithm for the case of a general transaction cost, including a fixed percentage transaction cost which was recently investigated [1]. We provide a simple analysis of the competitiveness of the algorithm and check its performance on real stock data from the New York Stock Exchange accumulated during a 22­year period. On this data, our algorithm outperforms all the algorithms referenced above, with and without transaction costs.

A Hybrid Concept Language, by Patrick Blackburn, and Miroslava Tzakova, Max-Planck Institute fur Informatik, Im Stadtwald, 66123 Saarbrucken, Germany. (TU 10:00)

We introduce a concept language for representing knowledge.  Exploring and extending the well-known correspondences between modal and description logics (and in particular, the fact that ALC is a notational variant of the multi-modal logic K(m)) we define a hybrid language extending K(m).

Our language is called `hybrid' because, as well as the familiar modal operators, it also contains variables across individuals and a quantifier that binds these variables.  The individual variables (which in their free variable form are known in the modal literature as `names' or `nominals') are interpreted by singleton sets.  Syntactically, they are used exactly as ordinary propositional variables: they can be combined with Boolean and modal operators in the standard way.  Semantically, however, by identifying singletons with the individuals they contain, we are free to view our individual variables as first-order variables and to give an essentially classical treatment of variable binding and quantification.  The resulting system is thus a hybrid of the best of modal and classical logic: it combines the naturalness of the modal notation with the power of variable binding.

As a consequence of being a hybrid of modal and first-order ideas, the language increases the expressivity of ALC in a natural and uniform way.  In particular, it can define number restrictions N and collections of individuals O and hence possesses at least the expressivity of ALCNO.  It also has the expressivity required to define concept and role membership assertions. In fact, the hybrid language is essentially a general formalism in which a variety of descriptive concepts can be formulated in a natural way.

The purpose of this paper is to establish the basic meta-theoretic results for the hybrid language. In particular, we present a complete axiomatisation, and show how so-called witnessed models can be used as a bridge between modal and first-order completeness techniques.

Diagnosing Double Regular Systems, by Endre Boros, and Tonguç Ünlüyurt, RUTCOR, Rutgers University. (TU 12:00)

We consider the problem of testing sequentially the components of a double regular system, when the testing of each component is costly. Generalizing earlier results about k­out­of­n systems, we provide a polynomial time algorithm for the most cost­effficient sequential testing of double regular systems. The algorithm can be implemented to work efficiently both for explicitely given systems, and for systems given by an oracle.

Efficient Graph Search by a Smell-Oriented Vertex Process, by Israel A. Wagner, IBM Haifa Research Lab, Haifa, Israel, and Michael Lindenbaum, and Alfred M. Bruckstein, Technion, Haifa, Israel. (Canceled)

Efficient graph search is a central issue in many aspects of AI. In most of existing work there is a distinction between the ``searcher'', which both executes the algorithm and holds the memory, and the``searched graph'', over which the searcher has no control at all. Large dynamic networks like the Internet, where the nodes are powerful computers and the links have narrow band and are heavily-loaded, call for a different paradigm, in which most of the burden of computing and memorizing is moved from the searching agent to the nodes of the network. In this paper we suggest a method for searching an undirected graph using the Vertex-Ant-Walk method, where an a(ge)nt walks along the edges of a graph G, occasionally leaving pheromone traces at nodes, and using those traces to guide its exploration. We show that the ant can cover the graph within time O(nd) where n is the number of vertices and d the diameter of G. The use of traces achieves a trade-off between random and self-avoiding walks, as it can give lower priority to already visited neighbors. Further properties of the suggested method are:

  • modularity : several cooperating search-agents can share the effort with minimal explicit communication between them.
  • good limit cycle : a Hamiltonian path in G (if one exists) is a limit cycle of the process.

An Automated Conversion of Documents Containing Math into SGML, by Janusz Wnek, and Robert Price, Science Applications International Corporation, 1953 Gallows Road, Vienna, VA 22182. (Canceled)

Intelligent document understanding (IDU) systems convert scanned document pages into an electronic format which preserves layout and logical document structure in addition to document content.  Most of the IDU experimental systems, however, lack the capability of full exploitation of recognition results, i.e., the reconstruction and   utilization of complete documents. In this paper we present an integrated IDU system that processes documents all the way from recognition to full utilization   using standard generalized markup language (SGML). SGML recognizes that document data, structure, and format are separable elements. The   data in a document may include text, graphics, images and even multimedia objects. The structure of a document refers to the relationship among the   data elements. The format of a document is its appearance. An SGML document has an associated document type definition (DTD) that specifies the rules for the structure of the document. SGML preserves the data and structure, but does not specify the format of the document. The standardization and widespread use of SGML-based tools provides the means for filling the gap between document recognition and seamless document reuse. The proposed system was designed with the SGML application in mind. From the early stages of document processing the SGML components are used and incrementally constructed. Following SGML convention, in order to convert a document from a paper format to SGML, its logical structure and content have to be mapped to a DTD. To   facilitate recognition of the structure, the current version of our system assumes   that the paper document was prepared in accordance with some formatting guidelines. These guidelines define standard phrases, ordering, and formatting rules that produce flexible yet consistent document structure,   with elements that have either direct or indirect correspondence to the DTD.  Directly mapped elements have phrases or formatting styles that correspond to SGML tags. Indirectly mapped elements require special processing and are exemplified here by tabular data and mathematical expressions. The conversion process involves OCR of a multi-page document, document structure analysis, processing of indirectly mapped elements, and generation of the final SGML description. Document structure analysis is reduced to parsing OCR results and recreating document structure by performing fuzzy searches for standard phrases and some format analysis. Tabular data processing utilizes OCR results with positional data, horizontal lines and heuristic rules to determine cell boundaries and contents. Recognition of mathematical expressions involves OCR on an extended symbol set, and equation structure recognition. The equation recognition engine employs a tree   representation to store intermediate transformations. The transformations are ordered   and involve connecting of separated symbols, context-sensitive OCR correction, extraction of horizontally aligned subexpressions, subscript   and superscript processing, and a general processing of symbols detected above or under the target symbol. The system was tested on examples of documents in the standard format. The test cases included tabular data and mathematical equations. The results can be demonstrated using the Panorama Pro SGML browser.

Categories and Problem Solving, by Robert Zimmer, Brunel University, London, England. (MO 14:30)

A state space can be viewed as a finite concrete category.  Once this has been done various decomposition theorems for concrete categories can be applied, enabling a decomposition of the problem solving task into a sequence of smaller tasks.  In this paper, we explain a decomposition theorem that is derived from a notion of kernels of category morphisms.  The theorem is then applied to some standard AI problem spaces, including the Missionaries and Cannibals.