Having presented Shapiro's Model Inference System, we can now address some of the questions posed in the introduction:
An immediate, if partial, answer to this is that logic has a very simple and regular semantics which is intimately reflected in it's syntax. This is exemplified in Section 3, where ``refinement'' is characterised by a simple syntactic criterion.
In terms of the error diagnosis tools, pure Prolog has the advantage that it only has one way of getting work done: viz, unifying a goal with the head of a clause. Combined with the lack of side-effects, this implies that we can think about predicates purely in terms of their input/output behaviour. In other words, we can (mostly) treat predicates as black boxes and only unfold those that are known to be buggy. This is in contrast to C, where the presence of side-effects allow the source of a bug to be syntactically quite distant from where it manifests.
Finally, for the purposes of the Model Inference System itself, adding or removing definite clauses have a predictable and incremental effect on a theory. Again, this follows from the simple proof-theoretic semantics of pure Prolog.
The most obvious atypicality of MIS is its reliance on an oracle for the concept it is trying to induce in addition to the training dataset. While this requirement is perfectly reasonable for abstract symbolic domains (such as inducting grammars of programming languages or axiomatising mathematical theories), it fails dismally when the domain has an a priori unknown structure. Concretely, this implies that MIS and indeed any supervised learning technique would be completely useless for, e.g., inducting a grammar for whale songs [Ric00].
The presence of noise in the training data implies that there will be some overlap between the sets of positive and negative data. As first-order logic does not pretend to deal with contradictions in a very sophisticated way6, it is not at all obvious how to retain the logical properties so effectively exploited by MIS in their presence.
MIS must make efficient use of the oracle if it is to induce larger-than-toy programs. As our implementation is quite naive and resource-hungry on even small examples, we have no empirical data on this front.
The other issue that needs to be addressed is limiting the exponential growth in the refinement graph so that interesting programs can be induced in a reasonable amount of time. One way of doing this is to introduce meta-information such as type and mode annotations for the predicate and function symbols in . Biasing the search almost always leads to incompleteness, which can be problematic.
These last three issues indicate that we must go beyond pure classical first-order logic to find a learning algorithm usable on practical, non-trivial domains. The most obvious recourse is the same as that for the physical sciences: apply statistics. See [MR99] for some recent work on this.
Some concepts cannot be learnt with a single set of clauses that all have the same head symbol (that is, define a single predicate). An example of this is anywhere where two or more levels of recursion (analogous to nested or sequenced loops) are required to express the relation, such as in successor-arithmetic multiplication, and a list-of-lists to lists flattening transformation.
Hence, it is necessary that auxilliary predicates can be introduced. If the learning algorithm can do this automatically, it is said to do predicate invention.
MIS will not create such theoretical terms on it's own initiative, although it can do multi-predicate learning - which is to learn several predicates as part of the one learning task. Additionally, MIS can either learn these predicates concurrently, or learn one and then use it as background knowledge for a later one. In either case, the learning task must be structured, as the oracle must be able to answer queries on every predicate in the target theory.
FOIL [QCJ93] is a well known ILP system that has quite distinct properties from MIS.