This is a short writeup (in advance) of my remarks in a panel presentation at the Constructive Induction Workshop at the 1994 International Conference on Machine Learning.   I believe this is what appeared in the working notes of the workshop.


Constructive Induction Needs a Methodology based on Continuing Learning

Richard S. Sutton
GTE Laboratories


Everyone knows that good representations are key to 99% of good learning performance.  Why then has constructive induction, the science of finding good representations, been able to make only incremental improvements in performance of machine learning systems?


People can learn amazing fast because they bring good representations to the problem, representations they learned on previous problems.  For people, then, constructive induction does make a large difference in performance.  The difference, I argue is not the difference between people and machines, but in the way we are assessing performance.


The standard machine learning methodology is to consider a single concept to be learned.  That itself is the crux of the problem.  Within this paradigm, constructive induction is doomed to appear a small, incremental, second-order effect.  Within a single problem, constructive induction can use the first half of the training set to learn a better representation for the second half, and thus potentially improve performance during the second half.   But by then most of the learning is already over.  Most learning occurs very early in training.  It may be possible to detect improvements due to constructive induction in this paradigm, but they will always be second order.  They will always be swamped by first-order effects such as the quality of the base learning system, or, most importantly, by the quality of the original representation.


This is not the way to study constructive induction!  We need a methodology, a way of testing our methods, which will emphasize, not minimize, the effect of constructive induction.  The standard one-concept learning task will never do this for us and must be abandoned.  Instead we should look to natural learning systems, such as people, to get a better sense of the real task facing them.  When we do this, I think we find the key difference that, for all practical purposes, people face not one task, but a series of tasks.  The different tasks have different solutions, but they often share the same useful representations.


This completely breaks the dilemma facing facing contructive induction, which now becomes a first order effect.  If you can come to the nth task with an excellent representation learned from the preceding n-1 tasks, then you can learn dramatically faster than a system that does not use constructive induction.  A system without constructive induction will learn no faster on the nth task than on the 1st.  Constructive induction becomes a major effect, a 99% effect rather than a 1% effect.  Most importantly, we now have a sensitive measure of the quality of our constructive induction methods, a measure unpolluted by tricky issues such as the original learner or the original representation.  All those things are factored out.  For the first time we will see pure effects due to changes in representation.  This, I hope, would enable us to evaluate our methods better and lead to faster progress in the field.



Finally, let me note that an explicit sequence of tasks is not necessary (though it may be the easiest way to do this).  One could also have one task that continues to drift over time, the objective then being to track it as closely as possible as it changes.  The real need is for a task which involves CONTINUAL learning, rather than a single learning event.  For example, one could have a learning problem in 100 features, 5 of which are particularly important because  the concept drifts in terms of its dependence on those features.  A constructive induction method could then identify those 5 features and make the continual learning most sensitive to them, thus coming to track the concept much better.



My own published work relevant to this is my paper in AAAI-92 and my paper with Steve Whitehead in ML93.  The latter proposed a particular learning framework that might be a good one for looking at this sort of thing.