The results in the previous slides were done by training on a dataset that someone classified for us, i.e., a poor student (or two, or three) sat for weeks and labeled each webpage as a faculty, student, etc.  We then used that model, unchanged, to deal with our reasoning process.  But do we really expect that our agents will get this nicely categorized training data for every task that it would have to consider?  Do we really want our agent to stop learning once it hits the real world?

Fortunately, our ability to perform general purpose reasoning, as in these examples, can also be used to bootstrap our learning.  How?  Let’s consider our university domain.  There, no-one tells us how bright students are, or how hard classes are.  All we have are grades.  Assume that our agent starts out with a pretty bad model for our university domain, whether elicited by hand, or randomly selected.  We can use the model in our standard reasoning algorithm, and figure out how likely each student is to be intelligent, and how likely each class to be difficult.  These are not good conclusions, but they give us one possible “filling in” for the data.  We can then us that filled in data, as if it were real, to train a better model, using the standard learning techniques that I mentioned earlier.  We can then reiterate this process.

Somewhat surprisingly, it not only converges, but to something that’s actually pretty good.  In the end, the model is pretty close, but even more impressive, the likely values of the intelligence of the students and the difficulty of the classes are exactly right!