This chapter presents a number of projects that deal with diagnostic and tutoring issues in problem-solving contexts:
The FLOW tutor: structured memory organization
Teaching strategies expressed in terms of schemataOne of the main tenet of Norman and Gentner's cognitive theory (Norman et al., 1976) is that memory is organized as a network of Schemata or prototype concepts. According to them a schema is a named frame with a list of slots that can be instantiated with values. Since values for slots can be pointers to other schemata, memory can be viewed as a semantic network of connected schemata. This is called an active structural network so as to convey the idea that it can contain both procedural and factual knowledge. In this view of memory organization, the aim of teaching can be defined as fostering the construction of these networks by supporting the acquisition of new schemata, the connection of new schemata to existing ones, and the revision of incorrect ones. Two types of pedagogical strategy can be derived from these structural characteristics, contrasting in the kinds of intermediate knowledge states they generate:
Tutors built on top of active structural networks
For factual knowledge, they represent the history of the American Civil War as a network of event-schemata in which events are causally linked (Norman, 19769. To interact with the student, the system travels along the links, supporting interactions very similar to SCHOLAR's.
For procedural knowledge, the domain is FLOW, a very simple programming language that can be learned in a matter of a few hours ( Gentner et al., 1974; Gentner and Norman, 1977; Gentner, 1977, 1979). Here the active structural network is used to represent programming knowledge and to interpret the student's programs in terms of this knowledge. The schemata are organized hierarchically: at the top are elementary sections of the instructional booklet that the student follows, at the bottom are the keys that he can press. The system's interpretation of a program takes the form of a tree, which connects the individual characters of the program text entered by the student to a schema of functional specifications.
The FLOW tutor's diagnostic mechanism takes advantage of the hierarchical organization of schemata to give an interpretation of the student's program. As the tutor observes every character entered by the student, high-level schemata make predictions about likely next keystrokes and, when predictions are not met, low-level schemata search for possible interpretations. These schemata trigger possible hierarchical parents, which are placed on an agenda. When these parents are in turn activated, they look for further confirmation in the lower schemata and for their own parent. Therefore the programming knowledge of the system is actually contained in the links between schemata.
Since buggy schemata for common errors are included in the hierarchy to catch mistakes, FLOW's knowledge representation can be considered a precursor of later theories of bugs. Furthermore, inasmuch as the relations between schemata can be viewed as planning knowledge, this is an early attempt at using a hierarchical set of "planning" methods to diagnose problem-solving activities. The interplay of top-down expectations and bottom-up reconstruction is also a common feature of a number of diagnostic plan analyzers. However, the FLOW tutor can hardly be said to analyze plans since FLOW programs are all extremely simple.
For unreported reasons, the FLOW tutor project was abandoned before significant results could be achieved in terms of usable systems. This is unfortunate, because the idea of building tutors around a theory of memory organization is very interesting.
SPADE: toward a tutor based on a theory of designThe SPADE project grew out of LOGO as a way to concentrate on the problem solving performed in the process of program design, and to conceive a tutor that could provide guidance to students in their programming attempts.
Ideas for a theory of designThe theme of SPADE project is Structured Planning and Debugging that Goldstein undertook with his student Mark Miller (Goldstein and Miller, 1976b; Miller and Goldstein, 1976a, 1977a). The purpose is to build a programming tutor based on a general theory of planning and debugging that would capture the essential ingredients of the program design experience (Goldstein and Miller, 1976a). Although the ultimate goal of constructing a complete learning environment was never fully realized, the results achieved in the process are interesting in their own right.
A linguistic view of the design processGoldstein and Miller borrow a formalism from linguistics to suggest context-free grammars for modeling the problem-solving process. They feel that the hierarchical organization of grammar rules makes for a more structured model, in which rewrite rules will stand for planning decisions, and the lexicon will consist of the programmer's observable actions. The vocabulary used to describe the different stages of the problem-solving process is based on a taxonomy of concepts involved in both planning and debugging, with the hope that both aspects can be captured in a unified theory.
Limitations of context-free grammars:
Diagnostic concerns: parsing protocols with PATN
In addition to its value as a performance model for pedagogical guidance, PATN was also to be used for diagnostic purposes. PAZATN is an interactive Protocol AnalyZer (Miller and Goldstein, 1976c), which cooperates with an expert to parse protocols of actual problem-solving sessions. It relies on a PATN parser to come up with hypotheses for the planning strategies applied by the programmer. When presented with the protocol of a problem-solving session, the PATN parser produces a set of possible derivation trees whose branches are annotated with global considerations. PAZATN then selects the most likely hypotheses.
From a diagnostic standpoint, PAZATN seems to ignore the difficult issue of recognizing and diagnosing errors in the plan whether they are due to incorrect planning actions or to the incorrect application of correct planning actions or to the incorrect application of correct planning actions.
To account for individual differences, Miller and Goldstein (1977b) propose to perturb an archetypical grammar representing expert knowledge into a tailored grammar representing the student's planning knowledge. This is equivalent to inferring her language of primitive actions and decisions, and addresses the fundamental issue of domain representation in an interesting way since the learning of grammars is an active research issue. However, methods and constraints to make search for an individual grammar manageable are unfortunately left unspecified, leaving the idea unsupported.
SPADE-0: a plan-based editorSPADE-0 (Miller and Goldstein, 1976b; Miller, 1979) is a first step toward implementing a tutoring system based on the above theory. The purpose of SPADE-0's editor is to encourage good design strategies by making decision processes explicit. To this end, it interacts with the student mainly through questions and answers in terms of plans, debugging techniques, choices among next possible steps, and design alternatives. It uses the vocabulary developed for the formal model, with words like "decompose" or "reformulate". It also includes concepts for the episodes of a program, like the "setup" or the "interface". Only at the lowest level does it deal with actual code. The fact that the user must learn this specialized vocabulary can be seen both as a difficulty and as an advantage. On the one hand it requires extra work, but on the other it forces a dialogue at the problem-solving level within the structure of a formal theory, rather than at the level of the syntax of a programming language.
In a programming session with SPADE-0, the purely top-down, left-to-right design process of the theory is not imposed on the student, who can start in the middle of the "Sequential", and can postpone parts of the problem. In fact, experiments with the system have revealed that programmers do not follow a strict top-down method, but usually make design decisions "in that order which minimizes the probable scope of future modifications" (Miller, 1979, p. 127). The student's planning decisions are recorded in a decision tree that represents a developmental history of the program. The editor functions as a bookkeeper, updating this tree and allowing the student to access, expand, or revise any node. In this way, the process of refining the plan into a program is explicitly viewed as a succession of trials and repairs, very much in the theory-building tradition of LOGO. A partial rule-based model of task-independent problem-solving expertise gives the editor a limited capability to advise the student on which step to take next. However, Miller describes these tutoring capabilities as rather ad hoc, and SPADE-0 does not build individual student models.
Since SPADE-0 was conceived only as an experimental "limited didactic system," many extensions would have been necessary for the project to become a full-size tutor that took advantage of al the machinery of the theory. However, it seems that its successors were never realized. Indeed, a remarkable feature of SPADE-0 is that the theory is not only a background that supports the tutoring but is brought to the foreground in the interaction with the student. This high-level dialogue is well in line with LOGO's Piagetian view of the student as epistemologist.
The MACSYMA ADVISOR: plans and beliefsThe ADVISOR built by Michael Genesereth (1977, 1978) for his doctoral dissertation, deals with the use of plans in a problem-solving context. The goal here is to build an on-line consultant, geared towards providing a reactive environment with intelligent feedback à la SOPHIE. The ASVISOR's domain is the complex mathematical package MACSYMA. Users can define sequences of mathematical operations for MACSYMA to perform, but they often need help. The user questions most commonly encountered in protocols of human consultants can be divided into five types of requests. The user is asking for:
Since the ADVISOR is merely helping the user in her problem-solving attempt rather than teaching her, understanding the user's approach well is critical for aligning comments or corrections. Therefore, much effort was directied toward designing the MUSER model and using this model to interpret the user's actions. The fifth type of question ---- explanation of behavior ---- is particularly delicate and important. Indeed, novices are often puzzled by results returned by MACSYMA because they are unfamiliar with the semantics of MACSYMA's operations. When a user asks for help in such cases, the ADVISOR needs to give a form of feedback that reflects a context broader than merely that of the current operation. In contrast with the usual local error messages and in-line help systems, it is to act as an intelligent consultant: it must be able to explicate unexpected responses in terms of the user's actual intentions and to provide appropriate advice.
Like Miller and Goldstein, Genesereth views plan recognition as an instance of a parsing problem: the trace of the user's actions is lie a sentence that must be parsed in terms of the fixed set of planning methods. However, Genesereth ignores the semantic and pragmatic considerations expressed in PATN. Instead, the ADVISOR concentrates on the relation between the user's goal and her beliefs about MACSYMA. To this end, the ADVISOR's parser considers two key factors in addition to its set of planning methods: the constraints provided by the dataflow between operations and the subgoals generated by previous operations. The dataflow between the operations must be propagated up through the planning methods as the dependency graph is constructed, whereas subgoal expectations must be propagated down. These constraints, along with some heuristics, guide the associations of subgoals and methods that structure the graph in an alternation of bottom-up recognition and top-down expectation.
Inferring the user's beliefs about the domain allows the ADVISOR not only to provide pointed remediation, but also to involve the user in choosing between competing hypotheses, once possible plans have been identified. In fact, interaction with the user plays an important role in the diagnosis, and supplements purely inferential processes with direct interactive feedback even before all possible candidate plans have been considered. In contrast with the SPADE-0 editor, the ADVISOR does not try to enforce reasoning about plans by interacting with the user in terms of a planning vocabulary. On the contrary, it completely avoids talking directly about plans in esoteric terms, since it merely uses them as a diagnostic tool. Instead, after selecting a likely candidate plan heuristically, it directly asks the user about her beliefs concerning operations in the domain of MACSYMA, as suggested by assumptions explicitly mentioned in the the plan.
This interactive approach to confirming diagnosis is attractive for two reasons. first, it involves the user early in the process. Second, it keeps the dialogue at the level of beliefs, where misconceptions are thought to occur, hiding the actual diagnostic process from the user. Once the ADVISOR has detected and confirmed misconceptions, it can correct them and offer the user the alternative most closely in line with her own plan.
Unfortunately, the ADVISOR has only been tested with knowledge of three different problems, and must therefore be viewed as a feasibility study. At this level it was successful, since it was able to reconstruct plans in the simple cases with which it was presented; but doubts persist about its generality, since even within a single domain like MACSYMA, the set of user beliefs is not easily bounded.
When tested in the context of an actual introductory course, MENO-II's relatively simple matching scheme, which ignores the process of program development and issues of control and dataflow, failed to correctly diagnose a large portion of bugs in student programs.
As a first step toward a generative theory of bugs for programming constructs, Bonar and Soloway study the influence of these functional and syntactic relations on programming errors committed by novices. While this program is probably too buggy for any syntactically oriented program-analysis system to make sense of, it reflects a misconception on the student's part that is both identifiable and explainable if one considers transfers from pre-existing knowledge.
From these analyses, Bonar and Soloway (1985) present a tentative catalog of bug generators for movice programming. In the style of REPAIR theory, these bug generators combine a "patch" with an impasse, though the absence of a complete model makes the definitions of both phenomena less precise.
These abbstract bug types are admittedly not complete generative explanations, and much nore work is required before a full generative theory is available.
The interface with the student avoids problems of natural-language processing
by the use of a set of informal programming languages. menus of
phrases are presented on the screen for the student to select from in composing
sentences. Informal programs at each level are then formed by different
combinations of these sentences. The use of these key phrases allows the
tutor to recognize mnot only what portion of the target program the student
is working on, but also at which stage in the theory of planning knowledge
she is forming a plan. In this way, tutoring can be adapted to help the
student complete the current level and move to the next one.
This setup has two diagnostic values:
After the fourth level has been completed, the student moves to a new phase curing which she constructs a visual solution by piecing individual plans togehter. This plan-level phase makes external use of plans that are similar to those used internally by PROUST, described in the next section.
BRIDGE needs more exposure to students to determine whether they find the framework overly confining, since they are forced through a planning process that they may or may not find natural. The most interesting feature of this experimental system is that it provides another example of an environment explicitly articulating an underlyin study of performance in the domain. As in SPADE's plan-based editor, the theory is brought to the foreground to become an instrument of communication between the tutor and the student. In the case of BRIDGE, the vocabulary not only deals with performance, but follows a study oriented toward the genesis of programming knowledge.
PROUST searches for the most plausible interpretaion of the program with respect to the specifications. To this end, it needs to infer a plausible design process that replays the programmer'sintentions. Hence, the theme of PROUST's method is analysis ba synthesis. The method combines reconstruction of intentions with detection of bugs. both must occur togehter, because bugs can lead to misinterpretations of intentions, and intentions are necessary to distinguish bugs from unusual but correct code.
An interpretation can be seen as a sophisticated kind of parse tree since it is defined as a mapping of the code onto the specifications via a hierarchical goalstructure for the program. the space of possible interpretations in which PROUST conducts its search is organized into three layers according to the design stages mentioned above. At the top are the various possible decompositions of the specifications into goals and subgoals, then the plans that could be selected as implementation mehtods for each one, and finally the different ways in which plans can match the code. In view of the wide variability of novice programs and of the possibility of bugs at all three levels, the interpretaion space is quite large even for relatively simple programs.
With this knowledge, PROUST thries to construct an interpretationfor the program to be analyzed. Starting with a goal agenda derived from the problem specifications, PROUST selects successive goals for analysis, though not necessarily in the order in which they were attached ba the student. To optimize its depth-first gemeration of a plausible goal decomposition, it gives priority to global contol structures that macimize the coherence of its growing interpretation.
Hypothesized plans are then evaluated according to how well they fit in the context of the overall interpretation. Attempts are also made to resolve partial mismatches with bug or transformation rules. A few intermal critics watch for inconsistent hypotheses. Finally, in a form of differential diagnosis, competing hypotheses are compared to one another, at the ottom as to how much code they can explain, and at the top as to how severe they assume the student's misconceptions to be.The best one explains the most data with the most conservative assumptions.
PROUST was tested on the first syntactically correct versions of the rainfall program produced by 206 students. 89% of these programs contained bugs. PROUST was fully confident about its interpretation in 81% of cases, and found only 4% of the programs inpossible to comprehend. PROUST runs into problems in that more compex assignments leave more design decisions to students, who then have to elaborate the problem requirements creatively. On PROUST's part, less detailed specifications require a grater ability to infer goal decompositions primarily from the code. Since such bottom-up analysesare extremely difficult, Johnson (1987b) is now considering ways of allowing students to discuss their intentions explicitly with the diagnostic system.
In comparison with other program analyzers described i this chapter, PROUST's main contribution is its handling of multiple goals. The FLOW tutor, SPADE, and the ADVISOR assume that all the operations to be accounted for can be reduced to a sungle high-level goal via a simple hierarchy. This assumption is possible because they only deal with very simple programs. In real programming tasks, even those presented i introductory courses, the specifications generate multiple, often interacting, goals. By addressing this issue, PROUST's explicit reconstruction of the programmer's intentions constitutes an important advance whose ramifications are not limited to computer programming. Of course, whether PROUST's current approach will be able to cope with large classes of problems remains to be seen. We have just mentioned some difficulties that result from its reliance on specifications to guide its reasoning about goals. An additional problem with PROUST's primarily top-sown approach is in the way it recognizes progrmming structures. Because of its limited ability to perceive functional intent by inspection, it is still overly dependent on syntactic clues.
Necertheless, PROUST couples serious engineering concerns with the investigations of program design that followed MENO-II, reaching a point where the theory may well be able to handle real tasks. PROUST's reasoning in terms of interntions not only gives leverage to diagnosis, but provides the kind of information needed for effective remedial action. With this depthe of analysis, a tutoring module based on PROUST, now in development at Yale (Littman et al., 1986), should be able to point to the cause rather than the manifestation of bugs and to address remedial issues in the context of the student's apparent intentions.
From an engineering standpoint, it is worth noting that MENO-TUTOR achieves an interesting modularity, which makes for a clean framework for implementation. First, it separates discourse strategies from domain knowledge and language generation ---although the exact nature of the interface between these parts has not yet been well defined. Then, within the strategy, it deals with local decisions and global changes of context using different mechanisms, which are connected by thir references to the sane discourse management network.
The main purpose of MENO-TUTOR is to serve as a generic tool for exploring barious tutorial strategies. The hierarchical network provides a set of tutorial prinitives with default sequences, so that a variety of pedagogical approaches can be generated by the addition of different metarules. Although the dialogues generated in the tess were all very short, they were able to cover small topics in many different but coherent ways when differnt sets of metarules were tried. This ability to maintain coherence under changes of metarules is largely attributable to the articulation of tutorial dicourse management provided by MENO-TUTOR's prinitives. It is this avility that makes the scheme a good candidate as testbed for exploring the space of dicourse strategies. However, the articulation is only behavioral: there is still no mechanism to explicitly represent the communicatio principles on which decisions are vased and interpret them in specific dialogue situations. These principles are still merely embodied by the set of arcs and metarules.
|structure||search process||specialty||mehtod of detecting errors|
|FLOW tutor||utilizes schemata organized in an active structural network||works on-line and uses an interplay of expectations and comfirmations||performs its analyses on-line||triggers buggy schemata|
|SPADE||views a plan as a parse tree||its linguistic view of plans as parse trees suggests a variety of parsing techniques.||incorporates global aspects of the design process, which bring to bear constraints external to purely hierarchical planning.|
|ADVISOR||constructs a dependency graph||builds its dependency graph in a bidirectional fashion, propagating constraints derived from dataflow information.||seeks to infer the user's beliefs about the domain and to engage in a dialogue in terms of these beliefs.||can infer a restricted class of mistaken beliefs directly from the data.|
|PROUST||attaches plans to a goal hierarchy||two phases:
first, it constructs an initial set of hypotheses in a mostly top-down search;
second, it performs a differntial analysis that takes intl account the interplay of intentions from above and the coverage of data from below, along with the severity of hypothesized errors.
|is able to cope with interactions between multiple goals, and had shown promise in real instructional settings.||uses buggy plans as well as bug rules theat resolve plan differnces during the matching process. MENO-II and PROUST bothe hypothesize misconceptions about the domain heuristically, using prstored catalogs.|
The MENO project has also evolved into a study of tutorial dialogues reminiscent of WHY's Socratic rules. In MENO-TUTOR, domain-independent discourse strategies are organized in a discourse management network: that is, a hierarchical augmented transition network divided into layers of abstractions. States represent tutorial primitives, and local strategies are setoff by transitions between these states. In addition to these local decisions, the scheme has facilities for specifying metarules that can change the context when the global discourse situation warants it. This has resulted in short dialogues that remain coherent despite changes in global strategy.