staf11

Personal Home Page de Travaux Find me?

Problem solving and design: diagnostic plan analysis

This chapter presents a number of projects that deal with diagnostic and tutoring issues in problem-solving contexts:

the FLOW tutor,
the SPADE project,
the MACSYMA ADVISOR, and
some systems developed in the context of the MENO project, includingBRIDGE, and MENO-TUTOR.

The FLOW tutor: structured memory organization

The first project examined here was undertaken by two psychologists: Donald Gentner and Donald Norman at the Center for Human Information Processing (CHIP) at University of California at San Diego, who turned to the task of building an intelligent tutor in order to test and explore their theories about human memory organization.

Teaching strategies expressed in terms of schemata

One of the main tenet of Norman and Gentner's cognitive theory (Norman et al., 1976) is that memory is organized as a network of Schemata or prototype concepts. According to them a schema is a named frame with a list of slots that can be instantiated with values. Since values for slots can be pointers to other schemata, memory can be viewed as a semantic network of connected schemata. This is called an active structural network so as to convey the idea that it can contain both procedural and factual knowledge. In this view of memory organization, the aim of teaching can be defined as fostering the construction of these networks by supporting the acquisition of new schemata, the connection of new schemata to existing ones, and the revision of incorrect ones. Two types of pedagogical strategy can be derived from these structural characteristics, contrasting in the kinds of intermediate knowledge states they generate:

In linear teaching, the network grows by increments consisting of full nodes. This is similar to the notion of frontier teaching in the genetic graph.
In web teaching ( Norman, 1973, 1976), a skeleton network covering the entire subject matter is introduced first ( i.e., a form of overview). Then successive increments consist of increasingly complete levels of detail. Here, the frontier of learning determines the depth rather than the breadth of knowledge.

Tutors built on top of active structural networks

Gentner and Norman have started to build two tutors: one for factual knowledge and the other for procedural knowledge. For both tutors, the idea was to have a computer-based system and a human teacher cooperate at first, and thence to progressively implement the system so that it would take over the human teacher's functions.

For factual knowledge, they represent the history of the American Civil War as a network of event-schemata in which events are causally linked (Norman, 19769. To interact with the student, the system travels along the links, supporting interactions very similar to SCHOLAR's.

For procedural knowledge, the domain is FLOW, a very simple programming language that can be learned in a matter of a few hours ( Gentner et al., 1974; Gentner and Norman, 1977; Gentner, 1977, 1979). Here the active structural network is used to represent programming knowledge and to interpret the student's programs in terms of this knowledge. The schemata are organized hierarchically: at the top are elementary sections of the instructional booklet that the student follows, at the bottom are the keys that he can press. The system's interpretation of a program takes the form of a tree, which connects the individual characters of the program text entered by the student to a schema of functional specifications.

The FLOW tutor's diagnostic mechanism takes advantage of the hierarchical organization of schemata to give an interpretation of the student's program. As the tutor observes every character entered by the student, high-level schemata make predictions about likely next keystrokes and, when predictions are not met, low-level schemata search for possible interpretations. These schemata trigger possible hierarchical parents, which are placed on an agenda. When these parents are in turn activated, they look for further confirmation in the lower schemata and for their own parent. Therefore the programming knowledge of the system is actually contained in the links between schemata.

Since buggy schemata for common errors are included in the hierarchy to catch mistakes, FLOW's knowledge representation can be considered a precursor of later theories of bugs. Furthermore, inasmuch as the relations between schemata can be viewed as planning knowledge, this is an early attempt at using a hierarchical set of "planning" methods to diagnose problem-solving activities. The interplay of top-down expectations and bottom-up reconstruction is also a common feature of a number of diagnostic plan analyzers. However, the FLOW tutor can hardly be said to analyze plans since FLOW programs are all extremely simple.

For unreported reasons, the FLOW tutor project was abandoned before significant results could be achieved in terms of usable systems. This is unfortunate, because the idea of building tutors around a theory of memory organization is very interesting.

SPADE: toward a tutor based on a theory of design

The SPADE project grew out of LOGO as a way to concentrate on the problem solving performed in the process of program design, and to conceive a tutor that could provide guidance to students in their programming attempts.

Ideas for a theory of design

The theme of SPADE project is Structured Planning and Debugging that Goldstein undertook with his student Mark Miller (Goldstein and Miller, 1976b; Miller and Goldstein, 1976a, 1977a). The purpose is to build a programming tutor based on a general theory of planning and debugging that would capture the essential ingredients of the program design experience (Goldstein and Miller, 1976a). Although the ultimate goal of constructing a complete learning environment was never fully realized, the results achieved in the process are interesting in their own right.

A linguistic view of the design process

Goldstein and Miller borrow a formalism from linguistics to suggest context-free grammars for modeling the problem-solving process. They feel that the hierarchical organization of grammar rules makes for a more structured model, in which rewrite rules will stand for planning decisions, and the lexicon will consist of the programmer's observable actions. The vocabulary used to describe the different stages of the problem-solving process is based on a taxonomy of concepts involved in both planning and debugging, with the hope that both aspects can be captured in a unified theory.

Limitations of context-free grammars:

The top-down view of problem solving that these grammars embody is a substantial simplification in that it ignores the benefits of bottom-up exploration.
The problem-solving decisions are not really context-free: they depend on the semantics of the problem, like the expected form of the solution, and on pragmatic considerations, like the existence of reusable code.

Diagnostic concerns: parsing protocols with PATN

In a formalism called PATN ( for Planning Augmented Transition Networks), Miller and Goldstein (1977a) use a set of global registers to contain problem-specific parameters (e.g., the problem description) and use global assertions associated with arcs in the parse tree to include pragmatic information (e.g., testing for the existence of a library subroutine).

In addition to its value as a performance model for pedagogical guidance, PATN was also to be used for diagnostic purposes. PAZATN is an interactive Protocol AnalyZer (Miller and Goldstein, 1976c), which cooperates with an expert to parse protocols of actual problem-solving sessions. It relies on a PATN parser to come up with hypotheses for the planning strategies applied by the programmer. When presented with the protocol of a problem-solving session, the PATN parser produces a set of possible derivation trees whose branches are annotated with global considerations. PAZATN then selects the most likely hypotheses.

Limits:
From a diagnostic standpoint, PAZATN seems to ignore the difficult issue of recognizing and diagnosing errors in the plan whether they are due to incorrect planning actions or to the incorrect application of correct planning actions or to the incorrect application of correct planning actions.

To account for individual differences, Miller and Goldstein (1977b) propose to perturb an archetypical grammar representing expert knowledge into a tailored grammar representing the student's planning knowledge. This is equivalent to inferring her language of primitive actions and decisions, and addresses the fundamental issue of domain representation in an interesting way since the learning of grammars is an active research issue. However, methods and constraints to make search for an individual grammar manageable are unfortunately left unspecified, leaving the idea unsupported.

SPADE-0: a plan-based editor

SPADE-0 (Miller and Goldstein, 1976b; Miller, 1979) is a first step toward implementing a tutoring system based on the above theory. The purpose of SPADE-0's editor is to encourage good design strategies by making decision processes explicit. To this end, it interacts with the student mainly through questions and answers in terms of plans, debugging techniques, choices among next possible steps, and design alternatives. It uses the vocabulary developed for the formal model, with words like "decompose" or "reformulate". It also includes concepts for the episodes of a program, like the "setup" or the "interface". Only at the lowest level does it deal with actual code. The fact that the user must learn this specialized vocabulary can be seen both as a difficulty and as an advantage. On the one hand it requires extra work, but on the other it forces a dialogue at the problem-solving level within the structure of a formal theory, rather than at the level of the syntax of a programming language.

In a programming session with SPADE-0, the purely top-down, left-to-right design process of the theory is not imposed on the student, who can start in the middle of the "Sequential", and can postpone parts of the problem. In fact, experiments with the system have revealed that programmers do not follow a strict top-down method, but usually make design decisions "in that order which minimizes the probable scope of future modifications" (Miller, 1979, p. 127). The student's planning decisions are recorded in a decision tree that represents a developmental history of the program. The editor functions as a bookkeeper, updating this tree and allowing the student to access, expand, or revise any node. In this way, the process of refining the plan into a program is explicitly viewed as a succession of trials and repairs, very much in the theory-building tradition of LOGO. A partial rule-based model of task-independent problem-solving expertise gives the editor a limited capability to advise the student on which step to take next. However, Miller describes these tutoring capabilities as rather ad hoc, and SPADE-0 does not build individual student models.

Since SPADE-0 was conceived only as an experimental "limited didactic system," many extensions would have been necessary for the project to become a full-size tutor that took advantage of al the machinery of the theory. However, it seems that its successors were never realized. Indeed, a remarkable feature of SPADE-0 is that the theory is not only a background that supports the tutoring but is brought to the foreground in the interaction with the student. This high-level dialogue is well in line with LOGO's Piagetian view of the student as epistemologist.

The MACSYMA ADVISOR: plans and beliefs

The ADVISOR built by Michael Genesereth (1977, 1978) for his doctoral dissertation, deals with the use of plans in a problem-solving context. The goal here is to build an on-line consultant, geared towards providing a reactive environment with intelligent feedback à la SOPHIE. The ASVISOR's domain is the complex mathematical package MACSYMA. Users can define sequences of mathematical operations for MACSYMA to perform, but they often need help. The user questions most commonly encountered in protocols of human consultants can be divided into five types of requests. The user is asking for:

Directions: " How do I construct a matrix?"
Factual information:" What are the arguments of the function COEFF?"
Verification of facts: 2 Is it the case that RATSINP can be used for expanding with respect to a variable?"
Explanation of method: "How does MACSYMA invert a matrix?"
Explanation of behavior: " Why did this operation return 0?"

In the ADVISOR, each type of question is handled by a specialized module. The sources of information available are a semantic-net representation of declarative knowledge with some inferential capabilities, and a problem-solving model ( called MUSER, for MACSYMA User). Even though most of these modules were implemented, the ADVISOR is only an experimental system that lacks an interface and never reached the point of being made available to MACSYMA users.

Since the ADVISOR is merely helping the user in her problem-solving attempt rather than teaching her, understanding the user's approach well is critical for aligning comments or corrections. Therefore, much effort was directied toward designing the MUSER model and using this model to interpret the user's actions. The fifth type of question ---- explanation of behavior ---- is particularly delicate and important. Indeed, novices are often puzzled by results returned by MACSYMA because they are unfamiliar with the semantics of MACSYMA's operations. When a user asks for help in such cases, the ADVISOR needs to give a form of feedback that reflects a context broader than merely that of the current operation. In contrast with the usual local error messages and in-line help systems, it is to act as an intelligent consultant: it must be able to explicate unexpected responses in terms of the user's actual intentions and to provide appropriate advice.

The dependency graph: a plan annotated with assumptions

Genesereth (1982) starts by formalizing the notion of a plan as a dependency graph. Here mental operations are annotated with assumptions about their applicability, their input, and their expected effects. These annotations stand for beliefs about MACSYMA's operations held by the user. The MUSER model's mental operations are a set of annotated planning methods, which the user is assumed to be employing in interacting with MACAYMA. Thus, a dependency graph can be considered a proof that a sequence of actions achieves a goal, with respect to a model of MACSYMA's operations and a model of problem solving.

Like Miller and Goldstein, Genesereth views plan recognition as an instance of a parsing problem: the trace of the user's actions is lie a sentence that must be parsed in terms of the fixed set of planning methods. However, Genesereth ignores the semantic and pragmatic considerations expressed in PATN. Instead, the ADVISOR concentrates on the relation between the user's goal and her beliefs about MACSYMA. To this end, the ADVISOR's parser considers two key factors in addition to its set of planning methods: the constraints provided by the dataflow between operations and the subgoals generated by previous operations. The dataflow between the operations must be propagated up through the planning methods as the dependency graph is constructed, whereas subgoal expectations must be propagated down. These constraints, along with some heuristics, guide the associations of subgoals and methods that structure the graph in an alternation of bottom-up recognition and top-down expectation.

Inferring misconceptions and interactive diagnosis

This propagation of dataflow constraints between planning methods is crucial to inferring misconceptions. Here is how it works: as the plan is being formed bottom-up, these constraints climb the derivation tree; when they reach an assumption or a retrieval operation, they provide information about the user's beliefs and the contents of her database. The ADVISOR can then search for modifications to these assumptions that will be consistent with these propagated constraints and will complete the parsing process. This is a powerful scheme, which does not require a priori knowledge of expected misconceptions. In practice, the ADVISOR's database includes a few prestored misconceptions that are incorporated into planning methods, but Genesereth (1982) claims that this model-driven recognition of common misconceptions is dome only for efficiency. In general, the mechanisms of dataflow propagation is sufficient. Of course, this assumes that MUSER's set of planning methods covers the user's and that all errors are due to misconceptions about MACSYMA and not to buggy planning methods.

Inferring the user's beliefs about the domain allows the ADVISOR not only to provide pointed remediation, but also to involve the user in choosing between competing hypotheses, once possible plans have been identified. In fact, interaction with the user plays an important role in the diagnosis, and supplements purely inferential processes with direct interactive feedback even before all possible candidate plans have been considered. In contrast with the SPADE-0 editor, the ADVISOR does not try to enforce reasoning about plans by interacting with the user in terms of a planning vocabulary. On the contrary, it completely avoids talking directly about plans in esoteric terms, since it merely uses them as a diagnostic tool. Instead, after selecting a likely candidate plan heuristically, it directly asks the user about her beliefs concerning operations in the domain of MACSYMA, as suggested by assumptions explicitly mentioned in the the plan.

This interactive approach to confirming diagnosis is attractive for two reasons. first, it involves the user early in the process. Second, it keeps the dialogue at the level of beliefs, where misconceptions are thought to occur, hiding the actual diagnostic process from the user. Once the ADVISOR has detected and confirmed misconceptions, it can correct them and offer the user the alternative most closely in line with her own plan.

Unfortunately, the ADVISOR has only been tested with knowledge of three different problems, and must therefore be viewed as a feasibility study. At this level it was successful, since it was able to reconstruct plans in the simple cases with which it was presented; but doubts persist about its generality, since even within a single domain like MACSYMA, the set of user beliefs is not easily bounded.

MENO: debugging and tutoring The MENO project started in the late seventies at the University of Massachusetts at Amherst, as an ambitious attempt to build and intelligent tutor for novice Pascal programmers. The project's goals were to diagnose nonsyntactic errors in simple programs, to connect these bugs to underlying misconceptions, and to tutor the student with respect to these misconceptions. MENO-II: error-oriented program analysis After the first system, MENO-II, a diagnostic system that specializes in the analysis of loops and related variables, concentrates on the goals of detecting bugs and of relating them to underlying misconceptions. First, the "Bug Finder" parses a student's program into a parse tree that is matched against a simple description of the solution. This is done with the help of specialized knowledge about types of loops and corresponding plans, as well as a library of known bug types. If a bug is discovered, it is then analyzed by a set of specific inference routines that suggest possible underlying misconceptions. This does not require very sophisticated reasoning since MENO-II's known misconceptions are organized in a network and are directly associated with the bugs listed in the library. If multiple misconceptions are plausible for a given bug, they are all reported. Besides, the tutoring component provides some correcting statements, which are simply stored in connection with the bugs known to MENO-II.

When tested in the context of an actual introductory course, MENO-II's relatively simple matching scheme, which ignores the process of program development and issues of control and dataflow, failed to correctly diagnose a large portion of bugs in student programs.

The knowledge of novice programmers Their confidence "only a bit shaken" (Soloway ans Johnson, 1984, p. 57), Soloway and his collesgues set out to conduct the empitical studies and build the theory of programming knowledge that were lacking in MENO-II (Soloway and Ehrich, 1984; Spohrer et al., 1985). In these investigations of programming expertise centering on the concept of iterative loops, the researchers consider the knowledge that novices bring to bear on the task of designing programs. This latter inclides not only their burgeoning knowledge about programming, but also the knowledge they apply naturally when thinking about iterative plans and communicating instructions for iterative tasks in their native language. Since such studies provide a deeper understanding of the difficulties noveices encounter, they have implictions beyond the design of tutoring systems. For instance, they can guide the design of more "natural" programming languages that take advantafe of existing tendencies (Soloway et al., 1983b). They can also suggest the construction of vetter curricula in computer programming (Soloway et al., 1982; Soloway, 1986). Toward a generative theory of bugs for looping plans To understand the difficulties the novices encounter, Bonar and Soloway, observe people describing tasks "step by step" using natural language, as for instance whenm giving directions or instructions. They compare subjects who describe an iterative task in natural language with others who attempt to program an isomorphic task in a formal conputer-programming language such as Pascal. The units of analysis are problem-solving schemata---or plans---which play a central role in vothe the informal realm of linguistic descriptions and the formal realm of computer programming. The theme of the research is that there exist close but sometimes deceptive resemblances between natural linguistic procedures and the formal plans required in using a programming language. Links can be found at two levels. At the functional level, both types of description provide instructions for tasks involving similar actions, such as iterations or conditional choices. At the surface level, both tend to use sinilar vocabulary and syntax even though the functional semantics of shared terms may have very limited overlap. As grounds for transfer, these relations both facilitate understanding and memorization and create confusion.

As a first step toward a generative theory of bugs for programming constructs, Bonar and Soloway study the influence of these functional and syntactic relations on programming errors committed by novices. While this program is probably too buggy for any syntactically oriented program-analysis system to make sense of, it reflects a misconception on the student's part that is both identifiable and explainable if one considers transfers from pre-existing knowledge.

From these analyses, Bonar and Soloway (1985) present a tentative catalog of bug generators for movice programming. In the style of REPAIR theory, these bug generators combine a "patch" with an impasse, though the absence of a complete model makes the definitions of both phenomena less precise.

These abbstract bug types are admittedly not complete generative explanations, and much nore work is required before a full generative theory is available.

BRIDGE: from natural language to programming A student dearning to program must develop new "mental models2 of iterative plans, models that comply with the formalism of computer programming. According to Bonar (1985b) argues that htere are increasingly detailed stages in the way one can define a plan , moving from a mere restatement of the problem in natural language to a detailed, runnnabke program. BRIDGE (Bonar and Weil, 1985) is a tutoring system that takes advantage of these observations to help students make the necessary tensitions. the theme is to find natural evolving stages in the development of plans, and to articulate each stage explicitly as the student designs a program. The following are the four stages traversed by BRIDGE:

nonprocedural restatement: "compute the average of"

description in terms of aggregates: "sum all the integers"

sample step description: "add the nxt integer to"

programming language specification: "repeat"

For example, currently the student is a Level 2, where inputs are described in terms of data aggregates and where the notion of a loop is not yet required. At the mext stage, the studdent is asked to define the operations for a sample step, and finally these operations are incorporated into a structured loop. The four levels containing the plans reqired at each staafge of the "averaging" paoblem are shown at the top of the screen. Plans that the student has dealt with are indicated in reverse video (white on black).

The interface with the student avoids problems of natural-language processing by the use of a set of informal programming languages. menus of phrases are presented on the screen for the student to select from in composing sentences. Informal programs at each level are then formed by different combinations of these sentences. The use of these key phrases allows the tutor to recognize mnot only what portion of the target program the student is working on, but also at which stage in the theory of planning knowledge she is forming a plan. In this way, tutoring can be adapted to help the student complete the current level and move to the next one.
This setup has two diagnostic values:

it alows the student to express clearly intermediate stages in problem solving; in thes way, the tutor dies not have to pergorm very complex analyses.

the student is taken step by step from her understanding of the problem in natural lingustic terms to the development of a program.

Thus, daily linguistic knowledge about task descriptions can be brought to bear for instructional purposes in an effort to integrate programming expertise with existing knowledge.

After the fourth level has been completed, the student moves to a new phase curing which she constructs a visual solution by piecing individual plans togehter. This plan-level phase makes external use of plans that are similar to those used internally by PROUST, described in the next section.

BRIDGE needs more exposure to students to determine whether they find the framework overly confining, since they are forced through a planning process that they may or may not find natural. The most interesting feature of this experimental system is that it provides another example of an environment explicitly articulating an underlyin study of performance in the domain. As in SPADE's plan-based editor, the theory is brought to the foreground to become an instrument of communication between the tutor and the student. In the case of BRIDGE, the vocabulary not only deals with performance, but follows a study oriented toward the genesis of programming knowledge.

PROUST: intention-based diagnosis Soloway and Johncon (1984) work on a new system called PROUST in which they reconsider the problem of program analysis in the light of the careful investigations of programming spawned by MENO-II's difficulties. While BRIDGE brings this mew understanding to the foreground i the tutorial session, as SPADE does, PROUST uses it in the background for diagnostic purposes, more as the ADVISOR does. Soloway and Johnson are less interested in explaining the origins of misconceptions in programming knowledge with a generative theory of bugs than in reconstructing a pausible program-design process so as to provide a problem-specific context for the recognition and discussion of bugs. The importance of intentions Diagnostic methods fail to recognize that nonsyntactic bugs are not an intrinsic property of the faulty program, but reside in the relation between the programmer's intentions and their realization in the code. The gist of intention-based program analysis is a comparison of intended functions and stuctures to actual ones. Underlying PROUST's approach to diagnosis is a view of the design process that distinguiches between three levels. First, the problem specifications give rise to an agenda of goals and subgoals. These in turn lead to the selection of plans, which are finally implmented as code. By considering all three levels, not only does this approach make the analysis less susceptible to misinterpreting unusual code, but it also produces more meaningful bug reports for a student struggling with a programming assignment.

PROUST searches for the most plausible interpretaion of the program with respect to the specifications. To this end, it needs to infer a plausible design process that replays the programmer'sintentions. Hence, the theme of PROUST's method is analysis ba synthesis. The method combines reconstruction of intentions with detection of bugs. both must occur togehter, because bugs can lead to misinterpretations of intentions, and intentions are necessary to distinguish bugs from unusual but correct code.

An interpretation can be seen as a sophisticated kind of parse tree since it is defined as a mapping of the code onto the specifications via a hierarchical goalstructure for the program. the space of possible interpretations in which PROUST conducts its search is organized into three layers according to the design stages mentioned above. At the top are the various possible decompositions of the specifications into goals and subgoals, then the plans that could be selected as implementation mehtods for each one, and finally the different ways in which plans can match the code. In view of the wide variability of novice programs and of the possibility of bugs at all three levels, the interpretaion space is quite large even for relatively simple programs.

PROUST's knowledge base and interpretation process PROUST is specifically geared toward interpretation, combining expert knowledge about programming with knowledge about likly novice errors. The main components of PROUST's knowledge base are as follows:

Goals and object classes: PROUST has information about the goals and objects mentioned in its problem specifications and the ways in which they can be implemented or reformulated. In addition, PROUST knows about implicit goals and objects that have to be inferred, and thus can cometimes be omitted in the problem statement. Finally, it possesses heuristic rules that can detect likely goal interactions and generate new goal expectations in connection with certain errors.
Plans: PROUST has a list of plans indexed by the goals they achieve.
Code: Such detailed plans often match the code only partially. To deal with these plan differences, PROUST applies two types of rule that are somewhat reminiscent of DEBUGGY's coercions:
transformation rules of the type used in code optimizers, which preserve equivalence betweentwo versions of a piece of code. These rules adapt the plan so as to repair the match.
bug rules that explain mismatcjes by hypothesizing a bu of a knoen type. These bugs are to be reported to the student.

Since a number of these plan-difference rules can chain to resolve a give mismatch, their use is controlled by a metric for the quality of matches.

With this knowledge, PROUST thries to construct an interpretationfor the program to be analyzed. Starting with a goal agenda derived from the problem specifications, PROUST selects successive goals for analysis, though not necessarily in the order in which they were attached ba the student. To optimize its depth-first gemeration of a plausible goal decomposition, it gives priority to global contol structures that macimize the coherence of its growing interpretation.

Hypothesized plans are then evaluated according to how well they fit in the context of the overall interpretation. Attempts are also made to resolve partial mismatches with bug or transformation rules. A few intermal critics watch for inconsistent hypotheses. Finally, in a form of differential diagnosis, competing hypotheses are compared to one another, at the ottom as to how much code they can explain, and at the top as to how severe they assume the student's misconceptions to be.The best one explains the most data with the most conservative assumptions.

PROUST used in real settings

After it has converged on one interpretation, PROUST evaluates its own reliability by measuring how fully it accounts for elements of the code and the specifications, and by applying its internal critics to detect flaws remaining in its final interpretation. If it is not fully confident, it discards uncertai portions of its analysis and warns the student about the incompleteness of its interpretaion. Then it sorts bugs to be reported, trying to group them so that it can point to common underlying misconceptions.
Johnson (1986) reports, however, that in actual settings some students found PROUST's explanations somewhat difficult to use, suggesting the need for an active tutoring module.

PROUST was tested on the first syntactically correct versions of the rainfall program produced by 206 students. 89% of these programs contained bugs. PROUST was fully confident about its interpretation in 81% of cases, and found only 4% of the programs inpossible to comprehend. PROUST runs into problems in that more compex assignments leave more design decisions to students, who then have to elaborate the problem requirements creatively. On PROUST's part, less detailed specifications require a grater ability to infer goal decompositions primarily from the code. Since such bottom-up analysesare extremely difficult, Johnson (1987b) is now considering ways of allowing students to discuss their intentions explicitly with the diagnostic system.

In comparison with other program analyzers described i this chapter, PROUST's main contribution is its handling of multiple goals. The FLOW tutor, SPADE, and the ADVISOR assume that all the operations to be accounted for can be reduced to a sungle high-level goal via a simple hierarchy. This assumption is possible because they only deal with very simple programs. In real programming tasks, even those presented i introductory courses, the specifications generate multiple, often interacting, goals. By addressing this issue, PROUST's explicit reconstruction of the programmer's intentions constitutes an important advance whose ramifications are not limited to computer programming. Of course, whether PROUST's current approach will be able to cope with large classes of problems remains to be seen. We have just mentioned some difficulties that result from its reliance on specifications to guide its reasoning about goals. An additional problem with PROUST's primarily top-sown approach is in the way it recognizes progrmming structures. Because of its limited ability to perceive functional intent by inspection, it is still overly dependent on syntactic clues.

Necertheless, PROUST couples serious engineering concerns with the investigations of program design that followed MENO-II, reaching a point where the theory may well be able to handle real tasks. PROUST's reasoning in terms of interntions not only gives leverage to diagnosis, but provides the kind of information needed for effective remedial action. With this depthe of analysis, a tutoring module based on PROUST, now in development at Yale (Littman et al., 1986), should be able to point to the cause rather than the manifestation of bugs and to address remedial issues in the context of the student's apparent intentions.

MENO-TUTOR: strategies for tutorial discourses

Complementing the diagnostic abilities of PROUST, MENO-TUTOR(Woolf and McDonald, 1984a, 1984b; Woolf, 1984) begins to address the issue of remediation. It attwmpts to capture the discourse strategies observed in human tutors who strive to be sensitive to their listeners (Woolf and McDonald, 1983). While this work is similar to and inspired by the study of Socratic dialogues in WHY, the current focus of the research is not so much on defining individual rules as on designing a coherent framework for representing and organizing elements of a discourse strategy. MENO-TUTOR attempts to formalize the type of discourse strtegy. MENO-TUTOR attempts to formalize the type of discourse procedures developed earlier for GUIDON. This domain-independent discourse strategy is to be coupled with a domain-specific language generator that implements the strategic decisions by means of utterances constructed from a database of domain knowledge.

The discourse management network: articulate tutorial strategies

The strategy is cast in what the authors call a discourse management network,which is a knid of augmented transition network. The nodes or states correspond to tutorial actions that constitute the basic components of a theory of tutorial dealogues. these states are hierarchically organized into three strategic layers that make the pedagogical decision process more transparent.

The performance of MENO-TUTOR as an exploration tool

So far, MENO-TUTOR has been tested in two domains: rainful processes---the domain of WHY---in which most of the basic work was done; and Pascal programs, the original domain of MENO. In both domains, the language generators were shallow and the domain database small, because the natural-language interface had not yet been given much attention. The next stwp is to build these additional components so that the scheme can be tested in more complex contexts.The tutor first adopts an interactive approach by the following way: it explores the competency of the student with two general quwstions to make sure that student and tutor share the same vocabulary. Once it is sure that the student understands the basics of looping constructs, MENO-TUTOR verifies her exact niconception with the next two questions. Finally, it attwmpts to repair her misconception with the tactic called "grain of truth correction," reinforcing what is correct about the student's thinking before proposing the required refinement.

From an engineering standpoint, it is worth noting that MENO-TUTOR achieves an interesting modularity, which makes for a clean framework for implementation. First, it separates discourse strategies from domain knowledge and language generation ---although the exact nature of the interface between these parts has not yet been well defined. Then, within the strategy, it deals with local decisions and global changes of context using different mechanisms, which are connected by thir references to the sane discourse management network.

The main purpose of MENO-TUTOR is to serve as a generic tool for exploring barious tutorial strategies. The hierarchical network provides a set of tutorial prinitives with default sequences, so that a variety of pedagogical approaches can be generated by the addition of different metarules. Although the dialogues generated in the tess were all very short, they were able to cover small topics in many different but coherent ways when differnt sets of metarules were tried. This ability to maintain coherence under changes of metarules is largely attributable to the articulation of tutorial dicourse management provided by MENO-TUTOR's prinitives. It is this avility that makes the scheme a good candidate as testbed for exploring the space of dicourse strategies. However, the articulation is only behavioral: there is still no mechanism to explicitly represent the communicatio principles on which decisions are vased and interpret them in specific dialogue situations. These principles are still merely embodied by the set of arcs and metarules.

Summary and conclusion

This chapter has covered a number of systems that attempt to construct interpretations of students' colutions in prlblem-solving contexts. The types of solutions analyued by these systems result form a planning process whereby goals are implemented as sequences of actions.

	structure	search process	specialty	mehtod of detecting errors
FLOW tutor	utilizes schemata organized in an active structural network	works on-line and uses an interplay of expectations and comfirmations	performs its analyses on-line	triggers buggy schemata
SPADE	views a plan as a parse tree	its linguistic view of plans as parse trees suggests a variety of parsing techniques.	incorporates global aspects of the design process, which bring to bear constraints external to purely hierarchical planning.
ADVISOR	constructs a dependency graph	builds its dependency graph in a bidirectional fashion, propagating constraints derived from dataflow information.	seeks to infer the user's beliefs about the domain and to engage in a dialogue in terms of these beliefs.	can infer a restricted class of mistaken beliefs directly from the data.
PROUST	attaches plans to a goal hierarchy	two phases: first, it constructs an initial set of hypotheses in a mostly top-down search; second, it performs a differntial analysis that takes intl account the interplay of intentions from above and the coverage of data from below, along with the severity of hypothesized errors.	is able to cope with interactions between multiple goals, and had shown promise in real instructional settings.	uses buggy plans as well as bug rules theat resolve plan differnces during the matching process. MENO-II and PROUST bothe hypothesize misconceptions about the domain heuristically, using prstored catalogs.

BRIDGE is based on a study of misconceptions about the programming domain which takes into consideration the lingustic knowledge novices bring to learning computer programming. This work had not yet given rise to a diagnostic system. Instead, bidge uses the concepts developed in the study to create a dupportive learning environment for novices. In this chapter, we have again seen how the design of AI-based instructioanl system motivates a variety of domain analyses and empirical investigations. the results os these studies are useful both in the bachground, for internal computation--- as in the ADVISOR or PROUST---and in the foreground, for pedagogical purposes---as in SPADE-0 and BRDGE.

The MENO project has also evolved into a study of tutorial dialogues reminiscent of WHY's Socratic rules. In MENO-TUTOR, domain-independent discourse strategies are organized in a discourse management network: that is, a hierarchical augmented transition network divided into layers of abstractions. States represent tutorial primitives, and local strategies are setoff by transitions between these states. In addition to these local decisions, the scheme has facilities for specifying metarules that can change the context when the global discourse situation warants it. This has resulted in short dialogues that remain coherent despite changes in global strategy.

Return to top