evaluating software: Reiser/Dick

cyberSlang: the ultimate instant online encyclopedia

Evaluating Instructional Software
The Model of Reiser and Dick
Robert A. Reiser and Walter Dick (1990)

Explained by Chris

Contents

Introduction
Description of the model
Initial evaluation of the model
Implications of the initial evaluation of the model

Abstract

Reiser and Dick (1990) present their new model for evaluating instructional software, including a study in which the new model was field tested. Their model focuses on the extent to which students learn the skills a software package is intended to teach.
Using this approach, educators will be better able to reliably identify software that is instructionally effective.

- Abstract - Literature - Introduction - Model - Evaluation - Implications

Introduction

Software evaluation organizations are helping teachers and school administrators select appropriate software for the students' needs. Most organizations use similar criteria to evaluate instructional software. Evaluators have to make subjective judgements about:
- the accuracy of the content and its match to typical school objectives,
- the instructional and technical quality of the software.

Without testing students after they use the software, judgements regarding the instructional effectiveness of software are primarily speculative.

Subject-matter experts (teachers) are not able to reliably identify software that is instructionally effective

Jolicoeur and Berger (1988b) asked three teachers to subjectively evaluate four spelling programs and four programs on fractions. These programs were also tried out with students: pretest, immediate posttest, and delayed posttest covering the skills taught in the programs.
Results:
- The teachers' ratings were not valid indicators of the instructional effectiveness of the software.
- The most effective spelling program received the lowest teacher rating.
- The least efffective spelling program was highly rated by the teachers.

Rothkopf (1963) asked 12 educators to read through seven versions of a programmed text and rate the quality of each one. Each version of the program was also tried out with students, who were given a recall test.
Results:
- The rank correlation between the educators' ratings and the actual instructional effectiveness of the various verions of the program was -.75.
- Those versions that the educators rated as likely to be highly effective were actually the least effective, and vice versa.

Subjective ratings are more negative than ratings based on field tests

Owston and Wideman (1987) reported that evaluators' opinions regarding a software package are likely to be quite different depending on whether the evaluators were involved in field testing the software. They asked panels of teachers who had been trained to use a software rating system to evaluate 36 commercially available software packages. Afterward, these software packages were field tested with students in classroom settings. In each case, a classroom teacher observed the students using the software, asked the students about their reactions to the software, and prepared a brief report rating the software on the basis of the information gathered from the field test.
Results:
- In 10 of the 36 cases, the earlier ratings were more negative than the ratings based on the field tests.
- In another seven instances, there were areas of agreement and disagreement.
- Valuable information can be gathered by field testing software.

Subjective ratings differ largely from person to person

Jolicoeur and Berger (1986) compared the overall ratings of 82 pieces of software evaluated by two different rating services.
Results:
- Very low correlation between the two sets of ratings.
- Their ratings of the instructional and technical characteristics of 29 pieces of software contrasted even more.
- Subjective evaluations of software are not reliable.

- Abstract - Literature - Introduction - Model - Evaluation - Implications

Description of the model

The authors' primary criterion to judge the effectiveness of software is the extent to which students learn the skills the software is intended to teach.

Step 1: Identify software of interest
The software program may be chosen because it deals with a particular subject matter or because it was recommended.
It is likely the individual who identifies the piece of software has rather limited knowledge of its characteristics.

Step 2: Identify general characteristics of software
Software characteristics include such factors as:
- content,
- general goals,
- instructional techniques,
- intended grade levels,
- required hardware.
The evaluator can get the clearest picture of these features by working through the software at this time.

Step 3: Still interested in software?
If no: go back to step 1.
If yes: continue to step 4.

Step 4: Identify or develop instructional objectives
In some cases, these objectives will be explicitly described in the software itself or in the documentation accompanying it.
Often, however, the evaluator will have to derive the instructional objectives by carefully examining the instructional activities.

Step 5: Indentify or develop test items and attitude questions
Test items are designed to assess student attainment of the instructional objectives. It is important to determine whether those test items are appropriate in light of the objectives that have been identified.
Some software packages will already include test items, but more often than not, they will have to be developed by the evaluator.
In addition, a series of questions should be developed to assess student attitudes toward the software and the contents.

Step 6: Conduct one-on-one evaluation
Based on the results of a pretest, three students from the target group should be chosen to work through the software. They should be representative of the various ability levels of the students for whom the software is intended: one high-ability, one average-ability, and one low-ability student.

They are asked to work through the software individually, with the evaluator present. They are closely observed as they go through the instruction. The evaluator may ask questions in order to better understand the performance and reactions of the students.

After completing the instruction, each student takes the test that is based on the objectives for the software.

And finally, each student responds to the attitude questionnaire.

Step 7: Is further evaluation necessary?
If no: jump to step 12.
If yes: continue to step 8.

Step 8: Need to change test items?
If no: jump to step 10.
If yes: continue to step 9.

Step 9: Make changes to test items
Reexamine the test items that were employed:
- items are unclear,
- items should be revised,
- items should be eliminated
- additional items are necessary in order to adequately assess attainment of a particular objective.
(Changes to the test items may be extensive enough to cause the evaluator to repeat the one-on-one stage. In this case: go back to step 6.)

Step 10: Conduct small group evaluation
8 to 20 students whose abilities, as measured by a pretest, represent the range that would be found in the target population for the instruction.

Arrangements are made for these students to study the software in the same setting and under the same conditions typically encountered in their school.

They are tested in order to assess how much they learned.

They respond to the questionnaire to determine their attitudes toward the software and the content it focuses on.

Step 11 (two weeks later): Administer retention test
The students who participated in the small-group evaluation.
The test should be the same as, or an alternate form of, the original posttest.

Step 12: Write evaluation report
Review the information that has been collected during the process.
Prepare a brief evaluation report:
- a summary of the collected information, documenting the findings,
- the evaluator's recommendation regarding use of the software.

- Abstract - Literature - Introduction - Model - Evaluation - Implications -

Initial evaluation of the model

The authors determined the effectiveness of the model:

1
They asked a fifth-grade teacher to identify an instructional problem for which instructional software might be the solution.
In order to meet the students' need for more spelling instructions, a spelling program (IBM) was chosen. The program consisted of a series of drill and practice lessons with 15 different words by lesson.

2
The classroom teacher examined the goals and objectives of the software, the content it covered, and the instructional strategy it employed.

3
She decided the software might meet the students' needs.

4
Instructional objectives: the purpose of the software was to teach students how to spell each of the 15 words that were presented within each lesson.

5
Test items: to assess the students on this objective, the teacher orally stated each word, placing it in the context of a sentence, and then asked the students to spell the word.
Attitude questions: a five-item questionnaire asked about the students' reactions to the software.

6
One-on-one evaluation:

The teacher selected one lesson in the software package (each lesson in the program employed the same instructional strategy).

A general pretest revealed how many of the 15 words in the lesson the students could already spell.

Three students were chosen for the one-on-one evaluation:
- one with 8 correct answers (out of 15),
- one with 5 correct (average),
- and one with 1 correct.

in the computer laboratory, each student was told the purpose for studying the lesson, and that the teacher and one of the researchers would watch the student go through the lesson and would ask questions occasionally.

The student followed the instructions delivered by the computer.

At the end of the lesson, the teacher gave each student the posttest: writing each word as it was pronounced.

Each student completed the five-item questionnaire that asked about the reactions to the software.

Results

The students learned to spell five to eight more words in the lesson:
- The high-ability student (8 in pretest) spelled 14 words correctly, gain: 6.
- The average student (5 in pretest) spelled 10 words correctly, gain: 5.
- The low-ability student (1 in pretest) spelled 9 words correctly, gain: 8.

The students liked the software, had no problem using it, and would recommend it to a friend.

7
The software proved to be effective, but teachers and researchers decided to collect more data.

8 and 9
No changes to test items.

10
Small-group evaluation:
Ten students were selected:
- one scored in the pretest very high (11 out of 15),
- three in the high range (6 or 5),
- three average (4 or 3),
- three below average (1).

The students worked in the computer lab in groups of five. They went through the instruction as part of their spelling lessons.

Posttest: After completing the lesson, they were given the oral posttest.

They were asked to fill out the same attitude questionnaire.

11
Retention test: About two weeks later, all the students in the class, including those who received no instruction on the computer, took the same test over the 15 spelling words.

12
Evaluation report:

The averages for the 10 students:
- pretest 4.2 correct,
- posttest 11.1 correct,
- retention test 6.8 correct.

Gain score (= retention test minus pretest): average 2.6 words.

Percentage of possible gain (= amount gained versus possible maximum):
2.6 (average gain) * 100 / 10.8 (possible average gain = 15-4.2) = 24%.

Students' attitudes (score of 1 = strongly agree):
- Have learned something from lesson (1.1 average response).
- Program operated smoothly and effectively (1.1 average response).
- Would recommend the instruction to a friend (1.1 average response).
- Enjoyed the experience (1.3 average response).

13 of the students in the spelling class did not study the words in the software lesson, but participated in the pretest and the retention test. Without any instruction, these students averaged about a 0.5-word gain between the two tests. This compares with the 2.6-word gain by the students who received instruction.

Final decision about this spelling program:

The researchers concluded that the gains were rather marginal.

The teachers had a different perception:
- They were pleased with the large initial gain on the posttest,
- and they were impressed with the extremely positive response of the students toward this instruction.
- They liked the informations about performance of particular students (who did better than expected).
Therefore, the teachers indicated that they would recommend this software program to their colleagues.

- Abstract - Literature - Introduction - Model - Evaluation - Implications -

Implications of the initial evaluation of the model

The authors interviewed the classroom teacher and the resource teacher to get their reactions to the software evaluation process:
- They appreciated the idea to collect student data as part of the software evaluation process.
- If they had the time to do so, they would use the model to evaluate other pieces of software.
- Unless they were given some release time to evaluate software, they would be unlikely to use the model.
- They suggested, that resource teachers could conduct such evaluations, or that software evaluation services should incorporate the model.
- The teachers suggested to simplify the model: only one-on-one stage (and no small-group session), all three students participating individually but together in a single one-on-one session.

New focus
The authors focuse their research on additional questions:

Does the use of this model produce different conclusions about the quality of software than the use of a standard checklist and their subjective judgement do?

Is the additional information (gathered as a result of collecting student data) worth the extra time involved in collecting it?

Do different groups of individuals who are presented with the same evaluation data arrive at the same conclusion?

Finally,
the authors believe that trying out software with learners is worthwile, as long as the obtained information is useful. Subject-matter experts often are unable to reliably identify software that is instructionally effective. Conclusions drawn from field tests often are different from those drawn from more subjective evaluations.
The authors intend to contribute an instrument to reliably identify software that is instructionally effective.
Their goal is to increase the probability that educators will select effective instructional software for use in their schools.

- Abstract - Literature - Introduction - Model - Evaluation - Implications -

Chris Mueller (prolingua@access.ch)

++41 (0)52 301 3301 phone
++41 (0)52 301 3304 fax

97 05 04