Developing And Evaluating A Machine‐Scorable, Constrained Constructed‐Response Item
dc.contributor.author | Braun, Henry I. | en_US |
dc.contributor.author | Bennett, Randy Elliot | en_US |
dc.contributor.author | Frye, Douglas | en_US |
dc.contributor.author | Soloway, Elliot | en_US |
dc.date.accessioned | 2014-10-07T16:09:08Z | |
dc.date.available | 2014-10-07T16:09:08Z | |
dc.date.issued | 1989-12 | en_US |
dc.identifier.citation | Braun, Henry I.; Bennett, Randy Elliot; Frye, Douglas; Soloway, Elliot (1989). "Developing And Evaluating A Machine‐Scorable, Constrained Constructed‐Response Item." ETS Research Report Series 1989(2): i-44. <http://hdl.handle.net/2027.42/108589> | en_US |
dc.identifier.issn | 2330-8516 | en_US |
dc.identifier.issn | 2330-8516 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/108589 | |
dc.description.abstract | The use of constructed response items in large scale standardized testing has been hampered by the costs and difficulties associated with obtaining reliable scores. The advent of expert systems may signal the eventual removal of this impediment. This study investigated the accuracy with which expert systems could score a new, non‐multiple choice item type. The item type presents a faulty solution to a computer programming problem and asks the student to correct the solution. This item type was administered to a sample of high school seniors enrolled in an Advanced Placement course in Computer Science who also took the Advanced Placement Computer Science (APCS) Test. Results indicated that the expert systems were able to produce scores for between 82% and 97% of the solutions encountered and to display high agreement with a human reader on which solutions were and were not correct. Diagnoses of the specific errors produced by students were less accurate. Correlations with scores on the objective and free‐response sections of the APCS examination were moderate. Implications for additional research and for testing practice are offered. | en_US |
dc.publisher | Lawrence Erlbaum Associates | en_US |
dc.publisher | Wiley Periodicals, Inc. | en_US |
dc.title | Developing And Evaluating A Machine‐Scorable, Constrained Constructed‐Response Item | en_US |
dc.type | Article | en_US |
dc.rights.robots | IndexNoFollow | en_US |
dc.subject.hlbsecondlevel | Education | en_US |
dc.subject.hlbtoplevel | Social Sciences | en_US |
dc.description.peerreviewed | Peer Reviewed | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/108589/1/ets200144.pdf | |
dc.identifier.doi | 10.1002/j.2330-8516.1989.tb00144.x | en_US |
dc.identifier.source | ETS Research Report Series | en_US |
dc.identifier.citedreference | Frederiksen, N., & Ward, W. C. ( 1978 ). Measures for the study of creativity in scientific problem solving. Applied Psychological Measurement, 2, 1 – 24. | en_US |
dc.identifier.citedreference | Johnson, W. L., & Soloway, E. ( 1985 ). PROUST: An automatic debugger for Pascal programs. Byte, 10 ( 4 ), 179 – 190. | en_US |
dc.identifier.citedreference | Waterman, D. A. ( 1986 ). A guide to expert systems. Reading, MA: Addison‐Wesley. | en_US |
dc.identifier.citedreference | Spohrer, J. C., Frye, D., & Soloway, E. ( 1988 ). A note on one aspect of MicroPROUST's performance. Unpublished manuscript. | en_US |
dc.identifier.citedreference | Spohrer, J. C. ( 1989 ). MARCEL: A generate‐test‐and‐debug (GTD) impasse/repair model of student programmers (CSD/RR #687). New Haven, CN: Yale University, Department of Computer Science. | en_US |
dc.identifier.citedreference | Landis, J. R., & Koch, G. G. ( 1977 ). The measurement of observer agreement for categorical data. Biometrics, 33, 159 – 174. | en_US |
dc.identifier.citedreference | Johnson, W. L., Soloway, E., Cutler, B., & Draper, S. ( 1983 ). Bug Collection I (Tech. Report No. 296). New Haven, CT: Yale University, Department of Computer Science. | en_US |
dc.identifier.citedreference | Bennett, R. E., Gong, B., Kershaw, R. C., Rock, D. A., Soloway, E., & Macalalad, A. (In press). Assessment of an expert system's ability to automatically grade and diagnose students' constructed‐responses to computer science problems. In R. O. Freedle (Ed), Artificial intelligence and the future of testing. Hillsdale, NJ: Lawrence Erlbaum Associates. | en_US |
dc.identifier.citedreference | Birenbaum, M., & Tatsuoka, K. K. ( 1987 ). Open‐ended versus multiple‐choice response formats–It does make a difference for diagnostic purposes. Applied Psychological Measurement, 11, 385 – 395. | en_US |
dc.identifier.citedreference | Fleiss, J. L. ( 1981 ). Statistical methods for rates and proportions. New York: Wiley. | en_US |
dc.identifier.citedreference | Johnson, W. L. ( 1985 ). Intention‐based diagnosis of errors in novice programs (Tech. Report No. 395). New Haven, CT: Yale University, Department of Computer Science. | en_US |
dc.identifier.citedreference | Ward, W. C., Frederiksen, N., & Carlson, S. B. ( 1980 ). Construct validity of free‐response and machine‐scorable forms of a test. Journal of Educational Measurement, 17, 11 – 29. | en_US |
dc.identifier.citedreference | Soloway, E., Macalalad, A., Spohrer, J., Sack, W., & Sebrechts, M. M. ( 1987 ). Computer‐based analysis of constructed‐response items: A demonstration of the effectiveness of the intention‐based diagnosis strategy across domains (Final Report). New Haven, CN: Yale University. | en_US |
dc.identifier.citedreference | Soloway, E., & Ehrlich, K. ( 1984 ). Empirical studies of programming knowledge (Research Report #16). New Haven, CN: Yale University, Department of Computer Science Cognition and Programming Project. | en_US |
dc.identifier.citedreference | Sebrechts, M. M., Schooler, L. J., & Soloway, E. ( 1987, May). Diagnosing student errors in statistics: An empirical evaluation of GIDE (abstract). Proceedings of the Third International Conference on Artificial Intelligence and Education. | en_US |
dc.identifier.citedreference | Sebrechts, M. M., LaClaire, L., Schooler, L. J., & Soloway, E. ( 1986 ). Toward generalized intention‐based diagnosis: GIDE. Proceedings of the 7th National Educational Computing Conference. | en_US |
dc.identifier.citedreference | McNemar, Q. ( 1962 ). Psychological statistics. New York: Wiley. | en_US |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.