Developing And Evaluating A Machine‐Scorable, Constrained Constructed‐Response Item

Braun, Henry I.; Bennett, Randy Elliot; Frye, Douglas; Soloway, Elliot

Developing And Evaluating A Machine‐Scorable, Constrained Constructed‐Response Item

dc.contributor.author	Braun, Henry I.	en_US
dc.contributor.author	Bennett, Randy Elliot	en_US
dc.contributor.author	Frye, Douglas	en_US
dc.contributor.author	Soloway, Elliot	en_US
dc.date.accessioned	2014-10-07T16:09:08Z
dc.date.available	2014-10-07T16:09:08Z
dc.date.issued	1989-12	en_US
dc.identifier.citation	Braun, Henry I.; Bennett, Randy Elliot; Frye, Douglas; Soloway, Elliot (1989). "Developing And Evaluating A Machine‐Scorable, Constrained Constructed‐Response Item." ETS Research Report Series 1989(2): i-44. <http://hdl.handle.net/2027.42/108589>	en_US
dc.identifier.issn	2330-8516	en_US
dc.identifier.issn	2330-8516	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/108589
dc.description.abstract	The use of constructed response items in large scale standardized testing has been hampered by the costs and difficulties associated with obtaining reliable scores. The advent of expert systems may signal the eventual removal of this impediment. This study investigated the accuracy with which expert systems could score a new, non‐multiple choice item type. The item type presents a faulty solution to a computer programming problem and asks the student to correct the solution. This item type was administered to a sample of high school seniors enrolled in an Advanced Placement course in Computer Science who also took the Advanced Placement Computer Science (APCS) Test. Results indicated that the expert systems were able to produce scores for between 82% and 97% of the solutions encountered and to display high agreement with a human reader on which solutions were and were not correct. Diagnoses of the specific errors produced by students were less accurate. Correlations with scores on the objective and free‐response sections of the APCS examination were moderate. Implications for additional research and for testing practice are offered.	en_US
dc.publisher	Lawrence Erlbaum Associates	en_US
dc.publisher	Wiley Periodicals, Inc.	en_US
dc.title	Developing And Evaluating A Machine‐Scorable, Constrained Constructed‐Response Item	en_US
dc.type	Article	en_US
dc.rights.robots	IndexNoFollow	en_US
dc.subject.hlbsecondlevel	Education	en_US
dc.subject.hlbtoplevel	Social Sciences	en_US
dc.description.peerreviewed	Peer Reviewed	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/108589/1/ets200144.pdf
dc.identifier.doi	10.1002/j.2330-8516.1989.tb00144.x	en_US
dc.identifier.source	ETS Research Report Series	en_US
dc.identifier.citedreference	Frederiksen, N., & Ward, W. C. ( 1978 ). Measures for the study of creativity in scientific problem solving. Applied Psychological Measurement, 2, 1 – 24.	en_US
dc.identifier.citedreference	Johnson, W. L., & Soloway, E. ( 1985 ). PROUST: An automatic debugger for Pascal programs. Byte, 10 ( 4 ), 179 – 190.	en_US
dc.identifier.citedreference	Waterman, D. A. ( 1986 ). A guide to expert systems. Reading, MA: Addison‐Wesley.	en_US
dc.identifier.citedreference	Spohrer, J. C., Frye, D., & Soloway, E. ( 1988 ). A note on one aspect of MicroPROUST's performance. Unpublished manuscript.	en_US
dc.identifier.citedreference	Spohrer, J. C. ( 1989 ). MARCEL: A generate‐test‐and‐debug (GTD) impasse/repair model of student programmers (CSD/RR #687). New Haven, CN: Yale University, Department of Computer Science.	en_US
dc.identifier.citedreference	Landis, J. R., & Koch, G. G. ( 1977 ). The measurement of observer agreement for categorical data. Biometrics, 33, 159 – 174.	en_US
dc.identifier.citedreference	Johnson, W. L., Soloway, E., Cutler, B., & Draper, S. ( 1983 ). Bug Collection I (Tech. Report No. 296). New Haven, CT: Yale University, Department of Computer Science.	en_US
dc.identifier.citedreference	Bennett, R. E., Gong, B., Kershaw, R. C., Rock, D. A., Soloway, E., & Macalalad, A. (In press). Assessment of an expert system's ability to automatically grade and diagnose students' constructed‐responses to computer science problems. In R. O. Freedle (Ed), Artificial intelligence and the future of testing. Hillsdale, NJ: Lawrence Erlbaum Associates.	en_US
dc.identifier.citedreference	Birenbaum, M., & Tatsuoka, K. K. ( 1987 ). Open‐ended versus multiple‐choice response formats–It does make a difference for diagnostic purposes. Applied Psychological Measurement, 11, 385 – 395.	en_US
dc.identifier.citedreference	Fleiss, J. L. ( 1981 ). Statistical methods for rates and proportions. New York: Wiley.	en_US
dc.identifier.citedreference	Johnson, W. L. ( 1985 ). Intention‐based diagnosis of errors in novice programs (Tech. Report No. 395). New Haven, CT: Yale University, Department of Computer Science.	en_US
dc.identifier.citedreference	Ward, W. C., Frederiksen, N., & Carlson, S. B. ( 1980 ). Construct validity of free‐response and machine‐scorable forms of a test. Journal of Educational Measurement, 17, 11 – 29.	en_US
dc.identifier.citedreference	Soloway, E., Macalalad, A., Spohrer, J., Sack, W., & Sebrechts, M. M. ( 1987 ). Computer‐based analysis of constructed‐response items: A demonstration of the effectiveness of the intention‐based diagnosis strategy across domains (Final Report). New Haven, CN: Yale University.	en_US
dc.identifier.citedreference	Soloway, E., & Ehrlich, K. ( 1984 ). Empirical studies of programming knowledge (Research Report #16). New Haven, CN: Yale University, Department of Computer Science Cognition and Programming Project.	en_US
dc.identifier.citedreference	Sebrechts, M. M., Schooler, L. J., & Soloway, E. ( 1987, May). Diagnosing student errors in statistics: An empirical evaluation of GIDE (abstract). Proceedings of the Third International Conference on Artificial Intelligence and Education.	en_US
dc.identifier.citedreference	Sebrechts, M. M., LaClaire, L., Schooler, L. J., & Soloway, E. ( 1986 ). Toward generalized intention‐based diagnosis: GIDE. Proceedings of the 7th National Educational Computing Conference.	en_US
dc.identifier.citedreference	McNemar, Q. ( 1962 ). Psychological statistics. New York: Wiley.	en_US
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: ets200144.pdf
Size:: 3.079MB
Format:: PDF

View/Open

Interdisciplinary and Peer-Reviewed

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.