Show simple item record

Developing And Evaluating A Machine‐Scorable, Constrained Constructed‐Response Item

dc.contributor.authorBraun, Henry I.en_US
dc.contributor.authorBennett, Randy Ellioten_US
dc.contributor.authorFrye, Douglasen_US
dc.contributor.authorSoloway, Ellioten_US
dc.date.accessioned2014-10-07T16:09:08Z
dc.date.available2014-10-07T16:09:08Z
dc.date.issued1989-12en_US
dc.identifier.citationBraun, Henry I.; Bennett, Randy Elliot; Frye, Douglas; Soloway, Elliot (1989). "Developing And Evaluating A Machine‐Scorable, Constrained Constructed‐Response Item." ETS Research Report Series 1989(2): i-44. <http://hdl.handle.net/2027.42/108589>en_US
dc.identifier.issn2330-8516en_US
dc.identifier.issn2330-8516en_US
dc.identifier.urihttps://hdl.handle.net/2027.42/108589
dc.description.abstractThe use of constructed response items in large scale standardized testing has been hampered by the costs and difficulties associated with obtaining reliable scores. The advent of expert systems may signal the eventual removal of this impediment. This study investigated the accuracy with which expert systems could score a new, non‐multiple choice item type. The item type presents a faulty solution to a computer programming problem and asks the student to correct the solution. This item type was administered to a sample of high school seniors enrolled in an Advanced Placement course in Computer Science who also took the Advanced Placement Computer Science (APCS) Test. Results indicated that the expert systems were able to produce scores for between 82% and 97% of the solutions encountered and to display high agreement with a human reader on which solutions were and were not correct. Diagnoses of the specific errors produced by students were less accurate. Correlations with scores on the objective and free‐response sections of the APCS examination were moderate. Implications for additional research and for testing practice are offered.en_US
dc.publisherLawrence Erlbaum Associatesen_US
dc.publisherWiley Periodicals, Inc.en_US
dc.titleDeveloping And Evaluating A Machine‐Scorable, Constrained Constructed‐Response Itemen_US
dc.typeArticleen_US
dc.rights.robotsIndexNoFollowen_US
dc.subject.hlbsecondlevelEducationen_US
dc.subject.hlbtoplevelSocial Sciencesen_US
dc.description.peerreviewedPeer Revieweden_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/108589/1/ets200144.pdf
dc.identifier.doi10.1002/j.2330-8516.1989.tb00144.xen_US
dc.identifier.sourceETS Research Report Seriesen_US
dc.identifier.citedreferenceFrederiksen, N., & Ward, W. C. ( 1978 ). Measures for the study of creativity in scientific problem solving. Applied Psychological Measurement, 2, 1 – 24.en_US
dc.identifier.citedreferenceJohnson, W. L., & Soloway, E. ( 1985 ). PROUST: An automatic debugger for Pascal programs. Byte, 10 ( 4 ), 179 – 190.en_US
dc.identifier.citedreferenceWaterman, D. A. ( 1986 ). A guide to expert systems. Reading, MA: Addison‐Wesley.en_US
dc.identifier.citedreferenceSpohrer, J. C., Frye, D., & Soloway, E. ( 1988 ). A note on one aspect of MicroPROUST's performance. Unpublished manuscript.en_US
dc.identifier.citedreferenceSpohrer, J. C. ( 1989 ). MARCEL: A generate‐test‐and‐debug (GTD) impasse/repair model of student programmers (CSD/RR #687). New Haven, CN: Yale University, Department of Computer Science.en_US
dc.identifier.citedreferenceLandis, J. R., & Koch, G. G. ( 1977 ). The measurement of observer agreement for categorical data. Biometrics, 33, 159 – 174.en_US
dc.identifier.citedreferenceJohnson, W. L., Soloway, E., Cutler, B., & Draper, S. ( 1983 ). Bug Collection I (Tech. Report No. 296). New Haven, CT: Yale University, Department of Computer Science.en_US
dc.identifier.citedreferenceBennett, R. E., Gong, B., Kershaw, R. C., Rock, D. A., Soloway, E., & Macalalad, A. (In press). Assessment of an expert system's ability to automatically grade and diagnose students' constructed‐responses to computer science problems. In R. O. Freedle (Ed), Artificial intelligence and the future of testing. Hillsdale, NJ: Lawrence Erlbaum Associates.en_US
dc.identifier.citedreferenceBirenbaum, M., & Tatsuoka, K. K. ( 1987 ). Open‐ended versus multiple‐choice response formats–It does make a difference for diagnostic purposes. Applied Psychological Measurement, 11, 385 – 395.en_US
dc.identifier.citedreferenceFleiss, J. L. ( 1981 ). Statistical methods for rates and proportions. New York: Wiley.en_US
dc.identifier.citedreferenceJohnson, W. L. ( 1985 ). Intention‐based diagnosis of errors in novice programs (Tech. Report No. 395). New Haven, CT: Yale University, Department of Computer Science.en_US
dc.identifier.citedreferenceWard, W. C., Frederiksen, N., & Carlson, S. B. ( 1980 ). Construct validity of free‐response and machine‐scorable forms of a test. Journal of Educational Measurement, 17, 11 – 29.en_US
dc.identifier.citedreferenceSoloway, E., Macalalad, A., Spohrer, J., Sack, W., & Sebrechts, M. M. ( 1987 ). Computer‐based analysis of constructed‐response items: A demonstration of the effectiveness of the intention‐based diagnosis strategy across domains (Final Report). New Haven, CN: Yale University.en_US
dc.identifier.citedreferenceSoloway, E., & Ehrlich, K. ( 1984 ). Empirical studies of programming knowledge (Research Report #16). New Haven, CN: Yale University, Department of Computer Science Cognition and Programming Project.en_US
dc.identifier.citedreferenceSebrechts, M. M., Schooler, L. J., & Soloway, E. ( 1987, May). Diagnosing student errors in statistics: An empirical evaluation of GIDE (abstract). Proceedings of the Third International Conference on Artificial Intelligence and Education.en_US
dc.identifier.citedreferenceSebrechts, M. M., LaClaire, L., Schooler, L. J., & Soloway, E. ( 1986 ). Toward generalized intention‐based diagnosis: GIDE. Proceedings of the 7th National Educational Computing Conference.en_US
dc.identifier.citedreferenceMcNemar, Q. ( 1962 ). Psychological statistics. New York: Wiley.en_US
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.