A Generalization Error for Q-Learning
dc.contributor.author | Murphy, Susan A. | en_US |
dc.date.accessioned | 2007-12-06T19:26:00Z | |
dc.date.available | 2007-12-06T19:26:00Z | |
dc.date.issued | 2005-07 | en_US |
dc.identifier.citation | Journal of Machine Learning Research 2005; 6(Jul):1073-1097 <http://hdl.handle.net/2027.42/57425> | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/57425 | |
dc.identifier.uri | http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=retrieve&db=pubmed&list_uids=16763665&dopt=citation | en_US |
dc.description.abstract | Planning problems that involve learning a policy from a single training set of ?nite horizon trajectories arise in both social science and medical ?elds. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function. | en_US |
dc.description.sponsorship | National Institutes of Health (NIDA grants K02 DA15674 and P50 DA 10075 to the Methodology Center) | en_US |
dc.format.extent | 1343 bytes | |
dc.format.extent | 196538 bytes | |
dc.format.extent | 73733 bytes | |
dc.format.mimetype | text/plain | |
dc.format.mimetype | application/pdf | |
dc.format.mimetype | text/plain | |
dc.language.iso | en_US | en_US |
dc.subject | Multistage Decisions | en_US |
dc.subject | Dynamic Programming | en_US |
dc.subject | Reinforcement Learning | en_US |
dc.subject | Batch Data | en_US |
dc.subject.classification | ISR - Institute for Social Research | en_US |
dc.title | A Generalization Error for Q-Learning | en_US |
dc.type | Article | en_US |
dc.subject.hlbsecondlevel | Social Sciences (General) | en_US |
dc.subject.hlbtoplevel | Social Sciences | en_US |
dc.description.peerreviewed | Peer Reviewed | en_US |
dc.contributor.affiliationum | Institute for Social Research | en_US |
dc.contributor.affiliationum | Department of Statistics | en_US |
dc.contributor.affiliationum | Departmentof Psychiatry | en_US |
dc.contributor.affiliationumcampus | Ann Arbor | en_US |
dc.identifier.pmid | 1475741 | en_US |
dc.identifier.pmid | 16763665 | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/57425/2/murphy05a.pdf | en_US |
dc.owningcollname | Institute for Social Research (ISR) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.