Cursive Script Word Recognition.
Brown, Michael Kenneth
1981
Abstract
New techniques and features have been developed for machine recognition of cursive script. Some experimental results have been obtained and are described in this thesis. Cursive script is input to a computer using a R and type graphics tablet. The cursive script pattern is stored as a chronological sequence of X and Y coordinates. New preprocessing techniques eliminate input noise and employ feature extraction utilizing special features to perform "prerecognition," a phase in which certain characteristics are recognized before the actual recognition phase is entered in order to aid in script orientation and normalization. Preprocessing operations performed include rotation, vertical and horizontal scaling and deskewing along with some gap filling and spot noise elimination. These operations are performed in a "closed-loop" manner in which verification is performed after each operation. The process may be repeated until satisfactory results are obtained. Next, feature extraction in performed and recognition is achieved by the application of a modified k-nearest neighbor (k-NN) rule. Cursive script words are "pre-selected" as most likely class c and idates by a "distance to class mean" measure. The k-NN algorithm is applied only to the pre-selected classes. Class locations are established by supervised machine learning, in which identified cursive script samples are presented to the machine for feature extraction analysis. The resulting parameters are stored in a "dictionary" for future reference. For testing purposes a special set of computer selected words is used. These words are chosen from the set of all possible two letter words for difficulty of discrimination (by the recognition algorithms). For selection purposes words are generated by computer linking of h and written letters using a cubic spline algorithm. Of the 676 possible two letter classes, the twenty that are most densely packed in the feature space ( and , hence, most difficult to discriminate between) are obtained by a computer search. These words are then linked to form a dense feature space distribution of 400 four letter classes which is again searched. The twenty most difficult to recognize four letter words thus obtained are used as the initial testing set. Results on 200 samples of the author's h and writing indicate that greater than 90% recognition accuracy is achievable using this "worst case" set of words. Better performance can be expected for virtually all other sets of words. Improvements in recognition accuracy over previously reported results have been obtained by several techniques. These include an improved feature set containing local character level information, a weighted metric for classification, the development of more sophisticated preprocessing techniques, and the use of unsupervised learning, which allows the machine to use unknown script input samples to update its "dictionary" as the style of the h and writing varies with time or with various script authors. From individual h and writing styles it is possible to recognize the script author and utilize this information to more accurately recognize h and writing by providing individual "dictionaries" for each author. Substantial recognition accuracy improvements are expected when the machine is exp and ed to read at the sentence level, thus allowing contextual information to be used.Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.