THE UNIVERSITY OF MICHIGAN INDUSTRY PROGRAM OF THE COLLEGE OF ENGINEERING CONTINUOUS HUMAN ESTIMATION OF A TIME-VARYING, SEQUENTIALLY DISPLAYED, PROBABILITY Gordon H. Robinson A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the University of Michigan Department of Aeronautical and Astronautical Engineering 1962 August, 1962 IP-578

Doctoral Committee: Professor Paul M. Fitts, Co-chairman Associate Professor Elmer Go Gilbert Co-chairman Associate Professor Frederick Jo Beutler Associate Professor Ward Do Edwards

ACKNOWLEDGEMENTS The author is greatful to Professor Elmer Go Gilbert for his guidance and instruction during the author's graduate program, and to Professor Paul Mo Fitts for his considerate leadership into the field of Engineering Psychology. Professor Arthur Wo Melton, Director of the Engineering Psychology Group, provided support and direction for the laboratory under which this work was performed, Special gratitude must be extended to Professor Ward Edwards for his direct supervision of the work and his many suggestions concerning interesting directions of investigation. The author also greatly appreciates the intellectual stimulation and leadership that Professor Edwards has provided during the past three yearso Support for this work was provided by Project MICHIGAN under Department of the Army Contract DA - 36-039 SC - 78801, administered by the UoSo Army Signal Corpso ii

TABLE OF CONTENTS Page ACKNOWLEDGEMENTS.............................................. i LIST OF TABLES............................................iv LIST OF FIGURES.....................v..................... LIST OF APPENDICES.. v................................iii LIST OF SYMBOLS............................................... ix INTRODUCTION............................................... Research on Estimation and Prediction...................... 2 THE EXPERIMENT................................... 7 Task Selectiono...................................... 7 Input Selection...o...................................... 10 Flash Series Generation.................................11 Experimental Variables.................................... 11 Task Instructions................................. 13 EXPERIMENTAL RESULTS..................................... 15 The Pilot Experiment........................................ 15 Response Measures............................... 17 Data Analysis..o....................................20 Experimental Data.................................. 20 Summary of Results......................................... 35 MATHEMATICAL MODELS o.................................... 39 A Model With Geometric Weighting.......................... 41 A Model With Constant Weighting............................. 48 A Descriptive Model................................. 52 A Normative Variation....................................... 60 DISCUSSION......................o.......... 63 Conclusion........................................ 68 BIBLIOGRAPHY............................................... 89 iii

LIST OF TABLES Table Page I Parameter Sets for the Descriptive Model Yielding Minimum Values of < e > p.................... 59 II Comparison Between the Performances of the Subjects and the Mathematical Models.......................... 69 iv

LIST OF FIGURES Figure Page 1 Tracking Console.......................................... 8 2 A Typical Response to a Sub-Problem....................... 16 3 Detection as a Function of Step Size and Flash Rate. The small and large step problems are plotted separately, the small extending from 0.06 to 0o64. Detection is measured in flashes................................................ 24 4 Percentage of Sub-Problems in which "No Detection" Occurs as a Function of Step Size and Flash Rate. The small and the large step problems are plotted separately, the small extending from 0.06 to 0.24 and the large from 0.16 to 0.64...................................................... 24 5 Detection as a Function of Probability, Constraint, and Small and Large Step Problems. Detection is measured in flashes.................................................... 25 6 Detection as a Function of Step Size, Sample Rate and Constraint. The small and large step problems are plotted separately, the small extending from 0.06 to 0.24 and the large from 0.16 to 0.64. Detection is measured in flashesO..Oo............................o................. 25 7 Convergence as a Function of Step Size and Flash Rate. The small and large step problems are plotted separately, the small extending from 0.06 to 0.24 and the large from 0.16 to 0.64. Convergence is measured in flashes.......... 27 8 Percentage of Sub-Problems in which "No Convergence" Occurs as a Function of Step Size and Flash Rate. The small and large step problems are plotted separately, the small extending from 0.06 to 0.24 and the large from 0.16 to 0.64.......................................... 27 9 Percentage of Sub-Problems in which "Accidental Initial Convergence" Occurs as a Function of Step Size and Flash Rate....................................................... 28 10 Convergence as a Function of Probability, Constraint, and Small and Large Step Problems. Convergence is measured in flashes............................................... 28 v

Figure Page 11 Convergence as a Function of Step Size, Flash Rate, and Constraint. The small and large step problems are plotted separately, the small extending from 0.06 to 0.24 and the large from 0.16 to 0.64. Convergence is measured in flashes................................... 29 12 Root Mean Square Error Over the Whole Sub-Problem as a Function of Flash Rate.....o........................... 531 13 Mean Error as a Function of Probability. This measure is made from convergence to the end of the sub-problem... 31 14 Root Mean Square Error as a Function of Step Size and Constraint. This measure is made from convergence to the end of the sub-problem................................33 15 Root Mean Square Error as a Function of Flash Rate and Constraint. This measure is made from convergence to the end of the sub-problem............................,.. 33 16 Root Mean Square Error as a Function of Probability and Constraint. This measure is made from convergence to the end of the sub-problem. The standard deviation for a 1735 sample mean is shown for the random problem.... 34 17 False Alarm Rate in False Alarms Per Flash as a Function of Step Size and Constraint. This measure is made from convergence to the end of the sub-problem.......... 36 18 False Alarm Rate in False Alarms Per Flash as a Function of Flash Rate and Constraint. This measure is made from convergence to the end of the sub-problem...... 36 19 False Alarm Rate in False Alarms Per Flash as a Function of Probability and Constraint. The measure is made from convergence to the end of the sub-problem...... 37 20 Responses of Two Mathematical Models and a Subject to a Portion of a Large Step Problem, Random Constraint, at 1 FPS. Normative Model K1 = 0.02, K2 = 0.10, K3 = 1; Descriptive Model K1 = 0.12, K2 = 0.60, K3 = 12, K4 = 0o.. 62 21 Responses of Two Mathematical Models and a Subject to a Portion of a Small Step Problem, Random Constraint, at 1 FPS. Normative Model K1 = 0.20, K2 = 0.10, K3 = 6; Descriptive Model K1 = 0.15, K2 = 0.40, K5 = 10, K4 = 0... 62 22 Detection as a Function of Step Size for Four Subjects. Detection is measured in flashes.......................o 82 vi

Figure Page 23 Detection as a Function of Flash Rate for Four Subjects. Convergence is measured in flashes.................... 83 24 Convergence as a Function of Step Size for Four Subjects. Convergence is measured in flashes........................ 84 25 Convergence as a Function of Flash Rate for Four Subjects. Convergence is measured in flashes........................ 85 26 Root Mean Square Error as a Function of Flash Rate for Four Subjects...................................... 86 27 Root Mean Square Error After Convergence as a Function of Flash Rate for Four Subjects............................ 87 28 False Alarm Rate, in False Alarms Per Flash, as a Function of Flash Rate for Four Subjects.......................... 88 vii

LIST OF APPENDICES Appendix Page A PROBABILITY TRACKING APPARATUS....................... 70 B INPUT PROBABILITY GENERATION........................ 71 C VARIANCES OF SAMPLE AVERAGES FROM FINITE POPULATIONS. 73 D EXPERIMENTAL PRESENTATION ORDER..................... 75 E INSTRUCTIONS................................... 77 F ANALOG COMPUTER CIRCUIT FOR COMPUTING PAY,.......... 78 G TWO QUALITATIVE RESPONSE EXCEPTIONS................. 79 H DATA NOT AVERAGED OVER SUBJECTS...................... 81 viii

LIST OF SYMBOLS a a constant C convergence, a measure made on the subject's response (see page 17) C no convergence, a measure made on the subject's response (see page 18) D detection, a measure made on the subject's response (see page 17) D no detection, a measure made on the subject's response (see page 17) e(n) a model's error at n, e(n) = r(n) - P2 e2(n) the expected value of e2(n), an ensemble average <e2>A the average value of e2(n) over some set of samples A eM(n) the descriptive models error at n, eM(n) = r(n) - P2 eMS(n) the error between the subject and the descriptive model at n, eMS(n) = Rn - r(n) eS(n) the subject error at n, eS(n) = Rn - P <eS2>p, <eM2>p, <eMS>p the average values of eS2(n), eM2(n) and eM(n)eMS(n) over a problem E(x) the expected value of x FAR false alarm rate, a measure made on the subject's response (see page 18) fps flashes per second, the flash rate IC initial convergence, a measure made on the subject's response (see page 18) 1 i=s k T i (Pi - Pi) i-T i-l k1 the decision criterion level in the descriptive model k2 the fractional response adjustment in the descriptive model ix

LIST OF SYMBOLS (CONT'D) k3 the number of flashes in u(n) in the descriptive model k4 the flash shift between the subject's and the descriptive model's responses M the last flash in a sub-problem M. the length of sub-problem i ME,C mean error after convergence, a measure made on the subject's response (see page 18) n a flash index, n = 1 is the first flash in a sub-problem N the total number of flashes in the model's summation P the probability of a 1 in a 0,1 binary series Pi the probability of a 1 in the binary series i Pr(x) the probability of x r the geometric ratio in the geometric model r(n) a model's response at n r(n) the expected value of r(n), an ensemble average Rn the subject's response at n RMSE the square root of the mean squared error, a measure made subject's response (see page 18) RMSE,C the square root of the mean squared error after the point of convergence, a measure made on the subject's response (see page 18) p a correlation coefficient S the total number of sub-problems in a problem s the nth flash in a binary series n 2 a. the variance of a sample from a population with P = P. 1 ar 2(n) the variance of r(n) at n, an ensemble average a2 (n) the variance of sn at n, an ensemble average x

LIST OF SYMBOLS (CONT D) T the total number of flashes in a problem u(n) an average of k3 flashes in the descriptive model 1'=s 2 S i=l' Wi a weight attached to sample sn-i+l xi

Io INTRODUCTION Human decision tasks can be described as static or dynamic, In a dynamic decision task some of the relevant stimuli vary as a function of time, or of past decisions, or botho The decision maker must keep track of these changes in order to perform satisfactorily. This experiment examines the human ability to follow or estimate a time-varying probability, This probability could be an important input to a dynamic decision tasko This experiment attempts to isolate the estimation of the probability from the utilization of the estimate in making decisions. The task selected was the estimation of the mean of a binary (Bernoulli) distribution, Samples (0 or 1) from the distribution were displayed sequentially and the subject made a continuous estimate of the distribution mean. The mean varied with time. The experimental details are described in Chapter IIo The study of probability estimation isolated from decision making is important for two reasons, In decision making under uncertainty the estimation of probabilities is always at least an implicit part of the task, The ability of the decision maker to produce decisions according to a maximum expected value criterion will depend directly on his ability to estimate the probabilities of the various alternative courses of action. There is also an applied interest in the human ability to turn uncertainty into probability. In many systems involving stochastic inputs it is relatively easy to automate the application of -1

-2decision rules. It is far harder, however, to find automatic means for supplying the probabilities and payoffs necessary for the application of the rules. Probability estimation is a candidate for inclusion as a human task in semi-automatic information processing and decision making systems in which the subsequent choice of a course of action is performed automatically, Research on Estimation and Prediction Human binary choice behavior has been studied extensively, This experiment complements previous studies by isolating the estimation function and by introducing dynamic probabilities. Most of the studies on human binary choice have not included estimation as an explicit part of the subjects' tasko These studies generally generate prediction data which are averaged over blocks of trials (decisions, choices) to produce prediction frequencies. A prediction frequency of 0o67 on trials 121 through 150 would indicate that the predictions during these 30 trials were distributed about two to one between the two choices. The subject may, or may not, be told the correct choice after each trial0 The correct choices are drawn in some manner from a stationary binary distribution0 Examples of these experiments, often called probability learning experiments, are Grant, 1953; Hake and Hyman, 1953; Hake, 1954; Estes, 1957; and Neimark and Shuford, 19590 Most of these studies report prediction frequencies approaching asymptotically the frequency of correct choice or the generating probability. This phenomenon has been named "probability matching"o This behavior is

-3not optimum, The optimum strategy, under instructions to maximize correct choices, is to predict consistantly the event having the larger probability. This event can be inferred from the relative frequency of previous events, Behavior significantly different from matching has been reported by Gardner, 1959 and Edwards, 1961 The number of trials may have been insufficient in some of the experiments in which matching was found0 An unpublished experiment by Tannenbaum and Edwards at The University of Michigan indicates an interaction between the amount of reward for a correct choice and the prediction frequency Some subjects used near-optimum strategies, A few studies have looked at the estimation ability of the binary decision maker. Grant (1953) reports an experiment by Hornseth in which the subject was asked to guess, at the end of 150 choice trials, which event had been the more frequent. The prediction frequencies for the last 30 trial block were close to the matching level. The data on guessing the overall frequency was plotted as the percentage of guesses that one of the events was the more frequent against the actual event frequencyo These data showed the percentage of correct guesses at a particular event frequency to be higher than the event frequencyo (ioeo, an event frequency of 0O70 would be guessed to be the more frequent more than 70 percent of the timeo) Grant concludes from this experiment and presumably from other probability learning experiments that the processes of estimation and prediction are distinct and that prediction is the more

-4accurate. The processes may indeed be distinct but the accuracy of estimation was not measured by Hornseth's experiment, Hake (1954) surveys a major portion of the probability learning literature including experiments by Estes, Grant, Hornseth, Hake, and Hyman and concludes that estimation is not accurate enough to be the basis for binary predictions, The choice behavior in these experiments should have been based on an estimate of the event frequencies or generating probabilitieso To conclude that the non-optimum performance was an indicator of probability estimation ability is unjustified, however, Neimark and Shuford (1959) included estimation as an explicit part of the task in a probability learning experiment0 In addition to making a choice at each trial some subjects were required to estimate the proportions of the past eventso The events frequency was 0o67. These subjects gave an unbiased estimate and had prediction frequencies significantly higher than the matching level. Subjects who predicted only had prediction frequencies at the matching level. These results suggest the explicit estimation improved predictiono Erlick (1959) looked at estimation without a decision task, He presented 100 binary events at a rate of five per second and asked for an estimate of the more frequent event and for an actual estimate of the event frequency on a continuous scale. Four event frequencies were used: 0O50 - 050, 0o48 - 0~52, 0045 - 0.55, and 0043 - 0.57. The data indicated that the more frequent event would be selected

correctly 75 percent of the time if the event frequency difference was approximately 0o08 (o046 - 0o54)o The median estimate of the frequency was within Oo01 for 0O50 and 052; 0o55 and 0057 were each estimated about 0O02 high, All of the experiments reviewed above used stationary processes to generate the binary events, A few experiments have used a dynamic generating process. These experiments have all used prediction as the required task, however, and thus give only indirect evidence on estimationo Grant (1953) reports an experiment in which the generating probability changed periodically as a square wave. The probability values always differed by 050 with higher values: 1,00, 0,90, 0o80, and 0070. The period was 40 events and two and one-half cycles were presented, A prediction frequency was calculated by averaging over five trials and about 40 subjects. This prediction frequency followed the cyclic change only when the higher probability was loOO or 0O90; reaching 0o95 in 20 trials at 1o00 and 0070 in 20 trials at 0O90O No systematic performance changes apparently occured during the two and one-half cycles The subjects were evidently not instructed about the nonstationarity of the generating process, Such instructions would probably have an appreciable effect, Goodnow and Pettigrew (1955) performed a binary prediction experiment in which a change from 0.50 - 0050 to 0 - lo00 occurred, They found that the response to such a change was more rapid when the

-6subjects had initially experienced a 0 - 1,00 series prior to the 0o50 - 0,50 series, Again, however, no specific instructions were given concerning the non-stationarity of the generating process, In both the Grant and Goodnow experiments there is evidence that a change in the generating probability of 0,50 will produce an appropriate change in the prediction frequency if the change is to an extreme probability. These extreme probabilities (1,00 and 0O90) evidently represent obvious changes even with no expectancy for change induced by the instructions, Flood (1954) discusses the strategy of a subject who may not be convinced that the probabilities are stationary. No particular results were obtained in an experiment designed to induce certainty versus uncertainty in the stationarity of a stationary generating process. The human ability to give a direct magnitude estimate of a stationary binary probability is uncertain, Most experimenters have postulated estimation only as an intervening variable between the display and a decision task. Decision behavior was improved in one experiment by including explicit estimation in the task, Two questions seem appropriate: What role does estimation play in a decision task, and how well can this estimation be performed, This experiment sheds light on the second question as well as providing a fairly comprehensive look at the continuous estimation of dynamic probabilities,

II. THE EXPERIMENT Task Selection The task was estimating the mean of a binary distribution as samples (individual drawings) from that distribution were sequentially displayedo This task was selected for two reasonso The binary distribution is completely described by one parameter, its mean. It is thus easily understood by people unfamiliar with the mathematical aspects of probabilityo The second reason for a binary presentation was its relevance to the binary decision and estimation literature discussed in Chapter Io Except where specifically defined otherwise, the word "probability" will refer to the current probability of drawing a right sample. Samples from the binary distributions were presented at a fixed rate by flashing either the right or left light of the apparatus shown in Figure 1. Directly beneath the two lights is an illuminated dial indicating the position of the manual response levero A continuous response mechanism was selected as appropriate for the estimation of a continuously varying stimulus. The display and response mechanisms were designed for convenient and effective interpretation and control. Details of the apparatus are given in Appendix Ao The lever was free to move between stops at 0 and 100 on the scale. The lever and associated mechanisms contained enough Coulomb friction to retain a setting without constant force, Neither springs nor viscous friction was used. The smallest scale division was 2. It was possible to position the pointer precisely to one half of the least division corresponding to a probability change of 0o01. A main scale division occurred at every fifth small division and was marked: 0, 10, 20, o.,, 90, 100o -7

-8Lights Dial, Indicating Lever... ~~~~~^. >~ Position, 0-100 in f^ \0\ ~50 Divisions Tracking Lever Figure 1. Tracking Console... i I.....::::::::::::::::::::1... \ Figure 1. Tracking Console.

-9The position of the lever was recorded by two means, Friden punched paper tape and a Sanborn continuous strip recordero The paper tape was punched in a Grey, or Cyclically Permuted binary code using six channels of an eight channel punch to encode 101 symbols, 0 through 100, Pilot studies indicated that the rate of output sampling necessary to recover the response information was dependent on the flash rate and that a response sampling rate equal to the flash rate would be adequate. A sample was thus taken every two seconds at the slowest presentation rate, 0.5 flashes per second, and every 0.125 seconds at the fastest presentation rate, 8 flashes per second. The punched tape record was later transferred to IBM cards on a modified IBM Tape to Card Converter and the data analyzed on an IBM 709 Data Processing System. The Sanborn records were used to make qualitative judgements about the response and to select appropriate criteria for the computer analysiso They also permitted continuous monitoring of the task as it was performed. The subject and his console were housed in a small isolated room. The subject wore noise insulating ear muffs. He had a two-way communication system with the experimenter. A low level white noise was presented by the earphones during the experimental runo His microphone was always on, and comments during the experimental run were permitted.1 When the experimenter spoke to the subject the noise was automatically switched off. The task has a strong resemblance to a standard unidimensional manual tracking tasko The main difference is the presentation of the 1Few comments were actually madeo Most of these were not printable,

-10target. Instead of being displayed explicitly as a dot or a line it exists only as a parametric description of the method used to select the flash sequenceo In this experiment the generating process was time-variant and the target could be defined as the mean of the distribution from which the last flash was drawn. It is impossible to recover the precise target from the information available to the subjects. The cursor, or 0-100 dial pointer in this case, is pointed at an estimated value of the target. The system is essentially open loop since the lack of an explicit target prohibits the formation of an error signal. The dynamics are almost entirely in the mental computation and there was no indication that motor skill was a limiting factor. The use of a tracking lever as a response means is unique in research on probability estimation. It is appropriate to the task and permits an easy understanding of the response scale by the subjects. Both end points are well fixed in the same sense that events which are impossible and certain are fixed in value on a personal or subjective probability scale. The 50 point on the scale might also be considered as an anchor point in the sense that all subjects clearly understood that 50 per cent meant equally frequent flasheso Input Selection The input probability changed in a series of discrete steps. This input form permitted visual, qualitative interpretations to be made from the data as well as the more extensive analysis done by the computer. It also permitted static as well as dynamic measurements to be madeo The step change sizes and their directions as well as the number of flashes

-11between steps were selected randomly from a finite set of values. The mechanism for the generation of the sequences is described in detail in Appendix B. The sequence of steps so generated is called a problem. Preliminary investigations revealed that step changes ranging from o0.0o6 to 0.64 in eight values would adequately cover the interesting 2 range of probability change. The number of flashes between step changes was selected from a set ranging from 34 to 89 flashes. Thirty-four flashes was considered the smallest number required to minimize interaction between successive step changes. The range between 34 and 89 was considered sufficient to prevent any performance improvement due to the learning of inter-step length. A step change and the flashes up to the next change is called a sub-problem. Flash Series Generation The flashes were drawn from finite populations without replacementso The population size was an experimental variable and is discussed below. Finite populations were selected to fix the average value of the flashes for each sub-problem. The effects of finite population sampling on variances are shown in Appendix Co Experimental Variables Five independent variables were used in the experiment. the rate at which the flashes were presented, the magnitude and sign of the step change, the probability that the step changed to, a constraint on the randomness of the flash series, and subjects. The number of flashes between step changes was not studied as a variable. The values of these 2A pilot experiment with a simplified apparatus was run before the main console was built in order to establish the general form of the response and reasonable ranges for the independent variables. It is discussed in more detail in Chapter IIIo

-12variables were as follows: Rate; 0.5, 10, 2.0, 4.0, and 8.0 flashes per second, (fps) Step size; oo06, 0.12, 016, 0o18, 024, 0.32, 0.48, 0.64; both + and - Probability; 0.02, 0.08, 0.14, 0.18, 0.26, 0.32, 0.34, 0.44, 0.50, and the complementary values between 0.50 and 1.00. The step changes and probability values were arranged in two problem types as described in detail in Appendix B. For one type, the small step problems, the mean step change is approximately 0.15; for the other, the large step problems, the mean step change is approximately 0,40. Both types contain the entire range of probabilities and are symmetric about 0.50, The constraint variable had two values leading to the random and the constrained problem typeso The random problems were generated from finite populations which were the length of the respective sub-problems being generated. These finite populations were thus of size 35 through 89 flashes. It was felt that these sizes would be large enough to yield experimental results fairly close to those which would result from infinite populations The constrained problems were generated from finite populations of 17 flashes. The lengths of the sub-problems were arranged in whole number multiples of 17: 34, 51, 78, and 85 flashes. It was assumed that this constraint would be sufficient to indicate those aspects of performance that constraint would affect. It is not a severe enough constraint to be readily perceived from inspection of the flash series, however. The same series of steps and probabilities were used in the random and in the constrained problems.

-13Each of the four subjects performed the task in 15 sessions. Each session consisted of two or three problems separated by a short rest period. Each session lasted for about an hour, Each subject saw the same series of problems in the same order. Rates, small and large step problems, subjects, and constraints were exhaustively combined. The order of presentation was chosen at random under the constraint that the tracking sessions were of about the same length. Appendix D gives the sequence usedo The pilot experiment had indicated that about 25 minutes at two flashes per second was the maximum time that a subject could be expected to track without a significant decrement in his performanceo The problems presented at 0.5 and lo0 flashes per second were given in four and two separate sessions respectively in order to limit all sessions to a maximum of 25 minutes of continuous trackingo Task Instructions Careful attention was paid to the instruction of the subjects prior to the recorded experimental sessionso This effort was repaid with an excellent consistency in the tracking behavior of the eight subjects, four in the pilot and four in the main experiment. The standard instructions used are shown in Appendix Eo These served only as the initial, formal introduction, however. Actually about 10 minutes was spent discussing the task to be performed and the purpose of the experimento Instruction was concluded when the experimenter was satisfied that all important concepts were understoodo A 45 minute practice session was used preceding the 15 hours of data recordingo During this session the response was continuously monitored

-14and the subject was assured of the quality of his performance. The lack of error feedback made it difficult for the subject to evaluate his own performance until he had some experience with the task. The instructions were as complete as the subject seemed to need in all but one important area. He was told nothing about the dynamics of the input sequence aside from the fact that there would be changes in the probability. He was told to expect both rapid and slow changes. He was instructed that the pay he would receive would be a constant rate per minute of tracking minus the accumulated squared error during the same interval. The amount was computed automatically on an analog computer operating during the tracking sessiono The circuit used for the pay scheme is shown in Appendix Fo

III. EXPERIMENTAL RESULTS The Pilot Experiment A pilot experiment was run prior to the main experiment in an attempt to answer three questions. The first concerned the general form and quality of the response. The responses found were qualitatively similar to the response in Figure 2. Both the response to change and the estimation of probability were better than expectedo The second question answered by the pilot experiment concerned changes in response with continued performance of the task, reflecting learning or fatigue. The identical problems were presented to the four subjects in each of six sessions about two days aparto There was no indication of a significant change in performance after the first sessiono To test for specific problem learning the problem which had been presented for six sessions was presented again only backwardso No decrement in performance was observed on this reversed problem. It was concluded that no specific problem learning had occurredo None of the subjects recognized that the problem had been the same in each of the six sessions nor were they able to describe the changes which had occurred in the probabilitieso Tracking sessions up to 15 minutes caused no particular fatigue or boredom and it was concluded that sessions of 25 minutes would be permissible on the more isolated, impressive, and comfortable console used in the main experiment. The third question answered by the pilot experiment concerned the kind and amount of instruction needed to bring the subjects up to a reasonably consistent level of performanceo The instructional method described in Chapter II was the resulto The subjects in the main experiment performed consistently after the instruction -15

1 Input False Alarms <.5 - — \ Response o -.J Convergence O'\Detection I ] I I I L 0 10 20 30 40 50 SAMPLES Figure 2. A Typical Response to a Sub-Problem.

-17and practice. Appendix G presents two interesting exceptions. The important task learning evidently occurs during the first few minutes of performance and the 45 minute practice session was sufficient. Response Measures The response measures were chosen after study of the Sanborn records from the main experiment. The form of these responses was the same as in the pilot experiment and shown in Figure 2~ The response was characterized by fairly rapid changes separated by periods of little or no change. This discontinuous form indicates that the behavior might be described in terms of a series of decisions concerning changes in the probability, A descriptive model with this characteristic is developed in Chapter IVo Several of the response measures were chosen to fit this response form. All of the response measures refer to individual sub-problems. The following measures were calculated: lo DETECTION, D: The number of samples from the step change to the point where the response has changed 0O05 in the direction of the new probability from its value at the point of the step change. If Rn is the response at point n in a sub-problem which starts at n = 1, the point of detection is where Rn = R0 + 0.05, the plus sign for an increasing step and the minus for a decreasing step. 2o NO DETECTION, D: The number of sub-problems in which detection did not occur; Rn never changed 0.05 in the direction of the new probability. 3~ CONVERGENCE, C: The number of samples from the step change to the point where the response is within 0.05 of the new probabilityo

-l8The point of convergence is where Rn = P + 0.05 where P is the probability following the step change. The point of convergence is the first entry into this region from either side. 4. NO CONVERGENCE, C: The number of sub-problems in which convergence did not occur; Rn was always outside the 0,05 region about PO 5. INITIAL CONVERGENCE, IC. The number of sub-problems in which the response was within the convergence region about the new probability at the point of the step change. P - 0,05 = Ro P + 0.05 6. ROOT MEAN SQUARE ERROR, RMSE: The square root of the mean squared error over the entire sub-problem. Error equals the response minus the probability. The response was measured on a 0 to 1 scale corresponding to the probability measure and the error can thus be considered to be an error in probabilityo For a sub-problem of length M; RMSE nF=M 1 i n=l 7. ROOT MEAN SQUARE ERROR AFTER C, RMSE,C: The square root of the mean squared error from the point of convergence to the end of the subn=M r proble m. RMSE,C = 1K C (R- P)2 This measure and the follown=C+l ing two were only made when either convergence or initial convergence was measured 80 MEAN ERROR AFTER C, ME,C: The mean error from the point of n=M convergence to the end of the sub-problem. MEC = 1 Z (R P) M-C n=C+l 9o FALSE ALARM RATE, FAR: The number of times per sample that the response left the 005 convergence region between the point of convergence

-19and the end of the sub-problem. If P - 0~05 Rn_1 = P + 0,05 and Rn< P - 0.05 or Rn > P + 0.05 then the point n would be a false alarm point. Detection and convergence were measures designed to describe the discontinuous response form. The 0~05 criterion used in these measures was selected after an extensive study of the data. A sudden response to the new probability occured shortly following a step change in about 80 percent of the sub-problemso This movement was interpreted to be the result of the perception of the change in the probability. The 0.05 detection criterion was selected as measuring this point with fair consistency. For step changes greater than about 0o15 this measure is relatively insensitive to the choice of the 0.05 criterion since the sudden response was characteristically 0.10 or greatero Convergence is more dependent on the selection of 0.05 as a criteriono The point of convergence was most useful, however, in determining the beginning of measures 7, 8, and 9. These measures were all averaged over flashes and the location of the convergence point did not affect their valueso Detection and convergence, as measured with the 005 criterion, are not particularly informative for the smallest step change, 0o06. Measures 7, 8, and 9, the three starting at the point of convergence, indicate the subjectst static estimation ability. The subject is operating under what might be called a dynamic set, howevero He has an expectancy for changes in the probabilityO Changes in his responses during this period could be called "microstructure" tracking since the subject was not aware that the probability was constanto No measures were made of the persistence of this microstructure tracking on the longer sub-problems.

-20This behavior might begin to diminish with long presentations of a constant probability. RMSE was the only measure made on all sub-problems regardless of their response form. It indicates the overall quality of performance. RMSE is a common measure in continuous tasks of this kind largely because it is easily derived and manipulated in mathematical expressionso Data Analysis There was one sub-problem for each rate, step size, step direction, probability, constraint, and subject, 3440 in all. The combinations of variables presented here was judged to be the most informative set from among the total available from the computer analysis. These quantitative performance measures were the intended output of this experiment and since no testable hypothesis were generated no tests of statistical significance were made. Experimental Data Differences Between Subjects. No qualitative differences existed among the four subjects used in the main experiment. The four subjects in the pilot experiment behaved similarlyo Inspection of the data indicated that for general performance information it would be best to average the data over subjects. Appendix H presents some of the subject-by-subject data. Detection, Do Figures 3 through 6 show the effects of the independent variables on detection. The data on step direction showed no appreciable difference between positive and negative directions and they are averaged together in all figureso The interaction of step size and rate shown in

-21Figure 3 was the most interesting relation foundo Detection decreases fairly linearly with step size and increases fairly linearly with the logarithm of rate. The linear increase in detection with the logarithm of the rate probably reflects a combination of factors influencing the response. A small linear increase with rate would be caused by a constant reaction and movement timeo For the usual tracking tasks this might be expected to be on the order of 0.5 seconds yielding a lag of 2 flashes at 4 fps and 4 flashes at 8 fpso The more important factor is probably a change in the method of performing the task as rate changes. At rates of 0.5 and 1 fps the subjects reported counting the flashes at times, occasionally counting the number of flashes of the lower frequency and comparing this to an estimate of the total number of flashes. They did not use any procedure of this sort consistently, however; at least not one apparent to them. They all reported that the rate of 2 fps was the most difficulto Evidently the methods which they had used effectively at 0.5 and 1 fps became difficult if not impossible at 2 fps. Beginning at 4 fps it is clearly impossible to respond to separate flashes and the series is probably perceived in groups of flashes. The task becomes similar to a continuous tracking task at these rates. Reese (1943) postulated a change in the mechanism of counting light flashes at about 4 flashes per second. Figure 3 shows an effect due to the presentation of the step changes in two separate series, the small and the large step problems. There is a region of overlap in step size between these two problems.

-22The smallest change in the large step problem is 0.16 and the largest in the small step problem is 0.24. In this overlapping region the small step problem yields detections of from one to six flashes higher than the large step problem at all rates. The subjects were evidently modifying their tracking method according to the type of problem being presented. The large and small step problems were ordered randomly, of course, and the subjects had no prior indication that there were two problem types. This change is perhaps not surprising considering the difference between the two problem types. The average step changes were 0.15 in the small step problem and 0.40 in the large step problem. Step changes of about 0.50 and larger are readily noticed, The subjects appear to have made larger, more decisive, response changes on the large step problem than on the small step problem. This change to a more responsive behavior is appropriate in quickly reducing the large errors following the larger step changes. "No detection" as a percentage of the total sub-problems is shown in Figure 4 as a function of step size and rate. About 90 percent of the "no detections" occurred with the combination of rate above 4 fps and step size below 0.15. Some of the "no detections" were probably caused by occasional lapses of attention. At 4 fps a 42 flash sub-problem is over in 11 seconds. In 25 minutes of continuous tracking a few 11 second lapses are certainly reasonable. The effect of probability on detection is shown in Figure 5. Perhaps the most interesting finding is that detection is not appreciably smaller for the extreme probabilities. In Chapter IV it will be seen that

-23responses generated by simple running averages produce detections which are similarly independent of probability. The variability among a set of detections of a particular step size and rate will depend on the probability, however. Detections of central probabilities, those near 0.5, will have more variability than detections of extreme probabilities, those nearer 0 or 1o The effect of the flash generation constraint on detection is shown in Figures 5 and 6. Constraint has no particular effect on average detection. As in the case of probability, however, constraint will effect the variability of a set of detectionso The constrained problems will yield less variable detections. Convergence, C. Figures 7 through 11 show the effects of the independent variables on convergence. The interesting effects are again with step size and rate. The effect of rate on convergence is similar to its effect on detection; a linear increase in convergence with the logarithm of rate. The effect of increasing step size is to increase convergence, although the increase is smallo The number of flashes between detection and convergence increases as step size increaseso This probably reflects the size of the response more than any other factoro The majority of sub-problems show a response successively approaching the new probability rather than one that overshoots. Convergence shows a similar difference to that noted in detection between the small and the large step problem in the region of overlapping step size.

-2430 25 20 2 10.5 FPS 0.1.2.3.4.5.6 STEP SIZE Figure 3. Detection as a Function of Step Size and Flash Rate. The small and large step problems are plotted separately, the small extending from 0.06 to 0.24 and the large from 0.16 to 0.64. Detection is measured in flashes. 20 E- 15 Z O 10 I 5 0.1.2.3.4.5..6 STEP SIZE 15 H ~ 10 / 1 lO ( 5 -.5 1 2 4 8 FLASH RATE (FPS) Figure 4. Percentage of Sub-Problems in which "No Detection" Occurs as a Function of Step Size and Flash Rate. The small and the large step problems are plotted separately, the small extending from 0.06 to 0.24 and the large from 0.16 to 0.64.

-2520 Small 20 Step 15 - % o - Large 5 Step -- Constrained Random 0.2.4.6.8 1.0 PROBABILITY Figure 5. Detection as a Function of Probability, Constraint, and Small and Large Step Problems. Detection is measured in flashes. 20 | 15 - 0 5 - -- Constrained - Random 0.1.2.3.4.5.6 STEP SIZE 20 15. -- — C- Constrained Random.5 1 2 4 8 FLASH RATE (FPS) Figure 6. Detection as a Function of Step Size, Sample Rate, and Constraint. The small and large step problems are plotted separately, the small extending from 0.06 to 0.24 and the large from 0.16 to 0.64. Detection is measured in flashes.

-26"No convergence," expressed as a percentage of sub-problems, is shown in Figure 8. "No convergence" remains relatively insensitive to changes in step size except for the largest step, 0.64, where it is zero. It is approximately 10 per cent for the large step problem and 12.5 per cent for the small step problem. "No convergence" rises sharply with increasing rate, reaching about 28 per cent at 8 fps. This is consistent with the data which show convergence equal to 35 flashes at 8 fps; about the length of the shortest sub-problem. "Initial convergence" has a high of 35 per cent for a step change of 0.06 and goes to zero for steps of 0.48 and 0.64. It increases slightly dwith rate from about 8 to 12 per cento The relationship between probability and convergence is shown in Figure 10o Convergence is relatively insensitive to probability as was detection, The effects of constraint on the sample generation are shown in Figures 10 and 11. Again, as with detection, there is little if any effect. Root Mean Square Error, RMSEo This measure was introduced to provide a single, overall indicator of the task performance. The most informative variation of RMSE is with rate as shown in Figure 12o RSME increases linearly with rate from 1 to 8 fps. It is interesting to evaluate this performance measure on a time basis as s it mightbe in a situation where it was desirable to perform the estimation in the shortest time possible. RSME divided by fps yields values of error-seconds per flash which decrease as rate increases, going from 0,134 at 1 fps to 0.022 at 8 fps. This decrease might well continue with even higher rates as the task becomes the

-2735 30 Rate 8 8 4 25 2 20 42.5 o 15 [ 1 FPS5 10 %0.2.3.4.. 6 STEP SIZE Figure 7. Convergence as a Function of Step Size and Flash Rate. The small and large step problems are plotted separately, the small extending from 0.06 to 0.24 and the large from 0.16 to 0.64. Convergence is measured in flashes. 15 ~ 10 i4 5 \ 5 0.1.2.3.4.5.6 STEP SIZE 25/ 20 Z z / U 15 / FLASH RATE FPS Figure 8. Percentage of Sub-Problems in which "No Convergence" Occurs as a Function of Step Size and Flash Rate. The small and large step problems are plotted separately, the small extending from 0.06 to 0.24 and the large from 0.16 to 0.64.

-2840 30 U 20 10 00.1.2.3.4.5.6 STEP SIZE 20 Z 15 0 10 _ W 5-.5 1 2 4 8 FLASH RATE (FPS) Figure 9. Percentage of Sub-Problems in which "Accidental Initial Convergence" Occurs as a Function of Step Size and Flash Rate. 25 015 10 Small Step -- -_ Constrained Problem -- Random 5 Large Step --- Constrained Problem *. Random 0.2.4.6.8 1.0 PROBABILITY Figure 10. Convergence as a Function of Probability, Constraint, and Small and Large Step Problems. Convergence is measured in flashes.

-2925 - 20,Mae o I V"', - --- Constrained d 1 - Random 0 -- 0.1.3.4.5.6 STEP SIZE 30 25 - 20 15 0o 10 - - Constrained Random 5.5 1 2 4 8 FLASH RATE FPS ) Figure 11. Convergence as a Function of Step Size, Sample Rate, and Constraint. The small large step problems are plotted separately, the small extending from 0.06 to 0.24 and the large from 0.16 to 0.64. Convergence is measured in flashes.

-530tracking of the relative brightness of the lights. Either the limitations on the judgment of relative brightness or simple reaction time would finally limit the performanceo This performance index must be viewed with caution. The error itself has a meaningful upper bound at the level where the lever is left stationary or is moved in some manner independent of the flashes. As this error level is approached further increases in rate would continue to decrease the index of error-seconds per flash but the index would have little meaning The following three measures were made from the point of convergence to the end of the sub-problem or from the beginning of the sub-problem if "initial convergence" occurred. They are therefore measures made on an average of 85 to 90 per cent of all of the sub-problems and on about 95 per cent of those sub-problems with step changes above 0.15 at rates below 4 fps. Mean Error After Convergence, ME, Co The mean error is shown as a function of probability in Figure 13. The average estimate is essentially unbiased at all probabilities. The largest error is smaller than the least scale division on the subject's response indicator (0.02). Mean error was not significantly affected by rate, constraint, step size, or subjects. This finding contradicts a body of conjecture based in part on the results of static estimation and choice experimentso Neither the overestimation of high nor the underestimation of low probabilities appearso The two distinctive features of this task were the dynamic estimation and the tracking lever as a response mechanism. The excellence in static estimation was undoubtedly due at least in part to these two featureso

-31-.20.18.16 - 0.12.10 0 1 2 3 4 5 6 7 8 FLASH RATE (FPS) Figure 12. Root Mean Square Error Over the Whole Sub-Problem as a Function of Flash Rate. +.02 +.01 0 -.01 - -.02 0.1.2.3.4.5.b.7.8.9 1.0 PROBABILITY Figure 13. Mean Error as a Function of Probability. This measure is made from convergence to the end of the sub-problem.

-32Root Mean Square Error After Convergence, RSME, C. RMSE, C is shown in Figures 14 through 16. The only independent variable not affecting RMSE, C is step size. This indicates that the period after the point of convergence is not affected by step size. The constraint on the generation of the flashes reduced the RMSE, C by about 0.014 and does not appear to interact with either step size or rate. RMSE, C decreases with increasing rate from 0.5 to 2 fps and thereafter remains relatively constant.3 Considered together with the data indicating smaller detection values at the lower rates it is high probable that the number of decisions concerning changes in the probability on a per flash basis, is highest at the lowest rate. Thus the additional decision time available at the lower rates permitted smaller detection values but resulted in larger RMSE, C when the probability was constant. The effect of probability on the RMSE, C is shown in Figure 16. The "random" problems are consistently higher than the "constrained" problems at all probabilities. The N = 17.3 line is the RMS error, or standard deviation, of a 17.3 flash averageo The subject's response is about this good or better at all probabilities. False Alarm Rate, FAR. The number of false alarms per flash is shown in Figures 17 through 19o Its behavior is similar to RMSE, C. It is similarly insensitive to the size of the step change. Increasing rate causes a decrease in FAR up to 4 fps with an apparent leveling off above 4 fps. These data lend additional support to the hypothesis concerning an increase in 3Inter-subject variation was high for 4 and 8 fpso See Figure 27, Appendix Ho

-33-.11.10.09- - Random 0.08.07 M.06 Constrained.05.04.03.02.01 Of I I,,, I,,, 0.1.2.3.4.5.6 STEP SIZE Figure 14. Root Mean Square Error as a Function of Step Size and Constraint. This measure is made from convergence to the end of the sub-problem..12.11.10.09 Random.08' 9.07.07 W; " " ~Constrained X.06.05.04.03.02.01 I.... I i.5 12 4 8 FLASH RATE (FPS) Figure 15. Root Mean Square Error as a Function of Flash Rate and Constraint. This measure is made from convergence to the end of the sub-problem.

-14-.14 N = 17.3 (Random) Random.10 8;.08'. y a Constrained 06 of.2.4.6.8 1.0tandard deviation for a 17.3 sample mean is shown for the random problem.

-35number of decisions per flash at the lower rates. False alarms can be considered as indicating decisive changes in the estimate. FAR remains constant over the entire probability range with the exception of the extreme values: 0O02 and 0o98~ These probabilities were usually estimated as 0 or 1 with an excursion away from 0 or 1 only following an occurrence of the infrequent flash. Since FAR did not change with probability it appears that the rate of "decisive" movements (greater than 0.05 from the probability) remained constant for all probabilities. The reduction in RMSE, C as the probability tends to extreme values therefore indicates that the time spent at these "erroneous" estimates decreased with extreme probabilities. This hypothesis is supported by observations made during the tracking sessionso The lever movements appeared larger although less frequent at the more extreme probabilitieso The increase in magnitude evidently compensated for the decrease in frequency to maintain the FAR at a constant level. The constrained series produced a slightly higher false alarm rate than the random serieso The constrained series has a greater number of runs of right or left flashes and would be expected to yield a higher decision rateo All of the FAR data will be dependent on the false alarm criterion level. A larger criterion could well reverse the constraint finding, for example, since the random series probably produces larger decision movements than the constrained series. Summary of Results The response to a step change in probability can be described in three regions: The period before any response to the change, before the

-56-.06;~ 81P Constrained.05 goo ".04 Random.04 cD.03.02 CO.01 0.1.2.3.4.5.6 STEP SIZE Figure 17. False Alarm Rate in False Alarms Per Flash as a Function of Step Size and Constraint. This measure is made from convergence to the end of the subproblem..10.09 x 08 X.07.06'\ Constrained ^ 05 0.04.03 Random.02.01.5 1 2 4 8 FLASH RATE (FPS) Figure 18. False Alarm Rate in False Alarms Per Flash as a Function of Flash Rate and Constraint. This measure is made from convergence to the end of the sub-problem.

-37-.08.07 / A I -— Constrained I.0 Random.05.01 0.2.4.6.8 1.0 PROBABILITY Figure 19. False Alarm Rate in False Alarms Per Flash as a Function of Probability and Constraint. The measure is made from convergence to the end of the sub -problem.

-38point of detection; the period before the convergence on a new estimate; and the period from the convergence point to the end of the sub-problem. These regions were defined mathematically as functions of probability response form, and somewhat arbitrary constants in order to achieve a complete description of the response. Detection increases with increasing rate and decreases with increasing step sizeo The range was from 4 to 24 flashes for a rate range of 0.5 to 8 flashes per second and a step size range of 0.06 to 0.64. Detection was approximately nine flashes for a step of 0,532 at 1 fps. Convergence increases with both rate and step size. The range was from 11 to 35 flashes for the same step and rate ranges stated above. Convergence was approximately 15 flashes for a step of 0,32 at 1 fps. Both detection and convergence were independent of the constraint imposed on the generation of the flash series, Both were independent of probability. After the point of convergence the average estimate was unbiased at all probabilitieso This unbiased estimate had an RMS error, or standard deviation, of about O0o8. The overall task performance was measured by the RMS error over the whole sub-problem. RMSE increased linearly with rate from 0.135 at 1 fps to 0.180 at 8 fps.

IVo MATHEMATICAL MODELS Three mathematical models will be derived in this chapter, Two of these will be called normative in that the purpose for their derivation is to provide standards by which to compare the data presented in Chapter IIIo The third is a descriptive model designed to simulate the human performanceo The normative models are somewhat arbitrary in their form. They are optimized within this form, however, to provide a best RMS error fit to the various inputs used in the probability tracking task. One form selected is a constant weighted average over a finite number of past flashes. The simplicity of this model makes it ideal for intuitive comparisons with the subjects' performance. The number of flashes in the running average is selected to give the best fito The other model has geometrically decreasing weight for flashes extending into the pasto This model is more appealing from the standpoint of response to the step inputs. It also corresponds to assumptions often made concerning the human immediate memory function, The best fit is found by selecting the appropriate geometric ratioo More sophisticated linear, and certainly some non-linear, models would undoubtedly perform this task with a lower RMS error than the two models selected. The value of more complex models for providing simple standards is marginal, however, The descriptive model was derived from thoughts on how the task was performed by the subjects. Its form arises from the qualitative aspects of the data and from observations of the subjects' behavior. -59

-40It has four parameters which are adjusted to yield a minimum RMS error fit to a subject's responseo The normative models to be considered have the form i=N r(n) = Z Wi Sn-i+l (4.1) i=l where r(n) is the models response or output at the point n in the sample series and wi is a weight attached to the sample Sn-i+l The response at n is thus the weighted average of the sample at n and its N-l immediate predecessors. This is an averaging or smoothing model intuitively appropriate to this task. It is limited to samples at and prior to the response point, considering only a finite number of these, and is therefore physically realizableo wi is not a function of n and could be described as sample-invariant. The random variables sn are drawn from an infinite population and are independent. They have values 0 or 1 corresponding respectively to left and right on the subject's display. The probability of a 1 is P and the probability of a 0 is therefore 1-P. In the situation where the N samples are all generated from a static distribution described by the probability P it is desirable that the estimate be unbiased or that ) - P (4.2) where r7) is the expected value of r(n), an ensemble average. This simply requires that i=N Z w =1 (4.3) i=l

-41 The responses of the two model forms will be derived for a subproblem beginning with n = 1 as the first sample of the new probability and ending with n = Mo The previous probability will be Pi and the sub-problem probability P20 The step change is therefore P2 - Po1 For N < n < M the response will be called steady-state since the samples are all from a static distribution. For 1 < n < N the response will be called transient. A Model With Geometric Weighting The first model to be considered is one having a weighting function w = ari-1 (4.4) where a and r are constants and 0 < r < 1o This function assigns geometrically decreasing weights to the samples. Limiting r to the range 0 to 1, exclusive, confines the function to one assigning monotonically decreasing weights to samples receding from no The value of N, the number of samples included in one computation, will be selected as a number large enough to assure the relative unimportance of the weight at n = N, arN1 compared to the weight at n = 1, ao This merely implies that the function's memory extends smoothly to the point of essentially complete "forgetting". The exact value of N in any particular model of this form is relatively unimportant to the considerations that follow. It will simply be assumed that rN < 1 (4.5) and, all quantities of this magnitude will be dropped

-42Of primary interest is the selection of r to produce an optimum model in the least mean squared error sense. This particular measure of performance was the same one used as a measure of the subject's performance and used in the payoff scheme. The constant a is selected to satisfy Equation (4.3) which becomes i=N ~ ar-1 = a + ar + ar2 +.o + arN- = a r = 1 (4.6) 1 -r i=l1 or a = l-r, rN < 1 (4.7) We will be concerned with two quantities; r(n), the expected value of r(n) at the point n, and o2(n), the variance of r(n) at the point no These are ensemble averages. For the expected value of r we have i=N Fr() = E[ Z Wi Sn-i+l] (4.8) i=l and since the wi are constant over the ensemble i=N i=N -) = E Wi E(sni+l) = Wi (4.9) i=l i=1 For the step function input r) will depend on P1 and P2 during the transient phase and on P2 alone during the steadystate phase. For 1 < n < N we have r(n) = (l-r)(1-rn) P2 + [1 - (l-r)(l-rn)] P l-r l-r = (l-rn) P2 + rnP1 (4.10) = P2 - rn(P2 - P)

-43For N <n < M r —) - P2 (4.11) For the variance we have the variance of the sum of the wisn-i+l terms. Since the sn are independent we have the sum of the variances of the individual terms 2(2) 2(N-1) 2 2(n) = aas(n) + a2r2on) + a2Cn-) +.. + a r2(N1)(n-N+l) (4.12) 2 2 where as(n) is the variance of the sample sno Since as(n) = P(1-P) where P is the probability with which sn was generated, ca2(n) will be a constant for constant P and in particular it will have two values 2 2 during the transient phase, a1 = P1(l - P1) and a2 = P2(l - P2) For the transient phase we then have 2(n) = (1-r)2 (1-r2n) 2 + [-r (1-r2N) (l-r)2 (r2n)] a2 (l-r2) (l-r2) (1-r2) f(l-r) [a2 _ r2n(2 2)], rN < 1 (4.13) (l+r) 2 2 1 and for the steady-state phase 2(n) = (-r) 2 (4.14) r (l+r) 2 We can now proceed with the formulation of the models performance in terms of its mean squared error. During the transient phase the error can be written as e(n) = r(n) - P2 = [r-n) - P2] + [r-n) - r(n)] (4,15)

-44and the squared error is then 2 2 2 e (n) = [r(n)-P] + rn)-r(n)] + 2[r(n)-P2][r(n)-r(n)]. (4.16) The expected value of the squared error is then e2(n) = [r ) - P2]2 + 2(n) (4.17) since [r(n) - P2] is a constant for a particular n and E[r(n) - r(n)] is 0. The average value of this mean squared error over the transient phase of the sub-problem will be <e2>T, representing the average over the ensemble and also over samples in the sub-problem. We will then have n=N - n=N _ n=N <e2 = 1 Z e2(n) = N Z [(n)-P2]2 + n a(n. (4.18 N n=l N n =1 Nnl Using Equation (4.10) the first term on the right side of Equation (4.18) becomes n=N n=N 1 [Z E ) - P 1 Z [r n(P1 - P2 )]2 N =l N n=l (P P2)2 r2 rN < 1 (4.19) N 1 - r2 Using Equation (4.13) the second term on the right side of Equation (4,18) becomes n=N n=N 1 7 G2(n) 1 V (l-r) [2- r2n(2 - 2)] n=l N n=l n= (l+r) 2 (l-r) [Na2 -r2 2 2 ^ [Nu 2 (a2 oC)] (4.20) N(l+r) [ 2 1-r2 - (l-r) [ 2 2 (a2 )], rN < 1 (l+r) 2 N(l-r2) 2 1

-45The average, mean squared error during the transient phase, <e2>T is then < 2>T (P1-P2) r2 (1-r) 2 r2 - 2 (421) = 2 Io- 2 [ (o2 -o (4.21) N 1-r2 (1+r) N(1-r2) The average, mean squared error during the steady-state phase, <e2>SS, is simply the variance, a2(n), as given by Equation (4.14) <e2> = -- a2- 2 (4.22) (l+r) The average, mean squared error over the whole sub-problem is then <e2 P = N <e>T + M <e >SS M M (Pl-P2)2 r2 _ N (l-r) 2 r2 (2 2) + [- - r 2 ( ~2- 2)]1 M 1-r2 M (l+r) N(l-r2) + ( + (M-_N (l-r).2 M (l+r) 2 (Pl-p2)2 r2 -r) 2 22 2 2 (P1-P2 -r2 + -rl- [2 r2 2 - 12 ] (4.23) M (1-r2) (l+r) M(l-r2) We are interested in the performance of this model over the same types of problems given to the subjects. The average, mean squared error over a problem is given by - ii=S Mi (Pi-l Pi r2 2-r 2 2 <e >P z= li r (P1iC- r2 + LL Yi [2.i - a-i e>P = T Mi 1-r2 + l+r i Mi(l- ) i (4.24) where Mi is the length of sub-problem i, T is the total problem length in samples, and S is the total number of sub-problems.

-46This expression may be simplified by making the following assumptions based on the methods used for generating the problems. (Appendix B.) The Mi were selected randomly, without replacement, from a set of equally frequent values and assigned to the sub-problems. 2 2 The sum of (ai - ail)/Mi therefore approaches zero for long series of sub-problems. Similarly the term M. a2/T will approach simply ao/So Equation (4.24) can therefore be written as <e >P = E (Pi- - Pi )+ -r i (4.25) 1-r2 T il 1 l+ 1r S i=l 2 The large step problem had values of IP1 - P21 of 0.16, 0.32, 0O48, and 0.64 occurring in 12, 10, 8, and 6 sub-problems respectively, yielding i=S 2 1 i (Pi-l - Pi) = 0o00251 T i=l a2 had values of 0.250, 0.224, 0,148, and 0.020 occurring in 6, 12, 10, and 8 sub-problems respectively, yielding i=S 1 ~ a2 = 0.162 S i-l 1 T was 2241 samples and S was 36 sub-problems. The average, mean squared error over the large step problem is then <e2>Lrp = 0o00251 + 0.162 l-r (4o26) Lsp 1-r2 1+r The small step problem had values of |P1 - P2| of Oo06, 0.12, 0.18, and 0.24 occurring in 12, 10, 16, and 12 sub-problems respectively, yielding i=S 2 1 2 (Pi-l - Pi) = 0o000465 T i=l

-47a2 had values of 0.250, 0.246, 0o217, 0.192, 0.120, and 0.085 occurring in 6, 10, 12, 10, 6, and 6 sub-problems respectively, yielding i=S Z o2 = 0.194 S i=l 1 T was 3000 samples and S was 50 sub-problems. The average, mean squared error over the small step problem is then <e2> = 0oo00465 r2 + 0,194 l-r (4.27) SSP 1-r2 l+r It will be of interest for comparative purposes to evaluate this model for the case in which only one value of r is used for both the large and small step problems. This model will be called nondiscriminating in Chapter V. In this case the sums in Equation (4.25) are over both problem types with T being equal to 5241 samples and S being 86 sub-problems. We have i=S - (Pi-1 -Pi) = 0.00134 T i=l and i=S 1 Z 2 1 Y c = 0. 181 S i=l The average, mean squared error over the large plus the small step problems, is then > = 0.00134 + 0.181 l-r (4.28) S+L _-r2 l+r We are interested in the selection of an optimum value of r for these three problem types. Equation (4.25) can be written as --- r 2 1 -r (4.29l <e >p = k - + v l+r (4.29) 1 1-r' 1 +r

-48where k and v are the constants for the specific problem type. The minimum of this function over r can then be found by setting d <e2>p _ -2vr2 + (2k + 4v)r - 2v 0 (430) dr (1 - r2)2 which yields two roots + 2v [(k+ 2v)2 1/2 (431) r12 = -+[( -1] (4S31) 2v 2v The minus sign yields a value of r between 0 and 1 and is also the minimum. For the large step problem Equation (4.31) gives an optimum r = 0.883. Using this value of r in Equation (4.26) we have a corresponding minimum mean squared error or 0.0190, For the small step problem Equation (4.31) gives an optimum r = 0.953. Using this value of r in Equation (4.27) we have a corresponding minimum mean squared error of 0.00931. For the large plus small step problems Equation (4.31) gives an optimum r = 0.915. Using this value of r in Equation (4.28) we have a corresponding minimum mean squared error of 0.0147. A Model with Constant Weighting The second model to be considered is a model giving a constant weight to each of N samples, a simple averaging model. The derivation of the response and errors for this model will parallel that for the geometric model and some of the detailed explanations will be omitted. The weighting function is Wi = 1/N (4.32)

-49where N is the number of samples in the average and the weight is 1/N to satisfy Equation (4.3). In this case the transient response will be ) = n P2 + N P = P1 + 2 P n (4.33) N N N and the steady-state response rV) = P2. (4.34) The variance of r(n) during the transient phase will be the variance of the sum of N terms each with weight 1/N o2(n) N-n 2 + n 2 r N2 1 N2 2 1 [a2 + n (a2 - a2)] (4.35) N 1 N 2 1 The variance of r(n) during the steady-state phase will simply be a2 a2(n) =2 (4.36) r N Following the same procedures and arguments developed in the derivation of Equations (4.15) through (4.19) we have 1 n=N 2 n=N 2 I C 1 C P2 - P1 [PI+ -P2 i Z [r -) - P22 = 1 Z [P1 +'2 - n - P] N n=l N n=l N (P2 - P)2 n=N 2 _____. Z (n 2) N n=l N (P2 - 1) (r 1 + 6) (4.37) 3 2= N-v

-50and analogous to Equation (4.20) we have n=N n=N 1.Z a(n) = 1_ [a2 + n (2- a2) N n=l N2 n=l 1 N 2 1 1 2 - N+l C2_ 2 (4.38) N 1 2N2 (38) The average, mean squared error during the transient phase is then <e2 _ Pi)2 (1 - 1 41 ) + 1 2. + N+1 2 2) (4.39) 3 2N 6N2 N 2N2 ( 2 The average,mean squared error during the steady-state phase is the variance given by Equation (4.36). _ ^e2 e Ss-s ~2 (4.40) The average, mean squared error over the whole sub-problem is then -- M-N 2 —<e2>SP = N <e2>T + - <e2>Ss M M -N (p p2)2 1 1 1 l 2 2 - (4.41) - — + _ + (2 - cl+- (4. 41) M 3( 6N 2MN N As with the geometric weighted model we are interested in the performance of this model over the problems given to the subjects. The average, mean squared error over a problem is given by i-=S 2 2 <e >P - f - 6 + Ml yi - y-1) + i el T M_ M i - + N6 2MiN i 2i 1+i (4.42) With the same arguments leading to Equation (4.25) we have < > i ( =S i =S e2- 1 2 1 (4.43) T 3 2 7N) i=l N1 Ns 2

-51These sums are the same as those calculated for the geometric weighted model. For the large step problem we have <e2>LSP = 0,00251 (N- + ) + 0o162 1 (4.44) 3 2 6N N For the small step problem e2>SSP = 0,000465 (N _ 1 + 0.194 1 (4,45) 3 2 T N And for the large plus small step problems <e2>L+S = 0 00134 ( 1 +.1) + 81 L (4.46) 3 2 +N N The minima can again be selected by letting k and v be the constants for the particular problem type. <e>p = k (N + 1 + (4.47) 3 2 6N N and solving d <e2>p k k v 0 (4. 48) dN 3 6N2 N2 yielding N = 3v1/2 N [- 3+ 1 (4.49) 2 k For the large step problem Equation (4,49) gives an optimum N = 14olo Using this value of N in Equation (4.44) we have a corresponding minimum, mean squared error of 0,0220, For the small step problem Equation (4,49) gives an optimum N = 35.3. Using this value of N in Equation (4.45) we have a corresponding minimum, mean squared error of 0,0107o

-52For the large plus small step problems Equation (4.49) gives an optimum N = 20.1. Using this value of N in Equation (4.46) we have a corresponding minimum, mean squared error or 0.0172. A Descriptive Model Upon inspection of the subjects' response data it is evident that the estimation task was not performed in the smooth manner of the two normative models. The responses were characterized by rapid adjustments separated by periods of little or no movement. This evidence, together with thoughts on how this task might be performed, led to the postulation of the following model as an attempt to describe the human performance. This model operates in the following manner: The subject maintains a short, running average of the previous k3 flashes. This average is of exactly the same type as the second normative model discussed above. At each flash this average is compared with the existing setting of the response lever and the difference noted, If this distance measure is greater than a prescribed criterion level, the response is changed to a new value at some point intermediate between the old response and the running average. If the difference is less than the criterion level, the response remains unchanged. This model has several features making it attractive from a descriptive standpoint. It uses the lever as a memory device, moving it only a fraction of the distance to the new average and thus preserving some of the information in the previous setting. This memory function permits a smaller number of flash in the running average than would

-53otherwise be required to produce the levels of mean squared error measured from the subject's responses. The criterion level corresponds to the concept of the subjects' smallest perceptable difference between the running average and the lever position. It permits the response to remain stationary through periods of small deviation of the running average from the responseo This model's operation can be thought of as a form of hypothesis testing. At each flash it is testing the hypothesis that the running average is from a population described by the response lever setting, using the criterion level as a form of significance measure. The subject's performance is thus viewed as a succession of decision making situations. This framework is appropriate with the inclusion of more higher mental processes than are in the usual manual tracking task. This model can be described mathematically as follows: 1 i=k3 u(n) = -- Sn-i+l (4 50) k3 i=l where u(n) is the running average of k3 flashes, sn. With r(n) as the current lever setting if Ir(n) - u(n)| < k1 (4.51) where k1 is the criterion band, then r(n+l) = r(n). (4.52) If, however, |r(n) - u(n)| > k, (4.53)

-54then r(n+l) = r(n) + k2 [u(n) - r(n)] (4.54) where k2 is the fractional lever adjustment. A fourth parameter, k4, was also considered, representing a time (flash) shift between the subject's and the model's responses. The subject's response at n was compared to the model's at n - k4. The four parameters are constrained to the following ranges: 0 < kl < 1 (4.55) where 0 yields adjustment decisions at each sample and 1 would yield no adjustment decisions. 0 < k2 < 1 (4.56) where 0 yields no response changes and 1 represents simply the following of the running average whenever an adjustment decision is made: 1 < k3 < K, k3 an integer (4.57) where K is some reasonable maximum number of flashes that the subject could be expected to assimilate in one averaging calculation, No definite values for K are known for this task. It is certainly reasonable to assume that the flashes are not simply remembered as a succession of binary symbols but are encoded into a larger symbol set, perhaps one depending on the lengths of runs of one of the binary symbols. Considering the natue of the task and its difficulty it would seem unreasonable that more than 20 flashes could be used in an averaging calculation and that a value closer to 10 would be more appropriate.

-55There are no particular constraints on k4 except that k4 < 0 implies subject prediction with respect to the model. Having thus chosen the model form the task is now to select parameter sets (kl, k2, k3, k4) which will make the model best describe the human performance. The criterion used for this selection was the minimization of the mean squared error between the subject's and the model's responses over particular problems. This measure was selected as providing the best single measure of performance, as was discussed in Chapter IIo The selection of the minimum mean squared error for the criterion assures a fairly close fit to the transient portion of the response where the error is large, at the possible expense of fit to the steady-state portion. The actual minimization process was carried out as follows. The model was programmed on an IBM 709 computer. The computer was then fed four of the input problems used in the experiment plus the responses of one of the four subjects to these problems. At each sample point the squared difference between the subject's and the model's responses was calculated and accumulated. These values were simply printed out at the end of each particular problem-parameter set combination. The large number of parameter sets possible and the possibility of numerous minima precluded the possibility of an automatic searching technique for the minima. Several computer runs were made in which previously selected parameter ranges were either extended or filled in according to the results of the previous run. The total variation of the parameters was through the following ranges.

-560.02 < kl < 0,20 (6 values) Oo10 < k2 < 0.90 (7 values) 1 < k3 < 28 (12 values) -2 < k4 < 4 (6 values) The four problems investigated were the large and small step problems, random constraint, at 1 and 4 fps. The subject was S-2. Several parameter sets with approximately equal minimum error measures were found for each problem type. In each case these minima represented either a valley in the error function or fairly distinct minima separated by regions of higher erroro Table I shows the various 2 parameter sets and their corresponding minimum errors, <eMS>po In each group of parameter sets various trade offs can be seen among the parameters yielding the approximately equal error measures. The following method was devised as a means for selecting the best descriptive model from among these parameter sets with approximately equal <eM>p The subject's error, eS(n), can be written as es(n) = eM(n) + eMS(n) (4.58) where eM(n) is the model's error and eMS(n) is the error between the subject and the model, Squaring this error we have 2 2 2 es(n) = eM(n) + eMS(n) + 2eM(n) eMS(n) o (4.59) The average value of this squared error over a particular problem is then 2 2 2 <es~P = KeM>p + KeMS>P + K2eMeMS>P (4.60)

-57The minimization process used to select the parameter sets was concerned 2 with finding minimum values of <eMS>. The computer also calculated values for <e>p, the model's error. <e>p was, of course, one of the measures made on the subject's performance. The term <2eMeMS>p can therefore be calculated from Equation (4.60). These error terms can be interpreted in the following manner. Consider the subject's error at any point in the sample sequence to be composed of two components; one dependent in some manner on the actual input samples and the remainder on other phenomena not related to the input. The first part of this error might be termed coherent and the remainder noise. Consider now a descriptive model and its relationship to these two error measures. If it performs the task in exactly the same manner as the subjects, it will have an error, <eM>p, which is equal to the subject's coherent, or sample dependent, error and the error between this model and the subject, <eMS>P would be equal to the subject's noise or sample independent error. The subjects' noise can be considered to be random fluctuations in the response about the coherent value. Since <eM>p approaches zero over a large set of sub-problems then <eMeMS>P will also approach zero. If, on the other hand, the model does not represent the entire coherent part of the subject's response, that is if it is not a complete descriptor of the subject's coherent behavior, then eMS(n) will be partially dependent on the sample series and therefore correlated with eM(n)o In this case the term <2eMeMS>p will not approach zero. This correlation can therefore be used as an additional selection device. It

-58can be written in the normalized form <eMeMS>p p = <eMeMS>P (4.61) (<e2>p <e2S>p)l/2 Table I shows the values of <eM p, <2eMeMS>p, and po The normalized correlation, p, provides a measure giving good discrimination among the parameter sets for the large step problems. For the large step problem at 1 fps parameter set 2 has a value of p which is essentially zero. At 4 fps parameter set 4 has a very low value for po Neither of the small step problems, however, produce a correlation which discriminates among the parameter sets or which is as small as that found for the large step problem. It would appear on the basis of this evidence that the postulated descriptive model better represents the subject's performance on the large than on the small step problems. Zero correlation, as defined by Equation (4.61), does not necessarily imply a complete lack of dependence of eMS(n) on the sample series. Two hypotheses could be used to explain the fairly large <eMS>p which remained even for p - 0. One would simply be that this level of noise did exist in the subjects' performance. The other would be that this "noise" component had at least some portion which was related to the sample series but which was uncorrelated with eM(n). Perhaps one reason for a fairly large noise component would be variations in the subjects' method of performing the task during the problem run.

TABLE I PARAMETER SETS FOR THE DESCRIPTIVE MODEL YIELDING MINIMUM VALUES OF <e2MS>p Criterion Fractional Memory Lag Adjustment k k2 k <e MS>p <e M>p <2e MS>P Large Step, 1 fps 1 0 o05 0,20 8 0.0104,0185.0030,lo9 2. 0,12 0o60 12 0.0121.0198.0001.003 Small Step, 1 fps 1. 0,05 0.10 6 0,0076.0130 -.0066 -.332 2, 0o05 0,20 12 0 o0079.0140 -.0079 -o375 3. 0,08 0.20 10 0.0083.0130 -.0073 -o351 4. 0.12 0.20 8 0.0086.0130 -.0076 -.359 5. 0,40 10 0 0 0092.0120 -,0072 -,343 Large Step, 4 fps 1. 0.05 0.10 8 2.0098.0240.0062.209 2. 0.05 0.10 12 0.0098.0210.0092.321 3. 0o10 0.10 8 2 0106.0240.0054.170 4o 0.15 0o10 8 2.0117.0280.0003.008 5, 0.15 0.30 12 2.0121.0300 -.0021.060 Small Step, 4 fps 1. 0.05 0.10 16 2,0102.0140 -.0102 -.426 2. 0.05 0.10 24 0.0102.0160 -.0122 -.480 3. 0.10 0.10 16 1.0102.0140 -.0102 -.426 4. 0.10 0.30 24 0.0106.0140 -.0106 -.418 5. 0.10 0.50 24 1 o0106.0150 -..460

-6oA Normative Variation The descriptive model discussed above was constructed as an approximation to the human performance on this task, It is interesting, however, to see how well this model form can do if the parameters 2 are selected to give a minimum <eM>p to be normative in the same sense as the models with geometric and constant weighting, Normative parameter sets for the large and small step problems were found using the computer to calculate <e->p and converging on the minimum value by successive selection of the parameter sets as in the selection of 2 the minimum <eMSp for the descriptive models. k4 was set equal to zero for this selection, The large step problem yielded one distinct and interesting minimum; k1 = 0002, k2 = 0.10, and k3 = 1, All three of these parameters are the smallest values examined and this minimum is in one corner of theerror surface, This model would operate as follows: with k3 = 1, the running average would have values of either 0 or 1, depending on the most recent sample, with k2 = 0.02 there would be a response adjustment at every sample except when the response was within 0.02 of either 0 or 1o This adjustment would be 0,10 of the distance between the previous response and 0 or 1 The root mean squared error for this model was 0,O124, The best normative parameter set for the small step problem was found to be: k1 = 0o20, k2 = 010, and k3 = 60 Again we have the minimum occurring at the smallest value of k2 but in this case the criterion for changing the response is fairly high and we have six samples in the memory. The root mean squared error for this model was 00990

Both the geometric and constant weighted models are included as special cases of this descriptive model. With k3 = 1 and k1 = 0 the descriptive model is identical with the geometric weighted model with r = 1 - k2o With kl = 0 and k2 = 1 the model is identical with the constant weighted model with N = k3. The best normative parameter set for the large step problem deviates from the simple geometric form only when the response is within 0.02 of either 0 or 1o The best normative parameter set for the small step problem does not yield as low an error as either the optimum geometric or constant weighted modelso The equivalent parameter sets for these models were outside the parameter range investigated, however. It would seem that this decision model, within its restricted parameter sets, represents a reasonable method for performing this task when the step changes are large, but not when they are small. It is interesting in this light to note that the decision model did not seem to describe the subject's performance on the small step problem as well as on the large step problem. Figures 20 and 21 show a small representative portion of three responses to the same input sample sequence. The normative model at the top is the parameter set selected above as the best normative set for the decision model. Note the rapid response changes of the large step model. The center response is that of two of the descriptive parameter sets and the lower response, the subject's. The descriptive parameter set for the large step problem is the one with the low value of p. The set for the small step problem was selected somewhat arbitrarily as one of the five sets that seemed like a reasonable description. The fairly high coherent subject's error is clearly evident in these figures.

-621 Input Probability *5:~ X ~esponse I A/ I Normative f esponse Model -.5- J. ~ I Descriptive0;, Model.5 Subjects Response 20 40 60 80 100 120 140 160 180 200 220 240 NO. OF FLASHES Figure 20. Responses of Two Mathematical Models and a Subject to a Portion of a Large Step Problem, Random Constraint, at 1 FPS. Normative Model K, = 0.02, K2 = 0.10, K3 = 1; Descriptive Model K1 = 0.12, K2 = o.60, K3 = 12, K4 = 0. Input Probability Normative Model.5 Response 0, I I, I,, I I I, NO.OF FLASHES R 1 aPS. Norma{ Descriptive Model l:~~~~ ~~Subjects Response.51 " 20 40 60 80 100 120 140 160 180 200 220 240 NO. OF FLASHES Figure 21. Responses of Two Mathematical Models and a Subject to a Portion of a Small Step Problem, Random Constraint, at 1 FPS. Normative Model K1 = 0.20, K2 = 0.10, K3 = 6; Descriptive Model K^ = 0.15, K2 = 0.40, K3 = 10, K4 = 0.

Vo DISCUSSION The response measures presented in Chapter III do not provide a direct answer to the question of how good is the performance. Quantitative standards are necessary for the measures. Mean error is an exception in that a standard of zero is reasonable and was in fact achieved. The normative models derived in Chapter IV provide the standards for the other measures. They permit a comparison to be made between the subjects' performance and that of several simple machines. Several important differences exist between the subjects' and the models' knowledge of the task. The subjects were not instructed on the step-function nature of the input. In fact they were specifically told to expect slow, continuous changes in the probability. The models, on the other hand, were optimized for step input functions. It is reasonable to assume, however, that the subjects' original ignorant and misinformed state did not persist for long after the tracking began. The rapid performance asymptote (less than 45 minutes) and the discrimination between the small and large step problems attest to this. The model does not have learning and adaptive abilities, of course, and it was therefore given the maximum knowledge that the subjects could theoretically derive from the task. The model-subject comparison thus includes the subjects' learning and adaptive abilitieso This method of subject instruction will allow more valid generalization of the measured estimation ability to other input forms. The same situation exists in the relative knowledge of the input statistics possessed by the subjects and the models. The models were completely informed of the distributions of step size, step direction, -653

-64sub-problem length, and probability. The subjects knew nothing of these initially. Again, however, it can be assumed that the subjects learned a considerable amount about these distributions while performing. The adaptation to the small and large step problems is an example of the subjects distinguishing between two distributions of step size. The models were provided with a definite criterion for optimum performance; the minimum mean squared error. The subjects were instructed to use the same criterion. The actual criteria used by the subjects, however, corresponds to their conception of best performance and is a function of the instructions, of performing the task, and of personal abilities and sensitivities, The comparison between the subjects and the models will be made using the following measures: detection, D; convergence, C; root mean squared error, RMSE; and root mean squared error after convergence, RMSE,C. These four measures, plus mean error, provide a fairly complete description of the performance. Detection and convergence were calculated using Equations 4.10 and 4.33. These measures are for the expected values of the responses and are not the expected values of detection and convergence. The difference is not important of this comparison. The root mean squared error was calculated using Equation 4.23 and 4.41. The root mean squared error after convergence was calculated using equations 42 and 4.4 with the addition 4.of and 4. correction for the small error contributed by the remaining transient after convergence. This transient error was calculated using Equation 4.19 and 4.370

-65Two values of RMSEC were calculated, one with an infinite sample population as implied in Chapter IV and one with finite populations such as those used in the experiment. The correction factor for the finite populations is derived in Appendix C, It was calculated using an average number of flashes for the sub-problems and corresponds to the random problem type. Two specific step sizes were selected for the comparison: 0.40 for the large step problem and 0.15 for the small step problem. These are approximately the average step sizes in their respective problems. These step changes are examined at a flash rate of 1 fps and with the random problem type. Three forms of each normative model are used. Two of these correspond to the optimum models selected for the large and small step problems considered separately, They are called "discriminating" models. The third model is "nondiscriminating" in that it is required to be optimum over both the small and large step problems simultaneously. The parameters for these models were calculated in Chapter IV. The non-discriminating models represent the only case where the models are not provided the complete statistical information. The models' and subjects' performance measures are shown in Table II, Also included are values of detection and convergence for the descriptive models used in Figures 20 and 21, These measures are averages over a set of sub-problems with average step sizes of approximately 0.40 and 0.150 RMSE and RMSE,C were not available from the descriptive model's data. The response speed of the nondiscriminating model lies between the rapid response of the large step discriminating models and the smoothing

-66responses of the small step discriminating models. The discriminating models have lower values of RMSE, of course, since this was the optimization criterion. The nondiscriminating models are between the two discriminating models in RMSE.C, having lower values for the large step problem and higher values for the small step problem, The subject-model comparison shows a striking difference in the detection values for the large step problem. The normative models have detection values considerably smaller than the subjects. This results, to a large extent, from the difference between the models' smooth response and the decision, criterion testing, nature of the subjects' response hypothesized in Chapter IVo The normative models begin to respond to the step change with the first flash of the new probability. The subjects require a number of flashes to perceive a significant probability change and the necessity of a response change, The large step descriptive model has a comparable detection value with the subjects'. On the small step problem the subjects' detection values is higher than any of the models' although it is comparable to the discriminating model, In convergence, however, the subjects performed comparably to the models, On the large step problem only the discriminating, constant model has a smaller value. On the small step problem the subjects' convergence value lies between those of the discriminating and nondiscriminating models, The hypothesis that the subjects were adapting to the difference between the small and large step problems receives support from the convergence comparisons. The nondiscriminating models show a considerable decrease in convergence from the large to the small step problems, The

discriminating models show an increase in convergence fom the large to the small step problem as they change to a smoother response form. The subjects showed a similar slight increase in convergence from the large to the small step problems. The subjects' delayed detection with comparable convergence illustrates the discontinuous nature of their behavioro Although unable, or unwilling, to indicate the presence of a change in the probability for the first seven to twelve flashes, they were then able, however, to converge on the new probability in five to seven more flasheso The subjects were slightly higher than the models in RMSEo This is to a considerable extent the result of the subjects' pre-detection period where the errors were largeo The subjects compare favorably in RMSE,C with the infinite population values but are poorer than all but one of the finite population valueso The introduction of the finite population correction caused an appreciable drop in RMSEC, particularly for the models with long averages. It appears then, by comparison, that the subjects were not fully utilizing the series constraint. The subjects' RMSEC dropped on the average only about Oo014 from the random problem type with an average population of close to 60 to the constrained problem type where the population was only 170 In comparison with these models the subjects seem fairly adept at converging on a new probability after they decided a change had occured. This aspect of the task may well have received the most attention. Concentration on this would lead to increased RMSE,C due to false decisions during the static portion of the sub-problemo This represents a deviation from the explicit instructions,

-68Conclusion The human performance on this task was considerably better than had been expected. Two features distinguish this task from other investigations of probability estimation. One is the dynamic set under which the subjects were performingo This expectancy for changing probabilities was probably induced primarily by the subjects' actual experience in estimating the dynamic probabilities. The change in behavior from the large to the small step problems could be viewed as a partial loss of this dynamic set. The second distinguishing feature was the display and response mechanisms. The particular arrangement of lights, scale, and lever probably had a high stimulus-response compatabilityo It seems unlikely that probability estimation is, or at least need be, the limiting factor in human binary decision making. Furthermore, it is reasonable to inquire into probability estimation as a possible useful function of man in future man-machine systems involving information from uncertain or probabilistic sources,

-69TABLE II COMPARISON BETWEEN THE PERFORMANCES OF THE SUBJECTS AND THE MATHEMATICAL MODELS Detection Convergence RISE R1iSE C (Flashes) (Flashes) (Probability) Large Step Infinite Finite Problem Population Population (step = 0.40) Geometric r = 0o883 1.1 16.7 0.143 0.101 0.087 * r = 0.915 1.5 23.4 0,148 0.086 0,067 Constant N = 14 1.8 12,2 0.155 0,108 0.093 * N = 20 2.5 1705 00161 0o090 0.072 Subjects (lfps) 7.5 15,0 0.170 o.091 Descriptive Model 4,2 14o2 Small Step Problem (step = 0,15) Geometric r 0,953 8.4 22,8 94 0.073 0.042 * r = Q0915 4,6 12o4 0.103 0.094 0.073 Constant N = 35 11.7 23.3 0.100 0,076 Oo048 N = 20 6.7 1.303 0.110 0.099 0.079 Subjects (1fps) 12.5 17.5 0.112 0.095 Descriptive Model 7.2 21. o0 r = 0.915 and N = 20 are the nondiscriminating models.

APPENDIX A PROBABILITY TRACKING CONSOLE I ~~~~~~ 52" ~~~~~~~~~6~00 52" 4.5t? 381? Neon No. 51 Bulbs Illuminated Scale a & Pointer, 50 Divs. 7"1 ____|__________ 10 Main, Marked 7 ^~~~~~~~ I0, 10, 20,..., 90, 100 Digital Encoder & Potentiometer Shaft, Positioning Pointer The duration of the flash was approximately 0.020 seconds. The intensity was adjusted to provide a clear indication without glare. The room illumination was low.

APPENDIX B INPUT PROBABILITY GENERATION The input step sequence was generated by exhausting the step changes systematically using the tables shown below, The generation procedure was as follows: All problems were started at P = 0.50, the row identified as "probability from," 050. The table entries are step sizes, those to the right of the diagonal being positive and to the left negative. One of the step changes in the 0O50 row was selected at random. This step selection led to a new probability, "probability too" This new probability was in turn selected in the "probability from" list and a step change from it selected randomly. The step selections were made without replacement. This procedure was continued until the entire table was exhausted. It was actually necessary to constrain the random selection at times in order to exhaust the table without repeating steps. This selection method gave a "problem" with exactly one step of each size and direction to each probability. The large step problem and the small step problem were produced by separate tables. The number of flashes at each probability was selected randomly from the set; 42, 54, 66, and 78 for the small step problems, and 35, 51, 74, and 89 for the large step problems. For the constrained problems, both large and small step, the values were multiples of 17; 34, 51, 68, and 85~ Five problems were generated from each table, one for each rate. The same series of steps was used for the random and constrained problem types, -71

-72Large Step Problem Probability To.02.18.34.50.66.82.98 Probability.02 -.16.32.48.64 From.18.16 -.16.32.48.64 -.34.32.16 -.16.32.48.64.50.48.32.16 -.16.32.48 Steps + and -.66.64.48.32.16 -.16.32.16,.32,.48,.82 -.64.48.32.16 -.16 and.64.98 -.64.48.32.16 Small Step Problem Probability To.08.14.26.32.44.50.56.68.74.86.92.08 -.o6.18.24 - - - - -.14.o6 -.12.18 - -.26.18.12 -.06.18.24.32.24.18.06 -.12.18.24.44 - -.18.12 -.06.12.24.50 - -.24.18.06 -.06.18.24.56 - - -.24.12.06 -.12.18 - -.68 - - - -.24.18.12 -.06.18.24.74 - - - - -.24.18.06 -.12.18.86 - - - - - - -.18.12 -.06.92 - - - - -.24.18.o6 Step + and -.06,.12,.18,.24

APPENDIX C VARIANCES OF SAMPLE AVERAGES FROM FINITE POPULATIONS Consider a population Yi with mean Y and with M members. Let N samples xi be drawn from yi with i=N i=N x = Wixi where = 1 i=l i=l The variance of x is i=N E[(x -Y)2]E { Ly wi -X Y i=lx i=l i=l i=N i=N = E { wi(xi - Y E Wj- j _ i=N i=N i=l j=l i=N j=N = y y wiwj E [(xi - Y)(xj - Y)] i=l j=l This expression contains terms which can be written as k=M E(xi -) = ( Y) k=l and ~an~d k=M =M k=M E(xi - )(x -) M(M-1) [I (Yk -)(Y- -) - (Yk=l ~=1 k=l k=M -M(M-1 7 (Yk - i j k=l -73

The variance then becomes i=N k=M E[(x-Y)2] = wi (Yk- Y) i=l k=l i=N j=N i=N k=M M(M-1) lj i i [L (k - ) M(-) 1 j 1 i=l k=l i=N k=M 1_____ 2 M wi - 1 M(M-1) k = )[MI -2 _1] Z7 (Yk ) i=l k=l For a sub-problem with Pi = Y and of length Mi we have k=Mi 2 1 -2 i M. (Yk - k=l and we can then write i=N i=l i-l For the geometric weighted model with wi = ar r2(n) = [(i)(lr) M-l_ i M1i - Mi-l i For the constant weighted model with wi = - 1 N 2 M.( 1 2 r (n) Mi-1)(IN) Mi-1 ] i

APPENDIX D EXPERIMENTAL PRESENTATION ORDER The problems were presented to the subjects in the following order. Where L, S = large and small step problems R, C = random and constrained problems 1, 2,..., 5 after R or C = the particular problem Part 1, 2, 3, 4 = divisions of a particular problem Session Problem Part Rate (fps) 1 LR1 2 LC1 1 1 2 SC1 4 SR1 8 LC2 1 o05 3 LR2 1 1 SR2 2 4 LR3 1 0.5 LR2 2 1 LC5 8 5 SR3 4 LC2 2 0o5 LR5 4 6 SC2 1 1 SR4 1 1 7 SC3 1 0.5 SR5 1 0.5 8 LC3 2 SR5 2 0.5 9 SC3 2 0,5 LC4 4 LR4 8 10 SR5 3 0 5 SR4 2 1 11 LC2 O 0,5 SC4 8 SC53 3 o5 -75

-76Session Problem Part Rate (fps) 12 SR5 4 05 SC5 2 13 LC2 4 0.5 LR3 2 0~5 LR5 3 0.5 14 SC3 4 0.5 LC1 2 1 15 LR3 4 0.5 SC2 2 1

APPENDIX E INSTRUCTIONS The following formal instructions were used. The instruction method is discussed in Chapter IIo "This experiment is concerned with your ability to estimate probabilities and to follow changes that occur in them as time passes. You will see a display of two lights, a left and a right light. At each flash one or the other of the lights will light, indicating right or lefto This is exactly analogous to the drawing at regular intervals of red and green balls from a jar. You will be asked to estimate, by setting a dial, your best guess as to the percentage of balls that are right. The dial is calibrated from 0 to 100 representing no right to all right flashes. For example, if you think that about 68% of the flashes are right then set the dial at 68. The actual percentages cover the entire range from 0 to 100 and have all values in between. The percentages do not necessarily fall on the dial markingso "The important new work to come out of this experiment is your ability to notice changes in the percentages and to follow the changing percentage with the dial settingo The analogy with the balls in the jar is the case where one or the other color is being taken out of the jar by another person without your knowledge. At times the percentage will change slowly in a continuous fashiono At other times the percentage will change suddenly, as though a whole handful of one color had been removedo If you are uncertain as to the percentage set the dial at 50. "You will be paid according to how well you do. At the end of each problem, 10 to 25 minutes, you will be able to read the amount of money off the meter on the computer. The computer calculates the difference between your estimate and the actual. probability, the error, and accumulates this error over the problem. It also adds up a constant amount of money per minuteo You are paid the difference. The computer is adjusted so that if you left the lever at 50 you would get no money. "You will wear a pair of earphones and have a microphoneo A low' seashore' type noise will be fed inbo the earphones in order to mask out noises from the street and the laboratoryo When I talk to you the noise will be removed. You can be heard at all times through your microphone. You are welcome to make verbal comments during the experiment. These are not being recorded and any sort of language is acceptable." -77

APPENDIX F From Tape 11 1 1 64 —-- 32 (Error) 5 J(Erro 2 - -- W- (.030).1 - 100 o ) (.563) |- r;2 Manualr Payoff K )2 Control 100 0 (.005) O F ANALOG COMPUTER CIRCUIT FOR PAYOFF

APPENDIX G TWO QUALITATIVE RESPONSE EXCEPTIONS On two occasions during approximately 70 hours of tracking the tracking response was qualitatively variant from the norm. These two situations lasted for a total of approximately 35 minutes. The first occurred during the pilot experiment. During one particular problem in the third session a subject was accumulating error at a much higher rate than in any of the previous sessions or problems. Upon inspection of her records it was noted that detection was considerably higher than it had been before. The instructions concerning the error formation and the payoff were repeated with special emphasis on the rapid error build up with large discrepancies between probability and response. Her response returned to normal on the next problem. The hypothesis here is that she was computing the new probability to a high degree of accuracy before she responded to the change. The "normal" response produces movement toward the new probability as soon as it is perceived, with further refinements as more data, flashes, are accumulated. The second anomaly occurred in the response of a subject in his 12th session of the main experiment. He was tracking a large step problem at 2 fps. The experimenter noted that the payoff was going negative; the error accumulation was faster than the pay accumulation. Upon examining the records it was established that for about the first 3/4 of the problem, about 15 minutes, the response was the mirror image of the proper or normal responseo The scale was reversed in relation to the light flashes. -79

-8o0 A check on the equipment failed to reveal any malfunction. Upon questioning after the session the subject stated that he was a bit mixed up at times. He evidently had no idea that he was doing a fairly good job of mirror image tracking. He was given this particular problem again in a special 16th session and this second run was used in the analysis.

APPENDIX H DATA NOT AVERAGED OVER SUBJECTS Figures 22 through 28 show some of the principal variable interactions for individual subjects. The data for detection and convergence show appreciable magnitude variations between subjects but maintain the same qualitative relationships as regards direction of change and the distinction between the small and the large step problems. Subject S-1 was quite consistently slower in his response than the other three. All subjects show a similar increase in RMSE with rate from 1 to 8 fps. Subject S-3 is consistently higher. All subjects show a decrease in RMSE,C from 0.5 to 2 fps. Subjects S-2 and S-4 show a continued decrease at 4 and 8 fps whereas S-3 and S-1 show an increase. FAR decreases with rate for three subjects; subject S-1 having little variation by comparison. Mean error did not vary significantly among subjects. -81

25 S1 L20 Small Step Problem S3 S2 Large Step Problem 4- S2 15 10 S1 S3 S2 ~~~~~~~~5 ~~S4 I I I I I I 0.1.2.3.4.5.6.7 Step Size Figure 22. Detection as a Function of Step Size for Four Subjects. Detection is measured in flashes.

25 20 I I ft.0 0",mesued n lahe 5 ~ 0 0~~~ 0.5~~~~ FLASH RATE FpS8 Figure 2*. Detection as a Funct^ - measured in flashes. Io fFas aefr Four Subjects - Dete -oni

30 Small Step Problem 25 S1 Large Step Problem S - S3 S2 ~S~ ~ S4^ ^~~~S3 20 -IS4 S2 S4, CD4 10 5I I I I I I 0.1.2.3.4.5.6.7 Step Size Figure 24. Convergence as a Function of Step Size for Four Subjects. Convergence is measured in flashes.

-8535 S1 25 20 15 10 - 8 5 0.5 1.0 2.0 4.0 8.0 FLASH RATE FPS Figure 25. Convergence as a Function of Flash Rate for Four Subjects. Convergence is measured in flashes.

.20 S3.19 S1.18 S4.14.13 -.12 - 0 0.5 10 2.0 4.0 8.0 FLASH RATE (FPS) Figure 26. Root Mean Square Error as a Function of Flash Rate for Four Subjects.

.11- -.05 - - S3.104.09.08.0.04 -.03 -.02 -.01 - 0 0.5 1.0 2.0 4.0 8.0 FLASH RATE FPS Figure 27. Root Mean Square Error After Convergence as a Function of Flash Rate for Four Subjects.

.10 1...09.08.07.06.05.04.03.02.01 I I I I I 0 0.5 1.0 2.0 4.0 8.0 FLASH RATE FPS Figure 28. False Alarm Rate, in False Alarms Per Flash, as a Function of Flash Rate for Four Subjects

REFERENCES lo Cohen, J. and Dinnerstein, Ao J. Flash Rate as a Visual Coding Dimension for Information. AF WADC TR 57-64, 1958. 2. Conrad, R, and Hille, B, A, "The Decay Theory of Immediate Memory and Paced Recall." Canadian Jo Psycholo, 12 (1958) 1-6. 3. Dember, Wo No "The Relation of Decision-Time to Stimulus Similarity. J. Expo PsycholO, 53 (1957) 68-72. 4. Edwards, W. "The Theory of Decision Making." Psychol. Rev., 51 (1954) 380-417. 5. Edwards, Wo "Behavioral Decision Theory." Annu. Rev. Psychol., 12 (1961) 473-498. 6. Edwards, W. "Probability Learning in 1000 Trials." J. Exp. Psychol., 62 (1961) 385-394. 7o Erlick, D. E. Judgments of the Relative Frequency of Sequential Binary Events. Aerospace Medical Labs. WADC TR 59-580, 1959. 8o Flood, M. M. "Environmental Non-Stationarity in a Sequential Decision Making Experiment." In R. M. Thrall, C. H. Coombs, and R. L. Davis (Edso), Decision Processes. New York: Wiley, 1954. 9, Forsyth, D. M. and Chapanis, Ao "Counting Repeated Light Flashes as a Function of Their Number, Rate of Presentation, and Retinal Location Stimulated." J. Exp. Psychol., 56 (1958) 385-391. 10o Gardner, Ro A, Perception of Relative Frequency as a Function of the Number of Response Categories. AMRL (1959) 408. 11. Goodnow, J. Jo and Pettigrew, To F. "Effect of Prior Patterns of Experience Upon Strategies and Learning Sets." J. Exp. Psychol., 49 (1955) 381-389. 12. Grant, Do A. "Information Theory and the Discrimination of Sequences in Stimulus Events " In B. McMillan (Ed.), Current Trends in Information Theory. Pittsburgh: University of Pittsburgh Press, 1954. 13. Hake, Ho W. "The Perception of Frequency of Occurrence and the Development of'Expectancy' in Human Experimental Subjects." In H. Quastler (Edo), Information Theory in Psychology. Glencoe, Illinois: The Free Press, 1954. 14. Hake, H. W. and Hyman, R. "Perception of the Statistical Structure of a Random Series of Binary Symbols." J. Exp. Psychol., 45, (1953) 64-74. -89

15, Hornseth, J. P. and Grant, Do Ao The Discrimination of Random Series of Stimulus Frequencies as a Function of Their Relative and Absolute Values, AFPTRC Research Bulletin TR-54-76, 1954 16o Hyman, Ro "Stimulus Information as a Determinant of Reaction Time." Jo Expo Psychol., 45 (1953) 188-196. 17. Jarvik, Mo Eo "Probability Estimates and Gambling." In Mathematical Models of Human Behavior, Stamford, Connecticut: Dunlap and Associates, Inc., 19550 18o Mackworth, J. Fo "Paced Memorizing in a Continuous Task." J. Exp. Psycholo, 58 (1959) 206-211, 19o Mackworth, No H. and Mackworth, J, F. "Visual Search for Successive Decisions." British J. Psycholo, 49 (1958) 210-221. 20. Neimark, Eo Do and Shuford, E, H. "Comparisons of Predictions and Estimates in a Probability Learning Situation " J, Expo Psycholo, 57 (1959) 294-298, 21, Pollack, Lo, Johnson, Lo Bo and Knaff, P. Ro "Running Memory Span." Jo Expo Psycholo, 57 (1959) 137-146o 22, Reese, Ro W. "The Application of the Theory of Physical Measurement to the Measurement of Psychological Magnitudes, with Three Experimental Exampleso" Psychological Monographs, 55 (1943) 1-88. 23. Schreiber, Ro J, "Estimates of Expected Value as a Function of Distribution Parameters." J. Expo Psycholo, 53 (1957) 218-220. 24, Stevens, J. C. and Shickman, Go Mo "The Perception of Repetition Rate " Jo Expo Psycholo, 58 (1959) 433-440. 250 Taves, Eo Ho "Two Mechanisms for the Perception of Visual Numerousness," Archives of Psycholo, (1941) 265, -90