WILLOW RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN L To be presented at the National Conference of The Society of Aeronautical Weight Engineers, Inc. 10-13 May 1954 Lord Baltimore Hotel, Baltimore, Maryland STATISTICAL CONSTANTS IN PREDICTING EQUATIONS Joseph B. Tysver Associate Research Engineer 2063-4-J Based Upon Work Done Under USAF Contract AF A3(616)-199. Published by Permission of the Wright Air Development Center. J

WILLOW RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN 2063-4-J-. ABSTRACT A linear predicting equation is presented for which the coefficients are obtained by minimizing the sum of the squares of the percentage errors. Comparison with the ordinary least-squares regression equation (based on actual errors) shows that the two equations can yield substantially different estimates when the range of the variables in the sample to be fitted is large. If percentage errors are more important to the experimentor than actual errors, and if the range of the variables is large, the predicting equation presented in this paper is preferable to the ordinary regression equation. i

WILLOW RUN RESEARCH CENTER-UNIVERSITY OF MICHICsAN 2063-4-J TABLE OF CONTENTS Section Title Page Abstract i I Introduction 1 II The Error of Estimation 3 III Least-Square Linear Prediction Based on 5 Actual Error IV Least-Square Linear Prediction Based on 6 Relative Error V Application 9 VI Conclusions 12 References 12 VII Appendix 15 I i i i I ii ----

WILLOW RUN RESEARCH CENTER -UNIVERSITY OF MICHIGAN I 2063-4-J I INTRODUCTION In aircraft weight estimation three approaches are possible: analytical, statistical, or a combination of both. No purely analytical approach is possible at the present time because of two factors: 1) Complete analytic treatment has not been developed and will be quite complicated when it does become available. 2) The presence of differences (and trends) in fabrication procedures introduces differences from an analytic model which cannot be ignored. On the other hand, the extrapolation which is, and will be, required introduces errors in a purely statistical approach which also cannot be ignored. Thus it appears that the only satisfactory method available for aircraft weight estimation will be based on analytic formulas with statistical constants. The analytic components will be determined by availability of theoretical development, adaptability to calculation, applicability to extrapolation, and acceptability of errors obtained; the statistical constants will be chosen to improve the fit of the analytic components to the weights to be estimated. Some of the aspects of fitting of statistical constants are discussed in this paper. The equation to be used in weight estimation contains variables Xi (i=l,..., m) which may represent parameters of the aircraft or analytically derived weight components, which themselves are functions of aircraft parameters; it also contains constants aj (J=1,..., k) which are to be determined from a sample of several aircraft so that the equation has a "best fit" to known weights. The definition of a "best fit" to be used here is the usual one of a least-squares fit. That is, the sum of the squares of the errors shall be a minimum. The basic form of the estimation is assumed to be: m Y = ao + al x +... + amxm = a + ~ ai xi (1) i=l Two other specific forms which can be transformed to this form and are of special interest are: bl b2 bm m bi Y =b X1 X2... Xm =bo Xi (2) 1 i 1

WILLOW RUN RESEARCH CENTER- UNIVERSITY OF MICHIGAN I 2063-/4-J I... for which Y =a + X + a2X +... + a, ai Xi 1=0 the respective transformations are. x = log X and Y = log Y Xi X and Y = (3) (2') (3') 2

WILLOW RUN RESEARCHC CENTER-UNIVERSITY OF MICHIGAN 2063-4-J II THE ERROR OF ESTIMATION L The choice of the quantity which is used to represent the error of estimation will depend on what the estimator wishes to guard against and what assumptions he is willing to make. Let us denote the actual weight of the jth object in a sample of n objects by yj, and the estimated weight of the jth object by Yj. The quantity to represent the error is then E + (y- Yj) where the choice of the plus or minus sign will depend upon the estimators point of view, but does not affect the mathematical treatment of E 2 The J statistician uses the plus sign and considers Ej as the error of observation. If it can be assumed that the predicting variables (xi) are without error, then this choice of Ej leads to the ordinary linear regression theory found in standard statistical texts (Ref. 1) which is outlined in the next section of this paper. If the predicting variables cannot be assumed to be without error we are led to an orthogonal linear regression theory which also can be found in statistical texts (Ref. 2). The difference between these two theories is illustrated in Figure 1 for one predicting variable. y y. Yj Vj Xj FIG. 1 ERROR OF ESTIMATION 3

WILLOW R RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN L 2063-4-J In some applications the relative (percentage) error is more important than the actual error. For example an error of 500 pounds may be acceptable in an observed value of 40,000 pounds, but would be intolerable in an observed value of 1,000 pounds. If the range of values is large it may be advisable to determine statistical constants which minimize the sums of squares of relative errors rather than of actual errors. A method for accomplishing this is presented in Section IV of this paper. Four possible definitions of the relative error e are presented below TABLE I RELATIVE ERROR, e Numerator y-Y Y -y Denominator y - (a + b x) (a + b x) - y a+bx a+bx y i-a af~ -b( ) a(f)+ b(a)- I On the basis of logical signs and simplicity definition selected here is e =a )+ b(X - 1 \y/ Y/ of calculation the 4

WILLOWX0 RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN 2063-4-J III LEAST-SQUARE LINEAR PREDICTION BASED ON ACTUAL ERROR For purposes of review and/or comparison, the ordinary least-squares development is outlined here with one predictor x. A predicting equation Y = a + bx is to be obtained from a sample of observed values (xj, yj) with j = 1, 2,..., n. The values of xj are assumed to be without error. The coefficients, a and b, are to be determined so that the sum of the squares of the actual errors, defined as: Ej = (Yj - Y), (4) is a minimum. The values of the coefficients which satisfy this condition are determined from the two equations: 2 a a 2 _t zE~ =0 and E = 0 or 2 Lyj - na - b x = 0 and x y - a xj - b x. = 0 giving n sxy - x Y b ---------- n x2 - ( x)2 y - bZx a = n It should be noted that the first equation gives the condition Z E = O; hence this choice of coefficients gives errors with zero mean and minimum standard deviation. 5

WILLOW RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN 2063-41-J IV LEAST-SQUARE LINEAR PREDICTION BASED ON REIATIVE ERROR First we shall consider a predicting equation of the form Y = a + bx The coefficients a and b are to be determined so as to minimize the sum of the squares of the relative errors Yj Yj ej =Yj for a sample of n pairs (x., y ). The two conditions used to determine the coefficients are 2 2 ~e =0 Oand Le 0. 8a j 8b Setting uj = /y and = 1/y (6) we have e = a v. + b u -1. The two conditions yield the equations 2 a Zvj + b Zuj j - Zvj = 0 and 2 a YUj Vj + bju - = O and the solutions 2 LUj vj - ~vj ~uJ v; bU = - - -'....~ ~......... — (U 7) b. 2 2 2 V Zu v - ( Vuj v v 3 ii~~~ 6I A ______ 16 vJ

WILLOW RUN RESEARCH CENTER —-UNIVERSITY OF MICHIGAN 2063-4-J and 2 Evj uj - Luj ~ uj vj (8) Lu Lvj - ( uj vj) Note that the condition ej =0 does not necessarily hold for this form. It can be shown that 2 ej = - ej. This theory may be generalized for application to a sample of n observations on m independent variables (xi) and one dependent variable y with the estimation equation Yj=a0 +alX j+..+am j J= 1,..., n and the relative errors Yj - Yj Yj m i = i=O ai Uij 1 where l/yj uij = j/Yj for i = 0 for i = 1,..., m The requirement that n 2 Z e =minimum j-l 7

WILLOW RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN 2063-4-J yields m + 1 linear equations 0 2 - ~ e = 0 i = 0, 1,..., m Cai I which can be solved for the m + 1 coefficients ai. A computational form is presented in the Appendix. Other variations of these results can be obtained by specifying such conditions as: 2 1) ~ e 0 (the mean relative error is zero). J 2) a = 0 (the estimation equation passes thru the origin), 0 I

WILLOW RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN I 2063-4-J V APPLICATION The analytically derived components of structural wing weight presented in Reference 3 will be used to illustrate the method of determining statistical constants. The four analytic components xl, x2, x3, and x; are used to estimate structural wing weights for 12 fighters, 13 bombers, and 15 cargo aircraft with an equation of the form we = a + a x1 + a2 x2 + a3 x3 + a4 x4. (9) From the analytic derivation the physical interpretations will be maintained only if xl and x3 have positive coefficients and x2 and x4 have negative coefficients. These signs are not obtained in all of the three samples (fighters, bombers, and cargo aircraft) so that physical interpretation is lost in making the statistical fit. A second attempt is made using the equation! I Wwe = ao + al (xl) + a2 (x2) (10) t t where xI = xI - x2 and x2 = x3 - x4. For this equation it is assumed that the poorer fit will be offset by a gain in physical interpretation and (it is hoped) a better applicability to extrapolation. The coefficients al and a2 should be positive to allow physical interpretation. The statistical fit of the samples again violates these sign requirements. A further forcing of the sign requirements is made to give the equation Wwe3 = a + al x (11) where I I! x =1 + X2. In this case the coefficient al, as determined by a statistical fit, is positive so that physical interpretation is maintained. i L 9

WILLOW RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN 2063-4-J Since the right side of the predicting formula has been reduced to one predictor, other formulas using x and x respectively may be of interest. These are Wwe4 = a + ao x1, (12) and We5 ao + al X2 ~ (13) The following table gives the values of erm for the different equations as applied to a sample of 12 fighter aircraft where ers = /n = root an square relative error e g= je /n = root mean square relative error (14) TABLE II RELATIVE ERROR (rms) FIGHTER AIRCRAFT (12) Formula err we1 = a x + a 2 x2 + a3 x3 + a4 x4 0.122 Wwe2 = ao + al (x1 - x2) + a (3 - x4) 0.149 W3 = ao + a [(x - x2) + (x3 - x4)] 0.150 we4 = ao + al (x1 - 2) 0.157 Wwe5 =a + a (x3 - X4) 0.152 Note the increase in erms is comparatively small when the variables are combined to force the signs for physical interpretation. 10

WILLOW RUN RESEARCH CENTER - UNIVERSITY OF MICHIGAN L 2063-4-J A comparison of the formulas based on actual errors with those based on relative errors is presented for Equation (12). An over-all fit to all categories of aircraft (i.e. including the three samples of 12 fighters, 13 bombers, and 15 cargo aircraft in one sample of 40 aircraft) is included. Values of eris are presented in Table III. Figures 2 and 3 present estimated wing weights versus actual wing weights for relative error fit and actual error fit, applied to the over-all category. Note the large relative errors at the lower range of Figure 3. TABLE III RELATIVE ERROR (erms) OF WING WEIGHT ESTIMATES Wwe = ao + al x1 Relative Error Fit Actual Error Fit Category'Category By All By All Category Categories Category Categories Fighters (12) 0.157 0.174 0.186 0.457 Bombers (13) 0.175 0.195 0.240 0.182 Cargos (15) 0.092 O.144 0.159 0.100 All (40) O.143 0.171 0.197 O.278 11

WILLOW RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN 2063-4-J - VI CONCLUSIONS In fitting an equation to data statistically the experimenter must first determine what type error he wishes to minimize. When percentage error is of primary importance, as it is in aircraft weight estimation, the method developed in this paper is appropriate for fitting a predicting equation to the data. This approach requires a slight increase in computation over that required for the equation based on actual errors, but it reduces the dominating role of the errors in the larger values of a sample. If the range of values in the sample is large, the two bases can yield substantially different percentage errors. REFERENCES 1. P. G. Hoel, Introduction to Mathematical Statistics, Chapter V I I -j 2. H. Cramer, Mathematical Press, 1946, p. 275. 3. Spath, R. M., "A Simple of Michigan, Willow Run Methods of Statistics, Princeton University Wing Weight Estimation Equation", University Research Center Memorandum 2063-3-J. 12

WILLOW RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN Ir__________ ~ ~ ~2063-4-J L 30 0 z I I — --- _ _ ____-_ ____ - -K — i / II I0 OF / I 7 i - - 7I7 I I - I-7 —^ I0 I-I I_ -1 -- — 1-',- f- - - ___ T I/iSS-C^lel --— r —--- 00 _ - /X ~ ~7___II I I __ ____ ___ I I d/./ 1 1 I I I I I ~WWe= aO+al(X1 - X2) I.~~/ /1^ 1 1 | | | | I~~~ ~ erms=. 71 i ACTUAL WING WEIGHT (WWa) FIG. 2 ESTIMATED WING WEIGHT USING RELATIVE ERROR FIT ALL CATEGORIES 13

WILLOW RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN 2063-4-J (3 I0 z ru 0.( Qr,/ / /77 / 7 _ 0/ 0, 0'/;' / / + (X1 /.278 ACTUAL WING WEIGHT (Wwa, FIG. 3 ESTIMATED WING WEIGHT USING ACTUAL ERROR FIT ALL CATEGORIES 14

WILLOW RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN 2063-4-J. VII APPENDIX LINEAR REGRESSION WITH LEAST-SQUARES RELATIVE ERROR Observed Sample Values: (xij, yj) i 1, * *, m; j = 1, *, S n Equations Yj = ao + alxlj + a2x2j +. + am mj Transformation: l/Yj i=0 Uij { xj/ i = i1, 2,..., m Relative Error: Yj - yj e = _ V Y m = L aiui - i=o J Condition: 2 8 2 Z e. = minimumw-~-r ej = 0, j J r = 0, 1,...* i a r ~ ar j 2 ej aej = 2 e - = 2 e urj j u ar j rj C J m ej Urj urJ ~ m n aFi( In fl ai uij -j Urj uij)- Ur = 0 15

MWILLOW RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN 2063-4-J System to be solved for ai, i = 0, 1, **.. m: 2 ao Uoj + al1Uo'Ju + a2UojU2j +... + amZU =~ uO m ojj mJ oj aoUoujUl + al2lj ajju1 j + a2Zulju2j + *.. + amZuljUmj =U1lj ao0 UoJU2j + alulju2j + 2 a2 u2j +.*. + aMu 2jumj =U2j 0 00 00 60 00 09 96 0a 0* ** 0* 0w 00 90 0* 00 00 0o *0 0* 60 aozuojUkj + aluljukj + a2U2jUkj +... + mU ojuki ll;'jukj,E'2ukj + amiu m~u 16

WILLOWR RUN RESEARCH CENTER-UNIVERSITY OF MICHIGAN 2063-4-J COMPUTATICNAL FORM ~uoj UojUlj UojU2j ** UojUmj U 2 C Ulj ZuljU2j *... UIjumj m Ulj u2j *. Cu2jUmj u2j 2 Umj | "Umj n U11 U12.* Ulm Ul1 U22.. U2m U20 2 Urs=uoj jsj -Uoj Urj UojUsj Umm UmO 2 Uro =uoj, urj -uoj ~UojUrj U00 U22.1 U2m. l U20.1 u u u u 1 1........................ U11 rs ls Ir Urs.1- 2 UMM 1 UmO.l oj 00.1 Ukk.(k-l) Urs (k-) Uks (k-l) Urk.(k-l) rs.k=..-.... *....*.* 0** U (k-1)(k-1).(tk-2) Umm. (m-l) UmO. (m-l) U00. (m-1) ao al a2 ~.. am UOOm am = Umo. (m-l)/ Um. (-l) m am-j = U(m-j)O.(m.-j-l) i' L ai U(m-j)i.(m-j-1) / U(m-J)(m-J).(mjl) i=m-j+l m 2 j = 1, 2..., m-1 ao = uo - ~ ai(Zuoj uij) /2uo i=l 3 17