RSD-TR-3-83 DETERMINING MOTION PARAMETERS FOR SCENES WITH TRANSLATION AND ROTATION By Charles Jerian and Ramesh Jain Department of Electrical and Computer Engineering June, 1983 CENTER FOR ROBOTICS AND INTEGRATED MANUFACTURING Robot Systems Division CO.LEGE OF ENGINEERING THE UNIVERSITY OF MICHIGAN ANN ARBOR, MICHIGAN 48109

RSD-TR-3-83 Abstract A study of methods that determine the rotation parameters of a camera moving through synthetic and real scenes is conducted. Algorithms that combine ideas of Jain and Prazdny are developed to find translational and rotational parameters. An argument is made for using hypothesized motion parameters rather than relaxation labelling to find correspondence.

RSD-TR-3-83 Introduction There have been many different approaches to extracting the motion information from sequences of dynamic scenes [Nag81a,Nag81b,ULL79a,ULL79b]. A number of these methods concentrated on determining the FOE or focus of expansion form of the translational parameters. Most of these methods require optical flow vectors or corresponding points in different discrete frames as input. Other methods seek to use the focus of expansion to yield the correspondence[LAW82]. The FOE as the intersection of all optical flow vectors does not exist when the system undergoes rotation, however we feel that the concept of an FOE can still be used in analyzing three-dimensional motion parameters and in solving the correspondence problem for such cases [PRA79b, PRAR,PRA81a,PRA81b]. One simple method for determining the focus of expansion in scenes where corner points or other matchable entities have been extracted has been proposed by Jain [JAI82a]. This method does not use a correspondence between the frames, rather it uses the triangle inequality to show that in approaching motion the sum of the distances of points in the second frame from the FOE minus the sum of the distances of points in the first frame from the FOE is maximized. This method is difficult to apply directly to real scenes because most corner detectors can lose points from frame to frame and can also find multiple points in both frames. The idea of hypothesizing a likely focus of expansion and verifying it in some way against the image is an important method. In this paper we will suggest other variations of this basic paradigm. In Prazdny's work also optimization of a quality of fit function was proposed to find the rotation parameters DETERMING MOTION PRARMET1CRS 3

RSD-TR-3-83 from given optic flow vectors and this was tested on synthetic data [PRA81a]. As an alternative to this method a linear solution to rotation and translation parameters have been found by Tsai and Huang [TSA81,TSA82a,TSA82b]. Tsai and Huang have a linear algorithm for finding the direction of motion and the rotation parameters given optical flow. Their method is in contrast to the optimization methods in that it uses a system of linear equations. All other systems studied have non-linear equations in their methods. Their algorithm was implemented in a straight forward way by least squares using subroutines from the NAAS IMSL package. The results indicated that there are problems with using low accuracy vectors, vectors for objects which are close in depth or vectors that are spatially close. These problems become worse for objects that are far away. This study showed that their method requires some modification and that the least squares approach might not be the best approach to use. It also shows that getting the rotation from the hyperbolic vs. linear error that occurs locally is next to impossible. If rotation rather than the more general distortion is desired a problem arises in that their method does not specifically obtain a rotation represented by 3 real numbers but 8 parameters of a 3 by 3 distortion matrix. Only a subset (the orthonormal matrices) actually are rotations. Therefore it is possible that the optimal solution is not constrained to be a rotation because of noise or other errors. In any case it seems that their method will not recover the motion of a synthetic version of Nagel's [DRE8la] car with the optical flow we can obtain using a least squares implementation [CPJ82]. DETIERMING MOTION PRARMETKRS 4

RSD-TR-3-83 Determining rotation in synthetic scenes using feature points If a collection of points from a translating rigid body or, equivalently, if the observer translates, the disparity (or optical flow) vectors lie on lines that intersect at a point called the FOE. The FOE gives the direction of motion of the object or observer. Figure 1 shows the disparity vectors for an observer moving toward two boxes. These boxes are at position (0,0,10) and (0,10,25) where (0,0,1) represent the center of the image. The boxes are inclined by a rotation of -.035 radians about the x-axis followed by -0.50 radians about the y- axis. The observer is displaced 2 units in the z direction. For this example the focus of expansion is at (0,0) and the direction of motion is (0,0,1). Imagine a spherical retina whose center is at (0,0,0). Then the lines in this figure would correspond to a collection of great circles of the sphere which pass through a pole at (0,0,1). Now, if the imaging system is rotated before it is translated, a motion component is added to each point on the sphere along a latitude line of the axis of rotation. The projection of such latitude lines on a plane is discussed by Prazdny [PRA81a]. For rotations about axes on the x-y plane, these circles map onto hyperbolic arcs and, for rotations about the z-axis they map into circular arcs. The synthetic images in this section all are created using a 2 radians field of view. This gives an aperture ratio of 1. Locally the 2 hyperbolic arcs are almost linear, especially near the center of the cameras field of view. For a camera with a narrower field of view than our synthetic imaging system, they are fairly linear throughout the image. Rotation of the camera causes each point in the plane to be displaced in a way depending on its position on the plane. The actual mapping of image points (X,Y) to rotated points DETERMING MOTION PRARMETERS 5

RSD-TR-3-63 (NEWX,NEWY) under rotation about the y axis by 6 radians is given by the equations: NEWX =tan(arctan (X) + A6) NEWY= Y/(1+NEWX2/ +X2). Then, translation moves the points along lines passing through the FOE. The effect for most imaging systems, with a reasonable field of view, is almost constant over the entire plane. Note that such rotations can be closely simulated by a translation of an object, at distance r, by ri in the direction perpendicular to the axis of rotation. In figure 3, we have the same two objects as figure 1, however the camera is rotated.5 radians about the y-axis before translating 2.5 units along the z-axis. The motion vectors in figure 3 fall on lines (shown in figure 4) that tend to cluster at two intersection points along the x-axis. Any method that finds good possible FOEs will find likely values at these points. The object points that have these values would then be found. This facilitates a segmentation of the scene based on the variation in depth. It is observed that the intersection points all are roughly along a line (in this case the x-axis since the FOE is at (0,0)) and the axis of rotation is always perpendicular to that line. The axis of rotation therefore, can be quickly and easily determined by examining the local FOEs. Different objects in the image can be at different depths. The points at each depth are observed to intersect near a single point. If the image is presegmented into regions that are each part of one object of approximately constant depth, a clear single intersection point for the optical flow vectors of that DETERMING MOTION PRARMIETERS 6

RSD-TR-3-83 region would result. This is the converse of using the distinct local peaks to help segment the image. Standard techniques such as finding the least squares intersection point may suffice to obtain reliable results for such a region. Similarly, Jain's method would give a local FOE for such a segmented region. The different local peaks would be fitted with an appropriate curve to find the direction and amount of rotation. The position of the local FOEs would give an ordering of the regions by depth. Rotational parameters seem to operate orthogonally to the radial direction on which Jain's function depends, so that rotations have little consistent effect on its value. Consider the case where the FOE is at (0,0) and rotation is about the z-axis. The value of the function at the correct FOE (0,0) is not affected by any amount of rotation. The maxima may not be at (0,0) anymore. From a correct FOE if we draw radial lines to a distinctive feature point, its matching poin pnt is found on the same line within a certain distance that depends on the maximum velocity allowed. A smooth error measure can be devised that accounts for points that do not precisely fall on a line by finding the nearest matching point within a range of angles and using the sum of the absolute or square of the error the best matching points. If there is no match, a large amount such as rr radians can be added to the sum of the errors. This allows us to effectively find an optimizable function for the FOE that can be tailored to find good peaks along the different clusters or that can average the values to find an average FOE. To find strong peaking the maximum angle for allowing a match is decreased and changes are made to the error function to cause poorer matches to weigh more heavily against a given choice for the FOE. Since rotation DETERMING MOTION PRARMETERS 7

RSD-TR-3-83 of the camera is merely a mapping of the image to itself, the effects of such rotation can be reversed by performing an inverse mapping of the image using equations previously described here, or those of Prazdny [PRA81a]. Such transformations are called de-rotation mappings here to distinguish them from three-dimensional rotations. Since such a measure is very sensitive to error in the FOE caused by rotation it can be used to effect controlled de-rotation of the image by intersecting the vectors more tightly in the de-rotated image and reducing the error measure. This procedure does not require any a-priori match, however if any a- priori matching information is available the routine can selectively examine such matches to select the match that is most consistent with its hypothesized FOE. Figure 5 shows the same objects as figure 1. The observer translates along the z-axis 2.5 units and rotates 0.5 radians about the z-axis. Figure 5 shows the optical flow vectors that result. Figure 8 shows the lines that extend these vectors to intersection points. As these figures show, the lines intersect on a circle, where the FOE of the lines is at the center of the circle. The width of the circle is proportional to the amount of rotation and inversely proportional to the translational velocity. If the rotation angle is zero the circle degenerates to a point. If the angle is rr radians, the circle becomes the point at infinity. A circle detector could be used on the local FOE histogram to find the circle and obtain the FOE and amount of z- rotation. If we have only part of the circle is represented strongly because objects are more concentrated in some parts of the image than others, an averaging method would locate the FOE at the center of mass of the points on the circle rather than at the center of the circle. The amount of error would depend on the non-uniformity of the distribution and the radius of DETERMING MOTION PRARMElTE1RS 8

RSD-TR-3-83 the circle. Jain's algorithm for example, tends to find a value near such a center of mass. This leads to increasing inaccuracies if the feature points are nonuniformly distributed. Inaccuracy increases as the radius of the circle increases. A local approach that would find the FOE in windows would be useful here since we could plot the local FOEs and apply a circle detector to find the FOE and amount of rotation. As a variation of the above methods, an FOE can be computed by Jain's method using data that has been de-rotated using Prazdny's equations. The computed FOE is used to obtain a correspondence of the feature points. This involves a search in the radial direction from the FOE for a matching feature point. In the synthetic scenes first explored, only position information was present, so no comparison of the feature points was possible. To discern the presence and amount of rotation the following are computed: the distance in the x direction of the matched point, the perpendicular distance, and the difference in angle of the matching points. We only need to explore a few point in the neighborhood of a given point to find the matching point. All these points can be preselected by using a maximum velocity constraint. Additionally, if feature points have feature values, these would be used to further constrain the list to be searched. We de-rotate by moving the position of points in one of the images along hyperbolic paths. For de-rotation purposes rotation about an axis in the x-y plane can be replaced by a rotation about the x-axis followed by a rotation about the y-axis. To simplify the work only rotations about the y-axis were considered, however, everything will still work with an increase in cost for rotations DETERMING MOTION PRARMFER;KS 9 l ______________

ISD-TR-3-83 about two axes. The amount of rotation about a given axis is treated as a parameter to vary in an optimization method. The best average FOE is redetermined using Jain's method. It may be necessary to do much re-computation. To avoid this assume that the position of the best FOE varies smoothly and by a small amount with small changes in the amount of rotation. Then only a small amount of additional computation is required to find a new average FOE, given a new angle. In this method we have tested various parameters to determine the amount of error from the correct FOE. Among these is the mean square angular error of all the corresponding points. Other measures tested were the distance of the hypothesized line connecting the corresponded points along the x direction, from the FOE, and from the average of all the x positions. One could also find the least squares value of the intersections and determine the orthogonal distance of each line from the least squares value. A different error measure is the amount of spread of the clusters t the intersections. Table 1 summarizes the results of an implementation of this theory. The table shows the extracted position of the FOE for different amounts of assumed rotation about the y-axis. The correct values used in generating the data would give an angle of -0.03 radians and locate the FOE at (0.0,-1.00) on the x axis at the extreme left of the image. The column labelled match angular error shows the average of the sum of the squared differences in angular position of vectors from the extracted FOE to pairs of points that are established as corresponding. The correct result yielded the minimum error. The average of the sum of the squares of the angular errors was computed as a measure of the quality of the FOE and is reported in Table 1. Lines were drawn between the matched points to a line going through the computed FOE orthogonal to the x-axis. The actual DET.RMING MOTION PRARMIEERS 10

RSD-TR-3-63 average position of the intersections of the extended optical flow lines and this line were determined. The x position column reports that position. If tha.ere is no systematic error this value should. be the same as the x coordinate of the FOE. The variance of the intersections is noted under the variance column. This is an excellent measure of the spread of the intersection cluster and gives the best indication of rotation of all the measures. The FOE variance column reports the variance using the x-coordinate of the FOE rather than the refined x-position obtained from the intersecting vectors. As one can see from examining this table, all of the error measures are quite capable of determining the correct amount of rotation to within.01 radians. The plot represents the search space- for an optimization algorithm that seeks the peak of the surface. The function is examined over a coarsely quantized subset of its range. The range is quantized using equal angles of the x and y axes of a polar coordinate sphere. This allows for the uniform treatment of cases where the z velocity is much less than the x or y velocities or the rotation is large and the FOE is not on the image. Then a finer search of the area showing the best results in the first search is performed. This gives us a coarse to fine strategy. Figure 9 shows the FOE function for the de-rotated feature points that was generated by the program from the same run whose results are in Table 1. Figure 10 shows the close-up of the function near the correct value. If there is a non-geometric check on correspondence available, then the quality of correspondence can be evaluated. The possibility of no- match could be included. Correspondence can be used with such equations as those of Tsai and Huang to extract rotation and translation parameters. If correspondence DFERMING MOTION PRARMhTERS 11

RSD-TR-3-83 can be verified with a non-geometrical test, then a hypothesize and test paradigm becomes possible. A preliminary correspondence of some points can be found. This can be used to determine some motion parameters. These parameters can then be used to establish a better correspondence for more points. The procedure can be iterated until a sufficiently satisfactory result ensues. If the process fails to converge, new starting values can be hypothesized from the data. A random sampling and consensus algorithm [FIS81] [FIS82] may be useful here. Real scenes Since edge points seem to cause problems for the methods described in this paper [CPJ82] corner points were used for work with real scenes. The input image consisted of a pair of 128 by 128 images quantized to 256 grey levels. The images contain a frame surrounding a cube. The observer is approaching the center of the curb with uniform velocity. A corner detector suggested by Kitchen and Rosenfeld [KIT81] was applied to the two scenes and corner points were extracted. All corner pointras within a certain fixed distance determined by an estimate of the maximum allowed velocity (in this example 10 pixels) were considered as possible matching corner points. The square grey-level differences of 5 by 5 windows was used as an error measure. To more accurately locate the actual matching point a 3 by 3 neighborhood of the point in the previous frame was examined to find the lowest error match. At most five of the best such matching corner points are recorded in an array with the quality of the match for each corner point in frame 2. DLERMING MOTION PRARME'IRS 12

RSD-TR-3-83 The space of possible FOEs is searched on a coarse 10 by 10 grid. For a given point in frame 2, only those possible matching points determined earlier which are not further away from the FOE in frame 1 are tested for approaching motion, for receding motion only points which are not closer would be checked. Of these points, the point with the least difference in angle is selected, if there is such a point. A weighted sum of this angular error and the grey value error measure would be better here. The chosen point is the best match consistent with the FOE. Running totals of angular error and grey value errors are kept. An attempt was made to minimize the error by changing the FOE. Many choices of the FOE result in an identical correspondence of points in frame 1 and frame 2. These choices yield the same sum total error. The angular error values are not constant for these values but they are less reliable because of quantization errors. One could attempt to more accurately optimize one of these functions but the results would be meaningless unless the disparity vectors are more precisely determined. With a correspondence of points in the two frames a conventional approach to finding the FOE can be used such as a least squares pseudo-intersection of the vectors. Such an answer is also no more accurate than the result that the point is in a region near the center of the image (the image is 128 by 128 with center at 64,64) between (40,40) and (60,60) because of the quantization error. One would obtain a high variance if such a method were applied. To more accurately position the FOE the flow vectors must be determined to a higher degree of precision using interpolation techniques. Since our method gives an excellent correspondence of the points the search costs of interpolation are greatly reduced. The more precise vectors can be intersected to find the FOE with more certainty. Figure 9 shows the disparity vectors DETERMING MOTION PRARMlETERS 13

RSD-TR-3-83 generated by this method for the real scene Figure 10 shows the intersecting lines and gives an idea of the quantization errors in angle and the uncertainty of the position of the FOE. If the points extracted by this method are used with any version of Jain's algorithm, no reasonable answer results. For real scenes, Jain's [JAI82a] algorithm fails to deal with missing points and addition of new points. It also ignores the invaluable information contained in the feature values returned by a corner detector or contained in the actual grey-value window around the feature point. Quantization problems leads to uncertainty with some of the methods proposed in this paper when they are applied to real scenes. If a point moves one pixel, its direction of motion is only discernable to i radians precision. To correctly determine the differences between rotation and translation and to precisely fix the FOE information of a much higher resolution is required. To some extent these problems may be ameliorated by the use of image sequences rather than image pairs. Such sequences can be fitted with a curve and the curve used instead of the actual points in the calculations. Conclusions In this paper we have shown that methods that seek to use preliminary or hypothesized motion parameters to find correspondence and better motion parameters are viable alternatives to the usual paradigm of attempting to obtain correspondence or optical flow first and then to solve for motion parameters. This will be especially important in a system that has many sources of information that can generate hypothesized motion trajectories and integrate DETERMING MOTION PRARMETERS 14

RSD-TR-3-83 these sources into the dynamic scene problem. Acknowledgement We gratefully acknowledge Richard Hill for his endless ideas, discussions and editorial advice. References [CPJ82] Jerian, Charles. Determining Motion Parameters for scenes with Translation and Rotation. Wayne State University,Department of Computer Science,Masters Thesis, Detroit Michigan, 1982. [DRE82a]Dreschler, L. and H.H. Nagel. "Volumetric Model and 3D- Trajectory of a Moving Car Derived from Monocular TV-Frame Sequences of a Street Scene." Proceedings IJCAI (1981), Vancouver, Canada [DRE82b]Dreschler, L. and H.H. Nagel. "On the Frame-to-Frame Correspondence between Greyvalue Characteristics in the Images of Moving Objects." Proceedings GI Workshop on Artificial Intelligence, Bad Honneff, Germany, 1981 [FIS81] Fischler M.A. and Bolles R.C. "Random Sample Consensus: A paradigm for model fitting with applications to image anlysis and automated cartography," CACM, Vol. 24(6). pp. 381-395 (June 1981) [FIS82] Fischler M.A., Barnard S.T., Bolles R.C.,Lowry M., Quam L. Smith G. and Witkin A. "Modelling and Using Physical Constraints in Scene Analysis", in AAAI-82 pp. 30-35 [GLA81] Glazer, (?????first name-initial??)"A simple method for determining optical flow", in IJCAI-81 [HIL80] Hill,Richard,"Determining optical flow",Wayne State University Computer Science Department Masters Thesis (1980) [HOR81] Horn, B.K.P. and B.G. Schunck. "Determining Optical Flow". Artificial Intelligence 17 (1981) 185-203 [JAI82a] Jain, R. "An Approach for the Direct Computation of the Focus of Expansion", in PRIP-82 pp. 262-268 [KIT80] Kitchen, L. and Rosenfeld A., "Gray-level Corner Detection", Computer Vision Laboratory, Computer Science Center. University of Maryland College Park, MD 20742 TR-887 April, 1980 [LAW82] Lawton D.T. "Motion Analysis via Local Translational Processing" in Workshop on Computer Vision:Representaion and Control pp. 59-72 [MOR77] Moravec, H.P. "Towards Automatic Visual Obstacle Avoidance." Proceedings 5th IJCAI (August 1977), p. 584. DETERMING MOTION PRARMIETEES 15

IRD-TR-3-63 [MOR79] Moravec, H.P. "Visual Mapping by a Robot Rover." Proceedings 6th IJCAI (1979), pp. 598-600. [MOR80] Moravec, H.P. Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover. Stanford: Stanford University, Department of Computer Science, Ph.D. Thesis, 1980; also Stanford AI Lab Memo AIM340; also Computer Science Department Report Number STAN-CS-80813; also Pittsburgh: Robotics Institute, CMU-RI-TR-3, September 1980. [NAG81a]Nagel, H.H. "On the Derivation of 3-D Rigid Point Configuration from Image Sequences." Proceedings IEEE Conference on Pattern Recognition and Image Processing (1981). [Nagl8b] Nagel, H.H. and B. Neumann. "On 3-D Reconstruction from Two Perspective Views." Proceedings IJCAI (1981). [PRA79b]Prazdny, K. "Motion and Structure from Optical Flow." Proceedings 6th IJCAI (1979), pp. 702-704. [PRA80] Prazdny, K. "Egomotion and Relative Depth map from Optical Flow." Biological Cybernetics, 36 (1980), pp. 87-102. [PRA81a]Prazdny K. "Determining the Instantaneous Direction of Motion from Optical Flow Generated by a Curvilnearly Moving Observer" CGIP 17 (1981) 238-248 [PRA81b]Prazdny K. "A Simple Method for Recovering Relative Depth Map in the case of a Translating Sensor" IJCAI-81, pp. 698-699 [TSA82a] Tsai R.Y. and Huang T.S. "Uniqueness and Estimation of Three- Dimensional Motion Parameters of Rigid Objects with Curved Surfaces", in PRIP-82 pp. 12-118 [TSA81] Tsai R.Y. and Huang T.S. "Estimating Three-Dimesional Motion Parameters of a Rigid Planar Patch", in IEEE Transactions on ASSP Volume ASSP-29 no. 6 (December 1981) pp. 1147-1152 [TSA82b] Tsai R.Y. and Huang T.S. "Estimating Three-Dimensional Motion Parameters of a Rigid Planar Patch", in IEEE Transactions on ASSP Volume ASSP-30 no. 4 (August 1982) pp. 525-534 [ULL79a] Ullman, S. The Interpretation of Visual Motion. Cambridge, Massachusetts: MIT Press, 1979. [ULL79b] Ullman, S., "The Interpretation of Structure from Motion, " Proc. Royal Soc. London, Vol. B, No. 203, pp. 405-426 DETKRMING MOTION PRARMETERS 16

RSD-TR-3-83 TABLE 1 ROTATION ABOUT THlr Y-AXIS DV-ROTATION RESULTS Foe Match X-axis intercept Angle angular X Foe X axis Y axis. _____laiax error position variance variance -.04 -.04 -1.00 9.2E-6.05 1.3E-2 1.4E-3 -.03.04 -1.00 2.8E-6.03 5.0E-5 1.4E-4 -.02.09 -1.00 6.6E-6.11 2.0E-2 2.0E-2 -.01.18 -1.00 2.5E-5.18 1.6E-2 1.6E-2 -.00.22 -1.00 4.1E-5.25 1.1E- 1.1E-1 DETERMlNG MOTION PRARM'EL-ERS 17

1~~~~'~ ~~18 111'<1 I //// Figure 1 Flow vectors for z translation Igure 2 Line extending flow vectors to meet at fol Fure 2 Line extending flow vectors to meet at fo,

19 <I \\\ / oh' E /' /,, I i Figure 3 Flow vectors showing y-rotation and translation igure 4 Lines extending flow vectors for y-rotation and translat

20 ill/ / // / / ////,IN Figure 5 Flow vectors for z-rotation and x and z translation Rgure 8 Lines extending z-rotated flow vectors showing circle patten

21 Figure 7 Foe function for foe at 0, -1 Figure 8 Close-up of foe function at 0. -1

22 6-.. W-11, FIgure 9 Flow vectors of cube scene gure 10 Lines extending flow vectors of cube scene