RSD-TR.144 THE USE OF THE DEANZA IP6400 INAGE PROCESSOR FOR LOCAL INDOW OPERATIONS Edward J. Dp Nirwan Ansari Department of Electrical and Computer Engineering The University of- Michigan Ann Arbor, Michigan 48109-1L109 January 1L984 CENTER FOR ROBOTICS AND INTEGRATED MANUFACTURE Robot Systemis Division COLLEGE OF ENGINEERING

''A- i...

RSD-TR-1-84 ABSTRACT This report discusses the use of the DeAnza LP6400 image processor for high speed local window operations. A system overview of the image processor and its important features are irst introduced. Algorithms for linear and non-linear window operators are then presented.

TABLE OF CONTENTS 2.1. The DeAnza. IP6400............................................................................. 2 2.2. Digital Video Processor (DVP).............................................................. 3.1. 3X3 Window Operation....................................................................... 3.2. n X local window operation...............................................................12 3.3. 3X3 Median Filter.............................................................................. 3 3.4. n Xmn median flter...............................................................................19

RSD-TR-1-84. Introduction Most image processing operations take a large amount of memory space and cpu time. General purpose computers are not usually equipped with features necessary for high speed image processing operations. Nowadays, high speed image processing is desirable in many industrial and medical applications. The DeAnza 1P6400 is used for its special features that make real time processing feasible. This report will fist introduce an overview of the IP6400, and its most important option, the digital video processor (DVP), which provides facilities for various processing operations. Algorithms using the 1P6400 to perform general 3X3 local window operations, nXm local windowoperations, 3 X 3 median iltering, and n X m median filtering are then presented. 2. System Overvlew 2.1. The DeAnza E1P8400 The 1P6400 Series Image Array Processor contains many features which make it a powerful imaging system. It can be used solely as a display system which pirovides high resolution color, pseudo color, and multiple monochrome image display. Coupled with other options, real time image processing can be achieved using the 1P6400. Among the options acquired in our system are a video signal digitizer and control, alphanumeric overlay generator, graphic overlay channel, trackball cursor generator and control, joystick cursor generator and control, external synchronous source input, video output controller, and, most of all, the digital video processor arithmetic unit which makes window operations feasible. Our system also contains four 512x512 8-bit memory planes and four video cards with 8 bit DAC's. It is interfac-ed with a VAX 11/780 host computer running UNIX" (4.lbsd). Figure 1 shows an overview of the 1P6400 system.

RSD-TR-1-84 SYNC —> ALPHANUMERIC U N [B IMAGE IMAGE U REFRESH -,TI ITTARRAY S ~ |MEMORY PROCESSOR Ii~~~~~~~T Ii~I BYPASS DP BYASS Figure 1. Overview of the DeAnza IP64OO, 2.2. Digital Video Processor (DVP) The importance of the DVP is due to its ability to handle operations on a large amount of image data in one video frame time(33 ins). The data paths of the DVP are shown in Figure 2. The DVP has two arithmetic/logic units (ALU) which allows the conditional outputs from the test ALU to determine swapping of the two operation codes of the operational ALU. In other words, the results of a test on an individual pixel can be used to determine the operation ~~The block diagram is taken from ~~~~~~the160 rgamrsMnarr 3tenaiP40Iae rcso

RSD-TR-1-84 Bit A ~~Byte AUX L ~~~~~~~~~~of U ~~Shifter' Array Plane Mask chl A W Beit Bi ~~~~~~~~~~~~~~Plane B ~~~~~~~~~~~~~~~~~~~~Mask chi Ls B ~ ~ ~ ~ ~ ~~~~~~~~~~~~~~~Bt Write~~o Input ~~~~~~~~~~~~~~~~~~~~~~~~~Bit Ci ~~~~~~~~~~~~~Plane= ch2 Zero Byte~~~~Wit Enable Write~J Bi K2 ~~~~~~~~~~~~~~~~~~~~~-Enable Figure 2. Data Path of the DVP on that pixel through the operational ALU. InputsA. to- the ALUT are- dtr Ined by thInp t Data Path Registersv (ITNPDP) by me&-,an

RSD-TR-1-84 channels directly or through the Intensity Transformation Tables (ITT) under control of the Memory Port Control register (MEMPC). The INPDP also allows data acquisition from the camera A/D, and constant values from constant register or auxiliary input. Users can also make use of it to achieve compound operations such as 16-bit arithmetic operations. Under the control of the Output Data Paths and Shift Control register (ODPSH), output data from the ALU's can be shifted or rotated as either two independent 8-bit data values or a combined 16-bit data value to each image memory channel including the graphic overlay. The Bit Plane Mask register (BPM) controls the output writing to any single bit or combination of bits of each image memory channel. The Destination registers (DEST) may be used to scroll data output from the DVP to the target memories in multiples of 16 pixels. This is convenient for arithmetic operations on two memory channels. For arbitrary offsets, rather than restricted by multiples of 16 pixels, there are scroll registers associated with each memory to scroll, zoom, and selectively inhibit data from the source memories. Thus, arithmetic operations can readily be done by appropriate manipulation of these registers. The rest of the paper will discuss the philosophy of using the EP6400 to perform local window operations including median filtering. The mnemonics, virtual addresses and function of the DVP registers and other registers which are used for window operations are found in Appendix AN1. 3. Loca Window Operations'st image operation is the local window operation. These operations take a great deal of time if done on a general purpose computer because the operations are usually perfrme pixl b pixl. sin thVW40iaepocso%n a pedu h prto

RSD-TR-1-84 whole image (512x512). A DVP operation is a simple arithmetic or logic operation on an image through the DVP. A series of programs have been written to perform operations using the DVP. These programs mainly set up correct communication between the host computer and the P6400 by loading and synchronizing the registers and operations of the IP6400. Manual entries for the various programs can be found in Appendix B to G. 3.1. 3 X3 Window Operation A 3 X 3 local linear window operation on an image can be thought of as a convolution of a 3 X 3 window with the image. Consider the following weighted window: u w2 u W4 U% U6 As the window moves ross the image, the pixel value in the center of the window at that instant is replaced by the sum of the products of the window coefficients and the corresponding pixel values of the image. In a global sense, the operation can be perceived as the result of the sum of nine image planes, in which each plane is the scrolled'image multiplied by one of the window coefficients, i.e. wo1X (original image scrolled one pixel left and up) + u"X (original image scrolled one pixel up) + WLJX (original image scrolled one pixel right and up) + o4 X (original image scrolled one pixel left) + w.5X (original image) + uV6X (original image scrolled one pixel'right) + w" X (original image scrolled one pixel left and down) + tu X (original image scrolled one pixel down) +t X(original image scrolled one pixel right and down). The versa.. tility of the DVP comes into play for such operation. Each multiplication of one coefficient with the whole scrolled image takes only one video frame time (one DVP operation). The overall operation would take only nine video frame times if no modification is needed after the nine mentioned operations are completed. However, since image gray levels range from 0 to

RSD-TR-1-84 overflows. This timing calculation does not include interface timing between the 1P6400 and the host computer, which is usually small. To be exact timing for communication between the IP6400 and the host computer should also be considered. This timing depends on the speed of the host computer and the number of tasks being performed by the host computer. Another aspect that should be considered in using the DVP is that the two ALU's are unsigned 8-bit ALU's which are not capable of performing multiplication or division directly. In'order to get around this, the Intensity Transformation Table (ITT) associated with each memory plane must be employed as a look-up table multiplication. Algorithms that require logarithmic or linear tables can also be implemented. The logarithmic table has the advantage of taking care of both positive and negative values, but it loses one bit of accuracy because both positive and negative values are represented by 8-bit data. The linear table.is likely overloaded if the window coefficient is large. Hence, a scaling factor is necessary to avoid overloading the Intensity Transformation Table. The linear table is preferred in this operation since it can be implemented easier without the complication of the sign. To multiply a memory plane with a coefficient, the ITT associated with that plane is loaded with 256 bytes of data corresponding to values of the products of the absolute value of the coefficient and gray levels ranging from 0 to 256. The ITT is then used for indexing the pixel intensity.: For the 3 X3 window operation, the result can be accumulated in one memory plane by summing each of the nine scrolled images (scrolled according to each window position) through its associated ITT which was loaded with indices (products of the absolute value of the coefficient and the gray levels from 0 to 256) corresponding to gray levels from 0 to 256. If a coefficient is negative, the ALU performs subtraction. Synchronization is very crucial for programs that involve DVP operations. Any necessary calculations for the next DVP operation should be done in the host computer as soon as the APGO bit (initiating the DVP operation) of the Control and Status register (CSR, see Appendix A) has been set. Only when the APR bit (ready bit) of the Control and Status register becomes set (indicating the completion of a DVP operation), should the required 1P6400 regis

RSD-TR-1-84 The 3 X 3 window operation software is written in the C programming language using a VAX11/780 host computer running UNIX. Three memory planes of the IP6400 are used for the operations. The original image to be processed is assumed to be in memory plane 0, and the result is placed in memory plane 1. Memory plane 2 is used to hold the information of overflows and underflows. Both memory plane 1 and 2 should be cleared before the operation. The algorithm consists of the following steps: (1) read in nine window coefficients. (2) do step 3 to step 5 for each window coefficient (i.e., 9 times). (3) load ittO with indices corresponding to products of the scaled coefficient (each coefficient is divided by the scaling factor) and image gray levels ranging from 0 to 255. (4) set memory 2 as the input to the Operational ALU, and memories 1 and 0 as inputs to the test ALU. Note that data from memory 0 is scrolled accordingly before the input to the test ALU is set. (5) if a coefficient is negative, perform subtraction in the test ALU (memory 1 - memory 0), and place the result in memory 1 (accumulating the result in memory 1). At the same time, decrement the input of the operational ALU (decrement -memory 2 which stores underflows and overflows information), and place the result in memory 2. Otherwise, perform addition in the test ALU and increment the input of the operational ALU. Succes-. sive results are therefore accumulated in memory 1, while the information about overflows and underfiows is accumulated in memory 2. The result in memory 1 would be correct if there are no underflows or overflows and the scaling factor mentioned before is 1. If the indication that a pixel in memory 2 is an underfiow, the corresponding pixel in memory 1 (accumulated result) is actually a niegative number. But the ALU's of the DVP are unsigned and image gray level values cannot be negative. It is logical, to set to zero or take the absolute value of those pixels which are underflows. If a pixel is found to

RSD-TR-1-84 memory 2 which holds the overflow and underflow information should have a value of zero if there is no overflows or underflows. They should have values close to ff (hex) if there is underflows and values from 1 to 8 if there is overflows. >From this information overflows and underflows can be corrected using the swapping ability of the DVP's ALU's. The folowing steps are used for correcting the underflows and overflows: (6) do step 7 to step 8 twice. (7) place memory 1 as input to the operational ALU, and memory 2 as input to the test ALU. (8) perform addition (memory 2 + constant) in the test ALU. (a) zero underflows: let the constant be 80 (hex), if there is carry bit from the test ALU, set output of the operational ALU to zero (80 hex plus an underfow indicator of value close to ff hex would produce a carry bit), or otherwise, pass input of the operational ALU as the output (pixels not underfowed are passed unchanged). Place output of the test ALU back into memory 2 which now holds the sum of previous values and the constant. Place output of the opetional ALU in memory which now holds the result of having corrected underflows. set overflows: let the constant be 7f(hex), if there is a carry bit from the test ALU, set output of the operational ALU to ff(hex) (7f hex plus the sum of an overflow indicator with value from 1 to 8 and 80 hex (prev'iously done) would produce a carry), or otherwise, pass the input of the operational ALU as the output. Place output of the test ALU back in memory 2, and output of the operational ALU in memory 1 which now holds the correct result with both underflows and overflows taken care of. If complementing underflows (taking absolute values) is desired, step (8b) should be taken instead of (8a): (b) underflows: let the constant be 01(hex), perform complementation of the input of

RSD-TR-1-84 indicator would have value ff(hex), excessive underflows and overflows (more than once) are taken care of at the same time in the next step). Place output from the test ALU in memory 2, and the output from the operational ALU in memory 1 (which now holds the result with the underflows corrected). overflows: excessive underfows and overflows can be corrected like step (8a), except that the constant becomes fe(hex). Set output of the operational ALU to 255 if there is a cay bit from the test ALU, or otherwise pass the input unchanged (summing fe(hex), 1(hex) (previous operation) and the overflow indicator would not produce carry bit if and only if the indicator is zero, in other words, there is neither overflows nor underfows). Place output from the operational ALU in memory 1. Finally, the result in memory 1 should be scaled (multiplied) by the scaling factor. As mentioned earlier, the scaling factor is used to avoid overloading the ITT. Since partial results are accumulating in memory 1, severe truncation inaccuracy will result if the ITT is overloaded when it is loaded a a multiplication table. This truncation error would be accumulated and thus is u'ndesirable. As a rule of thumb, the scaling factor which is determined by' the user is usually chosen as the absolute value of the largest window coefficient provided that the (I ff,O)'F —- 1k(fe,O) -one pixel left and up (O,lf ti -- ff) f) -— unscrolled (1,1fe),,- --- -(O,lfe) -one pixel right and down (010) ~ ~ ~ ~ ~~~~~tlff,O)

RSD-TR-1-84 magnitude of the chosen coefficient is greater than one. It would be one if magnitudes of all the dow coefficients are never affected by this scaling factor. The final step is: (9) scale the result in memory 1 by the same scaling factor through the ITT of memory 1. Place the result back into memory 1. The final result is then found in memory 1. The video pixels are organized as a 512x512 array. Columns are numbered in the xdirection (from left to right) from 0 to 511. Rows are numbered in y.direction (from bottom to top) from 0 to 511. Pixel location (0, 511) is then the upper left hand corner of the video display. The 3X 3 window operation requires 9 different scrolling positions for the convolution. For example, the scrolling positions for wl (one pixel left and up) is scrx=lff(hex) and scry=O(hex) where scrx and scry are the scroil registers corresponding to x and y directions(see Figure 3). For 3X3 window operations the scrolling positions corresponding to the window positions wl to w9 would be as follows: window positions scrx (hex) scry (hex) wti l 1ff 0 _ w4 lff lff ws —- 0 1ff Ut 1 lff Ytt? lff Ife WA — 0 Ife t _ 1 Ilfe The overall 3X3 window operation takes n+3 DVP operations, where n is the number of non-zero window coefficients. There exists some inaccuracy using the IP6400 due to the roundoff errors caused by the Intensity Transformation Table and the fact that the ALU's of the DVP are only 8-bit ALU's. There is however a tremendous gain in speed. There is little visible done L o:..:l.: ___ t_n agneal puros computer

RSD-TR-1-84 3.2.1 n m local window operation The philosophy of using the IP6400 to perform n X m local linear window operation is very similar to that of 3X3 local window operation. In fact, all the necessary DVP operations are of the same type. It takes nm+3-z (z is the number of zero coefficients) DVP operations for a n X m local window operation, nm-z DVP operations for accumulating the sum of the products of each window coefficient and its correspondingly scrolled image, and 2 DVP operations (exactly the same operations described for 3X3 window operation) for correcting underflows and overflows, and 1 DVP operation to scale the result by the same scaling factor mentioned before. One point that should be mentioned here is the evaluation of the scrolling positions for different window size. Consider the following window: w(O,O) w(1,O)... w(m,O) w(O,Ol) * w(m w(O,)...... w(m,n) Since the upper left corner of the video display is positioned by scrx=O and scry=lff(hex), the scrolling positions (specified by scrx and scry registers assmociated with each memory channel, see Appendix A) can be related to the window coefficients by the following equations. For window position w(i,j): cryl = (scro+-2! +-j mask mas = Of e)

RSD-TR-1-84 i the logical operation AND The mask (lff, hex) is used to mask out bits that are not used for scrolling position (see Appendix A). Note that the way the scrolling positions is evaluated has restricted m and n to be odd integers. 3.3. 3 X 3 Medan Filter Using the IP6400 for two-dimensional median filtering is challenging. Median filtering is a non-linear window operator and consists of a great number of sorting operations[2l[3I. One of the best known properties of median filtering is that of noise reduction with edge intensity being preserved[3]. For a 3 X 3 window the median is the 5th largest pixel value within the window. The algorithm to determine whether a given pixel is a median value within a window is to count the number of other pixels within the window having values lem than, and the number equal to the specified pixel. If there are four pixels within the window having values less than a particular pixel, that pixel is the median of the window. If there are more than four pixels within the window having values less than the pixel, the pixel cannot be the median. When there are less than four pixels having values less than the particular pixel, that pixel can still be the median depending on how many pixels within the window have graY value equal to the pixel.. The conditions for a pixel being the median within the window can be summarized as follows: Conditions no. less minimum no. equal median 0 4 yes 1- 3 yes 2 2 yes 3 1 yes 4 0 yes 5 - no 6 - no 7 -_no 8 - no

RSD-TR-1I-84 Table Conditions for a pixel being the median within the winudow A program for 3 X3 median filtering using the IP6400 was written in the C programming language running UNIX on the host processor. The algorithm requires the four memory planes of the 1P6400. The original image to be processed is assumed to be in memory 0, and the final result is placed in memory 2. Memory 1 is used for comparison, and hence contains the same image. Memory 3 is used to accumulate the conditions for each pixel such that it is the possible median of its 3X3 window. The algorithm consists of the following steps: (1) copy original image in memory 0 to memory 1. (2) clear memory 2 and 3. (3) do step 4 to step 8 for each window position (9 times). (4) find the number of pixels within every 3X3 window that have values less than that specified by the window position. This takes eight comparisons. Accumulate this information in memory A3 (5) mask out the impossible choices that cannot be medians. (6) find the number of pixels within every 3X3 window that have values equal to that specified by the window position. This takes another eight comparisons. Accumulate this information with that of step 4 in memory 3. (7) Using the information in memory 3, possible pixels specified by that window position that are medians are determined. The result is accumulated in memory 2. (8) clear memory 3 so that it is ready to store the conditions for pixels specified by next window position as possible medians. The DVP must have the capability of comparison to carry out the above steps. Comparisons can be done through the two ALU's by performing one's complement subtraction in the test ALU whose carry output bit in turn determines the swapping of the two operation codes of the operational ALU. Comparisons are done m-emoryvplane by memory. plane. Consider'

RSD-TR-1-84 w(O,O) w(1,0) w(2,0 w(0,1) wl,1) w(2,1) w(0,2) w(1,2) w(2,2) where w(i,j) indicates window position Comparing pixels specified by w(0,0) with the rest of pixels specified by other window positions is done by comparing properly scrolled images corresponding to their relative positions to w(0,0) with pixels specified by w(0,0). For example, pixels specified by w(0,0) are compared to those of w(1,0) by comparing original image with the original image scrolled one pixel left, to those of w(l,1) with the image scrolled one pixel left and up, and so on. Since the upper left corner of the unscrolled image is positioned by scrx-O and scry=lffhex), the corresponding upper left corner of the image scrolled one pixel left is then positioned by scrx=lfhex) and scry=lff(hex) (see Figure 3). The scrolling positions for comparing window position w(k, 1) with respect to w(i, j) are related as follows: -cr:- (scro+(i-k)) ~ mask scry =(sRcryo+(l-Aj) 8mas9k where ser: and 8Cry are values for scroll registers: and y (see Appendix A) scrzo =0 scryo = fA hex) Mas9k = fA hex) is the logical operator AND For the sake of programming, scrolling positions with respect to the center of the window (assume the center positions: scrx=0,scry=lff) are first evaluated. Then scrolling positions for comparisons with respect to each specified position can be evaluated in a loop without concern

RSD-TR-1-84 fied by a window position can be found through the operation of the two ALU's as follows: (1) perform step 2 to step 3 for each window position within the window with respect to the specified window position (there are eight window positions with respect to the specified position, i.e. eight comparisons). (2) perform one's complement subtraction in the test ALU (C-D-1) with the inputs of the original image represented by D, and the scrolled image (different scrolling for each compared position) represented by C (see Figure 2). The test ALU would produce a carry bit if and only if C is greater than D, i.e. the compared pixel has value less than that specifed by the window position within its 3X3 neighborhood. (3) set memory plane 3, which stores conditions of pixels that are possible medians, as the input to the opetional ALU. Set up the operation codes such that the input is incremented if there is a carry bit from the test ALU, otherwise the input is passed unchanged. Accumulate the output in memory 3. After eight comparisons for each window position, memory plane 3 holds information on the number of compared pixels being less than the center pixel of the window. As discussed above, numbers ranging from 5 to 8 are impossible choices of medians. To mask out those impossible choices, the ALU's are employed again as follows: (1) add 7b(hex) to memory 3. Memory 3 consists of values ranging from 7b(hex) to 83(hex) in which 80(hex) to 83(hex) become impossible choices. (2) mask out 80(hex) to 83(hex). Note that adding 80(hex) to a number greater than or equal to 80(hex) would produce a carry bit. Therefore, add memory 3 to a constant (80 hex) in the test ALU. Set memory 3 as the input to the operational ALU. Pass the input unchanged if there is no carry bit from the test ALU. Otherwise, set output to zero. Memory 3 consists of zeroes and values from 7b to 7f corresponding to pixels in the'image that are possible medians. Next, the number of pixels having values equal to that specified by the

RSD-TR-1-84 (1) perform step 2 to step 3 for each window position within the window with respect to the specified window position. (2) perform one's complement subtraction (CD-1) in the test ALU with the inputs of the scrolled image (compared position) as C, and the unscrolled image (specified window position) as D. Thus, there is a carry out if and only if C is greater than D. Set REQ and RCC of the tstop (test ALU register) to 1. Hence, the conditional operation code of the operational ALU is carried out if and only if there is no carry out and C=D. But, there is no carry out from the test ALU if C is less than or equal to D. Thus, the conditional operation code of the operational ALU is carried if and only if C=D, i.e. pixels compared are equal. (3) set memory 3 as the input to the operational ALU. Increment the input of the operational ALU if the condition for the conditional operation code of the operational ALU is meort, i.e. compared pixels are equal. Otherwise, pass the input unchanged. Place the output of the operational ALU in memory 3. Memory 3 then contains conditions for corresponding pixels of the specified window position that are medians. As mentioned above, a pixel is a median if it meets the conditions listed in Table 1. Through the above operations, impossible choices have been masked out, and thus a pixel specified by the window position is a median if the corresponding pixel in memory 3 has a value greater than or equal to 7f. Finally, medians can be accumulated in memory 2 for each specified window position as follows: (1) set the operation code of the test ALU with addition (C+D), and the inputs of memory 3 (scrolled accord'ing to window position, see explanation below) as C, and a constant, 81(hex) as D. Hence, a carry bit occurs only if a pixel value in memory 3 is greater than or equal to 7f, in other words, the corresponding pixel in the original image is the median. (2) set the inputs of the operational ALU with memory 2 (accumulated result) as A, and the original imagte (scrolled accordingt to window position, see eXplanation below) as B. Pass B

RSD-TR-1-84 pass A unchanged. Thus, pixels determined to be medians for each window position are accumulated in memory 2. If a pixel is determined to be the median, the center of the neighborhood should be replaced by this pixel. For a specified window position, say w(O,O), a pixel determined to be the median should be scrolled one pixel right and down replacing the center pixel. That is why memory 3 and the original image mentioned above must be scrolled according to the speciried window position as inputs to the ALU's. These scrolling positions are related to their window positions by the following: For w(i,j): stcrzlsl = (s9cr'zo+ 1-i) & mas?k ocry[11 = (scryo-1+j) &Y mask where scrz(iJ and socry*1 are values of scroll register z and y (to specify a scrolled position) sRcrzo = 0 scryo = lfA hex) mask = lfA hex) & isP the logical operator AND The mask (1ff) is used for the same reason mentioned earlier. The 3 X 3 median filtering takes altogether 182 DVP operations. It requires 1 DVP operao. tion to preblank memory 2 and 3 and 1 DVP operation to copy the original image into memory 1. For each window poisition (there are nine) it requires 8 DVP operations to rnd, out the ante- D?oeatosfr,idn number- ofpxl haing- vauequl A2- DVopraton

RSD-TR-1-84 to mask out impossible choices of medians, 1 DVP operation to accumulate medians, and 1 DVP operation to clear memory 3. There is a tremendous gain in speed when compared to median filtering done on a general purpose computer. It takes only 144 comparisons between image planes (up to 512x512) for the above operation. If done on a general purpose computer, 8 comparisons for each pixel of a 512x512 image (2097162 comparisons)are required. Since the operation does not involve the ITT's, there is no loss of accuracy such as the truncation inaccu racy mentioned in Section 3.1 and 3.2 3.4. n X m median filter The algorithm for n X m (n is number of rows, m is number of columns) median filtering follows the same approach as of that 3X 3 median filtering. The median of the n X m window is the ((mX n) +1)th largest value within the window. Only slight modification has to be done 2 on the evaluation of the scrolling positions. For window position w(ij):. ecrz!il = (,scrzo+j!m -) 81 Mask scry[Al = (8cry~o-. n +j) & Mask 2 where ecrz4i and ocr*1 are values of the scroll registers (to specify a scrolled position) ocrzo = 0 scryo = ifA hex)

RSD-TR-1-84 i the logical operator AND The mask is used for the same reason mentioned earlier. Note that n and m must be odd numbers. Modification also has to be made to the constant added in memory 3 (step 1 of masking out impossible choices of the previous section) to mask out impossible choices so that conditions for following opertions to determine the medians remain the same. Since the constant added for a 3 X 3 window is 7b(hex), 80(hex)-5(hex), the constant added for. nXm window is then 80(hex( (nX m) +1)(hex). In this way, a pixel of an image is the median of its nXm 2 neighborhood if the corresponding pixel in memory 3, which holds conditions for possible medians, is greater than or equal to 7hex), the same conditions for 3X3 median filtering. The whole operation takes (2nm(nm+1)+2) DVP operations. It requires 2 DVP operations for copying the original image to memory 1, preblanking memory 2 and 3, and, for each window position (there are nm), 2(nm..l)+2 DVP operations for finding conditions(number less, mask out choices, number equal) of possible medians, 1 DVP operation for accumulating the result, and 1 DVP operation clearing memory 3 to set ready to store conditions of the next window position. It takes 2nm(nm.-1) comparisons to perform the filtering using the 1P6400 and 512x512x(nm-1) comparisons when using a general purpose computer.

RSD-TR-1-84 4. DIscusslon The advantage of using the IP6400 for window operations les in the capability of processing up to one megabyte of image data in a single video frame time. Various image processing operations can be implemented through the IP6400 to achieve high speed. The sobel operator (edge detector), for example, can be implemented using two 3X3 window operations, and one frame addition, a total of 25 DVP operations as follows (see Appendix B to G): (1) place original image in memory 0 (2) execute dwin33 -1 -2-1 0 0 0 1 2 1 2 a c (detect vertical edge, result is placed in memory 1) (3) execute dmove 1 3 (move result into memory 3) (4) execute dwin3_3 -1 0 1-2 0 2 -1 0 1 2 a c (detect horizontal edge, result is placed in memory 1) (5) exec'ute dadd 1 3 2 (add both horizontal and vertical edges and place result in memory 2) The Laplacian operator for edge enhancement can be performed in 8 DVP operations. Tremendous gain in speed is the major signaificance in using the [P6400 for various image processing operations. 5. References [1I IP6400 Programmer's Manual, DeAnza Systems, Inc., September, 1981. [21 T. S. Huang, G. T. Yang, and G. Y. Yang, "A Fast Two-dimensional Median Filtering Algorithm," IEEE Trans. Acoustt., Speech, Signal Processing, vol. ASSP. 27, pp.13-18, Feb 197g.

RSD-TR-1-84 [3 T. S. Huang Two-Dimenional Digital Signal Processing II New York: Springer, 1981.

RSD-TR-1- 84 Appendx A - P8400 regiters

RSD-~TR~ -6 MEMORY PORT CONTROL REGISTER Mnemonic IMEMPC, Virtual Address 117424 octal 1514131211109 8 7 6 5 4 3 2 1 0 p p p P v V v NOT USED 3 2 1 0 3 2 1 0 3 2 1 0 Memory thru ITU to video Memory thru ITU to array processor Memory thru ITU to optional port O disables and 1 enables as described. Bit Label Function 0-3 VO-V3 Enable data from the memory channels to pass through the ITU to video VO Memory Channel 0 thru ITU to video Vi Memory Channel 1 thru ITU to video V2 Memory Channel 2 thru ITU to video V3 Memory Channel 3 thru ITU to video 4-7 PO-P3 Enable data from the memory channels to pass through the ITU to the Array Processor P0 Memory Channel 0 th.ru ITU to Array Processor P1 Memory Channel I thru ITU to Array Processor P2 Memory Channel1 2 thru ITU to Array Processor P3 Memory Channel 3 thru ITU to Array Processor 8-11 I0-I3.Enable optional port I0 Memory Channel 0 thru ITU to- Optional Port 11I Memory Channel 1 thru ITU to Optional Port 12 Memory Channel 2 thru ITU to Optional Port 13 Memory Channel 3 thru ITU to Optional Port O disables and, 1 enables as described.

RSD-TR-1-8 4 Image Scroll and Zoom Registers - 12 Bits MNEMONIC VIRTUAL ADDRESS CHANNEL ISCRXO, ISCRYO 117460, 117462 octal 0 ISCRX1, ISCRY1 117464, 117466 octal 1 ISCRX2, ISCRY2 117470, 117472 octal 2 ISCRX3, ISCRY3 117474, 117476 octal 3 ISCRX4, ISCRY4 117514, 117516 octal 4 overlay X Scroll and Zoom Register 15I 14 13 12 1 110 9 7 6 5 4 3 2 1 0 __~ ~~~............. NOT USED BL BE SE E8 E7 E6 E5 E4 E3 E2 E1 EQ,, _..........,m,,,' Starting Element (Horizontal) Element Sisn Bit Scroll with Black Enable Horizontally _Scroll with Black Enable Vertically BIT LABEL FUNCTION 0-8 EO-E8 Specify starting element in horizontal direction 9 SE Element sign bit 0 Data zeroed after horizontal wrap around if BE is 1 1 Data zeroed before horizontal wrap around if BE is 1 10 BE Scroll with Black Enable Horizontally - Element 0 No da.ta i-s zeroed 1 Enable data -to be zeroed wi th horizontA.ai wrap around 11 E~L Scroll,with Black Enable Vertically Line. 0 No data zeroed

Y Scroll and Zoom Register 12 11 10 9 15 1 13 2 1110 9 8'7 6 [5 4 3 2 1 NOT USED DD Zi ZO SL E8 E7 E6 E5 E4 E3 E2 i_ ii _'........ Starting Line (Vertical) Line Sign Bit Zoom. __ Disable writing to image memory by the DVP BIT LABEL FUNCT I ON 0)8 LO-L8 Specify starting line in vertical directi to 0 octal is top to bottom) 9 SL Line sign bit O Data zeroed after vertical wrap if BL is 1(top part of video disc black) I Data zeroed before vertical wrap if BL is I (bottom part of video d is black) 10-11 ZO-Zi Zoom control bits Zi ZO o Ol1tol1(No Zoom) o l2tol1Zoom I O4tol1Zoomn 1 1 8toi1Zoom 12 DD Disable DVP writing to image memory channel 0- Enable writing to memory channel by DVP 1%- Disable

RSD-TR-1- 4 DESTINATION REGISTERS Mnemonic Virtual Address Function IDESTX 117420 Destination X - element IDESTY 117422 Destination Y- line Destination X Register 151 12 ~1110 9 8 -7 6 5 4 3 1 2 1 0 X X x x x NOT USEED 1 0. 8 7 6 5 4 NOT USED Starting Element Position hift Code for outout from digitizer A/D converter Bit Label Function 4-8 S4-S8 Starting Element Position in image memory for data. from the DVP. Must be multiples of 16. 10311 Si so Direction and number of bits to shift the output of the digitizers A/D converter before Bits zero on input' input to the array processor. the array process.C 6 bit 8 bit Digiltizer Digitize= 0 0 Normal no shift. Accept output 2 MSB's Nonce from the A/D converter. o 1 Shif t right 1 bit - divide 3 MSB's MSB AID converter output by 2 1 0 Shift left 1 bit multiply MSB3 LSB LISB AID Converter output by 2 1 1 Shift lef t 2 bits - multiply 2 LSBs 2 LSB

RSD-TR-1-8 4 Destination Y Register 15 9 8 7 6 5 4 3[2 1 y y y y y y y y NOT USED.8 7 6 5 4 3 2 1 Starting Line Position Bit Label Function 0-8 YO-Y8 Starting Line Position in image memory for d from DVP.

RSD-TR- l-8 4 DVP Bit Plane Mask Registers Mnemonic Virtual Address Function IBPMA 117440 Bit plane mask A IBP.MB 117442 Blt plane mask B 15 14. 13 12 11 10 9 8 7 6 5 4 3 2 0 M lS M MR M *M M bSMi M' -1 M M. M. H 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 t plane mask bits Bit o lane mask b:ts 0 - Do not write to bit plane (Disable) 1 - Write to bit plane (Enable) Bit Plane Mask Bits Label Register Memory channel 0-7 MO-M7 A 0 8-15 M8-M15 A 1 0-7 MO-M7 B 2 8-15 M8-M15 B 3

RSD-TR- 1- 84 Constants Register MNEMONIC - ICONST VIRTUAL ADDRESS 117444 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 O K K K K K K K K C C C C C C C c 7 6 5 4 3 2 1 O 7 6 5 4 3 2 1 O K1 Constant - Operation ALU R2 Constant - Test ALU Bit Label Function 0-7 CO-C7 K2 Constant - alternate input B of Test ALU 8-15 KO-K7 Kl Constant - alternate input D of Operational All values are positive and considered as intensities.*

RSD-TR-1-84 DVP Input Data Path MNEMONIC - INPDP VIRTUAL ADDRESS 117446 OCTAL 1514 13 1109 7 6 5 4 3 2 1 0O B B Z C C C T K D D ~~~~~0 XO O 1 0 I 1 1 0 0 1 1 0 S 2 1 0 A input B input C input D input memory memory memory memory channel channel channel channe 1 select select select select Auxiliary "AMS Camera A/D LS input Cnstant Constant Test ALU output Select D input Force LS to 0 D input control ~A tInD~ut Control B InDut Control C Input Control 0-Test ALU to shift 1-B -input to shif ter Operational ALUJ Test ALU INPUT Name ALU OUTPUT Name A Operational ALU MS Byte of SHIFTER B Operational ALU MS Byte of SHIFTER C Test ALU LS Byte of SHIFTER D Test ALU LS Byte of SHIFTER

RSD-TR-1-8 4 Output Data Paths and Shift Control Register MNEMONIC - IODPSH VIRTUAL ADDRESS 117454 OCTAL 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 o 0. O 0 O 0 V V R R D S S S NOT USED 3 2 1 0 U u U L 2 1 0 # of Bits and direction of shift Byte/Word mode Rotate (end around) or shift~(zero fill) Overflow (Left)-all bits set to 1 Selects memory channel: O-Lower byte of Shifter Output data path 1-Upper byte of Shifter Shift control bits Upper Byte - MS byte, output of Operational ALU Lower Byte - LS byte, output of Tet ALU (or BI selector) Bits Label Function 0-2 80-S2 Shift Code - specifies the direction and number of shifts S2 51 so Action

RSD-TR- 1- 84 Test ALU Operational Code Register MNEMONIC - ITSTOP VIRTUAL ADDRESS 117450 OCTAL 15 14 13 12.11 10 9 8 7 6 5 4 3 2 1 0 W, WI R R C C C C P P E E. C E C S S S S 3 2 O1 0 f r 1 o C __u 3 2 1 0 Condi tion Select one of code for 16 operations counter to be incremented Complement of Carry O-Carry, l-No Carry Force SO-S3 to 0 Select Arithmetic or if LSB of C input Logical Operation is 0 Determine one of 48 If CC is set (=1) operations to perform Force Bit 0 of data in shifter to 0 Inputs A = B condition code Write- eable region: Complement carry condition code anywhere, cursor defined, or overlay defined 2 bit condition code field for OP ALU code. Bit Label Function 0-3 SO-S3 Test ALU operational codes. See Table 1.2.11.F. 4 CC Complement of Carry

RSD-TR- 1- 84 Test Operational Code Table ARITHMETIC FUNCTIONS.Operational Code OutDut Based on Cm1 ement Cr INPUTS M CC S3 S2 S1 SO CC - 0 CC - C, D 0 X 0 0 O 0 C+ 1 C C, D O X O 0 0 1 (C or D) + (C or 1 C, D 0 X 0 0 1 0 (C or D). + 1 (C or D) C, D 0 X O 0 1 1 0 377 octal C, D 0 X 0 1 0 O C + (C and D) + 1 C + (C and D) C, D 0 X 0 1 O 1 (C or D)+(C and D) + 1 (C or D)+(C ane C, D 0 X 0 1 1 0 C - D C - D - 1 C, D O x O 1 1 1 C and D C and D -1 C, D 0 X 1 0 0 0 C + (C and D) + 1 C + (C and D) C, D 0 X 1 0 1 C + D + I C + D C, D 0 X 1 0 1 0 (C or D)+(C and D)+ 1 (C or D)+(C anc C, D 0 X 1 0 1 1 (C and D) (C and D) -1 C, D O X 1 1 O 0 C + C + 1 C + C *C, D 0 X 1 0 1 (C or D) + C + 1 (C or D) + c C, D O X 1 1 1 0 (C or D) + C + 1 (C or D) + C C,D 0 X 1 1 1 1 C C- 1 LOGICAL FUNCTIONS Operational Code Output Based on Complement Carry INPUTS M CC 83 82 S1 S0 (CC S DON'T CARE) C,D 1 X 0 0 0 C C,D 1 X O O O 1 C or D C, D 1 X O O 1 0 C and D C, D 1 X 0 0 1 1 0 C, D 1 O 0 1 0 (C and D C, D 1 X 0 1 0 1 D C, D 1 x 0 1 1 0 C exclusive or D C, D 1 X 0 1 1 1 C and D C, D I X 1 O 0 0 C or O C, D 1 X 1 0 0 1 C exclusive or D C, D 1 X 1 0 1. 0 D C, D 1 1 0 1 1 C and D C, D 1 X 1 1 0 0 377 octal, all bits set C, D 1 X 1 1 0 1 C or D C, D 1 X 1 1 0 C or D C, D 1 X 1 1 1 C Test Operational Code Table

RSD-TR-1-84 Operational Code Register MNEMONIC - IOPOP VIRTUAL ADDRESS 117452 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 r S. w M C | S S. S s W. C 3 2 1,0 w M,, 3 2 1 o OP ALU Codes Default OP ALU \Used if Test ALU' codes codes match Complement O-Force cari y Complement O-Force ca of carry 1-No carry of carry i-No carry O-Arithmetic O-Arithmetic Mode: 1-Logical Mode: 1-Logical Enable writing to memories Enable writing to memories Default OP ALU codes Conditional OP ALU codes Bits Label Function 0-3,8-11 SO-S3 Operational ALU operation codes 4,12 CC Complement of Force Carry 0 - Force Carry 1 - No Carry 5,13 w Enable writing to memory channels -36-__ _____

RSD-TR-1-8 4 Operational Code Table ARITHMETIC FUNCTIONS Operational Code Output Based on Complement Carry INPUTS M CC S3 S2 S1 SO CC O0 CC = 1 A, B 0 X O O 0 0 A + 1 A A, B 0 X 0 0 0 1 (A or B) + 1 (A or B) A, B 0 X 0 0 1 0 (A or B) + 1 (A or B) A, B O X 0 0 1 1 0 377 octal A, B 0 X 0 1 0 0 A + *(A and B) + 1 A + (A and B) A, B 0 X 0 1 0 1 (A or B)+(A and B) + 1 (A or B)+(A and B A, B O X 0 1 1 O A - B A - B - 1 A, B 0 X 0 1 1 1 A and B A and B -1 A, B 0 X 1 0 O 0 A + (A and B) + 1 A + (A and B) A, B O X 1 0 0 1 A + B + 1 A + B A, B 0 X 1 0 1 0 (A or B)+(A and B)+ I (A or B)+(A and B A, B O X 1 0 1 1 (A and' B) (A and B) -1 A, B O X I 1 O 0 A + A + 1 A + A A, O X 1 + A + 1 (A or (A_or) + A A, B O X 1 1 1 O (A or B) + A + 1 (A or B) + A A, B 1O X 1 1 i 1 A A - 1 LOGICAL FUNCTIONS Operational Code Output Based on Complement Carry (CC = DON'T CARE) INPUTS M CC S3 S2 S1 SO A, B I X O O O- O0 A A, B 1 X O O O 1 A or 3 A, B 1 X O O 1 O A and B A, B 1 X 0 0 1 1 0O A, B 1 X 0 l O O A ana B A, 8 1 1 1 B A, B 1 X 0 1 1 O A exclusive or B A, B I X O 1 1 1 A and B A, B 1 X I 0 O 0 A or, B A, B 1 X 1 O 0 1 A exclusive or B A, B l X 1 0 O B A, B I X 1 O 1 1 A and B A, B 1 X 1 10 0 377 octal, all bits set A, B 1 X 1 1 O 1 A or B A, B 1 X 11 O A or B A, B 1 X. 1 1 1 1 A........., iil, Il I J, n, 11ii-37-n'

RSD-TR-1-84 Appendx B 3 x 3 linear window operation

RSD-TR-1-84 Appendix ( B UNIX Programmer's Manual Appendix ( ~~~~~NAME~~i on dwin3_3 -'3 X 3 window operation SYNOPSIS (1) dwin33 Iwl w [w31 [w41 [w5j [w61 [w71 [w8J [w9 [Is] [a] Ic] (2) dwin3..3 [fJ datafile DESCRIPTION Three memory planes are used for this operation. The image to be processed should be in memory plane 0. The result is located in memory plane 1. Memory plane 2 is used for the overflow indicators. The parameters include the nine window coefficients, a scaling factor, underflow complement option, and memory 1 & 2 preblank option described as follows: wl to w window coefficients arranged in the order of wl to w3 as first row of the win dow, w4 to w6 as second row of the window, w7 to w9 as third row of the win dow. All coefficients must be numerical values. s scaling f -actor. It is a numerical value, usually selected to prevent overflow of the lookup table (ITT) of DeAnza. As a rule of thumb, when absolute values of some coefficients are larger than one, the absolute value of the largest coef ficient should be the scaling factor. This scaling factor scales down the coefficients for arithmetic manipulation in DVP of DeAnza, and then scales up the result. Ience, the scaling factor does not affect the coefficients on the whole window operation. Yet, it is necessary to prevent overflow of the ITT of DeAnza for arithmetic manipulation. a ThiS option -complements, instead of zeroes underflow. Default would zero underflow. c This option preblanks memory 1 and memory 2. Default would not preblank the memory planes. Parameters can be input from a datafile- by -typing f before the datafile. AUTIIOR N. Ansari (7/7/83) FILES / dev/deanza /usr/include/sys/iz.b SEE ALSO IP 6400 Programmer's Mlanual, DeAnza Syste

RSD-TR- 1- 84 Appendix C - nX near window operation

RSD-TR-1-8 4 Appendix ( C UNIX Programmer's Manual Appendix ( NAME dvwin - n'X n- window operation SYNOPSIS (1) dvwin [n [ml [w... mll [sl [a] [C] (2) dvwin If datafile DESCRIPTION Three memory planes are used for this operation. The image to be processed should be in memory plane 0. The result is located in memory plane 1. Memory plane 2 is used for the overflow indicators. The parameters include the size of the window, the window coefficients, a scaling factor, under. flow complement option, and memory 1 & 2 preblank option described as follows: n width, number of rows of the window. m length, number of columns of the window. wi to w[nm window coefficients arranged in the order of row by row basis. All coefficients must be numerical values. ~s scaling factor. It is a numerical value, usually selected to prevent overflow of the lookup table (ITT) of DeAnza. As a rule of thumb, when absolute values of some coefficients are larger than one, the absolute value of the largest coefficient should be the scaling factor. This scaling factor scales down the coefficients for arithmetic manipulation in DVP of DeAnza, and then scales up the result. Hence, the scaling factor does not affect the coefficients on the whole window operation. Yet, it is necessary to prevent overflow of the -ITT of DeAnza for arithmotic manipulati'on. a This option complements, instead of zeroes underfiow. Default would zero underflow. c This option preblanks memory 1 and memory 2. Default would not preblank the memory planes. Parameters can be input from a datafile by typing f before the datafile. AUTIIOR N. Ansari (7/14/83) FILES / dev/deanza /usr/include/sys/iz.h SEE ALSO IP 6400 Programmer's Manual, DeAnza Systems, Inc.

RSD-TR-1-84 Appendix D - n X m median filtering

RSD-TR-1-8 4 Appendix ( D UNIX Programmer's Manual Appendix ( NAME dvmedian~ -'Median filtering SY'NOPSIS dvmedian [nJ [mj DESCRIPTION dvmedian performs two-dimensional median filtering using the DeAnza IP6400. The default window size is 3X3. The maximum window size is 25X25. Four memory planes are used for this operation. The image to be processed should be in memory plane 0. The result is located in memory plane 2. Memory planes 1 and 3 are used for intermediate processes. The only parameters are to specify window size. ~n width, number of rows of the window. ~m length, number of columns of the window. AUTIIOR N. Ansari (7/23/83) FILES / dev /deanza /usr/include/sys/iz.h SEE ALSO [P 6400 Progra.mmer's Manual, DeAnza Systems, Inc.

RSD-TR-1- 8 4 Appendix E denhance

RSD-TR-1-8 4 Appendix ( E UNIX Programmer's Manual Appendix ( E NAME denhance- deblur a picture SYINOPSIS denhance DESCRIPTION denhance is a deanza version unsharp masking. It uses a Laplacian operator to deblur a picture. It is a 3x3 window operation with a weight of 5 at the center, -1 at the nearest neighbors, and a 0 at othr neighbors. The picture to be processed should be placed in memory plane 0, and the result is placed in memory plane 1. DIAGNOSTICS None AUTHOR N. Ansari (7/ 7/83) FILES /dev/deanza /usr/include/sys/iz.h SEE ALSO IP 6400 Programmer's Manual, DeAnza Systems, Inc.

RS!D- TX-1 - 8 4

RSD-TR-1-84 Appendix ( F UNIX Programmer's Manual Appendix ( NAMIE dadd - adds two images from Deanza. SYNOPSIS dadd chanl chan2 chan3 DESCRIPTION dadd will add the images stored in chanl and chan2 together.and stores the result in chan3. chanl, chan2, chan3 can be any of the four channels: 0,1,2 or 3. DIAGNOSTICS Bad flags are ignored. AUTHOR Nancy C am (3/08/83) FILES / dev/deanz-a BUGS No known bugs.

RSD-TR-1-8 4 ApPendix G - dmove

RSD-TR-1-8 4 Appendix (G) UNIX Programmer's Manual Appendix ( G NAME d move SYNOPSIS dmove (chanXj [chanYj DESCRIPTION This routine allows the user to move an image from chanX to chanY, where chanX and chanY can be from 0 to 3. DIAGNOSTICS Synopsis will be given upon detecting illegal parameters. AUTHOR N. Ansari (7/7/83) FILES /dev/deanza /usr/include/sys/iz.h SEE ALSO IP' 6-100 Programmer's Manual, DeAnza Systems, Inc.

UNIVERSITY OF MICHIGAN 3 9015 02519 6257