Home

Process Overview
Introduction
What is PCA?
Previous Research

Implementation
Training Phase

Coefficients Phase
Recognition Phase

Other
Downloads

Research Paper
Credits
Contact Me
 
 


 


 

Training Phase

We wish to make the reader aware, before continuing, of the scope of our supervised system. In an attempt to limit the extent of the project so as to yield results in a timely manner, and to illustrate the potential success of a sign language recognition system, the decision was made to train only five (5) letters. We chose the vowels of the American English alphabet: A, E, I, O and U. Success of this limited system would provide the base work and confidence to extend the system to other letters. With this in mind we continue.

Code was developed to handle input of a dynamic number of training images even though our system specifically handled only 140 images. A two-dimensional vector data structure was used to hold these values. Once pixel data was saved into memory the average pixel values from the ensemble were evaluated. This average image was saved back to the system in a file called avg_varphi.PNM for visual reference. The following graphic shows this average image. Note the blurring effect representative of the composite of all signed letters comprising the ensemble.


Average image taken by summing all pixel values in our ensemble of 140 images and dividing by 140. This image data is later subtracted from the elements in the
ensemble to yield our final ensemble.

We recall that for PCA vision techniques to work optimally an ensemble is required to consist of the difference of values from the original images and the newly retrieved average values. Kriby and Sirovich remind us that it is in our interest for efficiency that we compute this new ensemble. Such an ensemble, consisting of caricatures, also aids considerably to reducing time when yielding the average covariance matrix. The resulting “true” ensemble is thus made up of the difference images. These difference images were output to file during execution for visual reference. All 140 of the images were output to file in the form A4_carac.PNM. The following image illustrates a sampling of original images with their corresponding caricatures. Note that a caricature can actually have negative values, which are just as important as positive values. In GIMP, however, negative pixel values are treated as background (black) so we do not “see” these values.


From left to right signed letters A, E, I, O, U with originals on top and caricatures on bottom. We see here the interesting parts to each letter showing as the lighter parts of the hand providing a good basis for our eigenvectors to return the most expressive features of the ensemble.

After the ensemble was completed the average covariance matrix needed to be computed. In order to extract eigenvectors and corresponding eigenvalues a symmetric square matrix must be used, and more specifically our needs required an average covariance matrix based from our ensemble. Attempts to calculate this matrix by methodical brute force were found to be computationally inefficient, and thus completely infeasible. An elegant solution was proposed and referenced earlier which brought computation time down to literally a few seconds. It took advantage of the mathematical property that the transpose of our ensemble multiplied by the ensemble results in a 3072 (rows) by 3072 (columns) matrix representing the average covariance matrix. Such a property exists as a result of already having taken the difference of our images in the ensemble. We eagerly exploited this realization to our benefit!

The construction of the covariance matrix in our code was handled very nicely by functions provided by the Newmat10 (NM10) library created by Robert Davies. The NM10 library is a full-feature matrix manipulation and calculation set for engineering and mathematics research. Actual construction of the covariance matrix was handled by creating a NM10 matrix of type Matrix and filling it with our values of our ensemble. Next we created another NM10 matrix of type SymmetricMatrix which would hold our calculated values for the covariance matrix. The SymmetricMatrix was filled with the result of the transpose of our ensemble matrix times the ensemble matrix. With the covariance matrix completed the next step was to perform eigen decomposition.

While NM10 provided functions to handle eigen decomposition the ultimate decision was made to not rely on them. This was done for a few reasons. The first reason was that we were unsure of how the results would be generated by NM10 and in what form they would be returned to us for further use. We judged it to be beneficial to be in greater control of our results and our outputs from functions. In addition, while we found this software to be flawless in handling basic matrix transpositions, it was also provided with no guarantee and an accompanying disclaimer that memory leaks could result from its use. Since our system was already heavily dependent upon memory usage we thought it to be unwise to use a library that could possibly act as a catalyst for hurting our execution. The final reason we opted not to use NM10 for our eigen decomposition was because we were aware of another library which provided similar calculations and carried none of the potential negatives the NM10 library posed.

The software package we opted to utilize instead was Numerical Recipes in C (NRIC) which is a commercial compilation of numerous mathematical functions—some of which handle eigen decomposition. The decomposition process for our needs was able to be separated into three unique NRIC functions which delivered good results. NRIC was used because of its reputation for reliability and for the control it provided during coding. The use of NRIC functions, however, required a conversion to be made from the NM10 average covariance matrix to a NRIC supported matrix.

In an attempt to appease users of both NM10 and NRIC, the NM10 library included matrix types that were thought to cooperate, at least initially, with basic NRIC functions. Disappointingly it was found that the NM10 supported types for NRIC were for an older version of NRIC functions so this was not an option. Instead we chose to handle the conversion manually by taking our NM10 covariance matrix and converting it to an NRIC supported matrix type. This was then passed to three successive NRIC functions: tred2, tqli, and eigsrt. These functions quite effectively handled the eigen decomposition process. It should be noted that we chose not to use the NRIC Jacobi function which would have performed eigen decomposition in one step. Jacobi is incredibly inefficient and time consuming.

Instead of Jacobi we approached our eigen decomposition problem in a more computationally efficient manner. We first performed a Householder reduction of the symmetric matrix (our average covariance) to reduce it to tridiagonal form which is the first step to easily extract eigenvectors. Then a tridiagonal shifting algorithm computed the real eigenvectors and eigenvalues of the matrix. A basic insertion sort algorithm was performed to sort the eigenvectors and eigenvalues.

The included NRIC function tred2 performed the Householder reduction while tqli performed the QL with implicit shifts. The computational benefit of using the QL implicit shift algorithm arises from the matrix property that any real matrix can be decomposed upon by using a reduced lower triangular version of the original covariance matrix. The tqli function returned a matrix that was 3072 by 3072 consisting of our eigenvectors and a column vector with 3072 elements corresponding to the eigenvalues. tqli is algorithmically complex when extracting eigenvectors in addition to corresponding eigenvalues and is O(n3). The NRIC function eigsrt cleaned up our data. Altogether the eigen decomposition ran for 12 to 13 hours on a Pentium III 800 MHz Microsoft Windows XP Professional laptop.

After eigen decomposition the sorted eigenvectors were output to separate files in a sub directory called eigenvectors. Within this folder all of the eigenvectors were saved in the format evec_4.TXT. The index reference corresponds to the importance of a given eigenvector. For example evec_0.TXT represented the eigenvector associated with the greatest eigenvalue, and evec_3.TXT represented the eigenvector associated with the fourth greatest eigenvalue. The eigenvalues were stored in only one file in the main directory as eigenvalues.TXT and the lines alternated between index number followed by the rank of the eigenvalue, and the actual eigenvalue.

The eigen decomposition of our average covariance matrix delivered useful results. Recall that eigenvalues are a mathematical way to identify which eigenvectors represent the most expressive features of our system. The following table illustrates the actual data generated from the execution for the 140 image ensemble.


Eigenvalue results for the test ensemble of 140 images. The calculated difference between rank 3 and rank 4 illustrates clearly the decision to use the top three corresponding eigenvectors for future investigation and calculations.

For any PCA system it’s important to locate the “step” at which a great difference occurs in the eigenvalue data. This information provides us with knowledge of how many eigenvectors are to be considered our primary eigenvectors, or our principal components. Clearly the table shows that the difference between rank 3 and 4 is quite large (by a factor of 10) and represents the “step” we are trying to identify. Interestingly a much larger difference existed between the 48th and 49th ranked eigenvalues which might encourage further investigation into whether using the first 48 eigenvectors would be a better option.

The ultimate decision was to continue with project development utilizing the first three eigenvectors and decide later if this was a satisfactory choice. Human interpretation was used to make this decision and no program was employed to automatically calculate which “step” in the eigenvalue data would be best. Adhering to the hope that the first three eigenvectors were satisfactory we moved into the second phase of the project with the hopes we could prove the first three eigenvectors were in fact sufficient. At the same time we would calculate the final pieces of information needed in order to classify and recognize test images.
 

 
 
 
Legal shiz and other things...

Omniasoft WebDev, Inc. © Copyright 2005.  All rights reserved graphics and some source code.  Some of the source code was written by people other than myself, so I don't know what type of claim to rights they have but no claim is made here.  Source code I have written may be downloaded, modified, and redistributed freely for any reason whatsoever, as long as my name "Art Geigel" is clearly accompanying the code and the URL to this website shows up.  My code is not provided under the GNU license so however you want to use the code you can feel free to (except for source that is clearly stated that is not mine).  If you do use the code, or if you use some of my research, I'd appreciate an email at
art@geigel.com but even that is up to you.  Have fun and enjoy!  And while you're reading this you should visit www.groundbooth.com!