|
|
Training Phase
We wish to make the
reader aware, before continuing, of the scope of our supervised system.
In an attempt to limit the extent of the project so as to yield results
in a timely manner, and to illustrate the potential success of a sign
language recognition system, the decision was made to train only five
(5) letters. We chose the vowels of the American English alphabet: A, E,
I, O and U. Success of this limited system would provide the base work
and confidence to extend the system to other letters. With this in mind
we continue.
Code was developed to handle input of a dynamic number of training
images even though our system specifically handled only 140 images. A
two-dimensional vector data structure was used to hold these values.
Once pixel data was saved into memory the average pixel values from the
ensemble were evaluated. This average image was saved back to the system
in a file called avg_varphi.PNM for visual reference. The following
graphic shows this average image. Note the blurring effect
representative of the composite of all signed letters comprising the
ensemble.

Average image taken by summing all
pixel values in our ensemble of 140 images and dividing by 140. This
image data is later subtracted from the elements in the
ensemble to yield our final ensemble.
We recall that for PCA
vision techniques to work optimally an ensemble is required to consist
of the difference of values from the original images and the newly
retrieved average values. Kriby and Sirovich remind us that it is in our
interest for efficiency that we compute this new ensemble. Such an
ensemble, consisting of caricatures, also aids considerably to reducing
time when yielding the average covariance matrix. The resulting “true”
ensemble is thus made up of the difference images. These difference
images were output to file during execution for visual reference. All
140 of the images were output to file in the form A4_carac.PNM. The
following image illustrates a sampling of original images with their
corresponding caricatures. Note that a caricature can actually have
negative values, which are just as important as positive values. In
GIMP, however, negative pixel values are treated as background (black)
so we do not “see” these values.

From left to
right signed letters A, E, I, O, U with originals on top and caricatures
on bottom. We see here the interesting parts to each letter showing as
the lighter parts of the hand providing a good basis for our
eigenvectors to return the most expressive features of the ensemble.
After the ensemble was
completed the average covariance matrix needed to be computed. In order
to extract eigenvectors and corresponding eigenvalues a symmetric square
matrix must be used, and more specifically our needs required an average
covariance matrix based from our ensemble. Attempts to calculate this
matrix by methodical brute force were found to be computationally
inefficient, and thus completely infeasible. An elegant solution was
proposed and referenced earlier which brought computation time down to
literally a few seconds. It took advantage of the mathematical property
that the transpose of our ensemble multiplied by the ensemble results in
a 3072 (rows) by 3072 (columns) matrix representing the average
covariance matrix. Such a property exists as a result of already having
taken the difference of our images in the ensemble. We eagerly exploited
this realization to our benefit!
The construction of the covariance matrix in our code was handled very
nicely by functions provided by the Newmat10 (NM10) library created by
Robert Davies. The NM10 library is a full-feature matrix manipulation
and calculation set for engineering and mathematics research. Actual
construction of the covariance matrix was handled by creating a NM10
matrix of type Matrix and filling it with our values of our ensemble.
Next we created another NM10 matrix of type SymmetricMatrix which would
hold our calculated values for the covariance matrix. The
SymmetricMatrix was filled with the result of the transpose of our
ensemble matrix times the ensemble matrix. With the covariance matrix
completed the next step was to perform eigen decomposition.
While NM10 provided functions to handle eigen decomposition the ultimate
decision was made to not rely on them. This was done for a few reasons.
The first reason was that we were unsure of how the results would be
generated by NM10 and in what form they would be returned to us for
further use. We judged it to be beneficial to be in greater control of
our results and our outputs from functions. In addition, while we found
this software to be flawless in handling basic matrix transpositions, it
was also provided with no guarantee and an accompanying disclaimer that
memory leaks could result from its use. Since our system was already
heavily dependent upon memory usage we thought it to be unwise to use a
library that could possibly act as a catalyst for hurting our execution.
The final reason we opted not to use NM10 for our eigen decomposition
was because we were aware of another library which provided similar
calculations and carried none of the potential negatives the NM10
library posed.
The software package we opted to utilize instead was Numerical Recipes
in C (NRIC) which is a commercial compilation of numerous mathematical
functions—some of which handle eigen decomposition. The decomposition
process for our needs was able to be separated into three unique NRIC
functions which delivered good results. NRIC was used because of its
reputation for reliability and for the control it provided during
coding. The use of NRIC functions, however, required a conversion to be
made from the NM10 average covariance matrix to a NRIC supported matrix.
In an attempt to appease users of both NM10 and NRIC, the NM10 library
included matrix types that were thought to cooperate, at least
initially, with basic NRIC functions. Disappointingly it was found that
the NM10 supported types for NRIC were for an older version of NRIC
functions so this was not an option. Instead we chose to handle the
conversion manually by taking our NM10 covariance matrix and converting
it to an NRIC supported matrix type. This was then passed to three
successive NRIC functions: tred2, tqli, and eigsrt. These functions
quite effectively handled the eigen decomposition process. It should be
noted that we chose not to use the NRIC Jacobi function which would have
performed eigen decomposition in one step. Jacobi is incredibly
inefficient and time consuming.
Instead of Jacobi we approached our eigen decomposition problem in a
more computationally efficient manner. We first performed a Householder
reduction of the symmetric matrix (our average covariance) to reduce it
to tridiagonal form which is the first step to easily extract
eigenvectors. Then a tridiagonal shifting algorithm computed the real
eigenvectors and eigenvalues of the matrix. A basic insertion sort
algorithm was performed to sort the eigenvectors and eigenvalues.
The included NRIC function tred2 performed the Householder reduction
while tqli performed the QL with implicit shifts. The computational
benefit of using the QL implicit shift algorithm arises from the matrix
property that any real matrix can be decomposed upon by using a reduced
lower triangular version of the original covariance matrix. The tqli
function returned a matrix that was 3072 by 3072 consisting of our
eigenvectors and a column vector with 3072 elements corresponding to the
eigenvalues. tqli is algorithmically complex when extracting
eigenvectors in addition to corresponding eigenvalues and is O(n3). The
NRIC function eigsrt cleaned up our data. Altogether the eigen
decomposition ran for 12 to 13 hours on a Pentium III 800 MHz Microsoft
Windows XP Professional laptop.
After eigen decomposition the sorted eigenvectors were output to
separate files in a sub directory called eigenvectors. Within this
folder all of the eigenvectors were saved in the format evec_4.TXT. The
index reference corresponds to the importance of a given eigenvector.
For example evec_0.TXT represented the eigenvector associated with the
greatest eigenvalue, and evec_3.TXT represented the eigenvector
associated with the fourth greatest eigenvalue. The eigenvalues were
stored in only one file in the main directory as eigenvalues.TXT and the
lines alternated between index number followed by the rank of the
eigenvalue, and the actual eigenvalue.
The eigen decomposition of our average covariance matrix delivered
useful results. Recall that eigenvalues are a mathematical way to
identify which eigenvectors represent the most expressive features of
our system. The following table illustrates the actual data generated
from the execution for the 140 image ensemble.

Eigenvalue
results for the test ensemble of 140 images. The calculated difference
between rank 3 and rank 4 illustrates clearly the decision to use the
top three corresponding eigenvectors for future investigation and
calculations.
For any PCA
system it’s important to locate the “step” at which a great difference
occurs in the eigenvalue data. This information provides us with
knowledge of how many eigenvectors are to be considered our primary
eigenvectors, or our principal components. Clearly the table shows that
the difference between rank 3 and 4 is quite large (by a factor of 10)
and represents the “step” we are trying to identify. Interestingly a
much larger difference existed between the 48th and 49th ranked
eigenvalues which might encourage further investigation into whether
using the first 48 eigenvectors would be a better option.
The ultimate decision was to continue with project development utilizing
the first three eigenvectors and decide later if this was a satisfactory
choice. Human interpretation was used to make this decision and no
program was employed to automatically calculate which “step” in the
eigenvalue data would be best. Adhering to the hope that the first three
eigenvectors were satisfactory we moved into the second phase of the
project with the hopes we could prove the first three eigenvectors were
in fact sufficient. At the same time we would calculate the final pieces
of information needed in order to classify and recognize test images.
|
|