Shape Recognition with Application to Medical Imaging

                                                                                             Salih Burak Gokturk


Abstract

The use of 3-D images of human body has recently attracted attention as an alternative screening technique for many diseases. In order to use these CT or MRI images for computer aided diagnosis applications, robust features that represent the volume need to be constructed and subsequently used by a classification method. In this paper, we present a computer aided diagnosis system for early diagnosis of colon cancer. The system extracts features by a new
3-D pattern processing method and processes them using a support vector machines classifier. Our 3-D pattern processing method mimics the radiologist's way of viewing these images and combines information using Vector Quantization from many random triples of mutually orthogonal planes going through the volume. Another contribution of this paper is the idea of the feedback framework between the classification algorithm and the features. This framework, called
Distinctive Component Analysis maps the features to a lower dimension space where the two classes of objects are optimally separated so as to obtain better features. Finally, we show that the combination of these better features with support vector machines classification provides a good recognition rate
with an acceptable ratio of false positive detections. 


1. Introduction

Colon cancer is the second leading cause of cancer deaths in the USA[1]. Previous research has shown that adenomatous polyps in the human colon have a high probability of developing into subsequent colorectal carcinoma [2]. Detection and removal of small polyps can totally eliminate the disease. Therefore, a cost-effective and patient-comfortable screening procedure is desirable in order to diagnose the disease in an earlier stage.

Computerized tomography (CT) is a promising, non-invasive procedure for screening[9], as long as, automatic diagnosis software assists the radiologist, thereby reducing the cost of analysis. Automated polyp detection is a very recently growing area of research. The problem of identifying colonic polyps is very challenging because they come in various sizes and shapes, and because thickened folds and retained stool may mimic their shape and density. Figure 1 demonstrates the appearance of polyps and other tissue as they appear in a virtual colonoscopy study. Because of the similarities between the shapes of polyps and normal tissue, a method that automatically extracts subtle differentiating characteristics of both polyps and healthy tissue is needed.

                                                Fig. 1: Virtual endoscopic reconstructions from CTC data showing (a-c) true polyps, (d) a normal thickened fold (e) retained stool.

Initial studies concerning automated polyp detection have started from the intuitive observation that the shape of a polyp is similar to a hemisphere[4,5,7]. Due to the large number of false positive detections, all of these methods should be considered polyp candidate detectors rather than polyp detectors. This report presents a statistical method that captures the more subtle distinctions that differentiate between polyps and normal tissue. The input to this stage is a set of small candidate volumes, which may or may not contain polypoid shapes. This set can be computed by one of the previous methods. Our novel volume processing technique generates shape-signatures for each candidate volume using Vector Quantization. The signatures are then fed to a Support Vector Machine (SVM) classifier for the final diagnosis of the volume. Support Vector Machines identify the elements, so called the support vectors, that carry the distinguishing characteristics between the two classes in the training set. In order to get further intuition, one way of explicitly obtaining these distinguishing features is through looking at 3D renderings of the support vectors. Here, we propose an automatic way of obtaining these distinguishing features. In addition, and in clear contrast with previous statistical classification work, we present a framework where we use feedback from the classification algorithm in order to obtain distinguishing features of the volume by a new concept called Distinctive Component Analysis (DCA). Using DCA, we show that improved features can be attained for classification purposes. More importantly, DCA might be used to understand the differentiating characteristics between similar polyps and healthy tissue.

The paper is organized as follows: Section 2 explains our method, including the new 3-D pattern processing technique, Vector Quantization, support vector
machine classifier, and distinctive component analysis in detail. Section 3 describes the experimental setup and discusses our results. Section 4 gives our conclusions and possible directions for future work.


1.1. Previous Work

In order to process 3-D medical images, 3-D pattern processing methods are necessary. Of the most popular pattern processing methods in medical imaging are moment invariants, Elliptic Fourier Descriptors, Medial Axis Transform, histograms of shape features, Curvature, splines and AR modeling [23]. The previous work in Computer Aided Detection of colonic polyps has recently been investigated by various researchers. In [4], Summers et al. computed the minimum, maximum, mean and Gaussian curvatures at all points on the colon wall. In [22], Yoshida et al. use curvedness to distinguish polyps from healthy tissue. In [5], Paik et al. introduced a method based on the concept that normals to the colon surface will intersect with neighboring normals depending on the local curvature features of the colon. In [7], Gokturk and Tomasi designed a method where a sphere is fit locally to the isodensity surface passing through every CT voxel in the wall region and densely populated nearby sphere centers are considered as polyp candidates.  

Vector Quantization(VQ) is a well studied area of research [11]. It had various applications in communications, data compression and speech recognition. In this report, we apply vector quantization to obtain the histograms of high dimensional data. While doing so we considered two types of VQ: VQ with euclidean distance and entropy constrained VQ[11]. 

Different from previous work in classification, we aim to use feedback from the classifier function for the design of better features. We use Distinctive Component Analysis for this purpose. Similar to linear discriminant analysis(LDA)[24], Distinctive Component Analysis(DCA) aims to find a mapping to a lower dimension where different classes are well seperated. While doing so, the emphasis is more on the distinctive components(mapping function itself) thus we call this analysis DCA. In addition, we address the locality properties of a linear transformation such as LDA, by applying DCA to the local data points that carry the "good" information, i.e. the support vectors from SVM analysis. 

Blind source seperation or unsupervised learning problem is a challenging one. Clustering algorithms such as k-means[11], and also Gaussian mixture analysis [25] and Independent Component Analysis[26] aim to solve this problem. These methods, however, are not designed for our purpose, i.e. finding the distinguishing components between the classes.  Although DCA was designed for this specific purpose, we also give an algorithm to apply DCA to the unsupervised clustering problem.


2.  Our Method

Figure 2 gives the flow diagram of our system. Here, we propose not only a good selection of features and a statistical classification method, but also present a framework where we use feedback from the classification method in order to obtain a better combination of these features. The system consists mainly
of three components: A novel 3-D pattern recognition approach for feature creation, a support vector machine classifier as the statistical classifier, and distinctive component analysis as the feedback mechanism between them.

                                       

                                                                                                             Fig. 2: Overview of the System

The flow diagram of the new 3-D pattern processing method is given in Figure 3. Here, we mimic one of the ways the radiologists view colon CT images. Instead of using 3-D renderings of the tissue, they prefer to view the images through three perpendicular planes aligned with the transaxial, sagittal,
and coronal anatomical directions[3] (Figure 4). Viewing through these three main axes gives substantial information about the 3-D shape, but is
incomplete by itself. In our approach, we first obtain a large number of random triples of mutually orthogonal planes through a suspicious subvolume. On each triple of planes, several geometric attributes are calculated to characterize the shape as seen through that particular triple of planes. Finally, using vector quantization, we generate a histogram of these geometric attributes obtained from each triple as a feature-vector to represent the shape. Taking 
histograms of these geometric attributes over several random triples makes the resulting signatures invariant to rotations and translations. In addition, the right choice of the geometric attributes gives robust signatures of the volume. More details on the volume signatures are given in Section 2.1.

                                   

                                                                                    Fig. 3: Overview of the 3-D pattern processing approach.

                                                              

Fig. 4: (a) Two different volume renderings of a polyp. (b-d) Three mutually orthogonal tomographic planes through the same volume.

Support-vector machines are then trained with volume signatures computed from an initial set, and are subsequently used to classify new volumes into polyps and normal tissues. As discussed in Section 2.2, support vectors are essentially the vectors that carry the most distinguishing components between polyps and
healthy tissue. We explicitly obtain these components by distinctive component analysis, as described in Section 2.3, and subsequently improve our features using these distinctive components.


2.1.  Pattern Processing Using Vector Quantization

First, each candidate volume is sliced with several triples of perpendicular planes. Then, shape and intensity features are computed in each plane. First of all, these features should be able to capture representative information about the candidate shape. For this purpose, we fit primitive shapes such as circle, quadratic curve, line, parallel line (Figure 5 (c,d,e,g)) to the largest connected edge component, i.e. boundary of the shape. We also calculate least square residuals to these primitive shapes. In addition, moment invariants[10], as well as intensity mean and variance of the planes are calculated and recorded. 

                                                           

                             Fig.5: Illustration of the primitive shapes: (a) a randomly oriented plane (b) the edges in the image (c-e) circle, quadric, and line fit to the edges
                                                shown in (b). (f) Another randomly oriented plane with nearly parallel lines (g) parallel lines fit to the edges of the plane in (f).

All the attributes mentioned above are calculated for each random triple of images. The three images in each triple are sorted in the order of increasing radius of curvature, and the features above are listed into a vector in this order. The vector containing attributes from the three planes represents the signature of the shape, relative to that particular triple of perpendicular planes.

The features computed from each triple of perpendicular planes depend on the position and orientation of that particular triple. However, if histograms of feature distributions are computed from sufficiently many triples with random positions and orientations, the histograms themselves are essentially invariant to position and orientation. 

Note, however, that creating histograms of N-vectors is problematic for large N. Uniform distribution of, say, m bins per dimension would result in mN dimensional histograms and is, therefore, not tractable. A more efficient solution, proposed here, represents a histogram by first computing the
representatives for the main clusters of features over a large collection of vectors. New feature vectors are then assigned to these clusters, rather than to fixed uniform bins. This method is called vector quantization and is described in more detail in [11]. Figure 6 gives an example in 2-dimensional space. Note that, storage efficiency increases exponentially as the number of dimensions increase beyond 2.

                                                                   

                                            Fig.6: Illustration of high dimensional histogram bins by an example in 2D. (a) data in 2D (b) Uniform histogram bins (c) Bins found when clustering is
                                                                                            applied. The storage gain (number of bins) is five fold in this particular example.

Let Xij be the n-vector obtained from the jth random triple of perpendicular planes extracted from the ith shape in a training set. Having obtained Xij's from all of the shapes, the k-means algorithm is used to compute vector clusters. The cluster centers are initialized to a random selection from Xij's. Subsequent iterations of the  k-means algorithm then alternately reassign vectors to clusters and recompute cluster centers, resulting eventually in the optimum cluster centers. These iterations can be written in the form of the following pseudocode:

1- Start with an initial selection of codewords. (We chose them randomly in our case) Let Vi be the codewords (cluster centers).

2- Assign each data point x, to the cluster i, where the distance measure d(x,Vi) is the smallest among all clusters.

3- Update the cluster centers Vi as the centroids of the cluster i.

4- Repeat 2 and 3 until the change in the cluster centers is less than a threshold.

While  iterating the k-means algorithm we used two different distance measures,d. One is the well known euclidean distance measure:

                                                           

The other is generalized Lloyd distance measure, where the entropy constraint is considered:

                                                       

where pi is given as the particular probability of a cluster i. In practice, this is estimated as portion of the training set that belongs to the cluster i. This distance function penalizes the clusters with high probability, thereby constraining the clusters to have equal number of elements.

We will report the results for both of these techniques. Once the representative histogram bin centers are determined, a histogram of feature vectors belonging to each particular shape is calculated. When forming these feature histograms, the simplest strategy would be to have each vector increment a counter for the
nearest cluster center. This method, however, is overly sensitive to the particular choice of clusters. We adopted a more robust solution, in which each feature vector partitions a unit vote into fractions that are inversely proportional to the vector's distances to all histogram bin centers. The histograms thus obtained, one per candidate volume, are the rotation and translation invariant shape signatures used for classification as described in the next section. 

As one might guess, performance of the system is closely related to how well the representative histogram bin centers are chosen. A clustering algorithm with nearest neighbor distortion measure (such as k-means clustering algorithm) would fail in some situations. Figure 7 gives such an example. In (a), let the square and diamond points belong to two different classes. Although it seems to be an easy case to distinguish, when the k-means algorithm is executed, the two cluster centers are obtained as in Figure 7(b), by circles. In other words, the two clusters are not correctly identified. We will address this problem by extracting the distinctive components in Section 2.3. Before that, we would like to explain how the shape signatures are used for classification in the next section.

                                                       

                                                                                                                    Fig.7: Illustration of a specific case where a clustering method with nearest neighbor distortion measure would fail.

Given the histogram vectors for the shapes, histogram matching might require some normalization. More specifically, relative entropies of each cluster might be considered. For this, the relative entropy of a cluster (histogram bin) in one of the shapes will be normalized among the relative entropies of the same cluster among all of the shapes, aiming to extract the relative importance of the particular bin normalized among all shapes. In the experiments section, we will give results regarding the use of regular histograms, and relative entropy normalized histograms.


2.2.  Support Vector Machine Classification 

Given a training set of representative vectors for polyps and healthy tissue, an optimum classifier is obtained, and subsequently used for the classification of new test data. A classifier learning algorithm takes this training set as input and produces a classifier as its output. In our problem, a training set is a collection of candidate volumes that a radiologist has individually labeled as polyps or non-polyps. The learned classifier is then a function that, for any new candidate volume, tells whether it is a polyp or not.

Proposed first by Vapnik, the SVM classifier aims to find the optimal differentiating hypersurface between the two classes [8] in the training set.Finding this hypersurface is a well-studied optimization problem [8,12,13] that maximizes the distance of the hypersurface to the margin points, so called the support vectors. Support vectors are essentially the elements that carry the differentiating characteristics between polyps and non-polyps. In the classification process of a test vector x, only the support vectors need to be considered, and the following simple expression needs to be calculated:

                                                                                

                                                                               

(1)

    where d(x) is the distance of the test vector x to the optimum hypersurface. Here the constants Bi,b are computed by the classifier-learning algorithm, yi are given labels, i.e., 1 for polyp, -1 for non-polyp, and the xi are the computed support vectors. K is called the Kernel function, and is used to map the data from its original dimension to higher dimensions so that the data is linearly seperable in the mapped dimension. See [8] for details. The sign  of expression 1 essentially gives which side of the hyperplane the test data falls. More generally, the distance given in expression 1 is used as an SVM score for the test case.

The most distinguishing property of SVM, is to be able to minimize the structural risk, given as the probability of misclassifying previously unseen data. Besides that, SVMs pack all the relevant information in the training set into a small number of support vectors and use only these vectors to classify the new data. This makes support vectors very appropriate for an application such as polyp recognition where we need an automatic way of distinguishing the differentiating characteristics of polyps and non-polyps. 

More generally, using a learning method is necessary for a complex problem such as polyp recognition where it is not quite possible to find the distinguishing components of the two classes. Using SVM rather than hand-crafting classification heuristics, exploits all of the information in the training set optimally, and
eliminates the guess work from the task of defining appropriate discrimination criteria.


2.3. Distinctive Component Analysis

Support Vector Machines identifies the support vectors, the data that carries the distinctive characteristics in both of the classes, and implicitly uses this information in the classification process. Here, we have developed a feedback framework that uses support vectors in order to explicitly obtain these distinctive components. More specifically, the support vectors belong to the similar shapes in the polyps and the healthy tissue. We aim to find the distinctive planar attributes of the closeby support vectors, and give further emphasis to the right combination of those planar attributes while testing a new case. In this section, we would like to describe a new analytical approach in order to find the right combination and choice of the future vectors, called Distinctive Component Analysis (DCA).

First we would like to remind the example where a traditional clustering method, i.e. k-means clustering, would fail. Although it seems to be an easy case to distinguish, k-means algorithm fails to cluster the diamond points and the square points correctly as shown in 7(b). In order to identify the two clusters correctly, we propose to extract the distinctive components of the two classes. In this particular example, the projection of the data to the line presented in Figure7(c) extracts the distinctive components optimally. That is, finding this optimum line accounts for obtaining the distinctive components. We have to note that the given example is not a fair case for traditional clustering methods. However, our method to obtain high dimensional histograms of the data might involve unfair cases similar to the given example, and still needs to be solved.

Without loss of generalization, let us consider the two class clustering problem. We also assume that we have many instances of each class. We will treat these instances as vectors in the original space. Distinctive Component Analysis accounts for obtaining the components of the instances of each class which
distinguish themselves from the instances of the other class. Our objective is then to find a projection of the current instances of the future vectors into a lower dimensional domain where the projections belonging to the two classes become well separated.

Mathematically, this accounts for finding the best low dimensional space, where the intra class variation of the projected vectors is minimized while inter class variation is maximized. Without loss of generalization, let us develop the model for projecting the data from its original dimension n onto one dimension (line). Let N1 of x1i's be the instances of the first class and N2 of x2i's be the instances of the second class where N1 and N2 are the number of instances in each class. Let xx1i and xx2i be the projection of the vectors on to the distinguishing line. That is, given nx1 projection vector α

                                                           

                                                 

The distinctive component is then determined by the value of the projection vector α, where the following cost function is minimized:

                                  

where xx1 and xx2 are the average of xx1i's and xx2i's respectively and µ1 and µ2 are constants. We observed that the value of µ1=µ2 ranging between 0.1 and 0.5 gave similar results. The optimization problem can also be written in the following form, 

                                                                               (Equation 2)

where H given by:

               

where X1 is the matrix with each column containing an x1i and similarly X2 is the matrix with each column containing an x2i. 11 and 12 are column vectors of size N1 and N2 with all 1's as elements. Observe that H is symmetric and thus has the same left and right eigen vectors, 

                                                           
where is the diagonal matrix of sorted eigen values and U is the matrix of eigen vectors of the matrix H. Assuming that |α|=1, the equation 2 gets its minimum value when α= Un where  Un is the last eigen vector of H, that is the eigen vector associated with the smallest eigen value. 


One can generalize the given solution for the mapping onto the optimum line, to mappings to higher dimensions. An iterative analysis of the above system shows that the m-dimensional distinctive component consists of the mappings by the last m eigen vectors, i.e. eigen vectors belonging to the smallest m
eigen values of H. However, the eigen vectors other than Un, are perpendicular to Un, thus the addition of these vectors is not of practical importance since Un already captures the most of the distinctive information.

An analysis that identifies the distinctive components of the objects helps by not only distinguishing these important features, but also eliminating the distracting features that a clustering algorithm would misinterpret. More importantly than that, it helps us interpret those important features that distinguish between the
two class and design new systems using this information.

We have to note that in its current form, the DCA has a similar mathematical model to Linear Discriminant Analysis[24] and extracts local components (seperable components) but not global components in a complex space of vectors, i.e. space of polyps and healthy tissue. Thus, we propose to apply distinctive component analysis to small clusters of support vectors. This process is discussed more thoroughly in the experiments.

In this application, we do not aim to apply DCA to unsupervised clustering problem, but to apply it to the labeled training set. We have recently designed an iterative algorithm and applied DCA to unsupervised clustering(blind source seperation) problem.  In order to be coherent in this paper, we report the algorithm and an experiment with the following link:

PLEASE FOLLOW THIS LINK TO SEE THE APPLICATION OF DCA IN CLUSTERING


3. Experiments 

We applied two set of experiments to two different input dataset. In the 1st experiment we show that the addition of the DCA analysis improves the results obtained by SVM. In the 2nd experiment, we compare the performances when VQ with euclidean distance measure and the entropy constrained distance measure are used, and when relative entropy normalized histograms are used as well as the performance of a previous method.

Experiment -1

We used a data set consisting of small candidate volumes from the CT scans of subjects enrolled in our CT colonography study comprising of 30 known colonic polyps and 212 other regions (non-polyps) containing tissue from healthy mucosal surface. These healthy tissues were all false positives obtained in previous work [5,7], and essentially look quite like the true positive polyps (Some examples were shown in Figure 1). All the polyps had a principal radius greater than 5 mm. 150 random triples of perpendicular images were extracted from each candidate shape. A 24-vector was obtained for each triple which measures the following features on the random planes: Best fit circle's radius, residue to the best fit circle, line and quadratic curve, quadratic shape invariants, moment invariants, angle between parallel lines (if there are parallel lines in the image), total residue to line fit in the pair of parallel lines. Please see [3] and [6] for the detailed description of these features. Having obtained 24-vector for each triple of mutually orthogonal images, 45 clusters were used in k-means clustering, resulting into a 45-vector signature per candidate shape. 

The resulting signature vectors were used as feature vectors by the support vector machine classification algorithm with exponential radial basis functions as kernels[13]. In our analysis, we will refer to sensitivity as the fraction of detected polyps, and to specificity as the fraction of detected normal tissue volumes among the non-polyp locations. We applied a 10-fold cross validation study. In this study, we divided the polyps and the healthy tissue into 10 disjoint sets and conducted 10 experiments. In each experiment, one of the 10 sets was used as the test set, and the rest as the training set.  Of the 30 polyps, 24 were correctly classified. Of the 212 healthy tissue, 186 of them were correctly classified. The total performance of the system was 87%.

To examine the trade-off between the sensitivity and specificity more quantitatively, we have analyzed what happens when the zero-crossing ("sign" function) in expression 1 is replaced by a level crossing. As the level is decreased, more true polyps are detected, but at a cost of more false positives. The percentage of true polyp detections versus false positive detections is given in Figure 8(a). The main objective of this work is to be able to achieve an accuracy of 80% or more with an acceptable number of false positives (FPs). In previous work, comparable accuracy was obtained with about 100 FPs per colon for
polyps of size 5mm or greater, and our approach is shown to be able to reduce the false positive rate by 85% and 75% for accuracy levels of 85% and 95%, which inherently reduces the radiologist's interpretation time by the same amount.

                                           

                                         Fig.8: (a) Results obtained by Support Vector Machine Classifier (b) Improvements obtained using further analysis by DCA .

The performance of the system can be further increased if the aforementioned problems with the k-means clustering are resolved. For this, we propose to use Distinctive Component Analysis when the system is not certain about its decision. Having obtained the support vectors from the previous analysis, the margin between the support vectors and the optimum hyperplane gives the region where the decision of the system is in the suspicious region for a test case. Thus for that particular case, we use the local distinguishing characteristics of support vectors nearby that test case. More explicitly, we obtain the closest k
(k=2 in our experiment) polyp and non-polyp support vectors to the test case, and apply the distinctive component analysis to the geometric attribute vectors of many triples of mutually orthogonal planes belonging to those support vector shapes. This way, a distinctive α vector is found between the support vector cases close to the test shape case. The distinctive components (α's) have been very informative in interpreting the distinguishing planar geometric attributes between the support vectors belonging to polyps and support vectors belonging to healthy tissue. For example, the distinctive components of the
tissues given in the first example in Figure 9 are the angle between the parallel lines, and total fitting error to the parallel lines in the second plane (plane where parallel lines are present) and error to the best fitting circle and quadratic in the first(more curvy) and second planes. A similar example is also given in Figure
10 where the distinctive component takes in the form of the intensity attributes in the images. We think that Distinctive Component Analysis gives valuable feedback even to the radiologists regarding these distinguishing planar attributes.
    

                                               

                                                Fig.9:(a) Rendering of a polyp(top) and rendering of a similar healthy tissue(bottom) (b) Examples of two mutually orthogonal planes(the
                                                                        ones that were used out of the triple to obtain the planar attributes). (c) Distinctive component. First 12 attributes belong
                                                                        to the first (more curvy) image, and the next 12 attributes belong to the second image. Observe that the parallel line analysis in
                                                                        the second image and also the circle error and quadratic invariants are the distinctive components.

                                               

                                                 Fig.10:  (a) Rendering of a polyp(top) and rendering of a similar healthy tissue(bottom) (b) Examples of two mutually orthogonal planes(the
                                                                              ones that were used out of the triple to obtain the planar attributes). (c) Distinctive component. Observe that the intensity
                                                                              average and also error to the best fitting line in the second image are the distinctive components.

The above mentioned 10 fold study is repeated for evaluating the Distinctive Component Analysis. For each experiment, if the test case falls between the support vectors(suspicious region where the expression in 1 gets values in (-|w|,|w|)), further local analysis of the nearby support vector cases is conducted using DCA. This analysis is applied not on the histogram signatures, but on the triple plane vectors (24-vectors consisting of geometric attributes) of those support vector shapes. This analysis results with an optimum mapping α between the geometric attribute vectors of the nearby support vectors. Finally, this mapping is applied to the geometric attribute vectors of the test case and the location of the average mapping with respect to the mappings of the polyp and non-polyp nearby support vectors is used as a measure of polypness. The measure thus obtained is linearly combined with the Support Vector
Machines' measure obtained by equation 1 and final decision regarding the test case is given.

The results of the same 10-fold cross validation study with DCA are summarized in Figure 8(b). Observe that, the addition of Distinctive Component Analysis improves the previous results with a considerable amount. The ratio of False Positive detections is further reduced from 24% to 18% and 33%
to 29% for sensitivity levels of 90% and 100% respectively.

Experiment-2

In this experiment, we used a dataset consisting of 48 patients. the 3D CT data was preprocessed by the Hough Transform(HT) method as described in [4]. This yielded a total of 31099 hits of score 3000 or above, on a 48 patient datasets. For computational efficiency, we kept all of the polyp hits, but subsampled the non-polyp (healthy tissue) hits, eventually resulting in 40 polyp (sizes from 2mm to 15mm in diameter) and 2219 non-polyp hits. Subvolumes of size 32x32x32centered at each Hough Transform hit were used as input candidate data for our system.


We applied our new 3D pattern processing method to each shape in the candidate volume dataset. One hundered random triples of mutually perpendicular images were extracted from each candidate volume. Each triple yielded a 26-vector which measures the following features on the random planes: Best-fit circle's radius, residue to the best-fit circle, line and quadratic curve, quadratic shape invariants, moment invariants, angle and distance between nearly parallel lines (if there are such pair of lines in the image), and total residue to line fit in the pair of parallel lines. Finally, the high-dimensional histograms obtained by vector quantization resulted in a 45 dimensional vector to represent each particular candidate volume. Similar to the first experiment these vectors were used by the SVM classification algorithm with exponential radial basis functions as kernel functions. 

We conducted a 10-fold cross validation experiment to evaluate our system. First, we randomly divided the candidate volumes into 10 uniformly distributed sets of 10 polyps (with repetitions, average overlap between sets is 1.8 polyps, maximum overlap is 4 polyps) and 221 or 222 non-polyps (healthy tissue, without repetitions). In each of these 10 experiments, one set was used as the test set, and the remaining 30 polyps and 1997(1998) non-polyps as training
set. 

We obtained a sensitivity-specificity curve for each experiment for both our method and also to the previous method as given in Figure 11. We experimented the Euclidean distance based  VQ and generalized entropy constrained VQ, and relative entropy normalized histograms. The sensitivity-specificity curves from each experiment were next averaged to obtain a  final sensitivity-specificity curve. Among these curves, the previous method is denoted by HT, while our methods are denoted as Regular_VQ (Euclidean distance based) and Entropy_VQ(Entropy Constrained VQ) and Relative Entropy Normalized (relative entropy constrained normalization on the histograms). Observe that Regular_VQ performs the best among all methods. Entropy constrained methods did not work as well. This shows that the features of polyps and non-polyps do not span a uniform distribution, thus constraining clusters to be equal size and equal importance does not improve the results, but even make them worse.  

                                                       

                                                                         Fig.11: The Sensisitivity-Specificity curves of Various Methods.                                     

The results of our method are far better than the previous work(HT). For 100% sensitivity, the specificity is increased from 0.09 to 0.48. However, the improvements are not as good as the first experiment. To interpret the reason, we analyzed the two polyps with the least scores more thoroughly. One of these polyps was of 6.5mm in diameter that was located on a haustral fold. We observed that the HT hit belonging to this polyp was 6.7mm away from the real polyp center as manually identified by the radiologist. Figure 12 illustrates examples of random planes taken around the HT hit. Obviously, in this case, our method processed the nearby fold but not the polyp, resulting in a low score for the candidate volume. We observed a similar situation with the other low score polyp. In this case, the polyp was of size 3mm in diameter. The HT hit for this polyp was 3mm away from the actual location. Thus, the low scores that some of the polyps received in our system were caused by the mismatch of locations from the preprocessing stage. Excluding these two polyps result in a specificity level 0.68 at sensitivity level of 100%.

                                               

                                  Fig.12: Illustration of a polyp (ellipse) that has a HT hit displaced from the actual location. As seen on all three images, the edge of the fold is 
                                                         much closer to the center of the image; thus in this case the fold was processed instead of the actual polyp.
 

The current implementation of the method has adjustable parameters that may be optimized. Currently, the random images are centered at a random displacement, uniformly distributed between -5 and 5 voxels, around the HT hit. As mentioned above, we observed that some of the HT hits were more than 5 voxels away from the real polyp center as identified by the radiologist, resulting in poor performance. On the other hand, examining a larger volume around
the HT hit might impair the system by placing planes on nearby structures. This trade off is under investigation.

 


5. Discussion and Conclusion

Computer aided diagnosis has recently attracted attention in medical imaging. In order to process the 3-D medical images, we need to develop 3-D pattern processing methods. In this paper, we proposed a new 3-D pattern recognition approach that mimics the radiologists' way of viewing the images. We attain planar attributes from many randomly oriented triples of mutually orthogonal planes and use vector quantization to obtain histograms of these planar attributes. The resulting histograms are then invariant to rotation and translation, and used as signatures to represent the volumes. We combine our signatures with Support Vector Machines classification and improve the results obtained by the previous study on human colon cancer detection.

Another major contribution of this paper is the idea of feedback framework between the Support Vector Machines classification algorithm and the features. This framework, called Distinctive Component Analysis, maps the features to a lower dimension space where the two class of objects are optimally separated so as to obtain better features. We apply Distinctive Component Analysis to obtain a better combination of planar attributes. This analysis has been informative in understanding the nature of distinguishing characteristics between polyps and normal tissue that have similar shapes. Finally, we show that the combination of these better attributes with support vector machines classification provides a good recognition rate with a reasonable ratio of false positive detections.

There are many possible directions for future investigation. The new 3-D pattern recognition algorithm can be applied to the output of any high sensitivity algorithm to reduce the ratio of false positive detections. Thus, we would like to apply the new 3-D pattern recognition approach to other medical areas such as lung nodule detection. In addition, we would like to conduct more experiments with Distinctive Component Analysis. For this, any application area where two classes of similar objects need to be distinguished is appropriate. A nonlinear mathematical definition of DCA is also under investigation.


References

[1] Wingo P.J., Cancer Statistics, Ca Cancer Journal Clin, 1995; 45:8-30.

[2] Thoeni R.F., Laufer I. “Polyps and cancer,” Textbook of Gastrointestinal Radiology, Philadelphia: W.B. Saunders, 1994; 1160.       

[3] Gokturk S.B., Tomasi C., Acar B., Paik D., Beaulieu C.,  Napel S., “A Learning Method for Automated Polyp Detection”, to appear in the proceedings of Medical Image Computing and Computer-Assisted Intervention MICCAI’01, 2001.

[4] Summers R.M., Beaulieu C.F., Pusanik L.M., Malley J.D., Jeffrey R.B., Glazer D.I., Napel S., “Automated polyp detector for CT colonography: feasibility study,” Radiology ,216(1)284-90, 2000.

[5] Paik D.S., Beaulieu C.F., Jeffrey R.B., Jr., Karadi C.A., Napel S., “Detection of Polyps in CT Colonography: A Comparison of a Computer-Aided Detection Algorithm to 3D Visualization Methods,” Radiological Society of North America 85th Scientific Sessions, Chicago, November 1999.

[6] Gokturk S.B.,  Tomasi C., Acar B, Beaulieu CF, Paik D, Jeffrey RB, Yee J, Napel S, A Statistical 3D Pattern Processing Method For Computer Aided Detection of Polyps in CT Colonography, submitted to IEEE Transactions on Medical Imaging.

[7] Gokturk S.B., Tomasi C., “A graph method for the conservative detection of polyps in the colon,” 2nd International Symposium on Virtual Colonoscopy, Boston , October 2000.

[8] Vapnik V., Statistical Learning Theory, New York, 1998.

[9] Vining D.J., “Virtual colonoscopy,” Gastrointest. Endosc. Clin. N. Am., vol. 7(2), pp. 285-291, 1997. [10] Hu M.K., “Visual pattern recognition by moment invariants,” IRE transactions on information theory, vol. IT-8, pp 179-187, 1962.

[11] A. Gersho and R.M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Press, 1992.

[12] Vapnik V., Estimation of dependencies based on empirical data, Springer-Verlang, New York, 1982.

[13] Burges C.J.C., “A tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and knowledge discovery, 1998

[14] Winawer SJ, Zauber AG, O'Brien MJ, Gottlieb LS, Sternberg SS, Stewart ET, Bond JH, Schapiro M, Panish JF, Waye JD, et al. The National Polyp Study. Design, methods, and characteristics of patients with newly diagnosed polyps. The National Polyp Study Workgroup. Cancer 1992; 70:1236-45.

[15] Vining DJ, Shifrin RY, Grishaw EK, Liu K, Gelfand DW. Virtual colonoscopy. Radiology 1994; 193(P):446.

[16] Fenlon HM, Clarke PD, Ferrucci JT. Virtual colonoscopy: imaging features with colonoscopic correlation. AJR Am J Roentgenology 1998; 170:1303-9.

[17] Hara AK, Johnson CD, Reed JE, Ahlquist DA, Nelson H, Ehman RL, McCollough CH, Ilstrup DM. Detection of colorectal polyps by computed tomographic colography: feasibility of a novel technique. Gastroenterology 1996; 110:284-90.

[18] Kay CL, Evangelou HA. A review of the technical and clinical aspects of virtual endoscopy. Endoscopy 1996; 28:768-775.

[19] Royster AP, Gupta AK, Fenlon HM, Ferrucci JT. Virtual colonoscopy: current status and future implications. Acad Radiology 1998; 5:282-8.

[20] Ezer N., Anarim E., Sankur B., A comparative study of moment invariants and Fourier descriptors in planar shape recognition, 7th Mediterranean Electrotechnical Conference, pp 242-245 vol.1, 1994

[21] Rubin GD, Beaulieu CF, Argiro V, Ringl H, Norbash AM, Feller JF, Dake MD, Jeffrey RB, Napel S, Perspective volume rendering of CT and MR images: applications for endoscopic imaging. Radiology 1996 May;199(2):321-30.

[22] H. Yoshida, Y. Masutani, P.M. MacEneaney, K. Doi, Y. Kim, A.H. Dachman, “Detection of colonic polyps in CT colonography based on geometric features,” Radiology, vol. 217(SS), pp. 582-582, November 2000.  

[23] A.H. Mir, M. Hanmandlu and S.N. Tandon, Description of shapes in CT images: the usefulness of time-series modeling techniques for identifying organs, IEEE EMBS Mag., vol. 18(1), pages:79-84, 1999.

[24] S. Balakrishnama, A. Ganapathiraju,  and J. Picone, Linear discriminant analysis for signal processing problems, Proceedings. of IEEE Southeastcon '99, Page(s): 78 -81

[25] Xinhua Zhuang, Yan Huang, K. Palaniappan, and Yunxin Zhao, Gaussian mixture density modeling, decomposition, and applications, IEEE Transactions on Image Processing, Volume: 5 9 , Sept. 1996 , Page(s): 1293 -1302.

[26] Te-Won Lee, M.S.Lewicki,  and T.J. Sejnowski, ICA mixture models for unsupervised classification of non-Gaussian classes and automatic context switching in blind signal separation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 22 10 , Oct. 2000 , Page(s): 1078 -1089.