Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+Assessing the Accuracy of Non-Rigid
+Registration With and Without Ground Truth
+R. S. Schestowitz^{1}, W. R. Crum^{2}, V. S. Petrovic^{1},
+C. J. Twining^{1}, T. F. Cootes^{1} and C. J. Taylor^{1}
+^{1}Imaging Science and Biomedical Engineering, University
+of Manchester
+Stopford Building, Oxford Road, Manchester M13 9PT,
+United Kingdom
+^{2}Centre for Medical Image Computing, Department of
+Computer Science, University
+College London, Gower Street, London WC1E 6BT, United Kingdom
+Non-rigid registration (NRR) has over the past few years been increasingly
+used as a basis for medical image analysis. It was proposed for registering
+both pairs and groups of images. The problem is highly under-constrained
+and a host of algorithms that have become available will, given a set of
+images to be registered, in general, produce different results. We present
+two methods for assessing the performance of non-rigid registration
+algorithms. The former method is based on measuring overlap of ground
+truth labels between the registered images using Tanimoto's formulation.
+The latter method assesses registration as the quality of a generative
+statistical appearance model constructed from registered images using the
+concepts of model specificity and generalisation and requires no ground
+truth. We compare the two methods and show them to be in agreement with
+the latter being more sensitive.
+The first among methods to be described relies on the
+existence of ground-truth data such as boundaries of
+image structures, produced by manual markup of
+distinguishable points. Having registered an image set,
+the method can measure overlap between structures that
+have been annotated, thereby implying how good a
+registration was.
+Our latter method is able to assess registration
+without ground truth of any form. The approach involves
+automatic construction of appearance models from the
+registered data, subsequently evaluating, using model
+syntheses, the quality of that model. Quality of the
+registration is tightly-related to the quality of its
+resulting model and the two tasks, namely model
+construction and image registration, are innately the
+same one. Both involve the identification of corresponding
+points, also known as landmarks in the context of
+model-building. Expressed differently, a registration
+produces a dense set of correspondences and models
+of appearance require the images and these
+correspondences in order to be built.
+To put the validity of both methods to the test, we
+assembled a set of 2-D 38 MR images of the brain. Each
+of these images was carefully annotated to identify
+different compartments within the brain. These
+anatomical compartments can be perceived as simplified
+labels that faithfully define the structure of the brain. Our
+first method of assessment uses the Tanimoto overlap
+measure to calculate the degree to which labels across
+the image set concur. In that respect, it exploits
+ground truth, which has been identified by an expert,
+to reason about registration quality.
+The second method takes an entirely different approach.
+It feeds on the results of a registration algorithm,
+where correspondences have been highlighted, and builds
+an appearance model given the images and their
+correspondences. From that model, many synthetic brain
+images are derived. Vectorisation of these images
+allows us to embed them in a
+high-dimensional space. We can then compare the spatial
+cloud that these synthetic images form with the cloud
+that gets composed from the original image set -- the set
+from which the model has been build. Computing the
+overlap between these clouds gives insight into the
+quality of the registration. Simply put, it is a model
+fit evaluation paradigm. The better the registration,
+the greater the overlap between those clouds will be.
+To compute overlap between two clouds of data, we have
+devised measures that we refer to as Specificity and
+Generalisablity. The former tells how well the model
+fits its seminal data, whereas the latter tells how
+well the data fits its derived model. It is a
+reciprocal relationship that 'locks' data to its
+model and vice versa. We calculate Specificity and
+Generalisablity by measuring distances in space. As we
+seek a distance measure that is tolerant to slight anatomical differences,
+we use the shuffle distance, not neglecting to compare
+it against Euclidean distance. The shuffle distance compares each point in one image with a larger corresponding region in another image. It adheres to the best fit, i.e. matches the two points whose distance is minimal.
+Our assessment framework, by which we test both
+methods, uses non-rigid registration, whereby many
+degrees of freedom are involved in image
+transformations. To systematically generate data over
+which our hypotheses can be tested, we perturb the
+brain data using clamped-plate splines, which are diffeomorphic. In the brain
+data which we use, correspondences among images are said to be
+perfect so they can only ever be degraded. We wish
+to show that as the degree of perturbation increases,
+so do the measures of our registration assessment methods.
+In an extensive batch of experiments we perturbed the
+datasets at progressively increasing levels, which led
+to well-understood mis-registration of the data. We
+repeated these experiments 10 times to demonstrate that
+both approaches to assessment are consistent and
+results are unbiased. Having investigated and plotted the
+measures of overlap for each perturbation extent, we
+see a rather linear decrease in the amount of overlap
+(Figure X). This means that, when ground-truth-based
+registration is eroded, the overlap-based measure is
+able to detect that and the response is very
+well-behaved, thus meaningful and reliable.
+<Graphics file: ./Graphics/1.eps>
+    <Graphics file: ./Graphics/2.eps>
+          Figures X&Y. The measured quality of registration as perceived
+          by the overlap-based evaluation (left) and the model-based
+          evaluation (right).
+We then undertake another assessment task, this time
+exploiting the method which does not make use of ground truth.
+We notice a very similar behaviour (Figure Y), which is
+evidence that the latter is a powerful and reliable
+method of assessing the degree of mis-registration -- or
+conversely -- the quality of registration.
+As a last step, we embark on the task of comparing the
+two algorithms, identifying sensitivity as the factor
+which is most important. Sensitivity reflects on our
+ability to confidently tell apart a good registration
+from a worse one. The slighter the difference which can
+be detected reliably, the more sensitive the method.
+To calculate sensitivity, we compute the amount of
+change in terms of mean pixel displacement --
+deviation from the correct solution, that is. We then
+look at differences in our assessor's value, be it
+overlap, or Specificity, or Generalisation. We also must
+stress the need to take account of the errors bars as
+there is both an inter-instantiation error and a
+measure-specific error; the two must be composed
+carefully. The derivation of sensitivity can be
+expressed as follows:
+placeholder
+where X is... (TODO)
+<Graphics file: ./Graphics/3.eps>
+          Figure Z. The sensitivity of registration assessment methods.
+          note to self: exclude Gen.? Combined? Plots?
+          -.
+Figure Z suggests that, for roughly any selection of
+shuffle distance neighbourhood, the method which does
+not require ground truth is more sensitive than the
+method which depends on it. When the trends of these
+curves are inspected closely, it can be observed that
+they are approximately parallel, which implies that the two
+methods are very closely correlated.
+In summary, we have shown two valid methods for
+assessing non-rigid registration. The methods are
+correlated in practice, but the principles they build
+upon are quite separable. Their pre-requisites -- if
+any -- likewise. Registration can be evaluated with or
+without ground-truth annotation and the behaviour our measures is consistent across distinct datasets, is
+well-behaved, and is sensitive. Both methods have been
+successfully applied to assessment of non-rigid
+registration algorithms and both led to the
+expected conclusions. That aspect of the work,
+nonetheless, is beyond the scope of this paper.

Schestowitz Wiki

User Tools

Site Tools

Differences

Page Tools