User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

mias-irc-2005-rev-2 [2014/05/31 17:37] (current)
admin created
Line 1: Line 1:
 +Assessing the Accuracy of Non-Rigid
 +Registration With and Without Ground Truth
 +R. S. Schestowitz^{1},​ W. R. Crum^{2}, V. S. Petrovic^{1},​
 +C. J. Twining^{1},​ T. F. Cootes^{1} and C. J. Taylor^{1}
 +^{1}Imaging Science and Biomedical Engineering,​ University
 +of Manchester
 +Stopford Building, Oxford Road, Manchester M13 9PT,
 +United Kingdom
 +^{2}Centre for Medical Image Computing, Department of
 +Computer Science, University
 +College London, Gower Street, London WC1E 6BT, United Kingdom
 +We present two methods for assessing the performance of
 +non-rigid registration algorithms. One of them requires
 +ground-truth solutions, whereas the other does not need
 +any form of ground truth. The former method is based on
 +label overlap, which can be computed using Tanimoto'​s
 +formulation. The method which requires no ground truth
 +exploits the fact that, given a set of non-rigidly
 +registered images, a generative statistical appearance
 +model can be constructed. The quality of the model
 +depends on the quality of the registration,​ and can be
 +evaluated by comparing images sampled from it with the
 +original image set. We derive indices of model
 +specificity and generalisation,​ and show that they
 +demonstrate the loss of registration as a set of
 +correctly registered images is progressively perturbed.
 +We finally compare the two methods of assessment and
 +show that the latter method, which requires no ground
 +truth, is in fact more sensitive than the one that does.
 +Over the past few years, non-rigid registration (NRR)
 +has been used increasingly as a basis for medical image
 +analysis. Applications include structural analysis,
 +atlas matching and change analysis. Many different
 +approaches to NRR have been proposed, for registering
 +both pairs and groups of images. These differ in terms
 +of the objective function used to assess the degree of
 +mis-registration,​ the representation of spatial
 +deformation fields, and the approach to minimizing the
 +mis-registration with respect to the deformations. The
 +problem is highly under-constrained and, given a set of
 +images to be registered, each approach will, in
 +general, give a different result. This leads to a
 +requirement for methods of assessing the quality of registration.
 +Hereby we outline two methods for assessment, one of
 +which requires ground-truth solutions to be provided a
 +priori while the other does not. We shall present
 +results which confirm that both methods are valid and
 +proceed to calculating their sensitivities. We find
 +that the method which requires ground-truth solutions
 +is not as sensitive as the method which requires
 +nothing but the raw images and the corresponding
 +deformation fields, i.e. the registration.
 +The first among the methods to be described relies on the
 +existence of ground-truth data such as boundaries of
 +image structures, produced by manual markup of
 +distinguishable points. Having registered an image set,
 +the method can measure overlap between structures that
 +have been annotated, thereby implying how good a
 +registration was.
 +Our latter method is able to assess registration
 +without ground truth of any form. The approach involves
 +automatic construction of appearance models from the
 +registered data, subsequently evaluating, using model
 +syntheses, the quality of that model. Quality of the
 +registration is tightly-related to the quality of its
 +resulting model and the two tasks, namely model
 +construction and image registration,​ are innately the
 +same one. Both involve the identification of corresponding
 +points, also known as landmarks in the context of
 +model-building. Expressed differently,​ a registration
 +produces a dense set of correspondences and models
 +of appearance require the images and these
 +correspondences in order to be built.
 +To put the validity of both methods to the test, we
 +assembled a set of 2-D 38 MR images of the brain. Each
 +of these images was carefully annotated to identify
 +different compartments within the brain. These
 +anatomical compartments can be perceived as simplified
 +labels that faithfully define the structure of the brain. Our
 +first method of assessment uses the Tanimoto overlap
 +measure to calculate the degree to which labels across
 +the image set concur. In that respect, it exploits
 +ground truth, which has been identified by an expert,
 +to reason about registration quality.
 +The second method takes an entirely different approach.
 +It feeds on the results of a registration algorithm,
 +where correspondences have been highlighted,​ and builds
 +an appearance model given the images and their
 +correspondences. From that model, many synthetic brain
 +images are derived. Vectorisation of these images
 +allows us to embed them in a
 +high-dimensional space. We can then compare the spatial
 +cloud that these synthetic images form with the cloud
 +that gets composed from the original image set -- the set
 +from which the model has been build. Computing the
 +overlap between these clouds gives insight into the
 +quality of the registration. Simply put, it is a model
 +fit evaluation paradigm. The better the registration,​
 +the greater the overlap between those clouds will be.
 +To compute overlap between two clouds of data, we have
 +devised measures that we refer to as Specificity and
 +Generalisablity. The former tells how well the model
 +fits its seminal data, whereas the latter tells how
 +well the data fits its derived model. It is a
 +reciprocal relationship that '​locks'​ data to its
 +model and vice versa. We calculate Specificity and
 +Generalisablity by measuring distances in space. As we
 +seek a distance measure that is tolerant to slight anatomical differences,​
 +we use the shuffle distance, not neglecting to compare
 +it against Euclidean distance. The shuffle distance compares each point in one image with a larger corresponding region in another image. It adheres to the best fit, i.e. matches the two points whose distance is minimal.
 +Our assessment framework, by which we test both
 +methods, uses non-rigid registration,​ whereby many
 +degrees of freedom are involved in image
 +transformations. To systematically generate data over
 +which our hypotheses can be tested, we perturb the
 +brain data using clamped-plate splines, which are diffeomorphic. In the brain
 +data which we use, correspondences among images are said to be
 +perfect so they can only ever be degraded. We wish
 +to show that as the degree of perturbation increases,
 +so do the measures of our registration assessment methods.
 +In an extensive batch of experiments we perturbed the
 +datasets at progressively increasing levels, which led
 +to well-understood mis-registration of the data. We
 +repeated these experiments 10 times to demonstrate that
 +both approaches to assessment are consistent and
 +results are unbiased. Having investigated and plotted the
 +measures of overlap for each perturbation extent, we
 +see a rather linear decrease in the amount of overlap
 +(Figure X). This means that, when ground-truth-based
 +registration is eroded, the overlap-based measure is
 +able to detect that and the response is very
 +well-behaved,​ thus meaningful and reliable.
 +<​Graphics file: ./​Graphics/​1.eps>​
 +    <​Graphics file: ./​Graphics/​2.eps>​
 +          Figures X&Y. The measured quality of registration as perceived
 +          by the overlap-based evaluation (left) and the model-based
 +          evaluation (right).
 +We then undertake another assessment task, this time
 +exploiting the method which does not make use of ground truth.
 +We notice a very similar behaviour (Figure Y), which is
 +evidence that the latter is a powerful and reliable
 +method of assessing the degree of mis-registration -- or
 +conversely -- the quality of registration.
 +As a last step, we embark on the task of comparing the
 +two algorithms, identifying sensitivity as the factor
 +which is most important. Sensitivity reflects on our
 +ability to confidently tell apart a good registration
 +from a worse one. The slighter the difference which can
 +be detected reliably, the more sensitive the method.
 +To calculate sensitivity,​ we compute the amount of
 +change in terms of mean pixel displacement --
 +deviation from the correct solution, that is. We then
 +look at differences in our assessor'​s value, be it
 +overlap, or Specificity,​ or Generalisation. We also must
 +stress the need to take account of the errors bars as
 +there is both an inter-instantiation error and a
 +measure-specific error; the two must be composed
 +carefully. The derivation of sensitivity can be
 +expressed as follows:
 +where X is... (TODO)
 +<​Graphics file: ./​Graphics/​3.eps>​
 +          Figure Z. The sensitivity of registration assessment methods.
 +          note to self: exclude Gen.? Combined? Plots?
 +          -.
 +Figure Z suggests that, for roughly any selection of
 +shuffle distance neighbourhood,​ the method which does
 +not require ground truth is more sensitive than the
 +method which depends on it. When the trends of these
 +curves are inspected closely, it can be observed that
 +they are approximately parallel, which implies that the two
 +methods are very closely correlated.
 +In summary, we have shown two valid methods for
 +assessing non-rigid registration. The methods are
 +correlated in practice, but the principles they build
 +upon are quite separable. Their pre-requisites -- if
 +any -- likewise. Registration can be evaluated with or
 +without ground-truth annotation and the behaviour our measures is consistent across distinct datasets, is
 +well-behaved,​ and is sensitive. Both methods have been
 +successfully applied to assessment of non-rigid
 +registration algorithms and both led to the
 +expected conclusions. That aspect of the work,
 +nonetheless,​ is beyond the scope of this paper.
mias-irc-2005-rev-2.txt ยท Last modified: 2014/05/31 17:37 by admin