This shows you the differences between two versions of the page.
— |
mias-irc-2005-rev-2 [2014/05/31 17:37] (current) admin created |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | Assessing the Accuracy of Non-Rigid | ||
+ | Registration With and Without Ground Truth | ||
+ | R. S. Schestowitz^{1}, W. R. Crum^{2}, V. S. Petrovic^{1}, | ||
+ | C. J. Twining^{1}, T. F. Cootes^{1} and C. J. Taylor^{1} | ||
+ | |||
+ | |||
+ | ^{1}Imaging Science and Biomedical Engineering, University | ||
+ | of Manchester | ||
+ | Stopford Building, Oxford Road, Manchester M13 9PT, | ||
+ | United Kingdom | ||
+ | |||
+ | |||
+ | ^{2}Centre for Medical Image Computing, Department of | ||
+ | Computer Science, University | ||
+ | College London, Gower Street, London WC1E 6BT, United Kingdom | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | We present two methods for assessing the performance of | ||
+ | non-rigid registration algorithms. One of them requires | ||
+ | ground-truth solutions, whereas the other does not need | ||
+ | any form of ground truth. The former method is based on | ||
+ | label overlap, which can be computed using Tanimoto's | ||
+ | formulation. The method which requires no ground truth | ||
+ | exploits the fact that, given a set of non-rigidly | ||
+ | registered images, a generative statistical appearance | ||
+ | model can be constructed. The quality of the model | ||
+ | depends on the quality of the registration, and can be | ||
+ | evaluated by comparing images sampled from it with the | ||
+ | original image set. We derive indices of model | ||
+ | specificity and generalisation, and show that they | ||
+ | demonstrate the loss of registration as a set of | ||
+ | correctly registered images is progressively perturbed. | ||
+ | We finally compare the two methods of assessment and | ||
+ | show that the latter method, which requires no ground | ||
+ | truth, is in fact more sensitive than the one that does. | ||
+ | |||
+ | Over the past few years, non-rigid registration (NRR) | ||
+ | has been used increasingly as a basis for medical image | ||
+ | analysis. Applications include structural analysis, | ||
+ | atlas matching and change analysis. Many different | ||
+ | approaches to NRR have been proposed, for registering | ||
+ | both pairs and groups of images. These differ in terms | ||
+ | of the objective function used to assess the degree of | ||
+ | mis-registration, the representation of spatial | ||
+ | deformation fields, and the approach to minimizing the | ||
+ | mis-registration with respect to the deformations. The | ||
+ | problem is highly under-constrained and, given a set of | ||
+ | images to be registered, each approach will, in | ||
+ | general, give a different result. This leads to a | ||
+ | requirement for methods of assessing the quality of registration. | ||
+ | |||
+ | Hereby we outline two methods for assessment, one of | ||
+ | which requires ground-truth solutions to be provided a | ||
+ | priori while the other does not. We shall present | ||
+ | results which confirm that both methods are valid and | ||
+ | proceed to calculating their sensitivities. We find | ||
+ | that the method which requires ground-truth solutions | ||
+ | is not as sensitive as the method which requires | ||
+ | nothing but the raw images and the corresponding | ||
+ | deformation fields, i.e. the registration. | ||
+ | |||
+ | The first among the methods to be described relies on the | ||
+ | existence of ground-truth data such as boundaries of | ||
+ | image structures, produced by manual markup of | ||
+ | distinguishable points. Having registered an image set, | ||
+ | the method can measure overlap between structures that | ||
+ | have been annotated, thereby implying how good a | ||
+ | registration was. | ||
+ | |||
+ | Our latter method is able to assess registration | ||
+ | without ground truth of any form. The approach involves | ||
+ | automatic construction of appearance models from the | ||
+ | registered data, subsequently evaluating, using model | ||
+ | syntheses, the quality of that model. Quality of the | ||
+ | registration is tightly-related to the quality of its | ||
+ | resulting model and the two tasks, namely model | ||
+ | construction and image registration, are innately the | ||
+ | same one. Both involve the identification of corresponding | ||
+ | points, also known as landmarks in the context of | ||
+ | model-building. Expressed differently, a registration | ||
+ | produces a dense set of correspondences and models | ||
+ | of appearance require the images and these | ||
+ | correspondences in order to be built. | ||
+ | |||
+ | To put the validity of both methods to the test, we | ||
+ | assembled a set of 2-D 38 MR images of the brain. Each | ||
+ | of these images was carefully annotated to identify | ||
+ | different compartments within the brain. These | ||
+ | anatomical compartments can be perceived as simplified | ||
+ | labels that faithfully define the structure of the brain. Our | ||
+ | first method of assessment uses the Tanimoto overlap | ||
+ | measure to calculate the degree to which labels across | ||
+ | the image set concur. In that respect, it exploits | ||
+ | ground truth, which has been identified by an expert, | ||
+ | to reason about registration quality. | ||
+ | |||
+ | The second method takes an entirely different approach. | ||
+ | It feeds on the results of a registration algorithm, | ||
+ | where correspondences have been highlighted, and builds | ||
+ | an appearance model given the images and their | ||
+ | correspondences. From that model, many synthetic brain | ||
+ | images are derived. Vectorisation of these images | ||
+ | allows us to embed them in a | ||
+ | high-dimensional space. We can then compare the spatial | ||
+ | cloud that these synthetic images form with the cloud | ||
+ | that gets composed from the original image set -- the set | ||
+ | from which the model has been build. Computing the | ||
+ | overlap between these clouds gives insight into the | ||
+ | quality of the registration. Simply put, it is a model | ||
+ | fit evaluation paradigm. The better the registration, | ||
+ | the greater the overlap between those clouds will be. | ||
+ | |||
+ | To compute overlap between two clouds of data, we have | ||
+ | devised measures that we refer to as Specificity and | ||
+ | Generalisablity. The former tells how well the model | ||
+ | fits its seminal data, whereas the latter tells how | ||
+ | well the data fits its derived model. It is a | ||
+ | reciprocal relationship that 'locks' data to its | ||
+ | model and vice versa. We calculate Specificity and | ||
+ | Generalisablity by measuring distances in space. As we | ||
+ | seek a distance measure that is tolerant to slight anatomical differences, | ||
+ | we use the shuffle distance, not neglecting to compare | ||
+ | it against Euclidean distance. The shuffle distance compares each point in one image with a larger corresponding region in another image. It adheres to the best fit, i.e. matches the two points whose distance is minimal. | ||
+ | |||
+ | Our assessment framework, by which we test both | ||
+ | methods, uses non-rigid registration, whereby many | ||
+ | degrees of freedom are involved in image | ||
+ | transformations. To systematically generate data over | ||
+ | which our hypotheses can be tested, we perturb the | ||
+ | brain data using clamped-plate splines, which are diffeomorphic. In the brain | ||
+ | data which we use, correspondences among images are said to be | ||
+ | perfect so they can only ever be degraded. We wish | ||
+ | to show that as the degree of perturbation increases, | ||
+ | so do the measures of our registration assessment methods. | ||
+ | |||
+ | In an extensive batch of experiments we perturbed the | ||
+ | datasets at progressively increasing levels, which led | ||
+ | to well-understood mis-registration of the data. We | ||
+ | repeated these experiments 10 times to demonstrate that | ||
+ | both approaches to assessment are consistent and | ||
+ | results are unbiased. Having investigated and plotted the | ||
+ | measures of overlap for each perturbation extent, we | ||
+ | see a rather linear decrease in the amount of overlap | ||
+ | (Figure X). This means that, when ground-truth-based | ||
+ | registration is eroded, the overlap-based measure is | ||
+ | able to detect that and the response is very | ||
+ | well-behaved, thus meaningful and reliable. | ||
+ | |||
+ | <Graphics file: ./Graphics/1.eps> | ||
+ | <Graphics file: ./Graphics/2.eps> | ||
+ | |||
+ | |||
+ | Figures X&Y. The measured quality of registration as perceived | ||
+ | by the overlap-based evaluation (left) and the model-based | ||
+ | evaluation (right). | ||
+ | |||
+ | We then undertake another assessment task, this time | ||
+ | exploiting the method which does not make use of ground truth. | ||
+ | We notice a very similar behaviour (Figure Y), which is | ||
+ | evidence that the latter is a powerful and reliable | ||
+ | method of assessing the degree of mis-registration -- or | ||
+ | conversely -- the quality of registration. | ||
+ | |||
+ | As a last step, we embark on the task of comparing the | ||
+ | two algorithms, identifying sensitivity as the factor | ||
+ | which is most important. Sensitivity reflects on our | ||
+ | ability to confidently tell apart a good registration | ||
+ | from a worse one. The slighter the difference which can | ||
+ | be detected reliably, the more sensitive the method. | ||
+ | To calculate sensitivity, we compute the amount of | ||
+ | change in terms of mean pixel displacement -- | ||
+ | deviation from the correct solution, that is. We then | ||
+ | look at differences in our assessor's value, be it | ||
+ | overlap, or Specificity, or Generalisation. We also must | ||
+ | stress the need to take account of the errors bars as | ||
+ | there is both an inter-instantiation error and a | ||
+ | measure-specific error; the two must be composed | ||
+ | carefully. The derivation of sensitivity can be | ||
+ | expressed as follows: | ||
+ | |||
+ | placeholder | ||
+ | |||
+ | where X is... (TODO) | ||
+ | |||
+ | <Graphics file: ./Graphics/3.eps> | ||
+ | |||
+ | |||
+ | Figure Z. The sensitivity of registration assessment methods. | ||
+ | note to self: exclude Gen.? Combined? Plots? | ||
+ | -. | ||
+ | |||
+ | Figure Z suggests that, for roughly any selection of | ||
+ | shuffle distance neighbourhood, the method which does | ||
+ | not require ground truth is more sensitive than the | ||
+ | method which depends on it. When the trends of these | ||
+ | curves are inspected closely, it can be observed that | ||
+ | they are approximately parallel, which implies that the two | ||
+ | methods are very closely correlated. | ||
+ | |||
+ | In summary, we have shown two valid methods for | ||
+ | assessing non-rigid registration. The methods are | ||
+ | correlated in practice, but the principles they build | ||
+ | upon are quite separable. Their pre-requisites -- if | ||
+ | any -- likewise. Registration can be evaluated with or | ||
+ | without ground-truth annotation and the behaviour our measures is consistent across distinct datasets, is | ||
+ | well-behaved, and is sensitive. Both methods have been | ||
+ | successfully applied to assessment of non-rigid | ||
+ | registration algorithms and both led to the | ||
+ | expected conclusions. That aspect of the work, | ||
+ | nonetheless, is beyond the scope of this paper. |