This shows you the differences between two versions of the page.
| — |
mias-irc-2005-rev-3 [2014/05/31 17:36] (current) admin created |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | Assessing the Accuracy of Non-Rigid | ||
| + | Registration With and Without Ground Truth | ||
| + | R. S. Schestowitz^{1}, W. R. Crum^{2}, V. S. Petrovic^{1}, | ||
| + | C. J. Twining^{1}, T. F. Cootes^{1} and C. J. Taylor^{1} | ||
| + | |||
| + | |||
| + | ^{1}Imaging Science and Biomedical Engineering, University | ||
| + | of Manchester | ||
| + | Stopford Building, Oxford Road, Manchester M13 9PT, | ||
| + | United Kingdom | ||
| + | |||
| + | |||
| + | ^{2}Centre for Medical Image Computing, Department of | ||
| + | Computer Science, University | ||
| + | College London, Gower Street, London WC1E 6BT, United Kingdom | ||
| + | |||
| + | |||
| + | Non-rigid registration (NRR) has over the past few years been increasingly | ||
| + | used as a basis for medical image analysis. It was proposed for registering | ||
| + | both pairs and groups of images. The problem is highly under-constrained | ||
| + | and a host of algorithms that have become available will, given a set of | ||
| + | images to be registered, in general, produce different results. We present | ||
| + | two methods for assessing the performance of non-rigid registration | ||
| + | algorithms. The former method is based on measuring overlap of ground | ||
| + | truth labels between the registered images using Tanimoto's formulation. | ||
| + | The latter method assesses registration as the quality of a generative | ||
| + | statistical appearance model constructed from registered images using the | ||
| + | concepts of model specificity and generalisation and requires no ground | ||
| + | truth. We compare the two methods and show them to be in agreement with | ||
| + | the latter being more sensitive. | ||
| + | |||
| + | The first among methods to be described relies on the | ||
| + | existence of ground-truth data such as boundaries of | ||
| + | image structures, produced by manual markup of | ||
| + | distinguishable points. Having registered an image set, | ||
| + | the method can measure overlap between structures that | ||
| + | have been annotated, thereby implying how good a | ||
| + | registration was. | ||
| + | |||
| + | Our latter method is able to assess registration | ||
| + | without ground truth of any form. The approach involves | ||
| + | automatic construction of appearance models from the | ||
| + | registered data, subsequently evaluating, using model | ||
| + | syntheses, the quality of that model. Quality of the | ||
| + | registration is tightly-related to the quality of its | ||
| + | resulting model and the two tasks, namely model | ||
| + | construction and image registration, are innately the | ||
| + | same one. Both involve the identification of corresponding | ||
| + | points, also known as landmarks in the context of | ||
| + | model-building. Expressed differently, a registration | ||
| + | produces a dense set of correspondences and models | ||
| + | of appearance require the images and these | ||
| + | correspondences in order to be built. | ||
| + | |||
| + | To put the validity of both methods to the test, we | ||
| + | assembled a set of 2-D 38 MR images of the brain. Each | ||
| + | of these images was carefully annotated to identify | ||
| + | different compartments within the brain. These | ||
| + | anatomical compartments can be perceived as simplified | ||
| + | labels that faithfully define the structure of the brain. Our | ||
| + | first method of assessment uses the Tanimoto overlap | ||
| + | measure to calculate the degree to which labels across | ||
| + | the image set concur. In that respect, it exploits | ||
| + | ground truth, which has been identified by an expert, | ||
| + | to reason about registration quality. | ||
| + | |||
| + | The second method takes an entirely different approach. | ||
| + | It feeds on the results of a registration algorithm, | ||
| + | where correspondences have been highlighted, and builds | ||
| + | an appearance model given the images and their | ||
| + | correspondences. From that model, many synthetic brain | ||
| + | images are derived. Vectorisation of these images | ||
| + | allows us to embed them in a | ||
| + | high-dimensional space. We can then compare the spatial | ||
| + | cloud that these synthetic images form with the cloud | ||
| + | that gets composed from the original image set -- the set | ||
| + | from which the model has been build. Computing the | ||
| + | overlap between these clouds gives insight into the | ||
| + | quality of the registration. Simply put, it is a model | ||
| + | fit evaluation paradigm. The better the registration, | ||
| + | the greater the overlap between those clouds will be. | ||
| + | |||
| + | To compute overlap between two clouds of data, we have | ||
| + | devised measures that we refer to as Specificity and | ||
| + | Generalisablity. The former tells how well the model | ||
| + | fits its seminal data, whereas the latter tells how | ||
| + | well the data fits its derived model. It is a | ||
| + | reciprocal relationship that 'locks' data to its | ||
| + | model and vice versa. We calculate Specificity and | ||
| + | Generalisablity by measuring distances in space. As we | ||
| + | seek a distance measure that is tolerant to slight anatomical differences, | ||
| + | we use the shuffle distance, not neglecting to compare | ||
| + | it against Euclidean distance. The shuffle distance compares each point in one image with a larger corresponding region in another image. It adheres to the best fit, i.e. matches the two points whose distance is minimal. | ||
| + | |||
| + | Our assessment framework, by which we test both | ||
| + | methods, uses non-rigid registration, whereby many | ||
| + | degrees of freedom are involved in image | ||
| + | transformations. To systematically generate data over | ||
| + | which our hypotheses can be tested, we perturb the | ||
| + | brain data using clamped-plate splines, which are diffeomorphic. In the brain | ||
| + | data which we use, correspondences among images are said to be | ||
| + | perfect so they can only ever be degraded. We wish | ||
| + | to show that as the degree of perturbation increases, | ||
| + | so do the measures of our registration assessment methods. | ||
| + | |||
| + | In an extensive batch of experiments we perturbed the | ||
| + | datasets at progressively increasing levels, which led | ||
| + | to well-understood mis-registration of the data. We | ||
| + | repeated these experiments 10 times to demonstrate that | ||
| + | both approaches to assessment are consistent and | ||
| + | results are unbiased. Having investigated and plotted the | ||
| + | measures of overlap for each perturbation extent, we | ||
| + | see a rather linear decrease in the amount of overlap | ||
| + | (Figure X). This means that, when ground-truth-based | ||
| + | registration is eroded, the overlap-based measure is | ||
| + | able to detect that and the response is very | ||
| + | well-behaved, thus meaningful and reliable. | ||
| + | |||
| + | <Graphics file: ./Graphics/1.eps> | ||
| + | <Graphics file: ./Graphics/2.eps> | ||
| + | |||
| + | |||
| + | Figures X&Y. The measured quality of registration as perceived | ||
| + | by the overlap-based evaluation (left) and the model-based | ||
| + | evaluation (right). | ||
| + | |||
| + | We then undertake another assessment task, this time | ||
| + | exploiting the method which does not make use of ground truth. | ||
| + | We notice a very similar behaviour (Figure Y), which is | ||
| + | evidence that the latter is a powerful and reliable | ||
| + | method of assessing the degree of mis-registration -- or | ||
| + | conversely -- the quality of registration. | ||
| + | |||
| + | As a last step, we embark on the task of comparing the | ||
| + | two algorithms, identifying sensitivity as the factor | ||
| + | which is most important. Sensitivity reflects on our | ||
| + | ability to confidently tell apart a good registration | ||
| + | from a worse one. The slighter the difference which can | ||
| + | be detected reliably, the more sensitive the method. | ||
| + | To calculate sensitivity, we compute the amount of | ||
| + | change in terms of mean pixel displacement -- | ||
| + | deviation from the correct solution, that is. We then | ||
| + | look at differences in our assessor's value, be it | ||
| + | overlap, or Specificity, or Generalisation. We also must | ||
| + | stress the need to take account of the errors bars as | ||
| + | there is both an inter-instantiation error and a | ||
| + | measure-specific error; the two must be composed | ||
| + | carefully. The derivation of sensitivity can be | ||
| + | expressed as follows: | ||
| + | |||
| + | placeholder | ||
| + | |||
| + | where X is... (TODO) | ||
| + | |||
| + | <Graphics file: ./Graphics/3.eps> | ||
| + | |||
| + | |||
| + | Figure Z. The sensitivity of registration assessment methods. | ||
| + | note to self: exclude Gen.? Combined? Plots? | ||
| + | -. | ||
| + | |||
| + | Figure Z suggests that, for roughly any selection of | ||
| + | shuffle distance neighbourhood, the method which does | ||
| + | not require ground truth is more sensitive than the | ||
| + | method which depends on it. When the trends of these | ||
| + | curves are inspected closely, it can be observed that | ||
| + | they are approximately parallel, which implies that the two | ||
| + | methods are very closely correlated. | ||
| + | |||
| + | In summary, we have shown two valid methods for | ||
| + | assessing non-rigid registration. The methods are | ||
| + | correlated in practice, but the principles they build | ||
| + | upon are quite separable. Their pre-requisites -- if | ||
| + | any -- likewise. Registration can be evaluated with or | ||
| + | without ground-truth annotation and the behaviour our measures is consistent across distinct datasets, is | ||
| + | well-behaved, and is sensitive. Both methods have been | ||
| + | successfully applied to assessment of non-rigid | ||
| + | registration algorithms and both led to the | ||
| + | expected conclusions. That aspect of the work, | ||
| + | nonetheless, is beyond the scope of this paper. | ||