Multiple Object Detection by Sequential Monte Carlo
and Hierarchical Detection Network (HDN)
This page gives a high level overview of our research on Hierarchical Detection Network (HDN).
For more details, please refer to our article
published in CVPR 2010 proceedings.
In this paper, we propose a novel framework for detecting multiple objects in 2D and 3D images. Since a joint multi-object model is difficult to obtain in most practical situations, we focus here on detecting the objects sequentially, one-by-one. The interdependence of object poses and strong prior information embedded in our domain of medical images results in better performance than detecting the objects individually. Our approach is based on Sequential Estimation techniques, frequently applied to visual tracking. Unlike in tracking, where the sequential order is naturally determined by the time sequence, the order of detection of multiple objects must be selected, leading to a Hierarchical Detection Network (HDN). We present an algorithm that optimally selects the order based on probability of states (object poses) within the ground truth region. The posterior distribution of the object pose is approximated at each step by sequential Monte Carlo. The samples are propagated within the sequence across multiple objects and hierarchical levels. We show on 2D ultrasound images of left atrium, that the automatically selected sequential order yields low mean detection error. We also quantitatively evaluate the hierarchical detection of fetal faces and three fetal brain structures in 3D ultrasound images.
Motivation and Intuition
The most challenging aspect
of the multi-object detection algorithms is designing detectors that are fast
and robust, modeling the spatial relationships between
objects, and determining the detection order. In this paper, we
propose a multi-object detection system that addresses these
The computational speed and robustness of our system is
increased by hierarchical processing. In detection, one major
problem is how to effectively propagate object candidates
across the levels of the hierarchy. This typically involves
defining a search range at a fine level where the candidates
from the coarse level are refined. Incorrect selection of
the search range leads to higher computational speeds,
lower accuracy, or drift of the coarse candidates towards
incorrect refinements. The search range in our technique is part
of the model that is learned from the training data. Furthermore, our
detection schedule is designed to minimize the
uncertainty of the detections and optimally select schedule of the hierarchical
Our approach to multi-object detection is motivated by Sequential Estimation
techniques, frequently applied to visual tracking.
We sample from a sequence of
probability distributions, but the sequence specifies a spatial
order rather than a time order in tracking. The posterior
distribution of each object pose (state) is estimated based
on all observations so far. The observations are features
computed from image neighborhoods surrounding the objects.
The likelihood of a hypothesized state that gives rise to
observations is based on a deterministic model learned using a
large annotated database of images. The transition model
that describes the way the poses of objects are related is
Employing the sequential sampling
model allows us to use fewer samples of the object pose and
formally extend this class of algorithms to multiple objects. This
saves computational time and increases accuracy since the samples are taken from the regions of high probability of the
posterior distribution. Many ideas from the Sequential
Sampling literature on visual tracking can likely be extended to
multi-object detection. We demonstrate the
benefit of the sampling when detecting multiple landmarks in
2D images of the left atrium. Unlike in tracking, where
the sequential order is naturally determined by the time
progression, the order in multi-object detection must be
selected. In our algorithm, the order is selected such that the
uncertainty of the detections is minimized. So, instead of
using the immediate precursor in the Markov process, the
transition model could be based on any precursor, which is
optimally selected. This leads to a Hierarchical Detection
Network (HDN). The likelihood of a hypothesized pose is
computed using a trained detector. The detection scale is
introduced as another parameter of the likelihood model and the
hierarchical schedule is determined in the same way as the
The goal is to automatically determine the detection order of five left atrium
landmarks. The landmark detectors are
trained independently using 281 annotated images. Total of 46
annotated images from the testing data set were used to obtain
the detection order. The remaining 90 cases were used for
detection and evaluation comparison.
The final automatically selected detection order. At first, it might seem that landmarks 01 and 17 would be preferred over landmarks 5 and 13 due to the higher distinctiveness of the region. However, the high appearance variation of these landmarks causes preference of landmarks 05 and 13.
Our next experiment is on detecting three fetal brain structures
in 3d ultrasound data. The output of the system is a visualization
of the plane with correct orientation and centering as well as
biometric measurement of the anatomy. A total of 589
expert-annotated images were used for training and 295 for
testing. The volumes have average size 250 × 200 × 150 mm.
We use three resolutions in a hierarchical system shown in
Quantitative evaluation is in Table 1 and several examples of
detected structures in Figure 3. The HDN average detection
error 2.2 mm is lower compared to 4.8 mm error of a system
The detection order and the hierarchy of three brain structures: Cerebellum (CER), Cisterna Magna (CM), and Lateral Ventricles (LV).
Scale selection is applied.
Measurement errors of the hierarchical detection
system (top part of the table) compared to an earlier
system without the hierarchy. Mean error, standard
deviation, median error, and maximum error are computed.
The system was trained using number of volumes specified
in the 6th column and tested on the number of volumes
specified in the 7th column. The average detection error
using the hierarchy is 2.2 mm on data with 1 mm finest
resolution. The average error of the system without the
hierarchy is 4.8 mm.
| || mean || std || median || max || #train || #test
| CER || 2.289 || 0.884 || 2.213 || 4.197 || 589 || 295
| CM || 2.149 || 0.807 || 2.075 || 4.019 || 589 || 295
| LV || 2.245 || 0.817 || 2.154 || 3.891 || 589 || 295
| CER || 4.961 || 6.767 || 3.422 || 59.607 || 589 || 295
| CM || 4.989 || 6.832 || 3.519 || 68.679 || 589 || 295
| LV || 4.565 || 5.023 || 3.097 || 39.176 || 589 || 295
Final sequential detection result (cyan) compared to ground truth (red). Notice that the landmarks are accurately detected despite the noise, high appearance and shape variations, and shadowing effects. The landmark detection errors (in pixels) are shown below each image in the left-bottom-right order.
(6.39, 6.91, 4.64, 7.21, 6.26)
(2.42, 6.84, 9.95, 7.33, 8.41)
(7.94, 5.03, 7.03, 8.18, 5.00)
(5.24, 9.83, 6.16, 5.12, 7.71)
We have presented a Sequential Monte Carlo based Hierarchical Detection Network (HDN) for detecting multiple objects. The order of detection is automatically determined by a greedy algorithm that puts the most reliable detections earlier in the detection sequence. The detectors are organized in a multi-scale hierarchy with the scale parameter included in the order selection process. We have shown the effectiveness of the automatic order selection process on the detection of five left atrium landmarks in 2D ultrasound images. The multi-scale hierarchical detectors have higher detection accuracy than systems based on a single level as we demonstrated on detection of fetal face and three fetal brain structures in 3D ultrasound images.
The described framework opens up several possible avenues of future research. One area we are particularly interested in is how to include dependence on multiple objects at each detection stage. This will result in a stronger geometrical constraint and therefore improve performance on objects that are difficult to detect by exploiting only the pairwise dependence.
Publications and Further Reading
Automatic Detection and Measurement of Structures in Fetal Head Ultrasound Volumes Using Sequential
Estimation and Integrated Detection Network (IDN)
Michal Sofka and Jingdan Zhang and Sara Good
and S. Kevin Zhou and Dorin Comaniciu
IEEE Transactions on Medical Imaging (TMI), vol. 33, no. 5, pp. 1054-1070, May 2014.
Multiple Object Detection by Sequential Monte Carlo and Hierarchical Detection Network
Michal Sofka, Jingdan Zhang, S.Kevin Zhou, and Dorin Comaniciu
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010.
Integrated Detection Network (IDN) for Pose and Boundary Estimation in Medical Images
Michal Sofka, Kristof Ralovich, Neil Birkbeck, Jingdan Zhang, and S.Kevin Zhou
Proceedings of the 8th International Symposium on
Biomedical Imaging (ISBI 2011), Chicago, IL, USA, 30 Mar-2 Apr 2011.
Fast Boosting Trees for Classification, Pose Detection, and Boundary Detection on a GPU
Neil Birkbeck, Michal Sofka, and S.Kevin Zhou
Proceedings of the 7th IEEE Workshop on Embedded Computer Vision (in conjunction with IEEE CVPR)
Colorado Springs, CO, USA, 20 Jun 2011.