Projects > Hierarchical Detection Network (HDN)  

Multiple Object Detection by Sequential Monte Carlo and Hierarchical Detection Network (HDN)

This page gives a high level overview of our research on Hierarchical Detection Network (HDN). For more details, please refer to our article published in CVPR 2010 proceedings.



In this paper, we propose a novel framework for detecting multiple objects in 2D and 3D images. Since a joint multi-object model is difficult to obtain in most practical situations, we focus here on detecting the objects sequentially, one-by-one. The interdependence of object poses and strong prior information embedded in our domain of medical images results in better performance than detecting the objects individually. Our approach is based on Sequential Estimation techniques, frequently applied to visual tracking. Unlike in tracking, where the sequential order is naturally determined by the time sequence, the order of detection of multiple objects must be selected, leading to a Hierarchical Detection Network (HDN). We present an algorithm that optimally selects the order based on probability of states (object poses) within the ground truth region. The posterior distribution of the object pose is approximated at each step by sequential Monte Carlo. The samples are propagated within the sequence across multiple objects and hierarchical levels. We show on 2D ultrasound images of left atrium, that the automatically selected sequential order yields low mean detection error. We also quantitatively evaluate the hierarchical detection of fetal faces and three fetal brain structures in 3D ultrasound images.

Motivation and Intuition

The most challenging aspect of the multi-object detection algorithms is designing detectors that are fast and robust, modeling the spatial relationships between objects, and determining the detection order. In this paper, we propose a multi-object detection system that addresses these challenges.

The computational speed and robustness of our system is increased by hierarchical processing. In detection, one major problem is how to effectively propagate object candidates across the levels of the hierarchy. This typically involves defining a search range at a fine level where the candidates from the coarse level are refined. Incorrect selection of the search range leads to higher computational speeds, lower accuracy, or drift of the coarse candidates towards incorrect refinements. The search range in our technique is part of the model that is learned from the training data. Furthermore, our detection schedule is designed to minimize the uncertainty of the detections and optimally select schedule of the hierarchical scales.

Figure 1: Examples of multi-object detection: five landmarks of left atrium (LA) apical two chamber (A2C) view (left) and 3D ultrasound volume of fetal brain with three anatomies (right).

left atrium US volume fetal brain structures

Our approach to multi-object detection is motivated by Sequential Estimation techniques, frequently applied to visual tracking. We sample from a sequence of probability distributions, but the sequence specifies a spatial order rather than a time order in tracking. The posterior distribution of each object pose (state) is estimated based on all observations so far. The observations are features computed from image neighborhoods surrounding the objects. The likelihood of a hypothesized state that gives rise to observations is based on a deterministic model learned using a large annotated database of images. The transition model that describes the way the poses of objects are related is Gaussian.

Employing the sequential sampling model allows us to use fewer samples of the object pose and formally extend this class of algorithms to multiple objects. This saves computational time and increases accuracy since the samples are taken from the regions of high probability of the posterior distribution. Many ideas from the Sequential Sampling literature on visual tracking can likely be extended to multi-object detection. We demonstrate the benefit of the sampling when detecting multiple landmarks in 2D images of the left atrium. Unlike in tracking, where the sequential order is naturally determined by the time progression, the order in multi-object detection must be selected. In our algorithm, the order is selected such that the uncertainty of the detections is minimized. So, instead of using the immediate precursor in the Markov process, the transition model could be based on any precursor, which is optimally selected. This leads to a Hierarchical Detection Network (HDN). The likelihood of a hypothesized pose is computed using a trained detector. The detection scale is introduced as another parameter of the likelihood model and the hierarchical schedule is determined in the same way as the spatial schedule.


The goal is to automatically determine the detection order of five left atrium landmarks. The landmark detectors are trained independently using 281 annotated images. Total of 46 annotated images from the testing data set were used to obtain the detection order. The remaining 90 cases were used for detection and evaluation comparison.

Figure 2: The final automatically selected detection order. At first, it might seem that landmarks 01 and 17 would be preferred over landmarks 5 and 13 due to the higher distinctiveness of the region. However, the high appearance variation of these landmarks causes preference of landmarks 05 and 13.

left atrium landmarks detection order

Our next experiment is on detecting three fetal brain structures in 3d ultrasound data. The output of the system is a visualization of the plane with correct orientation and centering as well as biometric measurement of the anatomy. A total of 589 expert-annotated images were used for training and 295 for testing. The volumes have average size 250 × 200 × 150 mm. We use three resolutions in a hierarchical system shown in Figure 3. Quantitative evaluation is in Table 1 and several examples of detected structures in Figure 3. The HDN average detection error 2.2 mm is lower compared to 4.8 mm error of a system without HDN.

Figure 3: The detection order and the hierarchy of three brain structures: Cerebellum (CER), Cisterna Magna (CM), and Lateral Ventricles (LV). Scale selection is applied.

fetal head detection order

Table 1: Measurement errors of the hierarchical detection system (top part of the table) compared to an earlier system without the hierarchy. Mean error, standard deviation, median error, and maximum error are computed. The system was trained using number of volumes specified in the 6th column and tested on the number of volumes specified in the 7th column. The average detection error using the hierarchy is 2.2 mm on data with 1 mm finest resolution. The average error of the system without the hierarchy is 4.8 mm.

mean std median max #train #test
CER 2.289 0.884 2.213 4.197 589 295
CM 2.149 0.807 2.075 4.019 589 295
LV 2.245 0.817 2.154 3.891 589 295
CER 4.961 6.767 3.422 59.607 589 295
CM 4.989 6.832 3.519 68.679 589 295
LV 4.565 5.023 3.097 39.176 589 295

Figure 3: Final sequential detection result (cyan) compared to ground truth (red). Notice that the landmarks are accurately detected despite the noise, high appearance and shape variations, and shadowing effects. The landmark detection errors (in pixels) are shown below each image in the left-bottom-right order.

LA 1 LA 2 LA 3 LA 4

(6.39, 6.91, 4.64, 7.21, 6.26)

(2.42, 6.84, 9.95, 7.33, 8.41)

(7.94, 5.03, 7.03, 8.18, 5.00)

(5.24, 9.83, 6.16, 5.12, 7.71)

Figure 4: Final hierarchical detection result (cyan) compared to ground truth (red). The last two columns show the agreement of the detection plane in the sagittal and coronal cross section.

fetal head 1 fetal head 2 fetal head 3 fetal head 4
fetal head 1 fetal head 2 fetal head 3 fetal head 4
fetal head 1 fetal head 2 fetal head 3 fetal head 4


We have presented a Sequential Monte Carlo based Hierarchical Detection Network (HDN) for detecting multiple objects. The order of detection is automatically determined by a greedy algorithm that puts the most reliable detections earlier in the detection sequence. The detectors are organized in a multi-scale hierarchy with the scale parameter included in the order selection process. We have shown the effectiveness of the automatic order selection process on the detection of five left atrium landmarks in 2D ultrasound images. The multi-scale hierarchical detectors have higher detection accuracy than systems based on a single level as we demonstrated on detection of fetal face and three fetal brain structures in 3D ultrasound images.

The described framework opens up several possible avenues of future research. One area we are particularly interested in is how to include dependence on multiple objects at each detection stage. This will result in a stronger geometrical constraint and therefore improve performance on objects that are difficult to detect by exploiting only the pairwise dependence.

Publications and Further Reading

Automatic Detection and Measurement of Structures in Fetal Head Ultrasound Volumes Using Sequential Estimation and Integrated Detection Network (IDN)
Michal Sofka and Jingdan Zhang and Sara Good and S. Kevin Zhou and Dorin Comaniciu
IEEE Transactions on Medical Imaging (TMI), vol. 33, no. 5, pp. 1054-1070, May 2014.
[pdf] [bibtex] [publisher]

Multiple Object Detection by Sequential Monte Carlo and Hierarchical Detection Network
Michal Sofka, Jingdan Zhang, S.Kevin Zhou, and Dorin Comaniciu
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010.
[pdf] [bibtex]

Integrated Detection Network (IDN) for Pose and Boundary Estimation in Medical Images
Michal Sofka, Kristof Ralovich, Neil Birkbeck, Jingdan Zhang, and S.Kevin Zhou
Proceedings of the 8th International Symposium on Biomedical Imaging (ISBI 2011), Chicago, IL, USA, 30 Mar-2 Apr 2011.
[pdf] [bibtex] [website]

Fast Boosting Trees for Classification, Pose Detection, and Boundary Detection on a GPU
Neil Birkbeck, Michal Sofka, and S.Kevin Zhou
Proceedings of the 7th IEEE Workshop on Embedded Computer Vision (in conjunction with IEEE CVPR) Colorado Springs, CO, USA, 20 Jun 2011.
[pdf] [bibtex]


Copyright 2015 Michal Sofka