Connected Component Labelling
Jack Lawrence-Jones, 2nd August 2016
Connected Component Labelling (CCL) is a technique used in Image Processing to identify blobs of pixels in an image. A blob, or connected component, is an area of connected foreground pixels: a single shape made up of a continuous mass of pixels, where from any pixel inside it you can travel to any other pixel inside it, without ever leaving the shape.
CCL is often used as part of the image segmentation step in a Computer Vision pipeline. Once the connected components in an image have been labelled, each one can be individually further analysed, eg classified (determining what kind of thing the object is). A common example is in Optical Character Recognition (recognition of handwritten or typed text in images): each connected component is likely to be an individual letter, so once they have been identified, they can each be passed to a character recognition stage (e.g. a neural net).
Some modern image processing pipelines have replaced techniques like CCL with end-to-end neural nets (deep neural nets). However, such techniques are still key for applications such as live object detection and tracking and in embedded systems.
Many different CCL algorithms exist, including those optimising for space complexity, time complexity, and for parallel processing. In this article we'll look at the classical algorithm, designed by Rosenfeld and Pfaltz in 1966 using results from graph theory. We scan through the image sequentially from top to bottom and left to right, labelling each pixel based on the labels of its surrounding pixels. A second pass is then needed to correct labelling inconsistencies that certain shaped components cause (equivalent labels are stored in a "Union-Find" data structure - more on this later). Let's go through the algorithm in more detail - enjoy!
Two pass Connected Component Labelling with Union-Find
Let's restrict our inputs to binary (black and white) images. Each pixel can either be a foreground (black) pixel, or a background (white) pixel. We will use the counting numbers (positive integers) to label components. Background pixels will be labelled '0'. So, we want our algorithm to do the following operation:
First Pass
In the first pass we scan through the image, pixel by pixel, and look at each pixel's neighbours. A pixel's neighbours are the pixels that immediately surround it. If a pixel is neighbours with a labelled pixel, it is clearly part of the same component, so should be given the same label as its neighbour. Which specific neighbouring pixels we look at - the 'connectivity' - depends on the purpose of our image analysis: we can choose depending on the type of image, what it contains, what we want etc. Two commonly used connectivities are 4-connectivity and 8-connectivity.
4-connectivity includes the neighbours to the North, East, South and West of the current pixel. 8-connectivity additionally includes the North-East, South-East, South-West and North-West neighbours:
For this article we will use 4-connectivity.
As we scan through the image, row by row from top to bottom, and within each row left to right, we only need to examine those neighbours above and to the left of the current pixel. This is because of the direction in which we are scanning through the image - pixels to the right of and below the current pixel won't have been processed yet, so obviously won't be labelled. Therefore, our labelling kernel (the shape we are using to scan through the image and get each pixel's neighbours) looks like this:
Once the labels from the relevant neighbours have been retrieved, there are three potential scenarios:
- The pixel has no labelled neighbours (no neighbouring foreground pixels). Therefore, this pixel is the first pixel of a new shape, so should be given a new label. A label counter is used to keep track of the labels that have already been used, so that we make sure a unique new label is given.
- The pixel has one or more neighbours with the same label. The current pixel is therefore part of the same shape, so is given the same label as its neighbour(s).
- The pixel has multiple neighbours with different labels. This still means that the current pixel is part of the same shape as it's neighbours, however a labelling inconsistency has been found: our algorithm didn't 'know' that the two neighbours were part of the same component until now, as the current pixel joins them up. This inconsistency will need to be fixed in the second pass. For now, label the current pixel with the smallest of its neighbours' labels.
Handling inconsistencies...
Certain specific shapes lead to the labelling inconsistencies encountered in scenario 3 above, resulting in the component containing areas of pixels with different labels - not good. This is due to the direction which we scan through the image - components with two sections that only join on the right side will 'look' like 2 separate components until that right joining section is encountered, so will be given different labels.
To fix these labelling errors, we need to record them when we encounter them during the first pass. Once these label equivalences (they are equivalent because they should actually be the same label) have been recorded, we can fix them with a second pass of the image. To store the label equivalences efficiently, we use the Disjoint-Set data structure (a.k.a. Union-Find). This keeps track of which labels are equivalent, and lets us efficiently retrieve the lowest label (the 'representative') in each set of equivalent labels.
Therefore, if we encounter the following situation during the first pass:
In our disjoint-set data structure, we record that the labels 1 and 3 are equivalent.
Second Pass
Now we use the recorded label equivalences to fix any labelling inconsistencies from the first pass.
Again scanning through the image pixel by pixel, for each labelled pixel we check if we recorded any equivalent labels in our disjoint-set data structure. If we did, then we replace the pixel's label with the lowest label in its equivalence set. Eventually, every label is replaced with the lowest ('representative') label of its set. We're done ;)
Optional Third Pass
You'll see in the Python code that there is an optional third pass. In some cases, due to certain component shapes and arrangements, the labels given to them won't be consecutive integers. The third pass flattens the labels so that they are consecutive.
Complexity
This section discusses the time and space complexities of our algorithm. If you don't know what big O notation is, check this out before continuing.
Initialisation
Let's say the image has N pixels (if it is X pixels wide and Y pixels high, then N = X*Y). We start by computing the height and width of the image to use in our for loops; these are both O(1) using Python's len() function (see here). We also intialise the labelled output image (I chose not to implement in-place labelling), which is O(N), and intialise the Union-Find object and the label counter - both O(1).
First Pass
In the first pass, for each pixel we look at its previously looked-at neighbours (2 neighbours for 4-connectivity, 4 neighbours for 8-connectivity). If any Union-Find operations need to be performed, due to our efficient implementation of the Union-Find data structure, they have effective average cost of O(1). Therefore our initial set-up code and first pass have total time cost O(N).
Second Pass
The second pass looks at each of the N pixels and replaces each label with the representative label of its equivalence class, correcting any labelling inconsistencies. The Find() operation used to get the representative labels is effectively O(1). We also record each label used in a Python dictionary, so we can then flatten the labels in the third pass. As this is implemented as a hash table in Python, checking key membership has amortized cost O(1). Therefore the second pass is also O(N).
Third Pass
The third pass looks at each pixel and replaces the label with its flattened label list equivalent. Therefore, the third pass has time complexity O(N).
Total
Adding this all together, and removing constant multipliers, our algorithm has time complexity O(N). It requires additional space O(N + L): O(N) for the output labelled image and O(L) for both the Union-Find object and the label hash table constructed for the third pass (where L is the number of labels created). As the maximum possible number of labels required to label an image is N/2 (a chess board pattern, using 4-connectivity), O(N+L) becomes O(N). So we have O(N) time and O(N) space!