Nov 1, 2010

OpenCV SURF and Qt

Download Source

Due to the fact that I could find no real tutorial on how to use OpenCV's implementation of SURF (Speeded Up Robust Features) online, I decided to write one after I figured out how to use it.  SURF is a feature detection algorithm that finds the correspondence between two images.  This is useful for image stitching (creating wide panorama shots with multiple images) or just finding an object in an image.  The basic algorithm for SURF is as follows:

"First, 'interest points' are selected at distinctive locations in the image, such as corners, blobs, and T-junctions.  The most valuable property of an interest point detector is its repeatability, i.e. whether it reliably finds the same interest points under different viewing conditions. Next, the neighbourhood of every interest point is represented by a feature vector. This descriptor has to be distinctive and, at the same time, robust to noise, detection errors, and geometric and photometric deformations.  Finally, the descriptor vectors are matched between the vectors, e.g. the Mahalanobis or Euclidean distance.  The dimension of this vector has a direct impact on the time this takes, and a lower number of dimensions is therefore desirable." (Bay, Herbert et al. 2006. SURF: Speeded Up Robust Features)

Got that students?  Basically, we find some key points in the images that will uniquely identify them.  Once those key points are found we look at the neighboring pixels and put it in a feature collection.  Then we match the two vectors (match being the operative word here) by comparing their distance (Euiclidean or Mahalanobis).  That's the idea.  Simple but complex.  Well, you don't have to implement any of that goodness because OpenCV has done it for you.  Although it is a little complex to follow their sample code without someone helping you understand what is going on.  In this tutorial I will show you how to use OpenCV to extract the SURF key points and descriptors, compare them, and then stitch two images together based on those key points and descriptors.  My code is based directly on OpenCV's sample code find_obj.cpp where they use the SURF algorithm for feature detection.  

Before we start I wanted to talk about homographies.  A homography (for our purposes today) is a little 3x3 matrix that tells us how to map one image (a set of pixel) onto another image.  OpenCV has a function for finding this matrix once we have the key points and descriptors.  I won't focus on how this is found as that is not the scope of this tutorial.  Just know that we need it to stitch the two images together. If you want to study what a homography is, check out this page.

Here is the basic algorithm:
  1. Use cvExtractSURF to get key points and descriptors from both images.
  2. Find matching key points by comparing the distance between the key points. We will use a naive nearest neighbor approach.
  3. Once the pairs of key points are found, stick them into cvFindHomography to get the homography matrix.
  4. Use the homography to warp one image to the other.
We'll do this in chronological order (makes sense that way to me, anyway).  

Using cvExtractSURF
Given two QImages, I create two grayscale IplImages to work with OpenCV's methods. Don't worry about how to convert between QImage and IplImage.  I will post that code, but for general purposes you can just use an IplImage.

  IplImage* ipl1 = ImageStitcher::QImage2GrayscaleIplImage(scaledImage1);
   IplImage* ipl2 = ImageStitcher::QImage2GrayscaleIplImage(scaledImage2);
   CvMemStorage* memoryBlock = cvCreateMemStorage();
   CvSeq* image1KeyPoints;
   CvSeq* image1Descriptors;
   CvSeq* image2KeyPoints;
   CvSeq* image2Descriptors;
   // Only values with a hessian greater than 500 are considered for keypoints
   CvSURFParams params = cvSURFParams(500, 1);
   cvExtractSURF(ipl1, 0, &image1KeyPoints, &image1Descriptors, memoryBlock, params);
   cvExtractSURF(ipl2, 0, &image2KeyPoints, &image2Descriptors, memoryBlock, params);

Notice the basic idea here is pretty straightforward.  Get a couple of grayscale (has to be grayscale) IplImages.  Create a memory block to store your key points and descriptors.  Throw it into cvExtractSURF and it will populate the key points and the descriptors for you.  Don't worry about cvSURFParams or what a Hessian value is, just know that 300 to 500 is a good first parameter.  If that doesn't satisfy you, look up the documentation on the function and then go find out what a Hessian matrix is.  After you do this, you have got your key points and descriptors from both images.  We are now ready to compare them.

Matching Key Points in Both Images
Here we are going to match the image key points into pairs using a QVector. The first item in the QVector is the list of the first image's matched key points and the second item is the second image's matched key points.  

   // Find matching keypoints in both images
    QVector<QVector<CvPoint2D32f> > keyPointMatches;
    for (int i = 0; i < image1Descriptors->total; i++) {
        const CvSURFPoint* image1KeyPoint = (const CvSURFPoint*) cvGetSeqElem(image1KeyPoints, i);
        const float* image1Descriptor =  (const float*) cvGetSeqElem(image1Descriptors, i);
        int nearestNeighbor =
        if (nearestNeighbor == NO_NEIGHBOR) {
        keyPointMatches[0].append(((CvSURFPoint*) cvGetSeqElem(image1KeyPoints, i))->pt);
        keyPointMatches[1].append(((CvSURFPoint*) cvGetSeqElem(image2KeyPoints, nearestNeighbor))->pt);

So, loop over one of the image's descriptors and extract each of its key points and descriptors.  Now what you're actually extracting is a key point and a feature vector for that key point--its descriptor (the vector will be a good old C array).  Find the nearest neighbor, don't worry we'll get to that, and then match them up in our QVector.

OK, finding the nearest neighbor has several implementations.  This one is taken basically from OpenCV's sample code.

int findNaiveNearestNeighbor(
   const float* image1Descriptor, 
   const CvSURFPoint* image1KeyPoint, 
   CvSeq* image2Descriptors, 
   CvSeq* image2KeyPoints)
    int descriptorsCount = (int)(image2Descriptors->elem_size/sizeof(float));
    double minSquaredDistance = std::numeric_limits<double>::max();
    double lastMinSquaredDistance = std::numeric_limits<double>::max();
    int neighbor;
    for (int i = 0; i < image2Descriptors->total; i++) {
        const CvSURFPoint* image2KeyPoint = (const CvSURFPoint*) cvGetSeqElem(image2KeyPoints, i);
        const float* image2Descriptor = (const float*) cvGetSeqElem(image2Descriptors, i);
        if (image1KeyPoint->laplacian != image2KeyPoint->laplacian)
            continue; // Don't worry about key points unless laplacian signs are equal
        double squaredDistance = 

        if (squaredDistance < minSquaredDistance) {
            neighbor = i;
            lastMinSquaredDistance = minSquaredDistance;
            minSquaredDistance = squaredDistance;
        } else if (squaredDistance < lastMinSquaredDistance) {
            lastMinSquaredDistance = squaredDistance;
    if (minSquaredDistance < 0.7 * lastMinSquaredDistance)
        return neighbor;
    return NO_NEIGHBOR;

So, there is a lot to cover here but we'll do our best to get through it--stick with me.  Remember we are passing into this function an array of features from the first image, the key point associated with that array and then the second image's key points and associated feature vectors.  Now we loop through the second image's descriptors and extract each key point and feature vector.  The descriptorCount there at the top is important.  It tells us how long each feature vector array is.  Don't miss it.  The line about the laplacian is unimportant to grasp for our purposes, just know that key points don't really match if the sign of their laplacian doesn't match.  If your asking yourself, "What the heck is a laplacian?" then you are no different than many of us that first study computer vision.  It represents a spatial change in images and is used for edge detection.  It is a second order derivative for you math nerds that tells us how much a pixel sticks out from the average of its neighbors.  I myself am still trying to grasp the mathematics behind this little guy.  If anyone can correct me here, please do.

Once we are past that step, we get to the juicy goodness of comparing feature vectors.  We are going to compare the descriptors by finding the minimum squared distance between the first image's  feature vector and all of the second image's and descriptors. (Notice I use feature vector and descriptor interchangeably--they're the same thing).  We will do this by using the Euclidean squared distance (squared because it is faster than taking the square root of the thing).  We keep track of the last minimum we found so that we can tie some threshold to how "close" the descriptors in the two images really are.  I use 0.7--its a magic number.  Once you implement it you can play around with this threshold and find one that works best for your images.  Here is the method for comparing descriptors:

double compareSURFDescriptors(
   const float* image1Descriptor,
   const float* image2Descriptor,
   int descriptorsCount,
   float lastMinSquaredDistance)
    double totalCost = 0;
    for (int i = 0; i < descriptorsCount; i += 4) {
        QVector4D descriptor1(image1Descriptor[i+0], image1Descriptor[i+1], image1Descriptor[i+2], image1Descriptor[i+3]);
        QVector4D descriptor2(image2Descriptor[i+0], image2Descriptor[i+1], image2Descriptor[i+2], image2Descriptor[i+3]);
        totalCost += (descriptor2 - descriptor1).lengthSquared();
        if (totalCost > lastMinSquaredDistance)
   return totalCost;

Here we speed things up if our total cost gets higher than the best value we have seen so far.  Basically that is it for comparing and matching descriptors.  Once we have done that we can move on to finding the homography.

Finding the Homography
It's pretty simple folks--at least OpenCV has made it that way. Declare a 1D array of floats and OpenCV will use the matched key points to populate the homography.  Here it is:

CvMat image1Points = cvMat(1, matchingKeyPoints.first().count(), CV_32FC2, matchingKeyPoints.first().data());
   CvMat image2Points = cvMat(1, matchingKeyPoints.last().count(), CV_32FC2, matchingKeyPoints.last().data());

    double h[9];
    CvMat H = cvMat(3, 3, CV_64F, h);
    cvFindHomography(&image1Points, &image2Points, &H, CV_RANSAC, 9);

That little h[9] will now have the homography stored in row major order.  Once we have that we will apply the homography to every pixel in one of the images and this will essentially warp one of the images and then we can stick them on top of each other.  I won't go over the stitching here, but I will post the library that does this magic.  I called it ImageStitcher.  Feel free to take a look at it.

Download Source