Capturing motion from video using the Emgu CV library.

0
103

 

 

Introduction

OpenCV (Open Source Computer Vision Library) is a freely available software library containing ready-to-use routines to process visual input such as images or videos. The functions and tools the library offers can be accessed via C/C++, Python or the .NET programming languages. In this article, I focus on the OpenCV wrapper Emgu CV, whose methods can be embedded in a C# program. Here I demonstrate how to load and play a movie, how to detect faces in an image (by using a pre-trained Haar classifier), how to apply the Farneback algorithm for dense optical flow (to capture motion) and how to use frame subtraction (i.e., subtracting pixel information of successive frames to capture motion).

The contents of the article can be read as an introductory guide to some of the commands the Emgu CV library offers, but they are also aimed to show that the presented techniques can be helpful for answering questions arising in the behavioral sciences. For this reason, the article not only comes with descriptions of code but also contains a small application, a video and some brief comments on how the described software routines might be used to record data for behavioral analyses.

As I access the Emgu CV wrapper via C#, a solid knowledge of this programming language is required to understand the code samples properly. By contrast, using the application that is included does not require any programming skills. The application was compiled in Visual Studio 2015 and is based on the .NET framework 4.5 and Emgu CV 3.0 (click here). To make the Emgu CV functions and classes available in the Visual Studio developmental environment several steps have to be taken. First, after choosing an application type (in my case a standard Windows Form Application) go to the Solution Explorer Window, right-click on “References” and choose “Add reference”. In the window that appears, select “Browse” and search for the folder, where Emgu CV has been stored during its installation. Then the DLL files contained in the “bin” folder need to be included. Entering the C# command “using” and the names of the Emgu CV libraries (e.g., using Emgu.Util;) in the main program allows to access the necessary components (see more on the last step in the Form1 file that can be downloaded via “Download source”). A detailed description of all these steps can be found online (click here ). 

I do not describe the mathematical background of the procedures I present here. I only give a description of how to use the tools the library offers. If one is interested in the math of all this, please, look it up in other articles.

Absolute Difference between Pixel Information of Successive Frames of a Movie

In the following, I present a simple method to extract the quantity of motion that occurs between two successive frames of a video. The pixels of a video can be turned into greyscale values with a color range from 0 to 255 (8-bit picture ranging from black to white). The absolute difference between the greyscale values of two frames (i.e., the difference between pixels on the same spot in both images), when no motion (or position shift of an object) has occurred results in a black image because pixel values cancel out each other (the operation gives 0 for all pixels). However, when there is a position shift different graduations of grey will appear (see picture above) in some spots of the “difference image”. The pixel values above a certain threshold (e.g., gray-scale value of 100) can be counted in order to estimate the quantity of motion. There are, of course, limitations and problems using this method (e.g., changing lighting conditions of pixel color) but, overall, it gives a simple and fairly robust estimate of how much change is going on.

Using the code

The piece of code presented below is only a skeleton version of the method used in the application (see download sample). It focuses on the most important Emgu CV commands needed to determine the absolute difference between successive frames. The code piece builds on steps that need to be done before. This includes initializing several variables such as the Capture class. To capture a movie a code line such as Capture capture_movie = new Capture(movie_name) is needed.  To get more insight into this, an inspection of the source code is helpful (see mnu_LoadMovie_Click). In summary, the code below grabs a frame, turns it into a greyscale frame, subtracts its pixel values from the previous frame, and shows the result of this procedure in a window (please note, that I mostly work with the Image<> class instead of Mat, because it has some features the Mat class does not offer).






 public void Abs_Diff_And_Areas_of_Activity()
        {
          

        
        
        img_abs_diff.SetZero(); 
                

        
  
        prev_frame = frame;

        
                
               
        
 
 
  
        capture_movie.SetCaptureProperty(CapProp.PosFrames, frame_nr);

        
  
        frame = capture_movie.QueryFrame();
              

        
  
        Size n_size = new Size(frame.Width / Convert.ToInt32(txt_resize_factor.Text),
              frame.Height / Convert.ToInt32(txt_resize_factor.Text));
                
        
  
        CvInvoke.Resize(frame, frame, n_size );
        CvInvoke.Resize(prev_frame, prev_frame, n_size);

        
        CvInvoke.Imshow("Movie resized", frame);
               
        
        Image<Gray, Byte> prev_grey_img, curr_grey_img;

        
        rev_grey_img = new Image<Gray, byte>(frame.Width, frame.Height);
        curr_grey_img = new Image<Gray, byte>(frame.Width, frame.Height);
 
  
        curr_grey_img = frame.ToImage<Gray, byte>(); 
        prev_grey_img = prev_frame.ToImage<Gray, Byte>();

        
 
        
        CvInvoke.AbsDiff(prev_grey_img, curr_grey_img, img_abs_diff);

        
  

        
 
  

        
        CvInvoke.Imshow("Frame Subtraction", img_abs_diff);


         
         curr_grey_img.Dispose();
         prev_grey_img.Dispose();

                
       }

 

Detecting Faces in Images by using a Haar Classifier

Everybody who uses a modern digital camera or the camera on their smartphone has come in contact with the automatic face detection feature of these devices. Such object detection can also be done using tools provided by OpenCv and Emgu CV. These tools are based on machine learning algorithms for object identification, or more precisely on so-called Haar classifiers. Haar classifiers are trained with a large number of positive  (e.g., faces) and negative examples (e.g., images of the same size, which are not faces). Such a classifier can be then applied to unclassified images (e.g., images with faces) in order to identify objects in them (i.e., objects for which the classifier was trained). OpenCv offers ready-to-use xml.files containing data to detect different kinds of objects (e.g., faces, eyes etc.). The code presented below makes use of such a pre-trained classifier. However, it is also possible to create one’s own xml.file for object classification.

As in the code example given above a movie has to be loaded first, in order to apply the commands of the subsequent code piece (i.e.,   Capture capture_movie = new Capture(movie_name)). The code contains the basic principles of using a Haar classifier in Emgu CV. Using a different classifier (e.g., eyes) would give different results, of course, but the general principle would be the same.

Using the code






private void Face_Detect()
 {

 double rect_size = 0;

 
  
 Rectangle largest_rect = new Rectangle();

 

 
 
  
 CascadeClassifier haar = new CascadeClassifier("haarcascade_frontalface_default.xml");

 
 capture_movie.SetCaptureProperty(CapProp.PosFrames, frame_nr);

 
 frame = capture_movie.QueryFrame();

 
  
 grabbed_image = frame.ToImage<Bgr, Byte>();

 
 Size n_size = new Size(grabbed_image.Width / Convert.ToInt32(txt_resize_factor.Text),
 grabbed_image.Height / Convert.ToInt32(txt_resize_factor.Text));

 
 
 
 CvInvoke.Resize(grabbed_image, grabbed_image, n_size);
 

 
 Image<Gray, Byte> grey_img = new Image<Gray, byte>(grabbed_image.Width, grabbed_image.Height);
 
 grey_img = grabbed_image.Convert<Gray, byte>();
 
 
 Rectangle[] rect;

 
 
 
  
 
  
 
  rect = haar.DetectMultiScale(grey_img, 1.1, 3);

 
 
 foreach (var ele in rect)
 {

 
 if ((ele.Width * ele.Height) > rect_size)
 {
 rect_size = ele.Width * ele.Height;
 largest_rect = ele;
 }

 
 grabbed_image.Draw(ele, new Bgr(255, 0, 0), 3);

 }

 
 grabbed_image.Draw(largest_rect, new Bgr(0, 225, 0), 3);

 
 CvInvoke.Imshow("Original Video", grabbed_image);

 
 grey_img.Dispose();
 haar.Dispose();

 

 }

 

Applying Dense Optical Flow to Capture Pixel Position Shifts Occurring between Successive Frames of a Movie

When the term optical flow was coined (Gibson, 1940) it was mainly reserved for describing movement patterns caused by the relative motion between an observer and a scene. More precisely, it described the apparent (i.e., in principle non-existent) motion of objects, surfaces, and edges the eye has to process when people or animals move around in their environments. A modern – maybe hard to digest – definition says that the optical flow is the distribution of the apparent velocities of movement of brightness patterns in an image.

Similar to the frame subtraction method presented above, optical flow algorithms process changes in pixel color to detect motion. Overall, there are two main categories of algorithms, namely the sparse and the dense optical flow. The former uses a small set of vital features to detect motion, whereas the latter processes all the pixel information that is there. Dense optical flow is more accurate but also needs more resources. In the example below I present code for dense optical flow based on the Gunnar Farneback algorithm because, for the work I do, accuracy is more important than processing speed. The sample code is split into two functions. The first function presents code on the optical flow procedure; the second function gives insight on how to access the results of the procedure and how to draw these results onto the screen.  More information on the parameters of the Farneback algorithm can be found on Emgu CV and OpenCV webpages. I do not (and cannot) give information about the internal structure of the algorithm.

Again, as in the other code examples, a movie has to be loaded first to apply the commands of the subsequent code piece (i.e.,   Capture capture_movie = new Capture(movie_name)). The code skeleton for the Draw_Farneback_flow_map() function only focuses on the lines that are needed to access information of pixel shifts and how to make these shifts visible. In the source code file a great deal of extra code can be found (e.g., the sum of all vectors for left and right side separately, information about changes in direction etc.).

Using the code






public void Dense_Optical_Flow()
 {

 

 
  
 prev_frame = frame;

 

 
 capture_movie.SetCaptureProperty(CapProp.PosFrames, frame_nr);

 
 frame = capture_movie.QueryFrame();

 

 
 Size n_size = new Size(frame.Width / Convert.ToInt32(txt_resize_factor.Text),
 frame.Height / Convert.ToInt32(txt_resize_factor.Text));

 
 CvInvoke.Resize(frame, frame, n_size);
 CvInvoke.Resize(prev_frame, prev_frame, n_size);

 
 Image<Gray, Byte> prev_grey_img, curr_grey_img;

 prev_grey_img = new Image<Gray, byte>(frame.Width, frame.Height);
 curr_grey_img = new Image<Gray, byte>(frame.Width, frame.Height);

 
  
 Image<Gray, float> flow_x;
 Image<Gray, float> flow_y;

 flow_x = new Image<Gray, float>(frame.Width, frame.Height);
 flow_y = new Image<Gray, float>(frame.Width, frame.Height);

 
 curr_grey_img = frame.ToImage<Gray, byte>();
 prev_grey_img = prev_frame.ToImage<Gray, Byte>();

 
 
 
 
  
           
           
           
           
           
           
 CvInvoke.CalcOpticalFlowFarneback(prev_grey_img, curr_grey_img, flow_x, flow_y, 0.5, 3, 15, 3, 6, 1.3, 0);

 
 Draw_Farneback_flow_map(frame.ToImage<Bgr, Byte>(), flow_x, flow_y, overall_step);
 

 
 prev_grey_img.Dispose();
 curr_grey_img.Dispose();
 flow_x.Dispose();
 flow_y.Dispose();

 
 
 }
private void Draw_Farneback_flow_map(Image<Bgr, Byte> img_curr, 
 Image<Gray, float> flow_x, Image<Gray, float> flow_y, int step, int shift_that_counts = 0)
 {

 
 
 
  
  
 
  
 Point from_dot_xy = new Point(); 
 
  Point to_dot_xy = new Point(); 
 
 MCvScalar col; 
 col.V0 = 100;
 col.V1 = 255;
 col.V2 = 0;
 col.V3 = 0;

 

 
 
  
 
 for (int i = 0; i < flow_x.Rows; i += step) 
 for (int j = 0; j < flow_x.Cols; j += step) 
 {

 
 
 
  
 to_dot_xy.X = (int)flow_x.Data[i, j, 0]; 
 to_dot_xy.Y = (int)flow_y.Data[i, j, 0]; 

 from_dot_xy.X = j; 
 from_dot_xy.Y = i; 

 
  
 to_dot_xy.X = from_dot_xy.X + to_dot_xy.X; 
 to_dot_xy.Y = from_dot_xy.Y + to_dot_xy.Y; 

 

 
 CvInvoke.Line(img_curr, from_dot_xy, to_dot_xy, col, 2); 

 
 CvInvoke.Imshow("Flow field vectors", img_curr); 

 } 

 

 

 
 }



Points of Interest

The article comes with a small application that can perform all the analyses that are described above and even more than that. Since I do research in the field of non-verbal communication my main interest is in extracting nonverbal cues from human behaviors. For this reason, the application contains some extras that are not mentioned in the code samples above. The frame subtraction section contains an additional function that stores all values above a certain threshold and produces an image of the areas where changes in pixel color have occurred. The optical flow functions contain code that calculates a summed direction vector for the right and the left side of the window. They also provide information about the changes in the directions of the summed vectors (in an additional window). Moreover, there are code passages, which store the information extracted by the routines described here. All of this is intended to be used to do automated analyses of human motion behavior.

The user interface of the application informs about the number of total frames of a video, the frame rate, and the current frame number. It gives the threshold of the greyscale values that are accepted for the frame subtraction procedure (the default number of 100 means that only values above this threshold are used for the frame subtraction routine). “Steps” gives the number of frames a video will be pushed forward (or backward) after, for instance, using the “Forward” or the “Apply Stepwise” button. The “Divide Size by” text-field specifies to what extent the original video will be reduced in size (2 means that the width and the height of video will be halved). Making the video smaller speeds up the processing of image data.  The option buttons on the interface, the button “Play” and the options in the menu “File” are self-explanatory, I think. “Apply Stepwise” applies one of the image processing routines  (option buttons) to a video in a stepwise manner (applies it to the current frame and the current frame plus number given in “Steps”). Data captured with the software can be saved in txt.files (see menu). 

To have access to the Emgu CV routines a reference to the DLL files of the library is needed (by adding the Emgu CV  folder to the environmental variables of windows). It also possible to copy (may not be very elegant but is relatively simple) all necessary DLL’s to the folder where the exe. file of the program is.  If you are not interested in the code but want to use the software and have troubles with it, please, contact me

There is no guarantee that the samples presented here are free of bugs. Also, the code can, for sure, be organized in a more straightforward and parsimonious way. 

Acknowledgements

This work was supported by the Netherlands Institute for Advanced Study in the Humanities and Social Sciences (NIAS/KNAW), www.nias.knaw.nl. by the EURIAS Fellowship programme,  and by the European Commission (Marie-Sklodowska-Curie Actions – COFUND Programme – FP7).

LEAVE A REPLY