Skip to content

Yusuf Tas

  • Home
  • Deep Learning
  • Data Science
  • General Programming
  • Contact
  • About
April 10, 2018 / Computer Vision

Using Tensorflow Object Detection API with OpenCV

In this post, I will go over how to use Tensorflow Object Detection API within OpenCV. To be honest, I haven’t used OpenCV for quite some time. And after recently looking into it, I have realized how awesome OpenCV has become. It now has a dedicated DNN (deep neural network) module. This module also has functionality to load Caffe and Tensorflow trained networks. I am just so happy to see that functionality in OpenCV. Just think about it, you can use your Caffe or Tensorflow trained networks within OpenCV.

Alright, enough blubbering, let’s get back to the topic. In this post, I will use OpenCV DNN’s functionality to load a trained tensorflow network and use this network to apply object detection to a webcam stream. So in the end, we will have a display that shows webcam stream and in the stream we modify the frames and display detected objects with rectangles. Before we begin, let’s start with the result:

Firstly, let’s look into the network. Technically there are many networks you can use, for this post, I’ve decided to use Mobilenet. You can read their paper from this link: https://arxiv.org/abs/1704.04861. In summary, authors try to decrease network capacity to decrease processing time while also trying to maintain accuracy. That approach makes it even possible to use this network in mobile phones. Try using VGG16 network in a device with limited power and memory 🙂 I won’t go into details of the network structure, if you would like to learn the details, I would highly suggest you to read the paper.

Before we start coding, I need to mention that everything in this tutorial can be done with using only Tensorflow Object Detection API. But what OpenCV does is to take an image processing algorithm and make it so easy to use. So in this tutorial I’m using OpenCV’s DNN module to take care of everything; load the network, do a forward pass with frames and get bounding box predictions etc. You don’t even need to import tensorflow for this tutorial.

Alright, first thing is to get camera stream and display it. Later on, we will expand this code to apply object detection to each frame instead of just displaying. Displaying the camera stream is pretty straightforward in OpenCV:

Webcam streaming
Python
1
2
3
4
5
6
7
8
9
10
11
12
cam = cv.VideoCapture(0)
 
while True:
    ret_val, img = cam.read()  #read the frame from webcam
 
    cv.imshow('my webcam', img)
 
    if cv.waitKey(1) == 27:     # 27 is the character code for ESC
        break  # esc to quit
 
cam.release()                   # stop webcam stream
cv.destroyAllWindows()

Now, we got the webcam stream working, next step is to integrate this with object detection. For object detection, I will use Mobilenet as mentioned above. You can find Mobilenet and also inceptionv2 model from Opencv’s documentation. You can also try Inceptionv2 model, but if you don’t have a good GPU, webcam stream will be laggy since the processing time of one frame will limit the FPS. I highly recommend using Mobilenet, if you don’t have a high-end PC.

Now let’s look into object detection part :

Object detection from webcam stream
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
cam = cv.VideoCapture(0)  
 
#pb  = './inceptionv2.pb'
#pbt = './inceptionv2.pbtxt'
 
pb  = './mobilenet.pb'
pbt = './mobilenet.pbtxt'
 
 
cvNet = cv.dnn.readNetFromTensorflow(pb,pbt)    # read the network
 
while True:
    ret_val, img = cam.read()
 
    rows = img.shape[0]
    cols = img.shape[1]
    cvNet.setInput(cv.dnn.blobFromImage(img, 1.0/127.5, (300, 300), (127.5, 127.5, 127.5), swapRB=True, crop=False))
    cvOut = cvNet.forward()
 
    for detection in cvOut[0,0,:,:]:
        score = float(detection[2])
        if score > 0.3:
            left = detection[3] * cols
            top = detection[4] * rows
            right = detection[5] * cols
            bottom = detection[6] * rows
            cv.rectangle(img, (int(left), int(top)), (int(right), int(bottom)), (23, 230, 210), thickness=2)
 
            idx = int(detection[1])   # prediction class index.
            
            # draw the prediction on the frame
            label = "{}: {:.2f}%".format(classes[idx],score * 100)
            y = top - 15 if top - 15 > 15 else top + 15
            cv.putText(img, label, (int(left), int(y)),cv.FONT_HERSHEY_SIMPLEX, 0.5, colors[idx], 2)
            
    cv.imshow('my webcam', img)
 
    if cv.waitKey(1) == 27:
        break  # esc to quit
 
cam.release()
cv.destroyAllWindows()        

I extended the first code snippet to apply object detection. There are several important points in this code that I need to mention :

  1. Line 10. Here we load the trained network file. Opencv converts it from tensorlow format to OpenCV format. It requires two arguments : frozen inference graph and network description file. You can download these files from the OpenCV wiki page. And if you download it from there, you need to extract frozen_inference_graph file from the compressed file.
  2. Line 17. In this line, we convert the frame to BLOB type and set this as input to the network. Blob is the 4d matrix representation of your data. Depending on the framework, ordering of axes changes but generally it is (batch_size, channels, row, col) or something close. Since we use only one frame as input, in the output blob; batch size will be one. Channels will be 3 and we convert the ordering of channels BGR(opencv style) to RGB (normal style 🙂 ). And row,col will be (300 300). This input size will be dependent on the network you are using.
  3. Line 17. Again in line 17, we apply preprocessing to the input image. OpenCV just combined many things into one single line 🙂 . In this preprocessing, we subtract the mean values (R,G,B channels respectively) and rescale the input values according to the network’s specifications. This depends on the pretrained network, some networks just do mean subtraction while some will do mean subtraction and scaling as well. Depending on the network you are using, you should follow its preprocessing procedures.
  4. Line 20. In this line, we are iterating over each detection box. Each detection box will have several values:
    • detection[0] : unknown. I couldn’t find what it stands for, let me know if you know it. 🙂
    • detection[1] : Classification index, any value between 0 to class_count.
    • detection[2] : Classification probability 0-1. Higher the number, higher probability network believes it belongs to the class detection[1].
    • detection[3:7] : Bounding box coordinates
  5. Line 27 and 34. In line 27 we draw the rectangle of detection. And in line 34, we add a text of the predicted class of the detection. Note that this code snippet doesn’t show the class list in text. For clarity, I didn’t add it here to reduce clutter. If you would like to see the full code, you can find it in my github page.

I think this is all I can add to this post, you can find the full version of the jupyter notebook file my github page. Put a comment down below if you have questions, and as always keep learning.

Post navigation

Previous Post:

Caffe Python Installation with Anaconda

Next Post:

Beginner’s Guide to R

One comment

  1. Morne Supra says:
    May 7, 2019 at 6:29 am

    Hi Yusuf

    When running your code, I get an segmentation fault core dumped when executing cvOut = cvNet.forward() in the code.

    I use a self trained mobilenet model.

    Do you have any idea why this is happening?

    Regards
    Morne

    Reply

Leave a Reply Cancel reply

Reach me

  • Github
  • Twitter
  • Youtube
  • Mail

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Recent Posts

  • Convert Web Page to Text
  • Cryptocurrency Trading Bot Using Deep Learning: Part-1 Data Gathering
  • Beginner’s Guide to R
  • Using Tensorflow Object Detection API with OpenCV
  • Caffe Python Installation with Anaconda

Recent Comments

  • juan on Cryptocurrency Trading Bot Using Deep Learning: Part-1 Data Gathering
  • Maajid Khan on Caffe Python Installation with Anaconda
  • Nick G on Caffe Python Installation with Anaconda
  • Morne Supra on Using Tensorflow Object Detection API with OpenCV
  • Leon Ardo on Cryptocurrency Trading Bot Using Deep Learning: Part-1 Data Gathering

Categories

  • Computer Vision (1)
  • Data Science (4)
  • Deep Learning (5)
  • General Programming (4)
  • Life (1)

Tags

algorithm trading anaconda pycaffe automatic trading bot beautifulsoup beginner r binance-api caffe compiling errors caffe python installation cell array cryptocurrency data science deep learning deeptraderbot fc layer tensorflow fully connected networks hello world html render learning randomness life matlab mobilenet nlp opencv object detection phd prng pycaffe python python-binance r random randomness random pattern random walk random walking r vs python simple tensorflow square function statistics struct tensorflow tensorflow object detection web2text webcam object detection webcam stream web to text
© 2023 Yusuf Tas - Powered by SimplyNews
 

Loading Comments...