YOLO Object Detection — Small Stuffs

Sanjiv Gautam
4 min readJun 4, 2020

So you know what object detection is. Detecting the bounding box inside the image. Certain things inside Object detection that needs to be addressed.

Intersection Over Union

Not really a good painter, but it works!

The yellow figure is what our model predicted. Red color is what the actual bounding box is and green is what the intersection of them. So what intersection over union gives us is the intersection between red rectangle and yellow rectangle.

IOU = The intersection between red and yellow region / Union between them. So we deal with objects having IOU > 0.5 or something!

Non Max Suppression

Problem with Object detection is that our algorithm can detect same object multiple times, so we have two or three bounding boxes that detects same object, but we don’t need that. We need one detection per object at max. So what we do is, we use non max suppression. So what every predicted box in object detection may think that they have found the center of the image (bx,by) which infact is not true. There is only one center of image. So what the algorithm does is find multiple of those bx,by and we have to select one of them using Non Max Suppression.

We know that in YOLO, it outputs the first value of predicted as 1 or 0 which means 1 means, the model thinks it has detected the center of the image and 0 means there is no center, we label this first value of that output array as Pc (Probability of Detection). What we do in non max suppression is:

  1. Select the one with the max Pc.
  2. So all other bounding box that is near to Pc, that has IOU greater than 0.5 or something will be discarded. Okay why we do this? It’s because, with respect to the one with the highest Pc, all bounding box that has IOU greater than 0.5, means the two bounding box overlap. Here is what I meant.

We have two detected boxes, so we choose the one with the highest Pc which is 0.8, but we also have other which is 0.6, which detects the same object. So we check if the one with Pc 0.6 and the selected (0.8) has IOU greater than 0.5, if they have, then they are pointing to same object, so we discard it. If its IOU is less than 0.5, means they maybe pointing to different objects.

Anchor Boxes

You know what problem we detected before hand? Same object is detected by multiple boxes. But what if there are multiple objects on same grid? In our 10*10 grid, what if that grid contains more than one images? I mean if you are detecting both man and car, they might be on the same grid right?. What model does is, it has to choose one between the two classifications.So what do we do in this case? We use ANCHOR BOXES!

What we do is we choose say 2 anchor boxes for each grid. So we have 16 output instead of 8. What does 8 contains? [Pc,bx,by,bh,by,c1,c2,c3], so c1,c2,c3 is for car,pedestrian,traffic light. So one anchor box may contain two of those 8 values, i.e. 16 values. [Pc,bx,by,bh,by,c1,c2,c3,Pc,bx,by,bh,by,c1,c2,c3]. So 16 values corresponds to 2 anchor boxes. So it can detect upto 2 different objects in same grid. First 8 contains for one object and last 8 for other object . So it can detect upto 2 images in same grid. If we increase the number of anchor boxes it can detect more objects in same grid.

So each object is assigned to specific grid, and specific anchor box. So which anchor box is chosen ? The one with the highest IOU. So object with a specific grid (that model thinks that contains the object), and anchor box with highest IOU out of all those anchor boxes.

So what do we use anchor box? One anchor box corresponds to detecting one object. So what if we have 2 anchor box and 3 images in same grid? That’s the downside of YOLO, it cannot detect 3 objects in same grid, but 2 anchor boxes.

--

--

Sanjiv Gautam

Just an average boy who wishes mediocrity over luxury.