Custom Object Detection: Exploring Fundamentals of YOLO and Training on Custom Data | by Günter Röhrich | Jan, 2024


Leveraging Pre-trained Models, Augmenting Images and Bounding Boxes, and Unveiling the Power of Convolutional Neural Networks in Object Detection

Günter Röhrich
Towards Data Science

Deep learning has made huge progress over the last decade, and while early models were hard to understand and apply, modern frameworks and tools allow everyone with a bit of code understanding to train their own neural network for computer vision tasks.

In this article, I will thoroughly demonstrate how to load and augment data as well as the bounding boxes, train an object detection algorithm, and eventually see how accurately we’re able to detect objects in the test images. While the available tool kits have become much easier to use over time, there are still a few pitfalls you might run into.

Computer vision is both a very popular and, even more, a broad field of research and application. Advances that have been made in deep learning, especially over the last decade, tremendously accelerated our understanding of deep learning and its broad potential of usage.

Why do we see those advances right now? As Francois Chollet (the father of Keras library) describes it, we witnessed an increase of computational capabilities in CPUs that rose by a factor of roughly 5000, just between 1990 and 2010. Investments in GPUs have even gotten research further.

In general, we see three essential tasks that are related to CV:

  1. Image classification — this is probably the most intuitive task we can think of. Given an image, we want the algorithm to either assign a single class label (e.g. “cat”) to the image, or we rather aim at multiple classes, like “cat”, “dog” and “person” all in one single image.
  2. Image segmentation — This task is probably best known in context of our mobile phones. Whenever we select the “portrait” mode on our phone, we can observe our phone segmenting the main object from the background. If you’re using a virtual background in your company calls, it is also a segmentation task that is running in the background.
  3. Object detection — This is what y’all came for! We want to find certain objects in an image and draw rectangles around them. Each of those…



Source link

[aisg_get_postavatar size=64]