Capstone Project: Comprehensive Computer Vision Application
Duration: 10 min
This module focuses on building a comprehensive computer vision application using advanced techniques such as Convolutional Neural Networks (CNNs), object detection algorithms like YOLO and Faster R-CNN, image segmentation with U-Net, and instance segmentation using Mask R-CNN. Understanding and implementing these techniques will enable you to develop robust and efficient computer vision solutions.
Understanding Convolutional Neural Networks (CNNs)
Convolutional Neural Networks (CNNs) are a class of deep neural networks, most commonly applied to analyzing visual imagery. They are composed of an input layer, convolutional layers, pooling layers, fully connected layers, and an output layer. CNNs are particularly effective in image recognition and classification tasks due to their ability to automatically and adaptively learn spatial hierarchies of features from input images.
import tensorflow as tf
from tensorflow.keras import layers, models
# Define a simple CNN model
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])Model: "sequential"
__________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 11, 11, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 3, 3, 64) 36928
_________________________________________________________________
flatten (Flatten) (None, 576) 0
_________________________________________________________________
dens (Dense) (None, 64) 36928
_________________________________________________________________
dens_1 (Dense) (None, 10) 650
=================================================================
Total params: 92,322
Trainable params: 92,322
Non-trainable params: 0
_________________________________________________________________Implementing Object Detection with YOLO
You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system. YOLO divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities. YOLO is known for its speed and accuracy, making it suitable for real-time applications.
import cv2
import numpy as np
# Load YOLO
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
# Load image
img = cv2.imread("image.jpg")
img = cv2.resize(img, None, fx=0.4, fy=0.4)
height, width, channels = img.shape
# Detecting objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)
# Showing informations on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
for detection in out:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5:
# Object detected
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
# Rectangle coordinates
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
font = cv2.FONT_HERSHEY_PLAIN
colors = np.random.uniform(0, 255, size=(len(class_ids), 3))
for i in range(len(boxes)):
if i in indexes:
x, y, w, h = boxes[i]
label = str(class_ids[i])
color = colors[i]
cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
cv2.putText(img, label, (x, y + 30), font, 3, color, 3)
cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()💡 Tip: Ensure that the YOLO weights and configuration files are correctly downloaded and placed in the working directory to avoid runtime errors.
❓ What is the primary advantage of using CNNs in image recognition tasks?
❓ What is the main benefit of using YOLO for object detection?