OpenCV for multi-target tracking

Keywords: Python OpenCV network calculator

OpenCV for multi-target tracking

  1. Project framework
  2. Code block analysis
  3. Full code and resource download connection

Experimental framework, modules used

The experimental framework is as follows

├── mobilenet_ssd

​ ├── MobileNetSSD_deploy.caffemodel

└── MobileNetSSD_deploy.prototxt


├── race.mp4

├── race_output_slow.avi

└── race_output_fast.avi

Modules used

Python 3, OpenCV, dlib, python multiprocessing

Code block analysis

Import required libraries

from import FPS
import multiprocessing
import numpy as np
import argparse
import imutils
import dlib
import cv2

Generate a new Process by using python's Process class (each Process is independent of the original Process)

Process function definition

Python multiprocessing means that Python will call this function and then create a new interpreter to execute the code in it. As a result, each process that starts the tracer generation will be independent of its parent process. To communicate with Python driver scripts, we need to use pipes or queues. Both types of objects are thread / process safe, using locks and semaphores.

In essence, we are building a simple producer / consumer relationship:

The parent process generates new frames and adds them to the queue of a specific object tracker.

The subprocess uses frames, applies object tracking, and then returns updated bounding box coordinates.

First, we will try to get a new frame from the inputQueue on line 21.

If the frame is not empty, we get the frame and update the object tracker to get the updated bounding box coordinates (lines 24-34).

Finally, we add tags and bounding boxes to the outputQueue so that the parent process can take advantage of them in the main loop of the script (line 38).

Back to the parent process, we will analyze the command line parameters:

def start_tracker(box, label, rgb, inputQueue, outputQueue):
	brief : Construct a from bounding box coordinates dlib Rectangle object, then start the correlation tracker
	:param box:		Bounding box coordinates returned by detector
	:param label:	Detected object label
	:param rgb:		Start initial dlib Object tracker's RGB Ordered image
	:param inputQueue:	Input queue
	:param outputQueue:	Output queue
	:return: null
	# t is a tracker object
	t = dlib.correlation_tracker()
	rect = dlib.rectangle(box[0], box[1], box[2], box[3])
	t.start_track(rgb, rect)

	# This function will be called as a daemons, so don't worry about join ing it
	while True:
		# Try to get the next frame from the input queue
		rgb = inputQueue.get()

		# If there is an item in the queue, process it
		if rgb is not None:
			# Update the tracker and get the location of the tracking object
			pos = t.get_position()

			# Analysis object location
			startX = int(pos.left())
			startY = int(
			endX = int(pos.right())
			endY = int(pos.bottom())

			# Add the object label and bounding box to the output queue, and the coordinate information is the tuple passed
			outputQueue.put((label, (startX, startY, endX, endY)))

Parsing command line parameters

– prototype: the path to the Caffe "deploy" prototype file.

– model: the path to the model file that accompanies the prototype.

– Video: enter the path of the video file. We will use dlib to perform multi-target tracking in this video.

– output: optional path to output video file. If no path is specified, the video is not output to disk. I recommend exporting to an. avi or. mp4 file.

– confidence: an optional override with an object detection confidence threshold of 0.2. This value represents the minimum probability of filtering weak detection from the target detector.

# Construct command line arguments and parse them
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
	help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
	help="path to Caffe pre-trained model")
ap.add_argument("-v", "--video", required=True,
	help="path to input video file")
ap.add_argument("-o", "--output", type=str,
	help="path to optional output video file")
ap.add_argument("-c", "--confidence", type=float, default=0.2,
	help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

Initialize I / O queue

These queues will hold the objects we are tracking. Each generated process requires two queue objects:

One read input frame

Second, write the result

	# Initializing the I / O queue, we need to track every object
inputQueues = []
outputQueues = []

Initialize other content

# Initializing the list of detected objects trained by Caffe network
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
	"bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
	"dog", "horse", "motorbike", "person", "pottedplant", "sheep",
	"sofa", "train", "tvmonitor"]

# Load serialization model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

# initialize the video stream and output video writer
# Initialize video streaming and output video objects
print("[INFO] starting video stream...")
vs = cv2.VideoCapture(args["video"])
writer = None

# Start FPS throughput counter to calculate FPS value of video
fps = FPS().start()

Start reading video stream

# Traverse every frame of the video stream
while True:
	# Get frame
	(grabbed, frame) =

	# Determine whether it is the last frame
	if frame is None:

	# Resize frame to RGB format
	frame = imutils.resize(frame, width=600)
	rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

	# If you want to have output video, initialize writer to write video
	if args["output"] is not None and writer is None:
		fourcc = cv2.VideoWriter_fourcc(*"MJPG")
		writer = cv2.VideoWriter(args["output"], fourcc, 30,
			(frame.shape[1], frame.shape[0]), True)

Handle cases where there is no input queue

If there is no input queue (line 101), then we know we need to apply object detection before object tracking.

We apply target detection on lines 103-109 and loop the results on line 112. We take our confidence values and filter out the weak detection on lines 115-119.

If our confidence level meets the threshold established by the command line parameters, we will consider detection, but we will further filter it out through the class tag. In this case, we only look for the "person" object (lines 122-127).

Assuming we find a "person," we will create a queue and generate a trace process:

We first calculate the bounding box coordinates of lines 131-133.

From here we create two new queues, iq and oq (lines 137 and 138), which we attach to inputQueues and outputQueues (lines 139 and 140), respectively.

From there we generate a new start tracking process through bounding boxes, tags, rgb images, and iq+oq (lines 143-147). Don't forget to read more about multiprocessing here.

We also draw the border rectangle and class label of the detected object (lines 151-154).

Otherwise, we have performed object detection, so we need to apply each dlib object tracker to the frame:

# If the input queue is empty, then we haven't created the first object tracker yet
	if len(inputQueues) == 0:

		# Get the size of the frame and convert it to blob (the basic unit of caffe model detection)
		(h, w) = frame.shape[:2]
		# Normalization: 0.007843 is equal to 1 / 127.5
		blob = cv2.dnn.blobFromImage(frame, 0.007843, (w, h), 127.5)

		# Input the blob into the network, get the detection result, and return the 4-dimensional matrix
		[0 0 i 1]:Confidence
		[0 0 i 2]:Tag No
		[0 0 i 3:7]:Test object's xy coordinate
		detections = net.forward()

		# ergodic matrix 
		for i in np.arange(0, detections.shape[2]):
			# Extract the correlation confidence, i.e. the probability of detecting the object
			confidence = detections[0, 0, i, 2]

			# Filter out the weak detection target by setting the minimum confidence
			if confidence > args["confidence"]:
				# Get the detected object label, corresponding to the previous class label sequence number
				idx = int(detections[0, 0, i, 1])
				label = CLASSES[idx]

				# Here we mainly detect people. We can also set parameters so that we can detect objects in class labels
				if CLASSES[idx] != "person":

				# Calculate the coordinates of the bounding box of the target object
				box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
				(startX, startY, endX, endY) = box.astype("int")
				bb = (startX, startY, endX, endY)

				# Create two new input and output queues
				iq = multiprocessing.Queue()
				oq = multiprocessing.Queue()

				# Generating daemons for new object trackers
				p = multiprocessing.Process(
					args=(bb, label, rgb, iq, oq))
				p.daemon = True

				# Grab the corresponding class label for detection, and draw the bounding box, green, two-dimensional
				cv2.rectangle(frame, (startX, startY), (endX, endY),
					(0, 255, 0), 2)
				# Put the class name
				cv2.putText(frame, label, (startX, startY - 15),
					cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)

Input queue is not empty

Loop through each input queue and we add rgb images to them (lines 162 and 163).

Then loop through each output queue (line 166) to get bounding box coordinates from each individual object tracker (line 171). Finally, we draw the bounding box + related class labels on lines 175-178.

		# The output queue is not empty, indicating that we have test results, so we continue to track multiple objects
			# Loop and add input RGB frames on each input queue, enabling us to update every object tracker running in different processes
			for iq in inputQueues:

			# Loop on each output queue
			for oq in outputQueues:

				# Grab to the updated bounding box
				# The get method is a blocking operation, so this will pause our execution until the process finishes tracking the update
				(label, (startX, startY, endX, endY)) = oq.get()

				# Draw a bounding box from the tracker
				cv2.rectangle(frame, (startX, startY), (endX, endY),
					(0, 255, 0), 2)
				cv2.putText(frame, label, (startX, startY - 15),
					cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)

Complete the last cycle

If necessary, we write the frame to the output video and display the frame to the screen (lines 181-185).

If you press the "q" key, we "exit" and jump out of the loop (lines 186-190).

If we continue to work on frames, the fps calculator will update at line 193, and then start working again at the beginning of the while loop.

Otherwise, we will process the frame and display FPS throughput information + release pointer and close the window.

		# Determine whether video output is required
		if writer is not None:

		# Play the processed frame
		cv2.imshow("Frame", frame)
		key = cv2.waitKey(1) & 0xFF

		# Press' q 'to exit
		if key == ord("q"):

		# Update FPS counter

	# Stop counter and display FPS value
	print("[INFO] elapsed time: {:.2f}".format(fps.elapsed()))
	print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

	# Release write pointer
	if writer is not None:

	# Closing work

Complete code and reference website and Library Download

Reference website:

Download address of code model video:

Posted by evilMind on Thu, 11 Jun 2020 21:30:48 -0700