Hands-on Tutorials
Real Time Image Segmentation Using 5 Lines of Code
Perform accurate and fast object segmentation in images and videos with PixelLib

Table of Contents
- Image Segmentation in Computer Vision Applications
- Image Segmentation
- Batch Image Segmentation
- Video Segmentation
- Camera Feeds’ Segmentation
Image Segmentation in Computer Vision Applications
Computer Vision is the ability of computers to see and analyze what they see. Image segmentation is an aspect of computer vision that deals with segmenting the contents of objects visualized by a computer into different categories for better analysis. A good example of the process of image segmentation is foreground-background separation of objects in images, which is the technique of separating objects from its background for analyzing the objects and its background. The ability of image segmentation to achieve the task of foreground-background separation makes it an invaluable field in solving a lot of computer vision problems such as analysis of medical images, background editing, vision in self driving cars and analysis of satellite images.
PixelLib Library is a library created to allow easy integration of object segmentation in images and videos using few lines of python code. It supports a lot of amazing features such as:
- Semantic and Instance Segmentation of objects in images and videos.
- Custom training of segmentation models.
- Background editing in images and videos.
- Extraction of objects in images and videos.
Demand for Real Time Image Segmentation Applications
Pytorch Backend PixelLib: One of the greatest challenges in computer vision is keeping the space between accuracy and speed performance for real time applications. In the field of computer vision there is this dilemma of a computer vision solution either being more accurate and slow or less accurate and faster. The previous version of PixelLib uses the tensorflow deep learning library as its backend which employs Mask R-CNN to perform instance segmentation. Mask R-CNN is a great architecture but fails to balance between accuracy and speed performance for real time applications. I am so excited😊 to announce that PixelLib now provide support for pytorch backend to perform faster, more accurate segmentation and extraction of objects in images and videos using PointRend segmentation architecture. PointRend object segmentation architecture by Alexander Kirillov et al is used to replace Mask R-CNN for performing instance segmentation of objects. PointRend is an excellent state of the art neural network for implementing object segmentation. It generates accurate segmentation masks and run at high inference speed that matches the increasing demand for an accurate and real time computer vision applications.
Multiple OS Supports
PixelLib is a library built to provide support for different operating systems. I integrated PixelLib with the python implementation of PointRend by Detectron2 which supports only Linux OS.
I made modifications to the original Detectron2 PointRend implementation to support Windows OS. I am happy to give the wonderful news that the PointRend implementation used for PixelLib supports both Linux and Windows OS.
The sample images above are examples of the differences in the segmentation results of PointRend compared to Mask RCNN. It is obvious that the PointRend image results on the right are better segmentation outputs compared to Mask R-CNN results on the left.
Note: This article is based on performing instance segmentation using pytorch and PointRend. If you want to learn how to perform instance segmentation with tensorflow and Mask R-CNN read this article below:
Download Python
PixelLib pytorch version supports python version 3.7 and above. Download a compatible python version.
Install PixelLib and its dependencies
Install Pytorch
PixelLib Pytorch version supports these versions of pytorch(1.6.0, 1.7.1,1.8.0 and 1.9.0).
Note: Pytorch 1.7.0 is not supported and do not use any pytorch version less than 1.6.0. Install a compatible Pytorch version.
Install Pycocotools
- pip3 install pycocotools
Install PixelLib
- pip3 install pixellib
If installed, upgrade to the latest version using:
- pip3 install pixellib — upgrade
Image Segmentation
PixelLib uses five lines of python code for performing object segmentation in images and videos with PointRend model. Download the PointRend model. This is the code for image segmentation.
Line 1–4: PixelLib package was imported and we also imported the class instanceSegmentation from the the module pixellib.torchbackend.instance (importing instance segmentation class from pytorch support). We created an instance of the class and finally loaded the PointRend model we have downloaded.
Line 5: We called the function segmentImage to perform segmentation of objects in images and added the following parameters to the function:
- image_path: This is the path to the image to be segmented.
- show_bbox: This is an optional parameter to show the segmented results with bounding boxes.
- output_image_name: This is the name of the saved segmented image.
Sample Image for Segmentation

ins.segmentImage("image.jpg",output_image_name="output.jpg")
Image After Segmentation

The checkpoint state_dict contains keys that are not used by the model: proposal_generator.anchor_generator.cell_anchors.{0, 1, 2, 3, 4}
This log above may appeared if you are running the segmentation code! It is not an error and the code will work fine!
Obtain Segmentation Results
results, output = ins.segmentImage("image.jpg", show_bboxes=True, output_image_name="result.jpg")
print(results)
The segmentation results return a dictionary with a lot of values associated with the objects segmented in the image. The results printed will be in the following format:
{'boxes': array([[ 579, 462, 1105, 704],
[ 1, 486, 321, 734],
[ 321, 371, 423, 742],
[ 436, 369, 565, 788],
[ 191, 397, 270, 532],
[1138, 357, 1197, 482],
[ 877, 382, 969, 477],),'class_ids': array([ 2, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 24, 24,2, 2,2, 0, 0, 0, 0, 0, 0], dtype=int64), 'class_names': ['car', 'car', 'person', 'person', 'person', 'person', 'person', 'car', 'person', 'person', 'person', 'person', 'car', 'backpack', 'backpack', 'car', 'car', 'car', 'person', 'person', 'person', 'person', 'person', 'person'], 'object_counts': Counter({'person': 15, 'car': 7, 'backpack': 2}), 'scores': array([100., 100., 100., 100., 99., 99., 98., 98., 97., 96., 95.,95., 95., 95., 94., 94., 93., 91., 90., 88., 82., 72.,69., 66.], dtype=float32), 'masks': array([[[False, False, False, ..., False, False, False],[False, False, False, ..., False, False, False],'extracted_objects': []
Values from The Results Printed Above Include:
{'boxes': array([[5.790e+02, 4.620e+02, 1.105e+03, 7.050e+02],[1.000e+00, 4.870e+02, 3.220e+02, 7.340e+02],[1.142e+03, 3.640e+02, 1.161e+03, 4.060e+02]], dtype=float32),
boxes: These are the bounding boxes coordinates of the objects segmented. I did not display all the boxes’ coordinates, this is because the list would have been too long.
'class_ids': array([ 2, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 24, 24,2, 2,2, 0, 0, 0, 0, 0, 0], dtype=int64),
class_ids: These are the class ids of the objects segmented.
'class_names': ['car', 'car', 'person', 'person', 'person', 'person', 'person', 'car', 'person', 'person', 'person', 'person', 'car', 'backpack', 'backpack', 'car', 'car', 'car', 'person', 'person', 'person', 'person', 'person', 'person'],
class_names: These are the class names of the objects segmented.
'object_counts': Counter({'person': 15, 'car': 7, 'backpack': 2}),
object_counts: These are the counts of each of the classes segmented in the image. I used the python inbuilt counter to count the objects. In this case there are 15 persons, 7 cars, and 2 backpacks segmented in the image.
'scores': array([100., 100., 100., 100., 99., 99., 98., 98., 97., 96., 95.,95., 95., 95., 94., 94., 93., 91., 90., 88., 82., 72.,69., 66.], dtype=float32),
scores: These are the confidence scores for each of the objects segmented.
'masks': array([[[False, False, False, ..., False, False, False],[False, False, False, ..., False, False, False],
masks: These are the mask values of each of the objects segmented. I did not display all the mask values, this is because the list would have been too long.
Note: The default values of masks returned are in bolean. It is possible to obtain the masks’ coordinates by using a new parameter mask_points_values.
ins.segmentImage("sample.jpg", show_bboxes=True, mask_points_values=True, output_image_name="output.jpg")
mask_points_values parameter was added to the segmentImage function and set to True and the new mask values will be:
[[array([[295, 497]])
array([[422, 114],
[421, 115],
[417, 115],
...,
[436, 115],
[433, 115],
[432, 114]])]]
extracted_objects: This is the container list for the extracted objects’ values if we extract objects. It is empty, because we did not extract anything. We shall discuss later in this article on how to extract these segmented objects.
results, output = ins.segmentImage("image.jpg", show_bboxes=True, output_image_name="result.jpg")
Access Values Rendered by The Segmentation Results
Bounding box coordinate values
results["boxes"]
Class ids values
results["class_ids"]
Class names values
results["class_names"]
Object counts values
results["object_counts"]
Mask values
results["masks"]
Detection Threshold
PixelLib makes it possible to determine the detection threshold of object segmentation.
ins.load_model("pointrend_resnet50.pkl", confidence = 0.3)
confidence: This is a new parameter introduced in the load_model function and it is set to 0.3 to threshold the detections by 30%. The default value I set for detection threshold is 0.5 and it can be increased or decreased using the confidence parameter.
Speed Records
PixelLib makes it possible to perform real time object segmentation and added the ability to adjust the inference speed to suit real time predictions. The default inference speed for processing a single image using Nvidia GPU with 4GB capacity is about 0.26 seconds.
Speed Adjustments:
PixelLib supports speed adjustments and there are two types of speed adjustment modes which are fast and rapid modes:
1 Fast Mode
ins.load_model("pointrend_resnet50.pkl", detection_speed = "fast")
In the load_model function, we added the parameter detection_speed and set the value to fast. The fast mode achieves 0.20 seconds for processing a single image.
Full Code for Fast Mode Detection
2 Rapid Mode
ins.load_model("pointrend_resnet50.pkl", detection_speed = "fast")
In the load_model function, we added the parameter detection_speed and set the value to rapid. The rapid mode achieves 0.15 seconds for processing a single image.
Full Code for Rapid Mode Detection
PointRend Models
There are two types of PointRend models used for object segmentation and they are of resnet50 variant and resnet101 variant. The resnet50 variant is used throughout this article because it is faster and of good accuracy. The resnet101 variant is more accurate but it is slower than resnet50 variant. According to the official reports of the models on Detectron2 the resnet50 variant achieves 38.3 mAP on COCO and resnet101 variant achieves 40.1 mAP on COCO.
Speed Records for Resnet101: The default speed for segmentation is 0.5 seconds, fast mode is 0.3 seconds and while the rapid mode is 0.25 seconds.
Code for Resnet101 variant
ins.load_model("pointrend_resnet101.pkl", network_backbone = "resnet101")
The code for performing inference with the resnet101 model is the same, except we loaded the PointRend resnet101 model in the load_model function. Download the resnet101 model from here. We also added an extra parameter network_backbone and set the value to resnet101.
Note: If you want to achieve high inference speed and good accuracy, use the PointRend resnet50 variant, but if you are more concerned about accuracy, use the PointRend resnet101 variant. All these inference reports are based on using Nvidia GPU with 4GB capacity.
Custom Object Detection in Image Segmentation
The PointRend model used is a pretrained COCO model which supports 80 classes of objects. PixelLib supports custom object detection which makes it possible to filter detections and ensure the segmentation of target objects. We can choose out of the 80 classes of objects supported to match our target goal. These are the 80 classes of objects supported:
person, bicycle, car, motorcycle, airplane,bus, train, truck, boat, traffic_light, fire_hydrant, stop_sign,parking_meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra,giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard,sports_ball, kite, baseball_bat, baseball_glove, skateboard, surfboard, tennis_racket,bottle, wine_glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange,broccoli, carrot, hot_dog, pizza, donut, cake, chair, couch, potted_plant, bed,dining_table, toilet, tv, laptop, mouse, remote, keyboard, cell_phone, microwave,oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy_bear, hair_dryer,toothbrush.
We want to filter the detections of our sample image to detect only person in the image.
target_classes = ins.select_target_classes(person = True)ins.segmentImage("image.jpg", segment_target_classes = target_classes, output_image_name="output.jpg")
The function select_target_classes was called to select the target objects to be segmented. The function segmentImage got a new parameter segment_target_classes to choose from the target classes and filter the detections based on them.

Great!!! PixelLib detects only the people present in the image.
Full Code for Custom Object Detection
Object Extractions in Images
PixelLib makes it possible to extract and analyse objects segmented in an image.
Code for Object Extraction
The code for image segmentation is the same, except we added extra parameters in the segmentImage function to perform object extraction.
ins.segmentImage("image.jpg", extract_segmented_objects = True, save_extracted_objects = True, output_image_name="output.jpg")
- extract_segmented_objects: This is the parameter that handles the extraction of segmented objects. Access the extracted object values using:
results, output = ins.segmentImage("image.jpg", show_bboxes=True, output_image_name="result.jpg")#access the extracted objects from the results
results["extracted_objects"]
- save_extracted_objects: This is the parameter that will save each of the extracted objects as an image. Each of the segmented objects will be saved as segmented_object_index e.g segmented_object_1. The objects are saved based in the order in which they are extracted.
segmented_object_1.jpg
segmented_object_2.jpg
segmented_object_3.jpg
segmented_object_4.jpg
segmented_object_5.jpg
segmented_object_6.jpg


Note: All the objects in the image are extracted and I chose to display only two of them.
Extraction from Bounding Box Coordinates
The default extraction method extracts objects from the masks’ coordinates. The extraction only gives us information about the object itself and excludes its close surrounding. Consider a problem where we want to analyse the object and the region in which it is located in the image. PixelLib is able to solve this problem by extracting segmented objects together with their remote locations in the image using their bounding boxes’ coordinates.
Modified Code for Extraction
ins.segmentImage("image.jpg", extract_segmented_objects = True, extract_from_box = True, save_extracted_objects = True, output_image_name="output.jpg")
We introduced a new parameter extract_from_box to extract the objects segmented from their bounding box coordinates. Each of the extracted objects will be saved as object_extract_index e.g object_extract_1. The objects are saved based in the order in which they are extracted.
object_extract1.jpg
object_extract2.jpg
object_extract3.jpg
object_extract4.jpg
object_extract5.jpg
object_extract6.jpg


Full Code for Object Extraction using Bounding Box Coordinates
Image Segmentation Output Visualization.
PixelLib makes it possible to regulate the visualization of images according to their resolutions.
ins.segmentImage("sample.jpg", show_bboxes=True, output_image_name= "output.jpg")

The visualization wasn’t visible because the text size, and box thickness are too slim. We can regulate the text size, thickness, and box thickness to regulate the visualizations.
Modifications for Better Visualization
ins.segmentImage(“sample.jpg”, show_bboxes=True, text_size=5, text_thickness=4, box_thickness=10, output_image_name=”output.jpg”)
The segmentImage function accepted new parameters that regulate the thickness of texts and bounding boxes.
- text_size: The default text size is 0.6 and it is okay with images with moderate resolutions. It will be too small for images with high resolutions. I increased the value to 5.
- text_thickness: The default text thickness is 1. I increased it to 4 to match the image resolution.
- box_thickness: The default box thickness is 2 and I changed it to 10 to match the image resolution.
Output Image with A Better Visualization

Note: Regulate the parameters according to the resolutions of your images. The values I used for this sample image whose resolution is 5760 x 3840 might be too large if your image resolution is lower. You can increase the values of the parameters beyond the ones I set in this sample code if you have images whose resolutions are very high. text_thickness and box_thickness parameters’ values must be in integers and do not express their values in floating point numbers. text_size value can be expressed in both integers and floating point numbers.
Batch Image Segmentation
PixelLib makes it possible to perform prediction on a batch of images located in the same folder.
Code for Batch Segmentation
ins.segmentBatch("inputfolder", show_bboxes=True, output_folder_name = "outputfolder")
The code for batch segmentation is very similar to the single image segmentation, except we replaced the segmentImage function with segmentBatch function. We added the following parameters to segmentBatch:
- folder_path: This is the folder containing the images we want to segment.
- output_folder_name: This is the name of the folder that we will save all the segmented images.
sample folder structure
--input_folder
--test1.jpg
--test2.jpg
--test3.jpg
--output_folder
--test1.jpg
--test2.jpg
--test3.jpg
Code for Extraction of Objects in Batch Image Segmentation
ins.segmentBatch("inputfolder", show_bboxes=True, extract_segmented_objects=True, save_extracted_objects=True, output_folder_name="outputfolder")
We added extract_segmented_objects and save_extracted_objects parameters to segmentBatch function to extract and save extracted objects respectively. The extracted objects of each image in the input folder will be saved in a separate folder, whose name will be imagename_extracts e.g if the image name is test1.jpg, this means that the extracted objects will be saved in a folder named test1_extracts.
Note: The folders for the extracted objects are created within the same input folder of the images. Sample folder structure
--input_folder
--test1.jpg
--test1_extracts --test2.jpg
--test2_extracts --test3.jpg
--test3_extracts--output_folder
--test1.jpg
--test2.jpg
--test3.jpg
Code for Extraction of Objects from Bounding Box Coordinates
ins.segmentBatch("inputfolder", show_bboxes=True, extract_segmented_objects=True, extract_from_box=True,save_extracted_objects=True, output_folder_name="outputfolder")
We added extract_from_box and save_extracted_objects parameters to segmentBatch function to extract and save extracted objects respectively.
Note: The folders for the extracted objects from bounding boxes’ coordinates, are also created within the same input folder of the images. Sample folder structure
--input_folder
--test1.jpg
--test1_extracts --test2.jpg
--test2_extracts --test3.jpg
--test3_extracts--output_folder
--test1.jpg
--test2.jpg
--test3.jpg
Code for Custom Object Segmentation in Batch Image Segmentation
target_classes = ins.select_target_classes(person = True)
ins.segmentBatch("inputfolder", show_bboxes=True, segment_target_classes = target_classes, output_folder_name="outputfolder")
We called the function select_target_classes to select the target objects to be segmented. The function segmentBatch function got a new parameter segment_target_classes to choose from the target classes and filter the detections based on them.
Code for fast mode detection in Batch Image Segmentation
ins.load_model(“pointrend_resnet50.pkl”, detection_speed = "fast")
In the load_model function, we added the parameter detection_speed and set the value to fast. The fast mode achieves 0.20 seconds for processing a single image.
Code for rapid mode detection in Batch Image Segmentation
ins.load_model(“pointrend_resnet50.pkl”, detection_speed = "rapid")
We set the detection speed for inference to rapid mode.
In the load_model function, we added the parameter detection_speed and set the value to rapid. The rapid mode achieves 0.15 seconds for processing a single image.
Note: The code implementation of batch image segmentation for all supported features of PixelLib is the same as the single image segmentation, except segmentImage function was replaced by segmentBatch.
Object Segmentation in Videos and Camera Feeds
PixelLib makes it possible to perform real time object segmentation in live camera feeds and video files.
Video Segmentation
Code for Video Segmentation
Line 1–4: PixelLib package was imported and we also imported the class instanceSegmentation from the the module pixellib.torchbackend.instance (importing instance segmentation class from pytorch support). We created an instance of the class and finally loaded the PointRend model. If the model is yet to be downloaded, download it from here.
Line 5: We called the function process_video to perform segmentation of objects in videos and the following parameters are added to the function:
- video_path: This is the path to the video to be segmented.
- show_bboxes: This is an optional parameter to display the segmented objects in the results with bounding boxes.
- frames_per_second: This is the parameter that will set the number of frames per second for the saved video.
- output_video_name: This is the name of output segmented video.
ins.process_video("sample_video.mp4", show_bboxes=True, frames_per_second=3, output_video_name="output_video.mp4")
Code for Object Extraction in Videos
ins.process_video(“sample_video.mp4”, show_bboxes=True, extract_segmented_objectsframes_per_second=5, output_video_name=”output_video.mp4")
The process_video function has new parameters extract_segmented_objects and save_extracted_objects to extract and save segmented objects respectively.



Extraction from Bounding Box Coordinates in Videos
Modified Code for Extraction
ins.process_video(“sample.mp4”, show_bboxes=True, extract_segmented_objects=True, extract_from_box=True,save_extracted_objects=True, frames_per_second=5, output_video_name=”output_video.mp4")
- extract_from_box was added to the function to extract the objects segmented from their bounding box coordinates.



Custom Object Segmentation in Videos
PixelLib makes it possible to perform custom object segmentation in videos to filter unused detections and segment target classes.
Code for Custom Detection in Videos
target_classes = ins.select_target_classes(person = True, bicycle =True)ins.process_video("sample_video.mp4", show_bboxes=True, segment_target_classes = target_classes,frames_per_second=5, output_video_name="output_video.mp4")
The function select_target_classes was called to select the target objects to be segmented. The function process_video got a new parameter segment_target_classes to choose from the target classes and filter the detections based on them.
Code for fast Mode Detection in Video Segmentation
ins.load_model(“pointrend_resnet50.pkl”, detection_speed = "fast")
We set the detection speed for video processing to fast mode.
Code for rapid Mode Detection in Video Segmentation
ins.load_model(“pointrend_resnet50.pkl”, detection_speed = "rapid")
We set the detection speed for video processing to rapid mode.
Segmentation of Objects in Live Camera Feeds
PixelLib provides an excellent support for Real time Segmentation of Live Camera Feeds.
Code for Segmentation of Live Camera Feeds
import cv2 capture = cv2.VideoCapture(0)
We imported cv2 and included the code to capture camera’s frames.
segment_video.process_camera(capture, show_bboxes = True, frames_per_second= 5, check_fps=True, show_frames= True,frame_name= "frame", output_video_name="output_video.mp4")
In the code for performing segmentation, we replaced the video’s filepath to capture, i.e we are processing a stream of frames captured by the camera. We added extra parameters for the purpose of showing the camera’s frames:
- show_frames: This is the parameter that handles the showing of segmented camera’s frames.
- frame_name: This is the name given to the camera’s frames shown.
- check_fps: This is the parameter that will print out the frames per second at the end of the camera feeds processing.
- show_bboxes: This is an optional parameter that shows segmented objects with bounding boxes.
- frames_per_second: This is the parameter that sets the number of frames per second for the saved video file. In this case it is set to 5, i.e the saved video file would have 5 frames per second.
- output_video_name: This is the name of the saved segmented video.
Speed Adjustments for Live Camera Feeds Processing
The default speed mode reaches 4fps. The fast speed mode reaches 6fps and the rapid speed mode reaches 9fps. These reports are based on using Nvidia GPU with 4GB capacity
Code for Fast Mode Detection in Camera Feeds
ins.load_model(“pointrend_resnet50.pkl”, detection_speed = "fast")
We set the detection speed for processing live camera feeds to fast mode and the inference speed will be 6fps.
Code for Rapid Mode Detection
ins.load_model(“pointrend_resnet50.pkl”, detection_speed = "rapid")
We set the detection speed for processing live camera feeds to rapid mode and inference speed will be 9fps.
Code for Custom Object Segmentation in Live Camera Feeds
target_classes = segment_video.select_target_classes(person=True)segment_video.process_camera(capture, show_bboxes = True, frames_per_second= 5, segment_target_classes = target_classes,show_frames= True,frame_name= "frame", output_video_name="output_video.mp4")
The function select_target_classes was called to select the target objects to be segmented. The function process_camera got a new parameter segment_target_classes to choose from the target classes and filter the detections based on them.
Code for Object Extraction in Live Camera Feeds
segment_video.process_camera(capture, show_bboxes = True, frames_per_second= 5, extract_segmented_objects=True, save_extracted_objects=True,show_frames= True,frame_name= "frame", output_video_name="output_video.mp4")
The process_camera function has new parameters extract_segmented_objects and save_extracted_objects to extract and save segmented objects respectively.
- extract_from_box was added to the function to extract the objects segmented from their bounding box coordinates.
Code for Object Object Extraction from Box Coordinates in Live Camera Feeds
segment_video.process_camera(capture, show_bboxes = True, frames_per_second= 5, extract_segmented_objects=True, extract_from_box=True,save_extracted_objects=True, show_frames= True,frame_name= "frame", output_video_name="output_video.mp4")
- extract_from_box was added to the function to extract the objects segmented from their bounding box coordinates.
We discussed in detail in this article how to perform accurate and fast image segmentation and extraction of objects in images and live camera feeds using the PixelLib Pytorch Version. We also described the upgrade added to PixelLib using PointRend network architecture that makes it possible for the library to match the increasing demand to balance between accuracy and speed performance in computer vision.
Note: The PixelLib pytorch version does not support custom training and custom training using PointRend will be released soon.
Reach to me via:
Email: olafenwaayoola@gmail.com
Linkedin: Ayoola Olafenwa
Twitter: @AyoolaOlafenwa
Check out these articles written on how to make use of PixelLib for semantic segmentation, instance segmentation, extraction of objects and background editing in images and videos.