Book Review: Modern Computer Vision

Explore deep learning concepts and implement over 50 real-world image applications (A Bite-sized review)

Dec 23, 2022

Book Title:

Modern Computer Vision with PyTorch: Explore deep learning concepts and implement over 50 real-world image applications

After I learned the basics of machine learning a few years ago and put it to good use for detecting and staging kidney cancer and an explainable AI application, my interest swerved a bit into computer vision.

About two years ago, I decided to work on a small side project, a soccer game scene object detection, just to get the feel. I picked up a few things working on the project, but I quickly figured there was a lot to learn, so this year I picked up Modern computer vision with PyTorch, another one of my favorite machine learning books I read this year (Another is GPT-3 I reviewed recently).

Standing at over 800 pages, it's clearly not a beach read - more of a coffee table book (if we are allowed to switch fancy photographs with PyTorch codes).

The authors covered a large swath of topics, including, but not limited to:

Image classification: categorizing a given image into one of a number of predefined classes. It is a type of supervised learning where a system is provided with a set of labeled images, and the task is to predict the class of new images.

Techniques: convolutional neural networks (CNN), transfer learning with VGG and ResNet architecture.

Image segmentation: ‘dividing’ an image into different regions or segments. It is used to identify objects and boundaries in images.

Techniques: Mask R-CNN and UNET

Object detection: finding and identifying objects in an image or video. It can be used to detect objects such as people, cars, buildings, furniture, animals, and more (anything you want to detect, really). It is a major component of many computer vision tasks, such as video surveillance and autonomous driving.

Techniques: R-CNN and YOLO.

Image generation: creating new images from existing ones. This process typically involves applying a set of algorithms and techniques to create synthetic images that resemble the original ones.

Techniques: Autoencoders and generative adversarial networks (GANs).

At my first parse of reading the book, I focused on the basics of neural networks, building neural networks with Pytorch, convolutional networks, image segmentation (Mask R-CNN and UNET), and Autoencoders. And I have kept it at arm's length for any future computer vision projects.

In summary, the book is a great way to learn PyTorch (for deep learning and computer vision). Afterward, you can go deep (no pun intended), as much as you want, into the over fifty real-world applications of computer vision using state-of-the-art deep learning architectures.

episteme Engine

Discussion about this post