Nature toiled in the vast laboratories of the Earth for millions of years to perfect human vision. It is of no surprise, then, that the best minds around the world have only recently managed to teach computers to see and to begin to understand the world around them [0]. While Nature worked with neurons, rods, cones and the machinery of evolution to guide her, computer scientists and engineers work with mathematics, matrices and computational power to give computers the power to see and understand the visual world.

It is surprising how much we take our visual capabilities for granted. While computers can perform mind numbing feats of mathematical gymnastics at ease, they struggle to match the performance of three year olds when it comes to identifying cats. This is beautifully summed up in Moravec’s paradox. This XKCD comic is based on this paradox –

Title text: In the 60s, Marvin Minsky assigned a couple of undergrads to spend the summer programming a computer to use a camera to identify objects in a scene. He figured they'd have the problem solved by the end of the summer. Half a century later, we're still working on it.

Title text: In the 60s, Marvin Minsky assigned a couple of undergrads to spend the summer programming a computer to use a camera to identify objects in a scene. He figured they’d have the problem solved by the end of the summer. Half a century later, we’re still working on it.

Computer vision deals with methods that are used to analyze, understand and extract useful information from visual data(images/videos). Computer vision (CV) also shares borders with the important fields of pattern recognition and image processing. While pattern recognition algorithms find regularities or patterns in the data, image processing is concerned with operations that transform an image from one form to the other. Typical operations on images include sharpening, noise removal and altering the image orientation. CV applications have made inroads into a number of problems having substantial impact on our lives. Optical character recognition or OCR is one such problem that has immense practical utility. OCR is the conversion of text that we see around us in the form of signboards, printed or handwritten text into its digital counterpart. Solving this problem enabled Google to identify and transcribe house numbers. The same technology can now be used to identify signboards which can then be translated into different languages. Face recognition is another CV application which is used to suggest tags in Facebook photos.

There is more to computer vision than recognizing faces and digitizing printed text. Computers may find it a tad more interesting and indeed greatly honouring if they were employed by the enforcers of law to look out for criminals. This is what CV engineers have managed to do by automating video surveillance. One of the objectives in surveillance is to monitor an area of interest, say a crowded market, to detect suspicious activity. Clearly, employing humans to analyze video footages for the rest of their lives is not only preposterous, but also hopelessly boring. How do computer vision experts make a computer do this?

In order to look for suspicious activity, we must be able to keep track of the objects (people, cars) in a footage. We then need to understand their behaviour: X has been hanging around for ages; Y keeps circling the market – that seems suspicious! Finally, a decision has to be made if the behaviour is a cause for concern or not. This is summarized succinctly in the following flow chart:

A crowded place like a mall is a whirlwind of activity and motion. If your eyes were to track a single person as he moves through the mall, it would direct all its energies on that person by blacking out everything that happens in the background. We often find ourselves staring at a fixed point in space and forget everything else that is in our field of vision. This is precisely what computers have been taught. When there are people moving in the foreground of a video, the background, which is going to remain more or less static over a period of time, is subtracted, thereby highlighting the moving objects in the foreground distinctly. The following figure explains the concept involved –

The moving boat is singled out while the relatively motionless waters in the background are removed. Our team at HyperVerge implemented this solution and here is a small clip showing the performance of our algorithm on a sample video. Note the red box around each person as he/she moves in the video.

The jargon for this approach to object detection in CV literature is quite intuitive and is called background subtraction.

Computer vision based technologies are revolutionizing the world with their novel solutions. Our team at HyperVerge has also drawn deeply from its inexhaustible wells to devise beautiful solutions to problems.

In our next post about Computer Vision, we will dive into the deep end of the technology and illustrate a number of applications that have steadily gained popularity. Until then, let the Eye of the Computer watch over you!

Cover Photo credit:

The following two tabs change content below.

Karthik Thiagarajan

Karthik or Tika, as he is more commonly known, is the silent man at HyperVerge. He enjoys reading classic literature, blogging, and taking customary morning and evening walks to think deeply. Generally a man of few words who enjoys peace and solitude, he is known to break into the occasional tune when inspired.

Latest posts by Karthik Thiagarajan (see all)