Current and Future Trends in Computer Vision and Their Impact with Expert Isht Dwivedi

Isht Dwivedi

Computer Vision is the branch of Computer Science that enables computers and systems to derive meaningful information from digital images, videos, and other visual inputs. Understanding this information gives computers the ability to perform human-like actions or to make task-related recommendations.

Recent advances in computer hardware technology and deep learning algorithms have resulted in breakthrough advances in computer vision. According to Allied Market Research, the global computer vision market size was valued at $9.45 billion in 2020 and is projected to reach $41.11 billion by 2030. Newer computer vision algorithms made possible by these advances are faster and more reliable, thus paving the way for large-scale adoption across various industries. These technologies are being used across many areas such as the automotive, healthcare, logistics, and agriculture industries. Medical image analysis has brought a revolution in medical care through earlier, affordable, and more reliable detection of conditions such as cancer and lung infections. Computer vision systems can also be used to guide surgeons during complex procedures, such as minimally invasive surgeries, by providing real-time visual information and assistance with navigating instruments. In agriculture, farmers are able to get ahead of crop infestations, monitor crop growth, and find the most fertile land to plant crops. This technology is also being used to analyze images of crops and soil to identify areas that may be in need of additional fertilization or irrigation. Computer vision systems can also be used to guide robotic harvesters, allowing them to accurately identify and pick ripe crops without damaging the plants.

In the automotive sector, computer vision is being used to develop driving assistive systems and autonomous driving systems to reduce road accidents and increase the convenience of vehicles. Honda's research engineer, Isht Dwivedi, an expert in Computer Science, specializes in this area. He is creating state-of-the-art modules for visual understanding of road scenes such as identifying potentially risky objects on the road and gaining a 3D understanding of areas around construction zones. Using outputs from the visual scene understanding modules he created, Isht is further developing algorithms to generate human-interpretable descriptions of the scene that can be communicated to the driver through visual and audible human-machine interfaces. Such technologies can assist drivers in decision-making and navigation strategies which ultimately improves safety. He is also the inventor of a recently patented state-of-the-art BEV (Bird's Eye View) segmentation algorithm which can be used to reduce reliance on more prevalent but expensive High Definition maps. High Definition maps are detailed maps of a road scene that are manually annotated thus making them expensive in terms of time and money. Unlike any previous works, Isht's BEV segmentation algorithm can also work around construction zones. Since construction zones are not present on maps, it is very important for assistive and autonomous driving systems to understand and react to them properly.

In another project in the automotive sector, Isht designed a patented state-of-the-art algorithm to predict the future trajectories of road agents. This algorithm is significantly more compute efficient than prior works. Compute efficiency of algorithms used in the automotive sector is important because these algorithms need to run on the limited hardware resources available in a car.

In addition to the automotive sector, Isht is also working on improving automotive manufacturing processes. More specifically, he proposed a novel state-of-the-art method for temporal action segmentation of human action videos. Given a video of a human performing some actions, this method is used to split the video into smaller clips based on a list of pre-defined actions. Human action understanding is important in cooperative human-robot interaction. The robots that will operate in factories of the future need to interact with workers to execute various tasks. It's also important to have analytics on the factory workers' posture and ergonomics to prevent the risk of injury and fatigue. A paper based on this work has been published at the 2022 IEEE Conference on Computer Vision and Pattern Recognition, and a patent has been filed.

An early computer vision expert, Isht Dwivedi has been fascinated by task automation and the use of technology to solve problems since a young age. During his undergraduate studies, he chose courses related to artificial intelligence, and through these courses, learned about computer vision. Since 2015, Isht has dedicated his efforts to research in the field of computer vision to solve a wide array of problems. He went on to get a master's degree in Computer Science at Columbia University, NY which furthered his technical expertise and equipped him to solve bigger, more complex problems using computer vision.

Computer vision is a relatively new technology, but even so, it is revolutionizing many processes that humans do on a regular basis. Early on, it became clear to Isht that computers were going to continue making an increasingly greater impact on human endeavors, and now, he gets to play an integral part in that impact. Exactly how computer vision is employed varies by industry, but technologies like this one are the foundations of the future.

This article was first published on January 11, 2023