Giving robots, who currently rely on vision and touch to move around, the power to hear sounds and predict the physical properties of objects around them can be a game-changer, say researchers including two of Indian origin. People rarely use just one sense to understand the world, but robots usually only rely on vision and, increasingly, touch.
The researchers from Carnegie Mellon University (CMU) now say that robot perception could improve markedly by adding another sense: hearing. "A lot of preliminary work in other fields indicated that sound could be useful, but it wasn't clear how useful it would be in robotics," said Lerrel Pinto, who recently earned his PhD in robotics at CMU and will join the faculty of New York University this fall.
Determinate Role of Hearing
He and his colleagues found the performance rate was quite high, with robots that used sound successfully classifying objects 76 percent of the time. The team at CMU's Robotics Institute found that sounds could help a robot differentiate between objects, such as a metal screwdriver and a metal wrench.
Hearing also could help robots determine what type of action caused a sound and help them use sounds to predict the physical properties of new objects. Pinto said that the results were so encouraging that it might prove useful to equip future robots with instrumented canes, enabling them to tap on objects they want to identify.
The researchers presented their findings last month during the virtual Robotics Science and Systems conference. Other team members included Abhinav Gupta, associate professor of robotics, and Dhiraj Gandhi, a former master's student who is now a research scientist at Facebook Artificial Intelligence Research's Pittsburgh lab.
Usefulness of Sound
To perform their study, the researchers created a large dataset, simultaneously recording video and audio of 60 common objects -- such as toy blocks, hand tools, shoes, apples and tennis balls -- as they slid or rolled around a tray and crashed into its sides.
They have since released this dataset, cataloging 15,000 interactions, for use by other researchers. The team captured these interactions using an experimental apparatus they called Tilt-Bot -- a square tray attached to the arm of a Sawyer robot.
It was an efficient way to build a large dataset; they could place an object in the tray and let Sawyer spend a few hours moving the tray in random directions with varying levels of tilt as cameras and microphones recorded each action. They also collected some data beyond the tray, using Sawyer to push objects on a surface.
Pinto said the usefulness of sound for robots was therefore not surprising, though he and the others were surprised at just how useful it proved to be. "I think what was really exciting was that when it failed, it would fail on things you expect it to fail on," he said. For instance, a robot couldn't use sound to tell the difference between a red block or a green block. "But if it was a different object, such as a block versus a cup, it could figure that out."