Hyderabad: Imagine a robot or an AI assistant giving you a nudge before you accidentally oversee your food with salt.
This could well be possible, thanks to Ego4D, a project initiated by Facebook AI in collaboration with Facebook Reality Labs Research (FRL Research) and 13 partner institutes and labs from UK, Italy, India, Japan, Saudi Arabia, Singapore, and the United States.
This November, they will unveil a mammoth and unique dataset comprising over 2,200 hours of first-person videos in the wild of over 700 participants engaged in routine everyday activities.
International Institute of Information Technology (IIIT) Hyderabad is going to be the only Indian institute contributing to the dataset in the global consortium.
Computer vision is the process through which we try to equip machines with the same capabilities of visual detection and understanding that the human eye possesses. This is typically done via cameras that take photos and videos from a bystander perspective.
What makes the Ego4D project novel and next-generational is the manner in which data has been collected. "These are videos that show the world from the center of the action, rather than the sidelines," said Kristen Grauman, a lead research scientist at Facebook AI.
The footage has been collected via head-mounted devices combined with other egocentric sensors. By recognizing the location, scene of activity, and social relationships, these devices could be trained to not only automatically understand what the wearer is looking at, attending to, or even manipulating, but also the context of the social situation itself.
"Initially, we wanted to have a team that could travel across the country and participate in data collection. But with the pandemic, we had to find multiple local teams and ship cameras as well as data. We had to train people over videos," said Prof. CV Jawahar of Center for Visual Information Technology at IIITH.
IIITH collected data from over 130 participants spread across 25 locations in the country. At each location, participants spanned a gamut of vocations and activities from home cooks, to carpenters, painters, electricians, and farmers.
"This is not a scripted activity carried out by graduate students. Video footage has been taken as each individual went about his or her daily tasks in a normal setting," Prof. Jawahar said.
While computer vision has always had the potential for assistive technologies that improve the quality of life, this dataset could help push the envelope even further. For instance, while cooking, technology can prod you in the right direction. If you miss a step, it can remind you. If you're doing well, it can encourage you and pat you on your back.
Even while conducting surgeries, it can guide and provide additional cues to the surgeon wearing the device," said Prof. Jawahar.