domingo, 26 de enero de 2020

Scientists build first-person view video database to provide visual data in line with human experience

Download PDF Copy

Reviewed by James Ives, M.Psych. (Editor)Jan 25 2020

To better understand the organization of the brain and the perceptual tendencies in humans, a team of four scientists are recording video from four head-mounted cameras - with eyetracking and head movement - and assembling a massive video database with more than 240 hours of first-person video that can be used by researchers everywhere.

The brain is adapted to the world around us, but we don't have good data on what the world actually looks like to human observers. There are no collections of videos that sample the world the way that humans do - Hollywood cinematographers don't zip the cameras around as fast as human eyes move, so movies don't really reflect the way we take in the world."
Mark Lescroart, assistant professor and neuroscientist in the psychology department at the University of Nevada, Reno

The team of neuroscientists and social scientists is setting out to build a visual database that can be used to more accurately reflect human activity. They will create the vast gallery of videos that show what people see as they go about their daily activities. Their Visual Experience Database can be used to support research and impact future research in fields that rely on the analysis and recognition of images, including neuroscience, vision science, cognitive science, artificial intelligence and possibly digital humanities and art.

To gather the videos, the scientists designed a headset/glasses device to use in collecting the data. While early versions look like a prototype for a Borg device, the team is streamlining the system to keep the weight down and improve wearability. It has two cameras facing forward to see the world and two cameras facing the eyes to track eye movement. There will be five headsets for each of the four labs participating in the research.

The subjects wearing the headsets for the visual data gathering will vary in age from 5 to 70 years old. They'll go to a variety of spaces and engage in many activities: to museums, libraries, shopping, commuting, riding bikes, walking. The research team will analyze cues for 3D space perception and how people commonly experience walls, corners, landmarks and other built structures in the world. A sample of what the cameras see shows the eyes moving and the head turning to look at people and objects.

"We wanted to use mini-computers, but they weren't robust enough to handle our needs, so we ended up with a laptop in a backpack, it makes the headset a little more user-friendly so our subjects wouldn't be distracted by the tech," Paul MacNeilage, assistant professor and neuroscientist in the College of Science at the University of Nevada, Reno, said. "We decided to go with a Pupil Labs product for the base and added devices to it. We didn't want it to be too distracting for others, either."

The system must be able to collect GPS data, run four cameras, access software, have a decent power supply, record three video streams at once, utilize an internal motion sensor and an accelerometer. The technology is much more involved than the stationary devices that are typically used in the lab for eye-tracking studies with a chin rest and a display.

"This is definitely not the same type of studies on eye movement that are done in the lab, with chin rest and a display showing pictures of the environment," MacNeilage said. "This is out in the real world with people interacting with their environment."

This is especially important to MacNeilage, who runs the Self Motion Lab at the University of Nevada, Reno, where graduate students are involved in the groundbreaking research.

"The system measures head and body movement through space," he said. "This allows us to reconstruct visual input moment to moment and get insights on sensory-motor control. No existing database includes head motion."

MacNeilage, Lescroart and graduate student Christian Sinnott tried out the headset, taking it outdoors. They got a few weird looks from those they passed on the street.

"This is like research in the wild," Lescroart said. "Walking across campus and looking for objects will let us see how behavior changes when trying to navigate our environment. We'll be able to ask 'how does info we select change depending on tasks?' We'll see what it looks like when people move their heads and walk."

"We're building a first-person view video database to provide visual data that is more in line with human experience, which should help artificial intelligence make human-like decisions," MacNeilage, one of the team members and an assistant professor and neuroscientist in the College of Science at the University of Nevada, Reno, said. "We aim to find out what does it look like when people walk and move their heads to navigate through the world. We use a simple paradigm, find out how people sample the visual environment with their eyes."

Another use for the visual database is informing artificial intelligence. While artificial intelligence is coming to the forefront of technology, it only works as well as the quality of the data that it is trained on. Instead of systems functioning based only on photos and videos curated from the internet, the team of scientists sees its database as a new source of more accurate data.

"We aim to build our database with biases based on human perception,"MacNeilage said. Other visual databases used for AI have a built in bias. For example, photographers focus on particular objects for certain purposes, whether a commercial or a documentary video; or security cameras with fixed positions. "So scientists need more and better databases. Our database will have biases consistent with human behavior, which could be an advantage for AI."

Current artificial intelligence systems that recognize visual content require millions of training examples to achieve good performance. The databases used to train such systems often take photos and videos from the internet, and thus do not represent the content that humans see on a daily basis. This new database will introduce human-centered biases into the AI systems that can have serious, positive implications for AI-based applications such as self-driving cars.

Between all data sources, the team anticipates that there will be approximately 80 GB of data generated per recording hour, for a total of about 20 terabytes of raw data.

All data will be uploaded and stored in a central location on the University of Nevada, Reno high performance "Pronghorn" cluster of computer servers housed in the Tahoe Reno Data Center, a state-of-the-art facility operated by Switch, headquartered in Las Vegas, Nevada, with a data center in Reno.

From the 20 terabytes of raw data, the team will catalog at least 240 hours of the headcentered video, including the head and eye movements. They will make these data freely available to the public, which will enable scientists, historians and even artists to benefit from the rich resource.

The team includes lead researcher Michelle Greene, an assistant professor of neuroscience and computational vision scientist from Bates College in Maine; Lescroart; MacNeilage; and Benjamin Balas, a visual and cognitive neuroscientist and associate professor of psychology at North Dakota State University.

The various targets of this project will continue MacNeilage's investigations of head movements in natural environments, and will continue Lescroart's investigation of structural scene representations. The database as a whole will provide a highly valuable source of stimuli for ongoing fMRI experiments. It will continue Greene's investigation of the statistics of object location and co-occurrence and visual search; and will continue Balas' investigation of environmental effects on the development of perception.

Source:

University of Nevada, Reno