Education logo

Using remarkable precision, AI reconstructs cinematic videos from brainwaves

Using continuous functional Magnetic Resonance Imaging (fMRI) data from participants' brains, a team of researchers employed artificial intelligence (AI) to rebuild films.

By Najmoos SakibPublished 11 months ago 3 min read
4

The researchers collected data from volunteers who had watched movies of various inputs, including animals, humans, and natural surroundings, while undergoing brain scans to publish their findings, which have not yet been peer-reviewed.

The scientists, from the National University of Singapore and The Chinese University of Hong Kong, noted in their study: "Recreating human vision from brain records, especially utilizing non-invasive technologies like functional Magnetic Resonance Imaging (fMRI), is an intriguing but tough undertaking. "Non-invasive methods capture limited information and are susceptible to various interferences, such as noise," despite being less obtrusive.

The fact that fMRI machines take pictures of brain activity every few seconds makes it difficult to recreate video (or movement) information, such as what someone watched while having their brain scanned. Worse:

"Every fMRI scan effectively represents a 'average' of brain activity at that particular moment. A standard video, in comparison, has roughly 30 frames per second (FPS). If an fMRI frame takes 2 seconds, then 60 video frames—possibly comprising different objects, actions, and scene changes—are shown as visual stimuli throughout that time. As a result, it is difficult to decode fMRI and recover films at a frame rate that is higher than the fMRI's temporal resolution.

They modified the image-generating AI model Stable Diffusion to reconstruct the input as video after training the AI, which they refer to as MinD-Video, to decode the fMRI data. The movies were then evaluated in terms of semantics (if the AI recognized the input as a cat, a running human, or anything else) and scene dynamics, or how accurate the visual reconstruction was at the pixel level.

According to the team, their system was 85% accurate in terms of semantics, exceeding the previous best-performing AI model by 45%.The scientists said, "Basic objects, animals, people, and scene types can be well recovered [from brain scan data]." More crucially, the scene dynamics, including the close-up of a person, the fast-motion scenes, and the long-shot scene of a city view, may also be correctly rebuilt.

The researchers believe their work has promise for the development of brain-computer interfaces, while they warn that regulation is important to safeguard people's biological data and "avoid any malicious usage of this technology." They have uploaded other instances of their work on their website Mind-Video.

Reconstructing human vision from brain activity has been an intriguing effort that has aided in our understanding of our cognitive process. Despite recent successes in recreating static images from non-invasive brain recordings, research on recovering continuous visual experiences in the form of films is sparse.

We discovered three gaps between our earlier image reconstruction work and video reconstruction:

• When processing dynamic brain processes, the hemodynamic response causes a time delay. This temporal latency can make tracking real-time brain responses to stimuli difficult.

• Mind-Vis, our prior work, currently lacks both pixel-level and semantic-level assistance. This omission may have an influence on the tool's ability to generate reliable reconstructions.

• The dynamism of the scene inside one fMRI frame must be kept while working to improve our process' generating consistency. Accurate and stable reconstruction across a single fMRI time frame depends on this equilibrium.

The research is available on the arXiv pre-print server. Reconstructing human vision from brain activity has been an intriguing effort that has aided in our understanding of our cognitive process. Despite recent successes in recreating static images from non-invasive brain recordings, research on recovering continuous visual experiences in the form of films is sparse.

We propose Mind-Video, which gradually learns spatiotemporal information from continuous fMRI data of the cerebral cortex using masked brain modeling, multimodal contrastive learning with spatiotemporal attention, and co-training with an augmented Stable Diffusion model that incorporates network temporal inflation. We demonstrate that Mind-Video can rebuild high-quality videos at any frame speeds using adversarial guiding.

Different semantic and pixel-level measures were used to assess the restored videos. We outperformed the previous state-of-the-art by 45% and attained an average accuracy of 85% in semantic classification tasks and 0.19 in structural similarity index (SSIM). We also demonstrate our model's scientific plausibility and interpretability, which reflects known physiological processes.

product review
4

About the Creator

Najmoos Sakib

Welcome to my writing sanctuary

I'm an article writer who enjoys telling compelling stories, sharing knowledge, and starting significant dialogues. Join me as we dig into the enormous reaches of human experience and the artistry of words.

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Sign in to comment

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2024 Creatd, Inc. All Rights Reserved.