MuVi

MuVi is a dataset for affective multimedia content analysis. The motivation for the paper was that, intuitively, how we perceive and understand media is influenced by the medium through which we consume it.

A small exercise for illustration follows.

Try listening to the opening lines of this song:

Porter Robinson's Shelter.

What emotions do you think the song was trying to convey? Did you interpret the song to be generally more uplifting, or more melancholy? How do you think the emotion of the music changed as the song unfolded?

Keeping those thoughts in mind, now watch the music video that accompanied the song.

 

I hope the experience was as fascinating to you as it was to me the first time I tried this! More broadly, we can now imagine various scenarios in which media can be consumed in multiple different configurations:

  • Shortform video (YouTube Shorts, Instagram reels, or TikTok videos) could be consumed with both audio and video, or just silent video (for example, if you’re on public transport without earphones).
  • Podcasts could be consumed as just audio, or together with a video, if the creators had filmed themselves recording the podcast.
  • Music could be consumed as just audio (Spotify, Tidal, etc.) or together with an accompanying video (e.g., music video or live performance recording).

To understand how the auditory and visual modalities contribute to the perceived emotion of media, we presented music videos to particiapnts in three conditions: music 🎵, visual 📷 and audiovisual 🎬. Participants continuously annotated the arousal and valence of the music videos, and also reported the overall emotion conveyed.

The dataset can be found at Zenodo here (extracted audio + visual features only) and here (raw music videos).

Our results suggest that perceptions of arousal are influenced primarily by auditory information, while perceptions of valence are more subjective and can be influenced by both visual and auditory information.

Role

For this paper, I:

  • Worked on data analysis, with inputs from the rest of the team, and
  • Wrote the first draft of the manuscript.

Prof Kat Agres led the study design, with contributions from Dimos Makris, Dorien Herremans, and Gemma Roig. Material preparation and data collection were performed by Dimos Makris. All authors contributed to the editing and revisions.

Publications

Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses
Under review at Neural Computing and Applications
Phoebe Chua, Dimos Makris, Dorien Herremans, Gemma Roig, Kat Agres