Sound From Spaces

Generative music has always interested me, so I made this the focus of my music technology thesis. For my end of year exhibition I created a system where virtual environments and procedural music were married through emotion. My idea was to build a system of two parts: a simple virtual space that could be classified emotionally, and a procedural music algorithm that could make music for a given emotion. The goal was an emotionally complementary audio visual experience, as well as a proof of concept for a different VR/gaming sound design workflow, where music automatically follows the virtual environment.

During the exhibition, people were allowed to alter the space's parameters (as if they were changing in real-time within a game) and experience the music responding to these changes. You can watch a demo video above.

Aesthetically I took inspiration from things like Robert Henke's Lumiere and Moderat's Out of Sight music video. The structure of the system borrowed from a generative music system built at the University of Trento.

The generative music algorithm was programmed in Max/MSP, with additive synthesis being used to dynamically generate timbres from the ground up in response to emotions. I coded the visuals in OpenFrameworks. The space and sound were married through a program called Wekinator, a machine learning tool for artists by Rebecca Fiebrink.

The OpenFrameworks code is on my GitHub. If you want more detail, read below.

Quantifying Emotion

For an algorithm to be able to work with emotions, they need to be quantified somehow. To do this the system uses James Russel's Circumplex Model of Affect (pictured), which breaks emotions down into two components: valence and arousal. Valence refers to how positive an emotion is, arousal refers to how energetic an emotion is. For example, a state of anxiety would mean negative valence (unpleasant) and high arousal (energetic). A state of calm would mean low arousal and positive valence. Using this system, emotions can be plotted as points on a 2D plane.

Defining The Space Emotionally

To define the virtual space emotionally, I first defined it by its physical parameters (in this case colour, lighting and movement) then defined how to transform these attributes into an emotion (i.e. a pair of valence/arousal values). For example, no colour, low lighting and slow movement could correspond to "depressed". This transformation or mapping was created using Wekinator.

Machine Learning

Now for a tiny bit about machine learning and Wekinator. Basically, you give a Wekinator patch a bunch of input data (movement, colour, lighting) and output data (valence, arousal). The patch is then "trained" on this data using machine learning techniques, which work out how input and output are related. Once this is done, the patch can take previously unseen input and generate the appropriate output. It takes some work and a lot of training data to get right but the result can be very powerful.

This meant that I had to create lots of pairs of emotion parameters and space parameters. The settings for the "calm" state are pictured above. The same had to be done for the generative music patch, with sets of musical parameters being matched to pairs of valence/arousal values. Max/MSP and OpenFrameworks communicate the parameters to Wekinator via OSC.

Emotion-Driven Generative Music

The generative music patch was broken into two parts: emotional timbre generation and composition parameter generation. Basically, the composition patch "plays" the timbre patch. The timbre patch is the instrument, the composition patch is the musician. This meant there were two Wekinator patches responding to emotion, one generating composition parameters and one generating timbre parameters.


Timbres were designed (using additive synthesis) to evoke different emotions, this process being informed by research linking objective timbral parameters to emotional impressions. For example, brighter timbres were created for higher arousal. Dissonant timbres were created for low valence. The settings for the "alert" system state are pictured above. The parameters used to tailor the timbres were: 128 partial amplitudes (used to alter even/odd harmonic ratio and spectral centroid), spectral roughness, inharmonicity and noise content.


The composition patch sends MIDI notes to the timbre patch to play it. The manner in which these notes are generated is defined by emotion i.e. valence and arousal. Composition parameters with well-researched emotional connotations were chosen: tempo, mode, articulation, velocity, note density and register. Some of these connotations are graphed above. For example, low arousal might mean a legato articulation and high arousal might mean staccato articulation. Negative valence would mean a minor mode, positive valence a major mode. Notes are played randomly from the current mode, the manner in which they are played being defined by the other parameters (tempo, velocity etc).

The Result

The resulting system dynamically generates notes and timbres in response to the changing "emotion" of a virtual space. Some effects like reverb, tape hiss and vinyl warp were added for effect. It definitely could improve with some refining, but the idea is there.

Check out the demo video to see it in action. It demonstrates all the musical parameters changing in response to changes in environment, and also displays the emotional state of the system on the valence arousal plane.