Engineering and Music
"Human Supervision and Control in Engineering and Music"

Orchestra Concert
Ensemble Concert
About us

Kia Ng

Music via Motion: Trans-Domain Mapping of Motion and Sound
This paper presents a continuing research framework on the trans-domain mapping between physical movements of the performer(s) and musical events.  Starting with a brief background on the initial prototype, this paper describes various collaborative projects and future directions, exploring the relationships between the music and motion.
Motion, gesture and perhaps body language to a certain extent, directly and indirectly influenced various factors of artistic performances (audio-based or otherwise).  Few musicians perform without much body movements; besides the mechanics of the instrumental playing, the body of the performer tends to move or sway in an individualistic manner (consciously or unconsciously), which may or may not reflect the music. 

With the advancements of electronic and computing technologies, there has been increasing interests in new musical instrument design to augment traditional instruments (Schoner, Cooper and Gershenfeld 2000) with new capabilities, for examples, controlling synthesis parameters, visual output or triggering sound samples (Paradiso et al. 2000), as well as new interface designs to provide better ergonomics considerations, and/or offer simpler instrumental control to a wider users.  With such systems, the mode of interfaces, sensitivities and reactions (output) are highly flexible and can be configured or personalised, allowing better access to musical instrument playing with shorter learning time. 

This project aims to create an intuitive and non-intrusive interactive audio-visual performance interface. Detecting and tracking of motion, and applies the detected movement to influence, trigger or generate musical events or control certain parameters (for example volume, pitch or timbre), therefore allowing the users or performers real-time control of multimedia events using their physical movements.

Music via Motion (MvM)
MvM framework uses input from a video camera to process video frames acquired in real-time, detects and tracks visual changes of the scene under inspection, and make use of the recognised movements to generate interesting and relevant musical events, using an extensible set of mapping functions (Ng 2000).  Figure 1 illustrates the main modules.

Figure 1: Main modules.

Figure 1: Main modules.

The acquisition module consists of data capture system, which interfaces the framework to the real world environment, including digital video and physical sensors data acquisition.  The feature detection and tracking module contains algorithms to locate and follow certain predefined features in the input data, such as motion, shape and colour.  The mapping module is made up of an extensible and configurable set of functions which reacts to the detected features by generating an appropriate output.  The output and simulation module is responsible for multimedia events creation. For example, sound samples or MIDI data playing.

The current prototype is equipped with motion and colour detection modules.  Motion detection and tracking sub-modules include standard frame-differencing and background subtraction.  Pixel-wise colour segmentation in RGB space is straightforward and surprisingly effective, but the performance is sensitive to lighting condition.  To enhance colour segmentation, works in hand include transforming the colour representation (Raja, McKenna and Gong 1998, Drew, Wei and Li 1998) to minimise the variance of a colour cluster with illumination normalisation.

Basic mapping functions include a distance-to-MIDI-events mapping, with many configurable parameters, such as scale-type, pitch range and others.  Musical mapping can be enhanced with a database of composed musical phrases and several mapping layers can be overlaid in order to produce multi-layered and polyphonic effects (see Figure 2). 

Figure 2: Multi-layered mapping.
Figure 2: Multi-layered mapping.

With MvM, the whole body of the user acts as a musical instrument interface, which determines the tempo, volume and audio generation of the performance.

Coat of Invisible Notes (CoIN)
CoIN is a collaborative project designed to bring together multiple creative domains to build special costumes, music and dance within an interactive audio-visual performance interface simulated by the MvM.  For CoIN performances, MvM is configured to detect and track the colour where visual changes were detected.  Detected colours are used to control the choice of musical sound and effects.  Hence the visual changes of the costumes can also be used to control the character of the musical responses.

Constant one-to-one direct mapping of movement can also be tiresome and uninspiring.  For the CoIN performance, a background layer of music was specially composed, to provide a coherent structure, with various timed intervals for MvM to perform its solo candenza.  Basic expressive features are being added to the MvM prototype.  This includes an accent detector module which keeps a history of the region size of the detected visual changes, the directions and speed of the motion, and their means.  Sudden changes in these parameters are used to control factors in audio generation. 

Figure 3: MvM/CoIN performance.

Figure 3: MvM/CoIN performance.

Figure 3: MvM/CoIN performance.

Music Head
It was found that many other researches in visual tracking and sensing, and existing system, could be integrated for MvM exploration.  In order to provide seamless integration, data communications between the main modules have been enhanced using socket to enable cross platforms and distributed processing. 

The distributed mapping module was first tested for the Interactive Music Head collaborative projects, integrates MvM with a real-time face (and expressions) tracker from an ongoing research project, which aims to create a synthetic talking head intended for mediating interaction between humans and machines (Devin and Hogg 2001).  Figure 4 illustrates the real-time face tracker system.

Figure 4: Real-time face tracker system with spline curves classifying primary face structures.

Figure 4: Real-time face tracker system with spline curves classifying primary face structures.

Figure 4: Real-time face tracker system with spline curves classifying primary face structures.

Experimentations on various different mapping approaches, using the face shape contour, represent the works in hand.  Figure 5 presents a design of the distributed and configurable mapping module. 

Figure 5: Configurable mapping module.

Figure 5: Configurable mapping module.

Figure 5: Configurable mapping module.

Future Directions
Besides multiple cameras input, various sensors and imaging technologies (e.g. infra-red, thermal and range imaging) could be used for sensing and data acquisitions, depending on the applications and the environment for such system or installations. 

In addition to the video and sensor tracking of human motion for creative mapping, the data could be used to automatically generate statistical models of typical trajectories and motions (Johnson 1998).  With such models, realistic behaviours can be generated and applied to control virtual performer (Volino and Magnenat-Thalmann 1999, Badler et al. 1999) simulation, which could interact with human performer/user.  Future plans include behaviour modelling (Johnson, Galata and Hogg 1998), and other motion, gestural and expression trackers (Camurri et al. 2000). 

This paper described the MvM framework, mapping and integrating movements and musical events, and presents examples of applications using such technologies for the performing arts.

There has been an increasing interest in system like the MvM from a variety of disciplines (Ng et al. 2000, Siegel and Jacobsen 1998, Siegel 1999, Wanderley and Battier 2000).  In addition to original intensions for basic multimedia event triggering, live performing artists, choreographers, dancers, composers and artists have found many creative applications for the prototype. There may also be applications for music therapists, to encourage movement, using this motion-sensitive system to provide interactivity and creative feedback.

The author would like to thank Jon Scott, Gillan Cash, Vincent E. Devin, Aphrodite Galata and David Hogg for the Interactive Music Head collaboration.  CoIN project was carried out with financial assistance from Yorkshire Arts.  Thanks to the CoIN project team, Eddie Copp, Claire Nicholson, and dancers from the Bretton Hall College for their collaboration and support.
Badler, N.I., Bindiganavale, R., Rourne, J., Allbeck, J., Shi, J., and Palmer, M. (1999). Real time virtual humans, in proceedings of the 4th International Conference on Digital Media Futures, National Museum of Photography, Film & Television, Bradford, UK.

Camurri, A., Hashimoto, S., Ricchetti, M., Ricci, A., Suzuki, K., Trocca, R. and Volpe, G. (2000). EyeWeb: toward gesture and affect recognition in interactive dance and music systems, Computer Music Journal, MIT Press, 24(1): 57–69.

Devin, V. E. and Hogg, D. C. (2001). Reactive memories: an interactive Talking-Head. Research Report Series, School of Computing, University of Leeds, Report 2001.09.

Drew, M. S., Wei, J. and Li, Z-N. (1998). Illumination-invariant color object recognition via compressed chromaticity histograms of normalized images, Sixth International Conference on Computer Vision, Narosa Publishing House, pp. 533–540. 

Johnson, N. (1998). Learning object behaviour models, PhD thesis, School of Computer Studies, The University of Leeds, UK.

Johnson, N., Galata, A. and Hogg, D. (1998). The acquisition and use of interaction behaviour models, in Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 866–871.

Ng, K.C. (2000). Music via Motion, in Proceeding of XIII CIM 2000 - Colloquium on Musical Informatics, Italy.
Paradiso, J. A., Hsiao, K-Y., Strickon, J. and Rice, Peter. (2000). New sensor and music systems for large interactive surfaces, in proceedings of the International Computer Music Conference (ICMC 2000), Berlin, Germany, pp. 277–280.

Raja, Y., McKenna, S. and Gong, S. (1998). Segmentation and tracking using colour mixture models, Asian Conference on Computer Vision (ACCV), Hong Kong, Lecture Notes in Computer Science 1351, I: 607–614.

Schoner, B., Cooper, C., Gershenfeld, N. (2000). Cluster-weighted sampling for synthesis and cross-synthesis of violin family instruments, in proceedings of the International Computer Music Conference (ICMC 2000), Berlin, Germany, pp. 376–379.

Siegel, W. and Jacobsen, J. (1998). The challenges of interactive dance, an overview and case study, Computer Music Journal, 22(4): 29–43.

Siegel, W. (1999). Two compositions for interactive dance, in proceedings of the International Computer Music Conference (ICMC99), Beijing, China, pp. 56–59.

Volino, P. and Magnenat-Thalmann, N. (1999). 3D fashion design and the virtual catwalk, in proceedings of the 4th International Conference on Digital Media Futures, National Museum of Photography, Film & Television, Bradford, UK.

Wanderley, M. and Battier, M. (eds.) (2000). Trends in gestural control of music, Ircam - Centre Pompidou, 2000.