Engineering and Music
"Human Supervision and Control in Engineering and Music"

Orchestra Concert
Ensemble Concert
About us

Thomas B. Sheridan

Musings on Music Making and Listening: Supervisory Control and Virtual Reality
"Information Theory, Photosynthesis and Religion" was an imaginary title used by the MIT information theorists Peter Elias and David Huffman a half century ago to deprecate efforts to generalize certain mathematical theories to just about everything. Perhaps we are guilty of doing the same, with a symposium tying supervisory control to music.  With that precautionary caveat let us consider the topics at hand: 
1.  Music as Supervisory Control: a Three Level Hierarchy
At first glance the idea that supervisory control and music have any significant relation to one another seems far-fetched.  And I have no idea what connection Prof. Dr. Ing. Johannsen has in mind.  But upon reconsideration, the connection seems natural and rich for consideration.

Supervisory control has been defined as the situation where “One or more human operators are intermittently programming and continually receiving information back from a computer that itself closes an autonomous control loop from its sensors and through its actuators to the controlled process or environment.

Let me modify that definition just a tiny bit to make it more general:  substitute “information processor, living or artificial” for “computer.” Then we can use the notion of supervisory control to characterize a music-making system as:
(1) a human, acting as composer, conductor or teacher, “programming” another human to play an instrument, or 
(2) a human programming his or her own body to play an instrument, where in the latter case the conscious brain programs and exercises a lower-than-conscious semi-automatic nervous system to perform on an instrument,
(3) (1) and (2) in combination 

Since the combination (3) subsumes (1) and (2) let us consider it in detail.  It is depicted in Figure 1. U0, U1 and U2 represent successively higher level command variables, while X0, X1 and X2 represent successively higher state variables (the state of each right-hand element intended to be controlled by the left-hand element.

Figure 1. The hierarchical cascade of cause-effect  relationships in music.
Figure 1.  The hierarchical cascade of cause-effect  relationships in music.

First, what might be gained by considering the relationship in these terms? I believe a greater understanding of what music-making is, at least with respect to communication (between participants and with their instruments), biomechanics (the relations of forces and kinematics of the human body), dynamics (the study of forces and movements of the elements of a system in temporal relation to one another), and control.

1.1  The 0th level of control; biomechanics
The 0th (U0, X0) level of control is biomechanical. This is the level at which we know the most with respect to manual control. Depending on the instrument the musician must move her fingers, hands and arms and possibly her lips and lungs in programmed patterns.  The combined mass, damping and elastic characteristics of these body parts as well as the bow, percussion device, keys (on a piano or wind instrument) etc. constrain the relationship between the driving forces U0 (from muscles) and the body segment displacements X0

For any physical body in the known universe there is an approximate relation 

U0 / X0 = MD2 + BD + K          (1)

which holds, otherwise known as Newton’s law.  D in the equation stands for time derivative (or rate), D2 for time second derivative (or acceleration), M for mass, B for viscous damping, and  K for stiffness.  B and K can be non-linear parameters, so for that reason the (linear) differential equation above is only a first approximation. From this equation we can determine that the natural frequency of a simple spring-mass system with negligible damping is

wn = (K/M)0.5           (2)

This determines the frequency of a vibrating string or a vibrating column of air in an organ pipe of a woodwind instrument (e.g., a tighter string increases frequency, a longer and heavier string decreases it).  As damping, the B in equation (1), increases, the frequency of vibration gets proportionately lower. Equations (1) and (2)  also determine the natural rhythm of human body movement, the pace that the body naturally falls into with minimal forcing, and the temporal pattern that requires the least effort to sustain it. In the case of  human body limbs we have a rotational spring-mass-damper systems, meaning the mass elements (limbs, head, etc.) are rotating around various joints. But the same rules apply, the muscles serving as both springs and dampers, their coefficient values depending on how loosely or tightly the agonist and antagonist muscle pairs are flexed. 

One can freely (with muscles relaxed) wiggle the wrist, wag the head, swing the arm, swing the leg, walk, run, dance and very easily discover these natural frequencies.  One finds from experiment that the natural frequency of wrist wiggling is slightly faster than that for arm swinging, because, for  fixed K, the distance L from the associated joint to the equivalent mass concentration point is longer (where rotational inertia, the equivalent of M in equation (2), is ML2).  One also discovers that the range of natural frequencies is quite small, say between 1 Hz (a slow walk) and 4 Hz (wrist wiggling).

If a limb is hanging vertically and freely (muscles are very relaxed, so that both damping and stiffness of the muscles are near zero) the limb forms a pendulum.  In this case the force of gravity provides the more dominant restoring (spring) torque, pushing the pendulum toward its neutral (vertical) position.  Since the restoring torque (rotational equivalent of K) is proportional to M and pendulum length L, and the rotational inertia (rotational equivalent to M) = ML2, the natural frequency of the pendulum is proportional to (ML / ML2)0.5
(1/L)0.5. This means that the range covering pendular natural frequency of wrist, arm and leg pendula is again quite small, perhaps again only between roughly 1 and 4 Hz. 

So is there some significance of this for music? Yes, tremendous significance, in the sense that the biomechanics absolutely constrain the spectrum of  rhythmic pattern which we call music.  The musician, depending on the instrument, moves his or her limbs, coupled also to the controllable parts of the musical instrument, according to a musical “beat.” The frequency range of this “beat” lies in this same narrow range, roughly between 1 and 4 Hz. The pacing is actively controlled by the muscles, where the damping is not so small that the natural limb resonances “fight” the active control – unless the muscles try to drive the limbs faster than approximately 4 Hz. (Note that the range of musical pitches is governed by the organs of hearing, the most sensitive range of the basilar membrane lying between roughly 50 and 5000 Hz, a factor of 100 between lowest and highest frequency. We know that elephants and other large creatures communicate at much lower frequencies, perhaps for some analogous biomechanical reasons.)

Theoretically music could be made at any other range of rhythm or pace, from one “beat” per hour (or some maximum duration of sitting) up to a frequency that becomes a sound rather than a beat. But musicians do not employ these frequencies, for it would not be “natural.”

1.2  The 1st level of supervisory control: putting the notes together
Referring back to Figure 1, muscle forces U0 drive X0 into conformance with U1 commands, one note at a time, under the above timing constraints. This servomechanism  can be called the lowest level of sensory-motor skill. At this biomechanical level U0 and X0 are essentially below the conscious level, except for the raw beginner instrumentalist. 

The U1 pattern is the output of a mostly conscious higher level process which has X1, the musician’s hearing, as its feedback. What is heard is the guide to what is next ordered by the brain in the form of U1.

We know from early experiments in manual control (McRuer and Jex, 1967) that the low-level brain and neuromuscular system (from U1 through U0 and back to X0 and then to X1 has a round-trip time delay of at least 0.2 seconds. So what is heard from one’s own instrument is delayed 0.2 second or more from the time the action to produce a given note is initiated in the brain. But how can that be, for surely there is no instability of other irregularity in the controlled response. The fact is that U1 anticipates the sound feedback because there is a succession of musical notes which the musician sees (U2) on the score by “reading ahead.”  This has been called preview control.

1.3  The 2nd level of supervisory control: composing and conducting the synchronization of musicians
The writing of music is essentially the programming of the instructions (U2) to an intelligent musical production process.  The lone musician then performs from reading those instructions (the score) or from memory (after practice).  The writer gets feedback from performing the music himself or from listening to initial performances and revising, given some memory of how it sounded (X2). The memory of musical patterns, especially among those gifted and experienced in music, is prodigious. It is common for writers and soloists to remember a whole musical piece.

With multiple musicians, of course, a conductor sets the pace and brings them into synchrony with his baton, a second meaning of U2. Both the writer and the conductor adjust their renditions to the capabilities of the musicians and their instruments, much as the manual controller quite naturally adapts his control actions to the dynamics of the process being controlled. Conductors are also known for their incredible memories. This writer has had the pleasure of watching Seiji Ozawa conduct the Boston Symphony Orchestra on many occasions using no musical score whatsoever.

We are beginning to understand the neurobiology of how with practice the inter-neuronal connections are established so that hearing and playing one note or pattern of notes automatically triggers in the brain the playing of the ensuing pattern of notes. But full understanding of memory and performance of music or anything else at this level will take many more years. 

2.  Music as Virtual Reality: the Power of Metaphor
The phenomenon of virtual reality has become relevant to the control community and it should become relevant for the music community.
2.1  Telepresence and virtual reality
Those of us interested in human control, particularly remote control, know that these days it is becoming increasingly practical for a person to do a task anywhere without actually being there; this is called teleoperation (Sheridan, 1992). Further, by use of special new interface technology a person can see, hear, and possibly touch an environment that is arbitrarily remote in space — and actually feel present in the remote location; the latter is called telepresence.

With respect to vision, telepresence is accomplished by wearing a head-mounted video display which drives a video camera at the remote location, so that, whatever the position and orientation in the local space (where the observer is), the video camera assumes the same relative position in the remote space. Thus the observer sees what he or she would see as if present in the remote space. The sense of telepresence is dramatic. 

With respect to hearing, telepresence is accomplished by means of two microphones at the remote site which are positioned in correspondence to the positions of the observer’s two ears.  These transmit sounds back to earphones on the observer’s head.  By further use of a head-related transfer function (which duplicates the front-back and up-down spectral filtering of the outer ear), one can locate sound sources with respect not only to left-right but also to front-back and up-down.  This technology is well developed but is still a bit expensive for the commercial market.

With respect to touch, telepresence is accomplished by connecting the hand to a mechanical device that exerts differential force back on the skin duplicating the forces that are being transmitted and applied to objects in the remote environment. This technology began with force-reflecting master-slave manipulators and is now pushing in the direction of producing more subtle haptic feedback  (e.g., data gloves) (Durlach and Held, 1995).

Now consider that instead of head, ear and hand position producing sight, sound and tactile images that are generated by events in an actual remote environment, the head, ear and hand positions cause the sight, sound and tactile images to be produced artificially, i.e., by computer.  This, of course, is technology under active development. The experience of using such technology, and the field itself, are commonly referred to as virtual reality (though that being an oxymoron, virtual environment or virtual presence are probably more appropriate terms). 

2.2  Suppression of disbelief; music as metaphoric existence 
The MIT Press journal Presence: Teleoperators and Virtual Environments has seen an active discussion in recent issues on the nature of “presence,” virtual and otherwise (one is reminded of the various other uses of that term, e.g. ontological physical presence, divine presence, stage presence). 

Psychophysical measures have been proposed, ranging from simple subjective rating scales, to measures based on the tendency to make natural bodily responses to virtual events (e.g., closing one’s eyes or “ducking’ one’s head if a virtual object is seen to be on a collision course), to the use of visual or auditory masking noise to determine how little noise is required before one can no longer discriminate virtual from real (Sheridan, 1996). 

The “presence” research  community agrees on this empirical fact: the sense of “presence” in the virtual environment is enhanced by voluntary suppression of disbelief (that a virtual environment is not a real one). This is probably the same phenomenon that allows a person to be hypnotized, or to have a religious experience (Sheridan, 1998). Actually there are many forms of virtual reality in which people participate and can let themselves be “carried away,” and most have been around for centuries: story-telling, theater, photography, television, etc. 

And here is where music comes in. I would argue that music (which physically is just  patterned sound), whether live or recorded, engenders an auditory virtual reality. (I am not referring here to the virtual reality of stereo or surround sound recording, though that helps create the virtual presence). Music is music because it stimulates imagination, provokes moods, and puts the listener “in another world.” Of course that is nothing new for musicians, since they intentionally try to capture the sound patterns of running brooks, fluttering birds, or thunder claps, as well as the much harder-to-describe gamut of human emotions (from Pathetique and Blues to Ode to Joy and the 1812 Overture). 

The sound patterns, in other words, strike not only a mechanical  resonance with the natural frequencies of the human body, making us want to dance, but also strike an emotional, even spiritual, resonance through use of musical metaphor.

The technology of virtual reality, now much influenced by control engineering, has truly triggered a new interest in the very old ontological problems: What is real? What does it mean to exist? What is music? Where does it take the listener? 

McRuer, D. T. and Jex, H. R. (1967).  A review of quasi–linear pilot models.  IEEE Trans. Human Factors in Electronics HFE-4, no. 3: 231–249.

Sheridan, T.B. (1992). Telerobotics, Automation, and Human Supervisory Control, Cambridge, MA:  MIT Press.

Durlach, N.I and Mavor, A.S. (Eds) (1995). Virtual Reality. Washington, DC: National Academy Press.

Sheridan, T.B. (1999).  Descartes, Heidegger, Gibson and God: toward and eclectic ontology of presence.  Presence 8 (5), pp. 549-557.