Workshop Engineering and Music

"Human Supervision and Control in Engineering and Music"

Alex Kirlik

Entropy Based Measures of Control, Coordination, and Improvisation

Introduction

Interesting and promising analogies can be drawn between the performance of musicians, guided by a written score and possibly under the direction of a conductor, and human-machine performance in complex engineering systems. Many of these connections have been ably outlined by Johannsen in his introductory notes for this workshop on Human Supervision and Control in Engineering and Music. My aim in this paper is to focus on how techniques from the engineering and psychological sciences might enrich our understanding of music theory and performance, and conversely, how studying musical performance might broaden the scientific understanding of control and coordination in complex, human-machine systems. More specifically, I will present a conceptual scheme for understanding control, coordination, and improvisational aspects of human performance in both music and engineering. The fundamental concepts at the heart of this scheme are order and disorder, along with their cognitive counterparts, expectation and uncertainty.

Uncertainty and Order in Music Perception and Performance

Leonard Meyer, in his classic book Emotion and Meaning in Music (1956), sketched a theory of music grounded in the Gestalt theory of perception. Meyer argued that much of what makes music meaningful and emotive arises out of its capacity to prompt, suspend, and sometimes even violate listeners’ expectations (what he called “the inhibition of a tendency”). Meyer concluded his analysis with the following observation:

Finally, it is important to emphasize that a theory of music does not exist in a kind of splendid, irrelevant isolation. If it is to be fruitful, music theory must not only be internally consistent but it must also be consistent with and relevant to concepts and theories in other realms of thought. Thus it is significant that many of the concepts presented in this book have clear counterparts in the theory of games and in information theory. To cite only one instance of this: it seems quite possible to equate the inhibition of a
tendency, which of necessity gives rise to uncertainty and awareness of alternative consequents, with the concept of entropy in information theory. (Meyer, 1956, p. 255)

Meyer’s interest in the cybernetic concepts of information and entropy, which allow uncertainty and order to be treated mathematically, arises from the prominent role that listeners’ expectations, as well as the suspensions and violations of these expectations, play in his theory. In particular, Meyer conceives of these expectations, or tendencies, as cognitive-affective adaptations (learned rather than innate) to the statistical structure of music. In his theory, then, the concept of probability plays a crucial role:

A sound or group of sounds (whether simultaneous, successive, or both) that indicate, imply, or lead the listener to expect a more or less probable consequent event are a musical gesture or “sound term” within a particular style system. . . . Ambiguity is important because it gives rise to particularly strong tensions and powerful expectations. . . . There would seem to be various degrees of ambiguity. A sound stimulus becomes a sound term by
entering into probability relationships with other sound terms within the style. These probability relationships are of different degrees (Meyer, 1956, pp. 45-52).

Given this conception of music perception, the problem for Meyer then became to explain why music perception appears to be so regular and highly structured in the face of the complexly textured probabilistic tapestry that is the musical “sound stimulus.” And not surprisingly, Meyer turned to a prevailing perceptual theory of his day, Gestalt psychology, for the answer. The three central chapters of his book each describe “Principles of Pattern Perception” in terms of Gestalt “laws” such as good continuation, completion, closure, etc. In this view the listener resolves the uncertainties inherent in the probabilistic stimulation and formulates expectations by structuring the stimulus information in terms of these organizing principles of “good form.” On this issue, however, Meyer (a music scholar rather than psychologist) did not sketch a particularly productive path for the further study of music perception or performance. Gestalt theories have lost much of their luster since Meyer’s day, largely because of the difficulty of making predictive use of the principles of “good” form. These principles seem to explain much, but predict very little, providing an answer to the riddle of how we effectively perceive in uncertain conditions that is more apparent than real.

Ecological Perception

Meyer came tantalizingly close to another solution to the problem of music perception in his 1956 book, in his discussion of where meaning is to be found in music:

Meaning, then, is not in either the stimulus, or what it points to, or the observer. Rather it arises out of what [George] Mead and [Morris] Cohen have called the “triadic” relationship between (1) an object or stimulus; (2) that to which the stimulus points; and (3) the conscious observer. (Meyer, 1956, p. 34)

Interestingly, the same year that saw publication of Meyer’s book also saw this “triadic” model of the encounter of person and environment put forward as an alternative to the Gestalt theory of perception, in the book Perception and the Representative Design of Psychological Experiments by Egon Brunswik (1956). Brunswik’s “ecological” theory of perception and judgment, which offered a different and novel solution to perception in a probabilistically textured environment, is best understood in terms of his “lens model,” depicted in Figure 1.

Figure 1. Brunswik’s Lens Model

In Brunswik’s lens model one can see the three entities that comprise Meyer’s “triadic” theory of meaning in music. Musical perception or judgment appears on the left side of the diagram, corresponding to Meyer’s “conscious observer.” Perception or judgment is of a distal event (“that to which a stimulus points” for Meyer) and this perception or judgment is indirect; it is made on the basis of proximal information or cues (for Brunswik) or the stimulus (for Meyer).
Brunswik advanced his lens model of perception specifically as an alternative to Gestalt theory, as it solves the problem of perceptual attainment quite differently than in terms of cognitive principles of form. Instead, for Brunswik, attainment is to be explained by a detailed study of the probabilistic relationships that mediate between the observer and the proximal cues or stimulus, and between the stimulus and the distal object or event. In the modern psychological literature these relationships are formalized in terms of linear regression (statistical) models that capture both how the cues and distal objects or events are related (the “ecological validities of the cues” in Figure 1) as well as how the cues are related to perceptions or judgments (the “cognitive strategies” in Figure 1). A comprehensive description of research within the Brunswikian framework since the publication of his classic 1956 book can be found in Hammond and Stewart (Eds.), The Essential Brunswik, 2001.

Like Meyer, Brunswik too saw a relationship between his model of adaptation to a probabilistic environment and emerging ideas in cybernetics and the theory of information:

The reader will recognize that the vicariousness of psychological cues and means which we have come to acknowledge as the backbone of stabilized achievement may be viewed as a special case of receiving or sending messages through redundant, even though not literally repetitive channels . . . . Even so, there is a long way to go from the rather rudimentary emergence of the concept of cue in cybernetics and in the theory of communication to the more varied and somewhat metaphorical applications that would have to be made to render these considerations really fruitful in psychology. Hitherto most of the efforts to apply these disciplines to psychological problems have been rather literal minded and have considered the organism rather than the ecology as the prime source of noise and uncertainty. Italics my emphasis. (Brunswik, 1956, pp. 142-143).

Early applications of information theory in psychology focused largely on viewing internal perception, cognition, and motor processing in terms of the transmission of information along a noisy communication channel. These applications gave rise to various mathematical human performance laws for relatively simple reaction time and discrete movement behaviors (see Wickens and Hollands, 2000). On the other hand, information theory has to this point been of limited value in understanding more sophisticated and flexible cognitive behaviors, and has fallen largely into disuse in most of psychology. Those of us, however, who are more interested in human interaction with the outside world, with machines, and with other humans, than what is going on solely in the head per se, can find many uses for information theoretic concepts in describing the structure of the environment, and the nature of human-environment interaction. In the remainder of the paper I will sketch a few of these applications in the areas of distributed control, coordination, and improvisation.

Control and Coordination in Musical Performance

Understanding the functioning of a complex and distributed system such as an orchestra or a NASA mission control team requires techniques for describing the causal dependencies between the many entities or agents within the system that result in synchronized performance. Each agent, we might assume, has available a set of proximal information sources or cues, in Brunswik’s sense, and attempts to adapt its own behavior in light of these cues in a manner that is both predictable and desirable. In doing so, each agent potentially becomes a dynamic source of proximal information for others, and for our purposes it does not matter whether these sources consist of the movement of a baton, the playing of a note, or a keypress on a computer console. In some cases control and coordination among the agents might be achieved in a centralized manner: one agent is the source of proximal cues for all the other agents, which adapt their behavior to these information sources. In other cases, control and coordination might be decentralized: multiple agents provide proximal cues to multiple agents. In other cases group behavior might be not only distributed but also mutual: agents might dynamically adapt their behavior to each other, or sets of agents mutually to other sets, throughout a performance.
Consider the following data collected by Shin Maruyama of the University of Tokyo, to whom I am indebted for making it available for the present use (from Maruyama, 2001). Figure 2 depicts data Maruyama collected while observing the first day of rehearsals of the Tokyo Mozart

Figure 2. Coordination Between Conductor and Concertmaster (See Fig 3 caption for details)

Players under the direction of the Japanese conductor Rysuke Numajiri, while rehearsing a studio performance of Beethoven’s 5th Symphony. Figure 3, also due to Maruyama, depicts similar data for the same orchestra and conductor in the subsequent recording session (see Figure 3 caption for detailed description).

What should be noted here is a progression toward more stereotypical patterns of behavior both for the conductor and for the concertmaster between initial rehearsal and final recording, and more interestingly, more stereotypical patterns of interaction in the coordination between these two agents. The conductor has all but dispensed with what one might suppose are distracting or inessential left hand movements, and the concertmaster’s bowing preparation takes on a more consistent duration. Both of these adaptations might be supposed to make the agents more mutually informative, and thus predictable, to one another. The result is the more consistent cause-effect temporal pattern in the timing between the conductor’s gestures and the concertmaster’s responses.

An Entropy Based Approach

Assume we have a set of N variables that describe the state of a dynamic system comprised of at least one agent and a physical environment. The values of some of these variables will be perceptually available to the agent(s), and thus these variables comprise the set of proximal cues or information sources in the Brunswikian sense. The values of other variables can only be inferred on the basis of these proximal cues, and thus these variables comprise the set of distal variables, also in Brunswik’s sense of the term (refer to the lens model). The perceptual status of every variable, with respect to a given agent, is of one of these two types, proximal or distal.

Additionally, the values of some of these variables will be directly manipulable or controllable by a given agent while others will not be. The former can be called proximal action variables while the latter are distal action variables. For the latter, any control the agent may have over their values must be accomplished via manipulating variables over which it does have direct control, causing the values of the distal action variables to be changed by causal links within the external environment. Thus, as was the case with the perceptual status of system variables, each variable, with respect to a given agent, also has an action status of one of two types: proximal or distal. Every variable in the system, then, with respect to a given agent, is a member of one of four possible classes, as shown in Figure 4 below.

Figure 4. Extending Brunswik’s Environmental Model to Interactive Situations

In the figure, [PP,PA] variables are proximal with respect to both perception and action, and as the diagram suggests, they can be thought of as occupying the surface of the environment, with respect to a given agent’s perceptual and action capabilities. [PP,DA] variables can be directly measured by the agent but cannot be directly manipulated. [DP,PA] variables, on the other hand, can be directly manipulated but cannot be directly perceived. Finally, [DP,DA] variables can be neither directly measured nor controlled. See Kirlik (1998) for a more complete presentation.

Each set of n such variables at a particular time t represents the state of the dynamic system, or a particular value of the state vector. Over time, which will be considered discrete for simplicity of the analysis, one can record a progression of the state variable values and treat this data as one would a contingency table, that is, as samples from a multivariate probability distribution. One can calculate the entropy of the distribution of any particular state variable a, as H(a), using the standard Shannon measure (e.g., Cover and Thomas, 1991):

Similarly, one can treat the entire state vector in a multivariate sense and calculate the entropy H(S) of the dynamic system as a whole, thus, H(S) = H(a, b, c, …n) (McGill, 1954, Conant, 1976). As Conant (1976) shows, we are now in a position to measure a wide variety of internal dependencies within the system, for example, conditional entropies and information transmission values.

One application of these ideas would be to measure the degree to which a particular subsystem of S corresponding to the (observable) proximal perceptual variables (cues) in S is informative about the complementary subsystem of S composed of the (unobservable) distal variables in S. This would allow us to determine to what degree the knowledge demands imposed by a coordination or control task can be met through real-time perceptual inference rather than the use of stored knowledge. Similarly, one could apply these ideas to measure the degree to which a particular subsystem of S corresponding to the proximal action variables for one agent in S is informative about a subsystem of S composed of another agent’s proximal perceptual variables, thus allowing one to identify to what degree the perceptual environment of one agent is influenced by another agent’s actions. To make calculations such as these, we use the notion of conditional entropy. The system S can be partitioned into arbitrary subsystems, S1, S2, . . . SN. In our example, we might partition S into two mutually exclusive and collectively exhaustive subsystems SP and SD, corresponding to the proximal perceptual and distal perceptual variables in S, respectively, for a given agent (or group of agents). The concept of Conditional Entropy allows us to measure the average amount of uncertainty about a particular subsystem of S which remains for one who knows the values of a complementary subsystem of S. In our case, we can write S = (SP ,SD) to indicate that the system S is made up of the subsystems SP and SD. We can write HSP(SD) as the conditional entropy of SD given SP: that is, the average uncertainty about the variables in SD given knowledge of the variables in SP.

Conveniently, it is known that for any partitioning of S into subsystems S1 and S2,

H_S1(S2) = H(S1,S2) - H(S1) = H(S) - H(S1).

Thus, H_S1(S2) = 0 if the variables in subsystem S2 are completely determined by the variables in subsystem S1. The above result can be generalized to any number of subsystems for an arbitrary partitioning of S using an entropy “chain rule.” For our purposes, we are seeking to measure HSP(SD) = H(S) - H(SP) which indicates, given a particular trajectory of the variables in the system S over time, the degree to which the surface variables are informative about the depth variables. Given actual streams of behavioral data corresponding to each of the system variables, and a particular partitioning of system variables into SP and SD, we can readily compute a scalar measure of the degree to which proximal variables specify distal variables.

In Kirlik (1998) we used this method to analyze expertise differences in short order cooking at a fast food diner. We showed that the most expert cook was able to offload many of the severe memory demands of the task associated with how well meats should be cooked by taking actions to structure the grill in a user-friendly manner. The expert organized the initial placements of meats and moved them in such a way as to interactively create external sources of information that could be used in lieu of having to use internal memory to perform the same task. It is natural to wonder to what degree group musical performance makes use of similarly self-generated perceptual information in addition to stored information contained in either human memory or the written score. Regardless of the answer to this question, an entropy based analysis would appear to provide the tools needed to quantitatively address a broad range of issues in distributed coordination and control, in a wide variety of domains.

Improvisation

I conclude with a brief discussion of improvisation because I believe that it is a topic where the engineers and psychologists may have much to learn from those in the music domain. A disturbing trend, to my mind at least, in the realm of human-machine system design and training is the increased proceduralization and automation of technological systems to the point where significant levels of improvisation by human “operators” are all but impossible. This trend has at its roots, among other things, a somewhat naïve application of statistical quality control principles and other Tayloristic ideas to the problem of supporting human performance. For purely technological applications, the quality control goal of eliminating unnecessary variance has proven useful and cost effective. As a technique for designing human work, however, it results in brittle human performance and stressful working conditions. And data showing that even the most intelligently designed procedures and instructions are insufficient for adequately supporting human performance are not hard to find (see Vicente, 1999). Still, there does not yet exist a good theory or model of human performance in engineering systems that gives due regard to all the knowledge an operator (pilot, physician, etc.) must have to know when to apply a procedure, when not to, or how to tweak or tailor an existing procedure to meet the demands of a novel situation. We know that this ability is a key contributor to the robust and adaptive operation of technological systems, though we haven’t yet been able to pin down exactly what this ability is, or how to explicitly train it, or how to design to support it. Until we have such a theory, it will continue to prove difficult to defend against the trend toward the ever-increasing use of proceduralization and automation as the dominant intervention for performance support.

While it may be a difficult task to defend the need to allow the human operator latitude for improvisation in many technological systems, improvisation has always played a central role in the theory of music performance. Indeed, most would find it a laughable notion to think that music performance could be enhanced across the board by eliminating opportunities for improvisation. Consider what Meyer has to say on this point:

The musical relationships embodied in a score or handed down in an oral tradition do not fix with rigid and inflexible precision what the performer’s actualization of the score or aural tradition is to be. They are indications, more or less specific, of what the composer intended and what tradition has established. The performer is not a musical automaton or a kind of -mechanical medium through which a score or tradition is realized in sound. The performer is a creator who brings to life, through his own sensitivity of feeling and imagination, the relationships presented in the musical score or handed down in the aural tradition which he has learned. (Meyer, 1956, p. 199)

Meyer goes on to devote two entire chapters of his book on the crucial role that “deviations” from the written score and oral tradition play in creating quality musical performances, noting that it hardly speaks well of a performer to say that he or she “merely played the notes” or played “mechanically.”

In human-machine systems, however, far too often design and training proceed as if “doing it by the book” or working “like a machine” were admirable qualities. Experienced operators know otherwise, and in their better moments, so do researchers and practitioners in human-machine analysis and design. A scientific investigation of the constrained liberation underlying musical performance may hold promise for the development of a theory of responsible improvisation that could have significant social value. The entropy based techniques described in this paper allow one to formulate the problem of measuring the individual contributions of the musical score and musical improvisation to resulting musical performance in terms of simple addition and subtraction, once the relevant variables are measured. Whether this approach will yield insights into this important problem, however, remains to be seen.

References

Brunswik, E. (1956). Perception and the Representative Design of Psychological Experiments. Berkeley, CA: University of California Press.

Conant, R. C. (1976). Laws of information which govern systems. IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-6, No. 4, pp. 240-255.

Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. New York: Wiley.

Hammond, K. R., and Stewart, T. (2001). The Essential Brunswik: Beginnings, Explications, and Applications. Oxford, UK: Oxford University Press.

Kirlik, A. (1998). The ecological expert: Acting to create information to guide action. Proceedings of the Symposium on Human Interaction with Complex Systems. Dayton, OH, USA. IEEE Computer Society Press.

Maruyama, S. (2001). What information is explored and shared in the orchestra performance?
Proceedings of the 11th International Conference on Perception and Action. Storrs, CT, USA.

McGill, W. J. (1954). Multivariate information transmission. Psychometrika, Vol. 9, No. 2, pp. 97-116.

Meyer, L. B. (1956). Emotion and Meaning in Music. Chicago: University of Chicago Press.

Vicente, K, J, (1999). Cognitive Work Analysis : Toward Safe, Productive, and Healthy Computer-Based Work. Mahwah, NJ: Lawrence Erlbaum Associates.

Wickens, C. D. and Hollands, J. G. (2000). Engineering Psychology and Human Performance, Third Edition. Upper Saddle River, NJ: Prentice Hall.