Acoustical Society of America
133rd Meeting Lay Language Papers


[ Lay Language Paper Index ]
[ Press Room ]


Perceptual measures of visual and auditory cues in film music

Scott D. Lipscomb - lipscomb@lonestar.utsa.edu
Institute for Music Research
Division of Music
University of Texas at San Antonio
6900 N. Loop 1604 West
San Antonio, TX 78249
Scott's Home Page: http://music/utsa.edu/~lipscomb/

Popular version of paper 5aMU3
Presented Friday morning, June 20, 1997
133rd ASA Meeting, State College, PA
Embargoed until June 20, 1997

Musical sound has become an integral part of our daily existence, whether its presence is in the foreground or the background of a given environment. Music has been utilized successfully in many instances for the purpose of setting a mood and—some would say—to affect human behavior. Film music has proven extremely successful in altering human interpretation of visual images, providing researchers a means of investigating the relationship between auditory and visual cues within the context of an audio-visual (AV) experience. Consider the five AV combinations provided below:

WARNING: These are HUGE files (~6.2 MB each)

AV Composites

#1

#2

#3

#4

#5

MOV

MOV

MOV

MOV

MOV

Does your interpretation of the visual images change when accompanied by the various musical soundtracks? In an experiment, Lipscomb & Kendall (1994) set out to determine whether a group of subjects could reliably select the composer-intended AV combination from among a group of five possible combinations, as above. Which of the audio-visual composites above would you select as the intended combination? For the correct answer, click here.

Lipscomb & Kendall (1994) proposed that the appropriateness of an audio-visual combination is based on two implicit decisions made by the observer. First, an association judgment relies on past experience as a basis for determining whether or not the music is stylistically appropriate for a given context (e.g., legato string lines for "romantic" scenes, brass fanfares for a "majestic" quality, or low frequency synthesizer tones for a sense of "foreboding"). The second implicit decision matches the emphasized points in the visual scene with accented (emphasized) moments in the musical soundtrack. The investigators went on to propose that, if the associations identified with the musical style are judged appropriate and the relationship of the aural and visual accent structure are consonant, then attentional focus is unlikely to be drawn to either the musical sound or the visual image, remaining rather on the symbiotic composite. The resulting Model of Film Music Perception is provided in Figure 1. Two independent investigations were designed to test this proposed model.

Figure 1. Model of Film Music Perception (Lipscomb & Kendall, 1994).

 

Investigation One - Association Judgment

In a preliminary study, five scenes were selected from the movie Star Trek IV: The Voyage Home. The audio track was erased and was edited directly from a compact disc recording of the musical score, so that the resulting AV combinations consisted only of visual image and musical soundtrack. In other words, all ambient sound (dialogue, background noise, and sound effects) were removed, so that subjects made their decision based solely on the appropriateness of the pairing of music with the visual scene. The results are shown in Figure 2, in which the diagonal (boldface, italicized font) cells represent the composer-intended combinations.

VISUAL

   

1

2

3

4

5

A

1

12

1

0

1

0

U

2

1

11

0

3

0

D

3

0

0

8

0

0

I

4

3

4

4

8

3

O

5

0

0

4

4

13

Figure 2. Data matrix showing the number of subjects who selected each AV combination as "best fit."

As you can see, in every case, the majority of the sixteen subjects selected the composer-intended combination. In order to determine subjects’ listening strategies, a second experiment was carried out using the same AV combinations. An independent group of fifteen subjects were asked to watch each combination in a random order and rate them on the following ten scales, which were also randomly ordered:

good

_____________________________________________

bad

beautiful

_____________________________________________

ugly

interesting

_____________________________________________

boring

effective

_____________________________________________

ineffective

strong

_____________________________________________

weak

heavy

_____________________________________________

light

tense

_____________________________________________

relaxed

active

_____________________________________________

passive

fast

_____________________________________________

slow

agitated

_____________________________________________

calm

Table 1. Adjectives used in the second experiment

Past research (Osgood, Suci, & Tannenbaum, 1957) has shown that these adjective pairs separate into the following categories: Evaluative (good/bad, beautiful/ugly, interesting/boring, and effective/ineffective), Potency (strong/weak, heavy/light, and tense/relaxed), and Activity (active/passive, fast/slow, and agitated/calm). Mean ratings (average scores) for each AV combination are shown in the graphs below:

[insert graph thumbnails here]

Overall, the results show that music exercises a strong and consistent influence over subject responses to an AV combination, regardless of visual stimulus. In fact, within the context of the present experiment, music exerted a stronger influence over the subject ratings than did the visual image. A careful music-theoretical analysis of the musical soundtracks revealed that several musical parameters consistently influenced subject ratings. These specific musical parameters include:

clarity of tonal center

harmonic complexity

dynamic variation

tempo (speed and consistency)

phrase structure

amount of melodic activity

 

Investigation Two - Accent Structure Alignment

Past film music research (Bolivar, Cohen, & Fentress, 1994; Bullerjahn & Güldenring, 1994; Iwamiya, 1994; Lipscomb & Kendall, 1994; Marshall & Cohen, 1988; Sirius & Clarke, 1994; Tannenbaum, 1956; Thayer & Levenson, 1984; Thompson, Russo, & Sinclair, 1994) has focused almost exclusively on the associational (or referential) relationship between the visual images and musical sound. The second experiment reported herein (Lipscomb, 1995) breaks with this trend, focusing instead on the relationship between emphasized moments in the visual image and their alignment or misalignment with emphasized points in the musical sound. To illustrate this relationship, the following examples are provided:

[insert Quicktime movies]

consonant out-of-phase dissonant

Consonant AV combinations occur when the audio and visual accent structures are perfectly synchronized, i.e., every visual accent is accompanied by a synchronized musical accent. Accent structures that are out-of-phase share a common temporal interval between consecutive points of emphasis but are misaligned by a perceptually salient amount. A dissonant relationship exists when visual and auditory accents occur at different rates. These relationships can be illustrated visually, as in Figure 3.

 

Figure 3. Visual representations of relationships between sources of accent: a) consonant, b) out-of-phase, c) dissonant. In each pair, the upper pulse train represents the musical stratum and the lower pulse train represents the visual stratum.

Three independent groups of twenty subjects were asked to rate the synchronization and effectiveness of a collection of AV combinations. In each stimulus set, every combination was designed to be either consonant, out-of-phase, or dissonant. The primary distinction between the stimuli viewed by each group of subjects was the complexity of the audio-visual combinations. The first group rated combinations of simple, single-object animations (created by the author) and isochronous pitch sequences. The second group rated combinations excerpted from experimental animations created by Norman McLaren. The third group rated scenes from Obsession, a Brian DePalma film with a musical score composed by Bernard Herrmann. Once again, all ambient sound was removed, so that the AV composites consisted of visual images and musical sound only. An example of each is provided below:

[insert Quicktime movies]

Across all three stimulus sets, subjects reliably distinguished between the three alignment conditions, providing high synchronization ratings for consonant AV combinations, low ratings for dissonant combinations, and moderate ratings for out-of-phase combinations. Ratings for each of the alignment conditions followed this pattern for all excerpts. Therefore, the subject responses were collapsed across alignment condition resulting in the mean scores depicted in Figure 4. Notice, that as the stimuli became more complex (i.e., actual movie excerpts in Experiment 3 rather than single-object animations in Experiment 1) the mean ratings for consonant combinations dropped, while the ratings for dissonant and out-of-phase combinations reflect a higher degree of tolerance. Ratings of effectiveness for these same AV combinations reveal a slightly higher degree of tolerance for out-of-phase and dissonant combinations than the synchronization ratings (Figure 5a). In Experiment 3 (incorporating actual movie excerpts) subjects did not distinguish between the consonant and out-of-phase conditions, suggesting that accent structure alignment plays a less prominent roles in the determination of an effective AV combination when considering complex stimuli. The complex stimuli in Experiment 3 also revealed the only statistically significant difference between highly-trained musicians (those individuals with more than seven years of private study) and subjects with lower levels of musical training. As shown in Figure 5b, highly-trained musicians rated the dissonant combination significantly lower than did the untrained and moderately trained musicians.

"

Figure 4. Mean synchronization ratings for each alignment condition across all three experiments.

 

Figure 5a. Mean effectiveness ratings for each alignment condition across all three experiments.

Figure 5b. Mean effectiveness ratings for each alignment condition across all three experiments.
This figure is identical to Figure 5a, except for the separation of highly trained musicians from the other subjects on the dissonant rating for Experiment 3.

In conclusion, the investigator revised the Model of Film Music Perception presented by Lipscomb & Kendall (1994). Though both association judgment and accent structure alignment play an important role in the effective combination of visual images and musical sound, research has shown that the relationship is a dynamic one. With simple stimuli (Experiment 1), accent structure alignment plays a prominent role in the perceived efficacy of the combination. When viewing highly complex stimuli (Experiment 3), however, the significance of accent structure alignment seems to recede so that the association judgment becomes primary. Determination of specific weightings for this dynamic, interactive relationship require further research.

 

References

Bolivar, V.J., Cohen, A.J., & Fentress, J.C. (1994). Semantic and formal congruency in music and motion pictures: Effects on the interpretation of visual action. Psychomusicology, 13(1), 28-59.

Bullerjahn, C. & Güldenring, M. (1994). An empirical investigation of film music using qualitative content analysis. Psychomusicology, 13(1), 99-118.

Iwamiya, S. (1994). Interaction between auditory and visual processing when listening to music in an audio visual context: 1. Matching 2. Audio quality. Psychomusicology, 13(1), 133-153.

Lipscomb, S.D. (1995). Cognition of musical and visual accent structure alignment in film and animation. Dissertation

Lipscomb, S.D. & Kendall, R.A. (1994). Perceptual judgment of the relationship between musical and visual components in film. Psychomusicology, 13(1), 60-98.

Marshall, S.K. & Cohen, A.J. (1988). Effects of musical soundtracks on attitudes toward animated geometric figures. Music Perception, 6, 95-112

Osgood, C.E., Suci, G.J., & Tannenbaum, P.H. (1957). The measurement of meaning. Urbana: University of Illinois Press.

Sirius, G. & Clarke, E.F. (1994). The perception of audiovisual relationships: A preliminary study. Psychomusicology, 13(1), 119-132.

Tannenbaum, P.H. (1956). Music background in the judgment of stage and television drama. Audio-Visual Communications Review, 4, 92-101.

Thayer, J.F. & Levenson, R.W. (1983). Effects of music on psychophysiological responses to a stressful film. Psychomusicology, 3, 44-54.

Thompson, W.F., Russo, F.A., & Sinclair, D. (1994). Effects of underscoring on the perception of closure in filmed events. Psychomusicology, 13(1), 9-27.


[ Lay Language Paper Index ]
[ Press Room ]