From sinedesign
Revision as of 07:32, 8 March 2019 by Rgb (talk | contribs) (Added stereo section in localization)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Sound is a vibration that propagates as an audible wave of pressure, through a medium such as gas, liquid, or solid.

Humans can only hear sound waves between about 20 Hz and 20 kHz (20,000 Hz). Anything above that range is known as ultrasound, and anything below it is infrasound. This physiological reality means that musicians only need to focus on the audible range of sound. In fact, many audio encoding formats (such as mp3) are limited in the frequency range that they can store, as determined by the audio bitrate and sample rate.

The Nature of Sound

Sound Waves

A visual representation of sound waves in air (click for animation). [1]

Sound is transmitted through the air as a longitudinal (compression) wave. At the most fundamental level, sound is described by two simple elements: pressure and time. These fundamental elements form the basis of all sound waves and can be used to describe, in absolute terms, every sound we hear.

Another helpful way to visualize the physical properties of sound is to think about the shape of a subwoofer and imagine that it is turned on and vibrating back-and-forth at a certain speed, say 80 times per second (80 Hz). In such a case, the air is physically being pushed out of the area where cone shaped speaker is, creating waves of pressure in the air. The pressure pushes against the surrounding air and moves through space in this way.

Next time you're at the club (with hearing protection of course) try putting your hand over the subwoofer and see if you can actually feel the movement of air coming out of the speaker. Sound is a real, physical thing, which means that extra care needs to be taken when mixing for live venues, mimicking the sound of a room, or even setting up your bedroom studio.

Sound Wave Representation

A graphical representation of the pressure created by a sound wave over time.
A graphical representation of the pressure created by a sound wave over time.

Despite the physical reality that sound "pushes" outward from its source in all directions, producers will often find that it is much more convenient to represent a sound as a wave that moves up-and-down over time.

In such a representation, the horizontal axis represents time, and the vertical axis represents pressure. If the height of the wave on the chart is above 0, the speaker or instrument is "pushing" sound out. When it is below 0, the speaker or instrument has vibrated back in the other direction, creating an instance of low pressure. Amazingly, every other quality of sound derives from certain patterns in air pressure over time. No matter how complex or moving a piece of music may sound, know that without exception, they all have this fundamental structure in common.

Hearing Sound

If a tree falls in the forest and no one is around to hear it, does it make any sound?

At least in the context of music, there is a practical take on this famous thought experiment.

Musicians and music producers assume that humans will be listening to their music. After all, the enjoyment of a song comes from the subjective experience associated with listening to it, not from the sound waves themselves. A sensor could detect all of the pressure variations associated with music, but does that really mean that the sensor is experiencing it?

So, though a tree falling in the forest would most definitely create pressure waves in the air (as described above), a musician might respond that it doesn't matter unless there is someone around to hear it. If it can't be heard by a human, it isn't really music.

As such it's important to understand how humans in particular perceive sound.

In addition to the limited range of pitch that humans can hear, some pitches are perceived as being louder than others, even if the variations in air pressure are the same magnitude. Humans developed their ability to hear in order to detect danger, navigate the world, hunt for food, and communicate [citation]. As such, the frequencies that were more important for survival are the ones that are more pronounced in the human range of hearing.

Some producers use the Fletcher-Munson Curve to weight the relative loudness of the instruments in their songs during the mixing process. By adjusting the loudness of instruments based on the the human ear's sensitivity to certain pitches, the song will sound more balanced to the listener.

Another limitation of the human auditory system is the ability to localize pitches under 80Hz. Humans use small variations between the sound coming in their left and right ears to determine where it's coming from. In the case of low-frequency (high wavelength) sound, the ears are too close together to be able to register perceptible differences.[2]

Musicians are often told to "trust their ears", and for the most part this is good advice. However, it would be a mistake not to be aware of the limitations of this approach. Below are a few things to be aware of as a producer that relate to the human perception of sound:

Characteristics of Sound

Though physical sound can be fully described by pressure variations over time, the sound that humans hear can be described in many more useful ways. The following characteristics of sound describe the major ways in which humans actually perceive it.

These characteristics are the basic building blocks of sound. And sounds are the basic building blocks for music. As such, an understanding of these concepts is crucial for music producers.

Key Characteristics of Sound
Characteristic Perception Corresponding Unit(s)
Pitch How low or high a sound is Frequency (Hz), Pitch (e.g. C4), or Midi Note (e.g. 67)
Duration How long or short a sound it ASDR Time (ms)
Loudness How loud or soft a sound is Decibels (dB)
Timbre Quality of the sound (e.g. the sound of a snare drum vs. a piano) Usually only described verbally (e.g. 'dirty synth' or 'smooth keys')
Localization Where the sound is coming from No single measurement, but producers often use panning to create a stereo field


Comparison of a high and low frequency wave. [3]

Pitch is perceived as how "low" or "high" a sound is and represents the cyclic, repetitive nature of the vibrations that make up sound. For simple sounds, pitch relates to the frequency of the slowest vibration in the sound (called the fundamental harmonic). In the case of complex sounds, pitch perception can vary. [4]

In music production, and especially electronic music, pitches are described by the frequency (cycles per second, or hertz) of the corresponding sound wave. The faster the wave is vibrating, the higher the perceived pitch. In order for a sound to have any pitch at all, it must have a frequency that is clear and stable enough to distinguish it from noise. [5]

Pitch is closely related to frequency, but the two are not equivalent. Frequency is an objective, scientific attribute that can be measured. Pitch is each person's subjective perception of a sound wave, which cannot be directly measured. However, this does not necessarily mean that most people won't agree on which notes are higher and lower. [4]

In music theory, related pitches are often grouped into scales or chords in order to facilitate the creation of melodies and harmony, respectively. Scales and chord choices differ greatly between styles of music and cultures. The perception of which music sounds 'Eastern' or 'Western' - or 'Happy' vs. 'Sad' - is largely based on the collection of pitches used in the music's scales and chords.


Duration is perceived as how "long" or "short" a sound is. To a listener the duration of a sound lasts from the time the sound is first noticed until the sound is identified as having changed or ceased. [6] However, a producer is able to control exactly how long a sound lasts through the manipulation of its ASDR curve, or by adding fades or volume automation. Though a producer will have exact control over a sound's duration, the same sound may fall out the listener's perception before it has completely stopped in the song. Therefore, duration can be considered subjective.

Duration is also fundamentally related to important songwriting concepts such as rhythm, meter, and musical form.


Illustration of low vs. high amplitude waves. [7]

Loudness is the perception of how "loud" or "soft" a sound is. It is related to the pressure levels created by a sound wave (which are measured in decibels, or dB), but loudness is actually a subjective experience that depends on the ability of the human ear to detect sound waves and create nerve firings in the brain.

Producers have two key ways to measure loudness in their songs: using their own perception, and measuring the decibel levels of the sound waves. The former is a skill that can be cultivated with practice, but the latter is more straightforward.

Decibels correspond to the height, or amplitude, of a sound wave. The larger the amplitude, the higher the decibel level.

Producers may have a difficult time balancing the loudness of all of the elements in a song. Though adjusting decibel levels (dB) during the mixing process is a good start, the following physiological properties of loudness should also be taken into account:

  • A more complex sound creates more nerve firings and so sounds louder (for the same wave amplitude) than a simpler sound, such as a sine wave. [8]
  • Humans with normal hearing are most sensitive to sounds around 2–4 kHz, with sensitivity declining to either side of this region. [9]
  • When sensorineural hearing loss (damage to the cochlea or in the brain) is present, the perception of loudness is altered. Sounds at low levels (often perceived by those without hearing loss as relatively quiet) are no longer audible to the hearing impaired, but sounds at high levels often are perceived as having the same loudness as they would for an unimpaired listener. [8]


Timbre is perceived as the quality of different sounds (e.g. the thud of a fallen rock, the whir of a drill, the tone of a musical instrument or the quality of a voice). [8] In simple terms, timbre is what makes a particular musical sound have a different sound from another, even when they have the same pitch and loudness. [10]

It is the sign of an experienced producer that they are able to identify sounds based on their timbre. Especially when listening to other music, it is extremely helpful to be able to hear a sound and identify what gives it its unique sound. For instance, is the sound primarily comprised of sine waves, or square waves? Is there a filter envelop applied to the sound? What audio effects have been applied to it? By being able to analyze sounds in this way, producers can more easily gain inspiration from other tracks and incorporate a variety of musical ideas into their own work. Active listening is a key practice to develop these skills. [citation]

Timbre can be an illusive quality because it is difficult to describe. Some attempts have been made to break timbre down further into its component parts: [11][12]

  • Tonal Character (which pitches, overtones, and harmonics are included in the sound)
  • Noisiness (the presence of noise in the sound)
  • Coloration (the presence of subtle alteration and distortion)
  • Attack (how quickly the sound begins)
  • Release (how quickly the sound ends)
  • Glide (how the pitch or color changes from one note to the next)
  • Bend (small changes in frequency over the duration of a single note)


In the same way that depth perception is formed by processing slightly different images in the left and right eye, sound localization is a result of human's ability to process slightly different signals in the left and right ears, and create inferences about the sound based on these differences. Humans also use other cues to assess how "far" or "close" a sound is.

For this reason, music has commonly been recorded and produced in stereo since the 1970's. This means that music is actually a set of two audio channels: left and right. By playing two channels simultaneously through two different speaker sources (e.g. the left and right earbud), music can create the perception of depth and space for the listener. Sounds that are in mono, meaning that there is only one distinct channel, often sound less spacious than their stereo counterparts.

Key cues for distance

The following cues are associated with sounds being farther away: [13]

  • Loss of amplitude
  • Loss of high frequencies
  • Higher ratio of reverb to direct signal

Key cues for lateral direction

To determine the lateral input direction (left, front, right), the auditory system analyzes the following information:

  • Interaural Time Difference (ITD) (sound from the right side reaches the right ear earlier than the left ear)
  • Interaural Intensity Difference (IID) (sound from the right side has a higher level at the right ear than at the left ear)

Limitations of human localization

  • The human auditory system is unable to localize pitches under 80Hz. In the case of low-frequency (high wavelength) sound, the ears are too close together to be able to register perceptible differences.[2]
  • Humans can only discern interaural time differences of 10 microseconds (ms) or less. [14]

  2. 2.0 2.1
  4. 4.0 4.1
  5. Harold S. Powers, "Melody", The Harvard Dictionary of Music, fourth edition, edited by Don Michael Randel, 499–502 (Cambridge: Belknap Press for Harvard University Press, 2003) ISBN 978-0-674-01163-2. "Melody: In the most general case, a coherent succession of pitches. Here pitch means a stretch of sound whose frequency is clear and stable enough to be heard as not noise; succession means that several pitches occur; and coherent means that the succession of pitches is accepted as belonging together" (p. 499).
  6. Jones, S.; Longe, O.; Pato, M.V. (1998). "Auditory evoked potentials to abrupt pitch and timbre change of complex tones: electrophysiological evidence of streaming?". Electroencephalography and Clinical Neurophysiology. 108 (2): 131–142. doi:10.1016/s0168-5597(97)00077-4.
  8. 8.0 8.1 8.2
  9. Olson, Harry (1972). "The Measurement of Loudness". Audio Magazine.
  13. Roads, Curtis. The Computer Music Tutorial. Cambridge, MA: MIT, 2007. Print.
  14. Ian Pitt. "Auditory Perception". Archived from the original on 2010-04-10.