Learn Web Audio API

Learn Web Audio API

Web Audio API – of newborn things, which significantly expands the possibilities of web applications when working with sound. Is a powerful tool, without which you will be hard to do in the future in the development of modern games and interactive web applications. Enough high-level API, careful attention to detail, self-contained, easy to learn and particularly elegant integrated into applications using WebGl and WebRTC.

But, nevertheless, the article described in greater detail all;)

A little history

Long ago, at the dawn of the Web Internet Explorer has taken half-hearted attempt to break the silence that reigns in the browser, invented tag <bgsound>, which allows you to automatically play midi files when opening a website. In response, the Netscape developers have added a similar function using the tag <embed>. None of these solutions have not been standardized, such as, in principle, and was subsequently inherits other browsers.

Few years passed and browsers started to actively use third-party plug-ins. Play audio is made possible with the help of Flash, Silverlight, QuickTime, etc. All they are doing their part, but still has a lot of shortcomings plugin. Therefore, the possibility to have a tool to work with sound, supported by web standards, has long excited the minds of the developers. With the massive arrival of mobile browsers that do not support Flash, the problem became more acute.

A pioneer in the fight against with silence without plugins became an element <audio>, will appear in the first specification html5. It allows you to play audio files and stream, control the playback, buffering and sound level. Furthermore, it is easy to use and understand. Currently supported by all mobile and desktop browsers (including IE9), works quite well, but today we will talk about not <audio> element.
We talk about the Web Audio API, which is designed to perform a much more interesting, diverse and complex tasks.

Web audio API – this is NOT <audio> element and not his superstructure.

In the beginning it is important to understand that the <audio> element and web Audio API virtually no linked. These are two independent, self-contained API, designed to solve different problems. The only connection between the two is that the <audio> element may be one source of sound for web Audio API.

Tasks that are designed to solve the element <audio>:

  •      Simple audio player
  •      Single-threaded background audio.
  •      Audio prompts, captcha, etc.

Tasks that are designed to solve Web Audio API:

  •      Surround sound for games and interactive web applications
  •      Applications for audio processing
  •      audio synthesis
  •      Visualizing audio and much, much, much more …

Benefits of Web Audio API

Absolutely synchronous playback of audio (the ability to play hundreds of samples simultaneously with the difference in milliseconds, just plan to start and end the playback of each of them)
Ability sound processing using tens of built-in high-level units (filters, amplifiers, delay lines, modules convolution, etc.)
Rich possibilities for the synthesis of audio frequency vibrations with different shape of the envelope.</b> (You can write a simple synthesizer for 10 min)
Working with multi-channel audio (Based on specifications, API shall support up to 32 channels of audio !!! For information: stereo – is 2 channel, Dolby Digital – it’s 5 channels, most feature-rich Dolby TrueHD – 8 channels, to date, few home users have a sound card with more than 8-th channels 🙂
Direct access to the temporal and spectral characteristics of the signal (allows the visualization and analysis of the audio stream).
High-level 3D distribution of audio channels, depending on the position, direction and speed of the sound source and the listener (especially cool when designing bulk WebGL games and applications)
Tight integration with WebRTC (as the source of the sound, you can use the system microphone, connect a guitar or mixer. You can also get the audio from any external Stream, for that matter, and send it to the same)

We start dive. Audio context

One of the fundamental concepts when working with the Web Audio API is an audio context.

var context = new AudioContext();


While the specification is in draft, in webkit browsers need to use webkitAudioContext. Let something like:


var context;
window.addEventListener('load', function(){
  try {
    window.AudioContext = window.AudioContext||window.webkitAudioContext;
    context = new AudioContext();
  catch(e) {
    alert('Opps.. Your browser do not support audio API');
}, false);


A single document can only be one context. It is enough for the whole spectrum of the problem is solved Web Audio API. Having one audio context allows us to construct arbitrarily complicated audio graphs with an unlimited number of sources and recipients of the audio signal. Almost all methods and constructors for creating audio modules are methods audio context.

Possible sources of sound:

  •      AudioBufferSourceNode – audio buffer (discussed below)
  •      MediaElementAudioSourceNode – <audio> and <video> element
  •      MediaStreamAudioSourceNode – external audio stream (Stream) (a microphone or any other audio stream, including outdoor)

Possible recipients of sound:

  •      context.destination – System Default audio output (typically – column).
  •      MediaStreamAudioDestinationNode – audio stream (stream). This stream can be used in the same manner as stream obtained through getUserMedia(), and, for example, can be sent to a remote RTCPeerConnection using the method addStream().

Construct graphs (audio processing circuitry)

In any planned scheme you may be one or more sources and recipients of sound, as well as modules for working with sound (we’ll look at each of them in detail). The circuit can be direct and feedback, each module can have any number of inputs / outputs. All the care about the correct functioning assumes API. Your task is to connect all right. Let’s imagine an abstract scheme, just to understand how it is constructed by means of the code.

The creators of the Web Audio API did build any graphs (diagrams) elegant and easy to understand. Each module has a method .connect (…), which takes a single parameter, actually talking about what you want to connect. Here’s all you need to write to construct the above scheme:




Preloading and audio playback

Let’s consider a very simple, but pretty typical example of working with web Audio API, where the sound source is a buffer created from the audio file, Preloaded using XMLHttpRequest (AJAX), and the recipient is the system’s audio output.


// create AudioContext
var context = new window.AudioContext(); //
// variable for buffer, source and receiver
var buffer, source, destination;

// function for load file in buffer
var loadSoundFile = function(url) {
  // make XMLHttpRequest (AJAX) on server
  var xhr = new XMLHttpRequest();
  xhr.open('GET', url, true);
  xhr.responseType = 'arraybuffer'; // важно
  xhr.onload = function(e) {
    // decoded binary response
    function(decodedArrayBuffer) {
      // get decoded buffer
      buffer = decodedArrayBuffer;
    }, function(e) {
      console.log('Error decoding file', e);

// function start playing
var play = function(){
  // make source
  source = context.createBufferSource();
  // connect buffer to source
  source.buffer = buffer;
  // default receiver sound
  destination = context.destination;
  // connect source to receiver
  // play

// function stoping audio
var stop = function(){




Web Audio API has tens of of high-level, configurable and ready to use modules. These amplifiers, delay lines, filters, convolution modules, splitters and merzhery channels, 3D panner etc. You can create sophisticated graphs processing and synthesis of sound, simply connect the finished blocks, and configuring them. On ease of use is a bit like the children’s designer, but, unlike him, here you can create very cool stuff!

Let’s look at the core modules, starting with the simplest.

Gain (Amplifier)

The module allows you to change the audio level.

Any module Web Audio API can be created using the appropriate constructor context. To get a new object gain, you just need to call context.createGain (). Next, you can configure the resulting object both before and during playback. Configuration, a well as its possibilities and methods depend on the type of module, but in most cases it comes down to the simple installation of the values ​​for the corresponding fields of the object. Here’s an example of how to create a module gain and change the gain level.


var gainNode = context.createGain();
gainNode.gain.value = 0.4; // value 0..1 (can change dynamic)


Insert the amplifier circuit described above between the source and the receiver:





Delay (delay line)

This module allows you to delay the sound at a specific time.
Created and configured in the same way as described above gain.


var delayNode = context.createDelay();
delayNode.delayTime.value = 2; // 2 second



Let’s review the basic principles create a simple circuit with infinite loop signal using the gain for the signal attenuation and delay for delay. So we get a simple “echo” effect.

I must say that it is not the best example of how to do the “echo” effect, and it is suitable only as an example. The present realistic echo can be achieved by using the convolution module beep. Let’s look at it in more detail.


In simple terms, the contraction – a mathematical operation, such as addition, multiplication and integration. By adding the two original numbers receives the third, when the convolution – two original signals a third signal. In the theory of linear systems of convolution is used to describe the relationship between the three signals:

  •      input
  •      impulse response
  •      output

In other words, the output signal is the convolution of the input signal with the impulse response of the system.

What is the input and the output, like so clear. It remains only to deal with the “terrible” by the impulse response (impulse response) 🙂

Let’s look at an example of life and everything will become clear.
You’ve come to the forest. Shouted something to his friend. What did he hear? Right! Your voice, only slightly distorted and the effect of multiple echoes. The fact that the set of acoustic oscillations generated by your cords and larynx, before getting into the ear of your friend will be slightly changed under the influence of the surrounding space. Refraction and distortion arise for example, due to moisture in the wood. A certain portion of the energy of acoustic vibrations will be absorbed by soft covering of moss. Also the sound is reflected from hundreds of trees and objects around you that are at different distances. It can be a long time to enumerate all of these factors, but let’s understand that what all this has to do convolution 🙂

You probably already realized that the situation described in the input signal (signal source) is that you shout. Output the same signal will be that hears your friend. But the forest can be imagined as a linear system, the ability to change the characteristics of the signal on certain rules that depend on a huge set of factors. Without going into the theory, all this savokupnost rules can be represented in the form of so-called impulse response.

Echo Cave, specific noise when playing old records, voice distortion trolleybus driver grumbled the old microphone – all these sound effects can be uniquely represented their impulse responses.

Here is a small demo. Switching effects, you just change the very impulse response, which is the main parameter for the module convolution.

Convolution module is created, connected and configured the same way as all the other modules.


convolverNode = context.createConvolver();
convolverNode.buffer = buffer; // impulse response



In virtually all cases, we need you to model the impulse response – is an audio file (usually .wav). As an input signal, it should be preloaded, decoded and written to the buffer.

Where to find the impulse responses for different effects? Search in Google something like “download free impulse response” and find them in large numbers.


By filtering in digital signal processing often involve frequency filtering. If you know what the range of the signal, the Fourier transform and the frequency response of the filter, then just take a look an example. If you do not know what it is, and there is no time to understand, try to explain to the fingers.

All used equalizer in your favorite winamp, aimp, itunes, etc., for sure, tried different preset modes (bass, disco, vocals) and certainly pulled sliders at different frequencies, trying to achieve the desired sound. The equalizer is a device that can both strengthen and weaken certain frequency (bass, treble, etc.)

So, without going into details

Equalizer – this is the frequency filter
Curve formed by all the sliders – a response (frequency response) of the filter, and in English frequency response function.
In simple terms, using the Web Audio API, you can add a “Equalizer” (filter) in a graph of the signal processing in the form of a module.

Here is a list of filters available from the box:

  •      lowpass – low-pass filter (cut everything above the selected frequency)
  •      highpass – high-pass filter (cut everything below the selected frequency)
  •      bandpass – bandpass filter (passes only a specific frequency band)
  •      lowshelf – shelf at low frequencies (meaning that strengthened or weakened everything below the selected frequency)
  •      highshelf – Regiment at high frequencies (meaning that strengthened or weakened anything above the selected frequency)
  •      peaking – peak narrowband filter (amplifies a specific frequency, the popular name – “filter-bell”)
  •      notch – notch filter (attenuates a certain frequency, the popular name – “notch filter”)
  •      allpass – filter that passes all frequencies of the signal with equal gain, but changing the phase of the signal. This occurs when the delay on the transmission frequencies. Typically, such a filter is described by one parameter – the frequency at which the phase shift reaches 90 °.

If you are scared of the abundance of new words, don’t worry!
In fact, everything is much easier than in theory. Let’s try to understand a live example, changing parameters. I guarantee that everything will become much clearer.


var filterNode = context.createBiquadFilter();
filterNode.type = 1; // type filter: High-pass filter
filterNode.frequency.value = 1000; // Base frequency: Cutoff to 1kHZ
filterNode.frequency.Q = 1; // Quality factor
//filterNode.gain.value = 1000; // Gain (don't need this type of filter)



Generator allows to synthesize signals of different shape and frequency. Everything is controlled by 3 parameters:

  •      type – the waveform (1 – sine, 2 – rectangular, 3 – Saw 4 – triangular)
  •      frequency – the frequency of generation
  •      detune – mismatch (measured in cents). Each octave is 1200 cents, and each semitone is divided into 100 cents. Pointing detuning 1200, you can go to one octave up and said detuning -1200 down by one octave.

Now connect the generator to the above analyzer and see what happens.


oscillator = context.createOscillator();
analyser = context.createAnalyser();


3D sound

Well, we got to the coolest things in the Web Audio API – audio distribution channels in three-dimensional space. Let’s start with an example:

What do we see? This is a typical scene 3D shooter. It has a hero, whom we see from behind. He publishes several sounds (running and shooting), there is a lot of evil, which is torn and produces a variety of sounds at different distances from the hero, there is background music, there is a wind that rustles around etc.

In order to make the sound stage and realistic 3D-volume, must be very accurately apportion the sound channels, depending on the position coordinates and the speed of each of the characters. In addition to all of channels can be two (stereo) and 5 (Dolby Digital), 8 (Dolby TrueHD), in principle, any number, depending on the sound card and the system. Yes, even the sounds of moving objects should have a Doppler shift in frequency. Oh, and the saddest thing is that your position as a listener, is also changing, if you look at the side of the hero.

The question arises how all count? And here it is, the most important feature – Web Audio API will do everything for you, ie do not need to calculate anything. You just need a few lines of code to describe the coordinates, direction and speed of each sound source and the listener. That’s it! The rest of the dirty work takes the API, which will distribute sound channels, taking into account their number, add the Doppler where necessary and to create stunning 3D sound.

As I’ve said many times, all very well thought out. In the Web Audio API has a special module called panner (panner). It can visualize how to fly in the space of a column. And such columns can be arbitrarily many.

Each is described panner: coordinates the direction of sound speed.


// create, for example, the panner to represent the running and barking dogs
var panner = context.createPanner();
// connect the source to the barking panner
// connect to the output of the dog panner

panner.setPosition(q.x, q.y, q.z); // where is the dog
panner.setOrientation(vec.x, vec.y, vec.z); // which way she barks
panner.setVelocity(dx/dt, dy/dt, dz/dt); // what the speed at which it runs

In addition to that you, as a listener (context.listener), also describes the coordinates of the direction of the sound speed.


context.listener.setPosition(q.x, q.y, q.z);
context.listener.setOrientation(vec.x, vec.y, vec.z);
context.listener.setVelocity(dx/dt, dy/dt, dz/dt);

I think this is very cool!!

What else?

Here are a few interesting modules that can work something out:

  •      ChannelSplitterNode – channel separation
  •      ChannelMergerNode – Link Aggregation
  •      DynamicsCompressorNode – dynamic compressor
  •      WaveShaperNode – harmonic distortion
  •      ScriptProcessorNode – you can do whatever you want (there are numbers on the input buffer, processes it, and form a buffer numbers on the output module)