WebRTC Audio Worklet

Unofficial Proposal Draft,

This version:
https://alvestrand.github.io/audio-worklet/
Feedback:
public-webrtc@w3.org with subject line “[mediacapture-audio-worklet] … message topic …” (archives)
Issue Tracking:
GitHub
Editor:
(Google)

Abstract

This API defines a worklet for handling the samples in an audio stream using worklets.

Status of this document

This will be introduced.

API specification: WebRTC Audio Worklet

1. Introduction

This document constitutes an extension to “Media Capture and Streams”. It specifies an API that allows convenient access to raw audio data for processing purposes.

The aim of the specification, unlike [WebAudio], is to provide a special purpose API for the efficient processing of audio data, with minimal overhead imposed over what can be achieved by embedding the processing inside the browser.

The target for this API is functions that need to be implemented efficiently, with minimum additional overhead and minimal required conversions. Thus, the format for audio here is deliberately not constrained to a single format; the platform is free to choose over a wide range of capabilities, and the applications are expected to adapt to this.

2. Processing model

This API adopts the “worklet” model: The application loads a Javascript module, which is loaded into a context separate from the main Javascript application. In this context, a specific function is called for each buffer of audio data. The buffer contains enough information to ascertain the format of the audio data, and the audio data itself. There exists an API for writing audio data (in the same format as the incoming data), but the processing model allows applications that do not use this API.

3. Interface definition

[Exposed=Window, SecureContext]
interface AudioMediaTrackWorklet: Worklet {
};

// This object is created by the application in order to instantiate
// a worklet containing the AudioMediaTrackProcessor.
[Exposed=Window, SecureContext,
Constructor(MediaStreamTrack inputTrack, MediaTrackNodeOptions options)]
interface AudioMediaTrackNode {
  readonly attribute MediaStreamTrack? outputTrack;
};

// These parameters characterize a particular call to process().
interface Parameters {
  readonly attribute unsigned long long currentSample;
  readonly attribute double currentTime;
  readonly attribute unsigned long sampleCount;
};

// Format of samples. TODO: Figure out if there’s a common practice
// that we should refer to rather than defining our own enum.
enum SampleFormat {
  "float32",
  "int32",
};

// these options are given by the platform and cannot be changed by the user
interface MediaTrackPlatformOptions {
    readonly attribute SampleFormat sampleFormat;
    readonly attribute unsigned long channelCount;
    readonly attribute float sampleRate;
};

// These are specified by the instantiator at node creation time
interface MediaTrackNodeOptions {
    attribute bool producesOutput;
};


// The processor object is created by the platform when creating
// an AudioMediaTrackNode
[Exposed=AudioMediaTrackWorklet,
Constructor (MediaTrackPlatformOptions platformOptions,
             optional MediaTrackNodeOptions userOptions)]
interface AudioMediaTrackProcessor {
  readonly attribute MessagePort port;
  readonly attribute MediaTrackPlatformOptions platformOptions;
  readonly attribute MediaTrackNodeOptions userOptions;
  boolean process(Buffer input, Buffer? output, Parameters parameters);
};

Unlike the WebAudio API, there is no global clock; the currentFrame and currentTime are references to be interpreted in the context of this particular MediaStreamTrack.

The Buffer arguments are byte buffers, and must be interpreted by looking at channelCount and sampleFormat. They are allocated by the calling process, and are expected to be deallocated or reused after the process() function returns; the processing module cannot hold on to a reference to them.

If the MediaTrackNodeOptions includes a “true” for producesOutput, there will be an output buffer passed to process().

4. Design choices

Sometimes, processing will want to be done on multiple tracks. We could pass multiple tracks into the AudioMediaTrackNode constructur, and have multiple Buffers passed to process() - but this would require that the platform synchronize the tracks, which works against the goal of minimizing platform processing overhead.

An alternative design for these cases is to have two (or more) worklets writing into a common SharedArrayBuffer, which the trailing processor would then read from in order to produce output.

5. Emulating MediaTrackProcessor on top of WebAudio AudioWorkletNode

This should be possible, and would allow to experiment with the API in parallel with implementation.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[WebIDL]
Cameron McCormack; Boris Zbarsky; Tobie Langel. Web IDL. 15 December 2016. ED. URL: https://heycam.github.io/webidl/
[WORKLETS-1]
Ian Kilpatrick. Worklets Level 1. 7 June 2016. WD. URL: https://www.w3.org/TR/worklets-1/

Informative References

[WebAudio]
Paul Adenot; Raymond Toy. Web Audio API. 18 September 2018. CR. URL: https://www.w3.org/TR/webaudio/

IDL Index

[Exposed=Window, SecureContext]
interface AudioMediaTrackWorklet: Worklet {
};

// This object is created by the application in order to instantiate
// a worklet containing the AudioMediaTrackProcessor.
[Exposed=Window, SecureContext,
Constructor(MediaStreamTrack inputTrack, MediaTrackNodeOptions options)]
interface AudioMediaTrackNode {
  readonly attribute MediaStreamTrack? outputTrack;
};

// These parameters characterize a particular call to process().
interface Parameters {
  readonly attribute unsigned long long currentSample;
  readonly attribute double currentTime;
  readonly attribute unsigned long sampleCount;
};

// Format of samples. TODO: Figure out if there’s a common practice
// that we should refer to rather than defining our own enum.
enum SampleFormat {
  "float32",
  "int32",
};

// these options are given by the platform and cannot be changed by the user
interface MediaTrackPlatformOptions {
    readonly attribute SampleFormat sampleFormat;
    readonly attribute unsigned long channelCount;
    readonly attribute float sampleRate;
};

// These are specified by the instantiator at node creation time
interface MediaTrackNodeOptions {
    attribute bool producesOutput;
};


// The processor object is created by the platform when creating
// an AudioMediaTrackNode
[Exposed=AudioMediaTrackWorklet,
Constructor (MediaTrackPlatformOptions platformOptions,
             optional MediaTrackNodeOptions userOptions)]
interface AudioMediaTrackProcessor {
  readonly attribute MessagePort port;
  readonly attribute MediaTrackPlatformOptions platformOptions;
  readonly attribute MediaTrackNodeOptions userOptions;
  boolean process(Buffer input, Buffer? output, Parameters parameters);
};