This API defines a worklet for handling the samples in an
audio stream using worklets.
Status of this document
This will be introduced.
API specification: WebRTC Audio Worklet
1. Introduction
This document constitutes an extension to “Media Capture and Streams”. It specifies an API that allows convenient access to raw audio data for processing purposes.
The aim of the specification, unlike [WebAudio], is to provide a special purpose API for the efficient processing of audio data, with minimal overhead imposed over what can be achieved by embedding the processing inside the browser.
The target for this API is functions that need to be implemented efficiently, with minimum additional overhead and minimal required conversions. Thus, the format for audio here is deliberately not constrained to a single format; the platform is free to choose over a wide range of capabilities, and the applications are expected to adapt to this.
2. Processing model
This API adopts the “worklet” model: The application loads a Javascript module, which is loaded into a context separate from the main Javascript application. In this context, a specific function is called for each buffer of audio data.
The buffer contains enough information to ascertain the format of the audio data, and the audio data itself. There exists an API for writing audio data (in the same format as the incoming data), but the processing model allows applications that do not use this API.
3. Interface definition
[Exposed=Window, SecureContext]
interfaceAudioMediaTrackWorklet: Worklet {
};
// This object is created by the application in order to instantiate
// a worklet containing the AudioMediaTrackProcessor.
[Exposed=Window, SecureContext,
Constructor(MediaStreamTrackinputTrack, MediaTrackNodeOptionsoptions)]
interfaceAudioMediaTrackNode {
readonlyattributeMediaStreamTrack? outputTrack;
};
// These parameters characterize a particular call to process().
interfaceParameters {
readonlyattributeunsignedlonglongcurrentSample;
readonlyattributedoublecurrentTime;
readonlyattributeunsignedlongsampleCount;
};
// Format of samples. TODO: Figure out if there’s a common practice
// that we should refer to rather than defining our own enum.
enumSampleFormat {
"float32",
"int32",
};
// these options are given by the platform and cannot be changed by the user
interfaceMediaTrackPlatformOptions {
readonlyattributeSampleFormatsampleFormat;
readonlyattributeunsignedlongchannelCount;
readonlyattributefloatsampleRate;
};
// These are specified by the instantiator at node creation time
interfaceMediaTrackNodeOptions {
attributeboolproducesOutput;
};
// The processor object is created by the platform when creating
// an AudioMediaTrackNode
[Exposed=AudioMediaTrackWorklet,
Constructor (MediaTrackPlatformOptionsplatformOptions,
optionalMediaTrackNodeOptionsuserOptions)]
interfaceAudioMediaTrackProcessor {
readonlyattributeMessagePortport;
readonlyattributeMediaTrackPlatformOptionsplatformOptions;
readonlyattributeMediaTrackNodeOptionsuserOptions;
booleanprocess(Bufferinput, Buffer? output, Parametersparameters);
};
Unlike the WebAudio API, there is no global clock; the currentFrame and currentTime are references to be interpreted in the context of this particular MediaStreamTrack.
The Buffer arguments are byte buffers, and must be interpreted by looking at channelCount and sampleFormat. They are allocated by the calling process, and are expected to be deallocated or reused after the process() function returns; the processing module cannot hold on to a reference to them.
If the MediaTrackNodeOptions includes a “true” for producesOutput, there will be an output buffer passed to process().
4. Design choices
Sometimes, processing will want to be done on multiple tracks. We could pass
multiple tracks into the AudioMediaTrackNode constructur, and have multiple
Buffers passed to process() - but this would require that the platform
synchronize the tracks, which works against the goal of minimizing platform
processing overhead.
An alternative design for these cases is to have two (or more) worklets
writing into a common SharedArrayBuffer, which the trailing processor would
then read from in order to produce output.
5. Emulating MediaTrackProcessor on top of WebAudio AudioWorkletNode
This should be possible, and would allow to experiment with the API in parallel with implementation.
Conformance
Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL”
in the normative parts of this document
are to be interpreted as described in RFC 2119.
However, for readability,
these words do not appear in all uppercase letters in this specification.
All of the text of this specification is normative
except sections explicitly marked as non-normative, examples, and notes. [RFC2119]
Examples in this specification are introduced with the words “for example”
or are set apart from the normative text with class="example", like this:
This is an example of an informative example.
Informative notes begin with the word “Note”
and are set apart from the normative text with class="note", like this: