This will be introduced.
API specification: WebRTC Audio Worklet
1. Introduction
This document constitutes an extension to “Media Capture and Streams”. It specifies an API that allows convenient access to raw audio data for processing purposes.
The aim of the specification, unlike [WebAudio], is to provide a special purpose API for the efficient processing of audio data, with minimal overhead imposed over what can be achieved by embedding the processing inside the browser.
The target for this API is functions that need to be implemented efficiently, with minimum additional overhead and minimal required conversions. Thus, the format for audio here is deliberately not constrained to a single format; the platform is free to choose over a wide range of capabilities, and the applications are expected to adapt to this.
2. Processing model
This API adopts the “worklet” model: The application loads a Javascript module, which is loaded into a context separate from the main Javascript application. In this context, a specific function is called for each buffer of audio data. The buffer contains enough information to ascertain the format of the audio data, and the audio data itself. There exists an API for writing audio data (in the same format as the incoming data), but the processing model allows applications that do not use this API.
3. Interface definition
[Exposed =Window ,SecureContext ]interface :
AudioMediaTrackWorklet Worklet { }; // This object is created by the application in order to instantiate // a worklet containing the AudioMediaTrackProcessor. [Exposed =Window ,SecureContext ,(
Constructor MediaStreamTrack ,
inputTrack MediaTrackNodeOptions )]
options interface {
AudioMediaTrackNode readonly attribute MediaStreamTrack ?; }; // These parameters characterize a particular call to process().
outputTrack interface {
Parameters readonly attribute unsigned long long ;
currentSample readonly attribute double ;
currentTime readonly attribute unsigned long ; }; // Format of samples. TODO: Figure out if there’s a common practice // that we should refer to rather than defining our own enum.
sampleCount enum {
SampleFormat ,
"float32" , }; // these options are given by the platform and cannot be changed by the user
"int32" interface {
MediaTrackPlatformOptions readonly attribute SampleFormat ;
sampleFormat readonly attribute unsigned long ;
channelCount readonly attribute float ; }; // These are specified by the instantiator at node creation time
sampleRate interface {
MediaTrackNodeOptions attribute bool ; }; // The processor object is created by the platform when creating // an AudioMediaTrackNode [
producesOutput Exposed =AudioMediaTrackWorklet ,(
Constructor MediaTrackPlatformOptions ,
platformOptions optional MediaTrackNodeOptions )]
userOptions interface {
AudioMediaTrackProcessor readonly attribute MessagePort ;
port readonly attribute MediaTrackPlatformOptions ;
platformOptions readonly attribute MediaTrackNodeOptions ;
userOptions boolean (
process Buffer ,
input Buffer ?,
output Parameters ); };
parameters
Unlike the WebAudio API, there is no global clock; the currentFrame and currentTime are references to be interpreted in the context of this particular MediaStreamTrack.
The Buffer arguments are byte buffers, and must be interpreted by looking at channelCount and sampleFormat. They are allocated by the calling process, and are expected to be deallocated or reused after the process() function returns; the processing module cannot hold on to a reference to them.
If the MediaTrackNodeOptions includes a “true” for producesOutput, there will be an output buffer passed to process().
4. Design choices
Sometimes, processing will want to be done on multiple tracks. We could pass multiple tracks into the AudioMediaTrackNode constructur, and have multiple Buffers passed to process() - but this would require that the platform synchronize the tracks, which works against the goal of minimizing platform processing overhead.An alternative design for these cases is to have two (or more) worklets writing into a common SharedArrayBuffer, which the trailing processor would then read from in order to produce output.