WebRTC Frame Event Logging API

Draft Community Group Report,

This version:
http://example.com/url-this-spec-will-live-at
Issue Tracking:
GitHub
Editor:
(Google www.google.com)

Abstract

This spec presents an API for getting frame-level information on the time it takes to process frames (video or audio frames) through a WebRTC pipeline.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction {#intro}

This document describes an event logging API that can serve as an extension to RTCPeerConnection’s “getStats” API. The chief goals of this extension are:

Get information about series of events Allow the user to get information about all events at reasonable overhead Build an extensible framework for carrying event information

The initial version of this API will be an object that can attach to a MediaStreamTrack (which defines its source) and its destination (since it can have mulitple destinations).

The initial object we record information about is a video frame. Later extensions can encompass audio frames too, with not much change in design.

1.1. Use cases

Imagine a remote control application: There is a control that allows the user to click on the screen locally, which causes changes to happen in the video generated remotely. The app wishes to collect latency information on the time between the click and the user seeing the result.

The click can be timed using existing mechanisms. The click event will then be sent to the remote app, and the remote app will identify the first frame generated after the click, use the recording API to figure out when it was generated (in the local clock), when it was sent out over the network, and what its RTP sequence number was. It returns this information to the local app.

The local app will use the event recording API to record when the frame with the given RTP sequence number arrived, and when it was displayed (in the local clock). It can then measure the click-to-display lag.

It can use the data recorded remotely to figure out if generation and sending took a long time; it can’t absolutely record the network delay between the parties (since the clocks are unsynchronized), but can get some boundaries on what the lag cause could be. This aids very much in locating problem spots.

2. API

[Constructor(MediaStreamTrack source, any destination, optional EventCollectionParameters? parameters)]
interface RTCEventCollection : EventTarget {
    Promise<void> setEventCollection(optional EventCollectionParameters parameters);
    Promise<sequence<EventCollectionResult>> collectEvents();
};

dictionary EventCollectionParameters {
     [EnforceRange] long eventBufferSize = 50;
};

// This is the max number of events that will be stored. If more time passes between calls to collectEvents, the oldest events are discarded.

dictionary EventCollectionResult {
    DOMHiResTimeStamp initialTime;  // Time when frame was recorded or received
    DOMHiResTimeStamp finalTime;   // Time when frame was displayed, stored, 
                                                            // sent or discarded
};

enum EventCollectionDisposition {
    "displayed",  // Displayed to the user
    "discarded",  // Normal operation, such as a paused , caused discard
    "failed",   // Something bad happened to this frame - corruption, congestion….
    "transmitted", // Gone out over the wire
    "recorded"   // such as by MediaRecorder
};

dictionary EventCollectionVideoFrameResult : EventCollectionResult {
    long FrameIdentifier;   // RTP timestamp value of frame
    Int payloadType; 
    Int qpValue;
    EventCollectionDisposition disposition;
    // Intermediate events in a frame’s lifetime.
    // OPEN ISSUE: We might want to define a dict for “eventtype, start, end” end use
    // a sequence of those instead. If so, it can move to the “generic” framework.
    DOMHiResTimeStamp encodeStart;
    DOMHIResTimeStamp encodeEnd;
    DOMHiResTimeStamp sendStart;
    DOMHiResTimeStamp sendEnd;
     // On the receive side
    DOMHiResTimeStamp receiveStart;
    DOMHiResTimeStamp receiveEnd;
    DOMHiResTimeStamp decodeStart;
    DOMHiResTimeStamp decodeEnd;
    DOMHiResTimeStamp renderStart;
};

These dictionaries and enums can be extended at need. This provides an interface that is:

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[WebIDL-1]
Cameron McCormack; Boris Zbarsky. WebIDL Level 1. 15 September 2016. PR. URL: https://heycam.github.io/webidl/
[WHATWG-DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/

IDL Index

[Constructor(MediaStreamTrack source, any destination, optional EventCollectionParameters? parameters)]
interface RTCEventCollection : EventTarget {
    Promise<void> setEventCollection(optional EventCollectionParameters parameters);
    Promise<sequence<EventCollectionResult>> collectEvents();
};

dictionary EventCollectionParameters {
     [EnforceRange] long eventBufferSize = 50;
};

// This is the max number of events that will be stored. If more time passes between calls to collectEvents, the oldest events are discarded.

dictionary EventCollectionResult {
    DOMHiResTimeStamp initialTime;  // Time when frame was recorded or received
    DOMHiResTimeStamp finalTime;   // Time when frame was displayed, stored, 
                                                            // sent or discarded
};

enum EventCollectionDisposition {
    "displayed",  // Displayed to the user
    "discarded",  // Normal operation, such as a paused , caused discard
    "failed",   // Something bad happened to this frame - corruption, congestion….
    "transmitted", // Gone out over the wire
    "recorded"   // such as by MediaRecorder
};

dictionary EventCollectionVideoFrameResult : EventCollectionResult {
    long FrameIdentifier;   // RTP timestamp value of frame
    Int payloadType; 
    Int qpValue;
    EventCollectionDisposition disposition;
    // Intermediate events in a frame’s lifetime.
    // OPEN ISSUE: We might want to define a dict for “eventtype, start, end” end use
    // a sequence of those instead. If so, it can move to the “generic” framework.
    DOMHiResTimeStamp encodeStart;
    DOMHIResTimeStamp encodeEnd;
    DOMHiResTimeStamp sendStart;
    DOMHiResTimeStamp sendEnd;
     // On the receive side
    DOMHiResTimeStamp receiveStart;
    DOMHiResTimeStamp receiveEnd;
    DOMHiResTimeStamp decodeStart;
    DOMHiResTimeStamp decodeEnd;
    DOMHiResTimeStamp renderStart;
};