In this blog post, I will summarize my notes taken while learning the WebRTC technology and its implementations. Because I'm a newbie to the concept, the article can be read as a WebRTC tutorial.
Introduction to WebRTC API
What is WebRTC?
WebRTC (Real-Time Communication) is a technology that allows your application to send video, voice, and generic data between peers.
In other words, we can build powerful voice and video-communication applications with this open standard.
WebRTC is an open-source project and is also supported by the game-setters of the internet: Apple, Google, Microsoft, and Mozilla.
What are the benefits of using WebRTC?
- It allows audio and video communication to work inside web pages.
- It will enable direct peer-to-peer communication.
- It eliminates the need to install plugins and native apps.
What platforms can support WebRTC?
The WebRTC implementation is available as a regular JavaScript API in all modern browsers and native clients for Android and iOS.
The maturity level of WebRTC
Its specifications have been published by World Wide Web Consortium (W3C). W3C has four maturity levels. The maturity levels of W3C in ascending order are:
- Working draft (WD): Relatively early stage. ****the standard document may have significant differences from its final form. Thus, a considerable number of changes can be made in the future.
- Candidate recommendation (CR): At this stage, the significant features of the specifications are mostly decided.
- Proposed recommendation (PR): At this stage, the document is submitted to the W3C Advisory Council for final approval. It rarely causes any significant changes to a standard as it passes to the next phase.
- W3C recommendation (REC): This is the most mature stage of development. At this point, the standard has undergone extensive review and testing under both theoretical and practical conditions.
Nearly a decade was past when the first draft of WebRTC was published. Some of the milestones for WebRTC is as follows:
- October 2011: the W3C published its first draft for the spec.
- February 2013: first cross-browser video call.
- February 2014: first cross-browser data transfers.
- November 2017: Working Draft to Candidate Recommendation.
- January 2021: Candidate Recommendation to Recommendation.
WebRTC API
WebRTC consists of several interrelated APIs and protocols which work together to achieve this.
It mainly covers two different technologies:
- Media capturing
- Peer-to-peer connection.
Also, it has three JavaScript APIs for significant components of WebRTC.
- MediaStream: Access the input devices (microphone, webcam) and get a stream of media.
- RTCPeerConnection: Connecting another WebRTC endpoint across the internet and sending audio and video in realtime
- RTCDataChannel: Ability to do those not just for audio and video but also arbitrary application data
Those API's are: existed for those three main tasks, respectively:
- Acquiring audio and video
- Communicating audio and video
- Communicating arbitrary data

Simple media stream example
Before digging deeper into the technical details, I want to show you a simple example.
I'm going to make some minor changes and put the JavaScript code in HTML. Then your browser will render a small rectangle with a button.
If you click the button, you'll see a permission request to open your camera. If you give the permission, then your camera is going to open for you. You'll also find the modified code below.
Now, we can proceed the next phase and make our hands dirty.
Media Devices
The Media Stream API is the top-level API, and it provides interfaces and methods for working with streams. I'm not going to explain every technique and property but focus on the MediaDevices API.
You'll remember the function that requested your permission to open your camera. Let's look at its components and their brief definitions.
According to MDN:
- Navigator Object: The Navigator interface represents the state and the identity of the user agent. It allows scripts to query it and to register themselves to carry on some activities. You can retrieve the read-only
navigator
object bywindow.navigtor
. - MediaDevices Objects: The mediaDevices read-only property returns a MediaDevices object, which provides access to connected media input devices like cameras and microphones, as well as screen sharing. You can access it by
navigator.mediaDevices
. - getUserMedia Method: The MediaDevices.getUserMedia() method prompts the user for permission to use a media input which produces a MediaStream with tracks containing the requested types of media.
![[static/img/media-stream-getusermedia2.webp]]
The MediaDevices interface
We are going to use this object to obtain access to any hardware source of media data. The object has 5 methods and also one event listener property:
- ondevicechange (event)
- enumerateDevices()
- getMediaDisplay()
- getUserMedia()
- getSupportedConstraints()
- selectAudioOutput()
Since those are out of scope, I'm not going to mention selectAudioOutput method.
1) Listen device changes (devicechange)
We can also add an event listener to the object to listen to device changes.
2) Querying media devices (enumerateDevices)
In many cases, we need to check all the connected cameras and microphones to provide useful feedback to users. To do that, we use navigator.mediaDevices.enumerateDevices()
method. This will return a promise that resolves to an array of MediaDevicesInfo
.
Each MediaDevicesInfo
contains a property named kind with the value audioinput
, audiooutput
, or videoinput
, indicating what type of media device it is.
You can also inspect your device by executing the below code in your browser console.
3) Sharing/recording display (getDisplayMedia)
The method is for sharing your display or some part of it.
The method prompts the user to select a display or portion of a collection (such as a window) to capture the contents. In this way, it can produce MediaStream. The resulting stream can then be recorded using the MediaStream Recording API.
It is clear that the getDisplayMedia method can be used for nefarious activities against the user. For this reason, some precautions are taken the browsers as follows:
- The permissions acquired from the user is not persisted like how it is in the
getUserMedia()
method. Therefore, the user must be prompted every time. - The call to
getDisplayMedia()
must be made from code running in response to a user action, such as in an event handler. - The specified constraints can't be used to limit the options available to the user. Instead, they must be applied after the user chooses a source, to generate output that matches the constraints.
- Browsers are encouraged to warn users about sharing displays or windows that contain browsers and keep a close eye on what other content might be getting captured and shown to other users.
The function below optionally takes a constraints parameter and returns a promise that resolves to a MediaStream.
The constraints for getDisplayMedia()
differ from the constraints that are used for regular media input.
4) Acquiring audio and video (getUserMedia)
The getUserMedia()
function is responsible for acquiring audio and video. The function does two things:
- Prompts the user for permission to use media input.
- Produces a MediaStream.
The other important things to know are:
- It takes only one argument: a
MediaStreamConstraints
object. (we will cover it soon) - When the function runs, it returns a Promise that resolves to a MediaStream with tracks containing the requested media types.
- Because users are not required to choose, the returned promise may be neither resolved nor rejected.
- It is important to note that the
getUserMedia()
function is available only in secure contexts (HTTPS
,file:///
,localhost
). Otherwise, thenavigator.mediaDevices
returnsundefined
.
I said that the method only takes one parameter. Let's look at this constraints parameter closer.
Constraints
It is a MediaTrackConstraints object and allows us to specify and constrain the requested media. In other words, this object allows us to specify the media we requested.
For instance, minimum video resolution or the camera type we need must be defined in this object. Therefore, there are plenty of constraint properties we need to understand.
Before starting with a simple example, there are essential things to know about constraints object:
- The constraints object can have audio and video properties. Either or both must be specified.
- If the browser cannot find all media tracks with the specified types that meet the constraints given, then the returned promise is rejected with
NotFoundError DOMException
. - Giving
true
to any of those properties means that it is necessarily required. If one cannot be included, then the method results in an error. - Aside from giving
true
, an object can give additional constraints to specify requirements:min
,max
,exact
,ideal
.
An Example constraints object
To give an idea, an example for constraints object can be:
Constraint types
The following constraint types are used to specify a constraint for a property.
- ConstrainBoolean: Its value may either be set to a Boolean (
true
orfalse
) or an object containing (exact
orideal
). - ConstrainDouble: Its value may either be set to a a
double-precision floating-point number
or an object containing (max
,min
,exact
,ideal
) - ConstrainDOMString: Its value may either be set to a
string
, anarray of strings
, or an object containing (exact
orideal
- ConstrainULong: Its value may either be set to an
integer number
or an object containing (max
,min
,exact
,ideal
)
Some examples of constraints
Exceptions
Exception (DOMException) | Details |
---|---|
NotFoundError | Thrown if no media tracks of the type specified were found that satisfy the given constraints. |
NotReadableError | Although the user granted permission to use the matching devices, a hardware error occurred at the operating system, browser, or Web page level which prevented access to the device. |
OverconstrainedError | Resulted in no candidate devices. |
SecurityError | Thrown if user media support is disabled on the Document |
TypeError | Thrown if the list of constraints specified is empty, or has all constraints set to false . Als thrown in non-secure context |
5) Constrainable properties of user agent (getSupportedConstraints)
Because only constraints supported by the user agent are included in the list, each of these Boolean properties has the value true
.
The next step will be reviewing peer connections.