In this blog post, I will summarize my notes taken while learning the WebRTC technology and its implementations. Because I'm a newbie to the concept, the article can be read as a WebRTC tutorial.
Introduction to WebRTC API
What is WebRTC?
WebRTC (Real-Time Communication) is a technology that allows your application to send video, voice, and generic data between peers.
In other words, we can build powerful voice and video-communication applications with this open standard.
WebRTC is an open-source project and is also supported by the game-setters of the internet: Apple, Google, Microsoft, and Mozilla.
What are the benefits of using WebRTC?
- It allows audio and video communication to work inside web pages.
- It will enable direct peer-to-peer communication.
- It eliminates the need to install plugins and native apps.
What platforms can support WebRTC?
The maturity level of WebRTC
- Working draft (WD): Relatively early stage. ****the standard document may have significant differences from its final form. Thus, a considerable number of changes can be made in the future.
- Candidate recommendation (CR): At this stage, the significant features of the specifications are mostly decided.
- Proposed recommendation (PR): At this stage, the document is submitted to the W3C Advisory Council for final approval. It rarely causes any significant changes to a standard as it passes to the next phase.
- W3C recommendation (REC): This is the most mature stage of development. At this point, the standard has undergone extensive review and testing under both theoretical and practical conditions.
Nearly a decade was past when the first draft of WebRTC was published. Some of the milestones for WebRTC is as follows:
- October 2011: the W3C published its first draft for the spec.
- February 2013: first cross-browser video call.
- February 2014: first cross-browser data transfers.
- November 2017: Working Draft to Candidate Recommendation.
- January 2021: Candidate Recommendation to Recommendation.
WebRTC consists of several interrelated APIs and protocols which work together to achieve this.
It mainly covers two different technologies:
- Media capturing
- Peer-to-peer connection.
- MediaStream: Access the input devices (microphone, webcam) and get a stream of media.
- RTCPeerConnection: Connecting another WebRTC endpoint across the internet and sending audio and video in realtime
- RTCDataChannel: Ability to do those not just for audio and video but also arbitrary application data
Those API's are: existed for those three main tasks, respectively:
- Acquiring audio and video
- Communicating audio and video
- Communicating arbitrary data
Simple media stream example
Before digging deeper into the technical details, I want to show you a simple example.
If you click the button, you'll see a permission request to open your camera. If you give the permission, then your camera is going to open for you. You'll also find the modified code below.
Now, we can proceed the next phase and make our hands dirty.
The Media Stream API is the top-level API, and it provides interfaces and methods for working with streams. I'm not going to explain every technique and property but focus on the MediaDevices API.
You'll remember the function that requested your permission to open your camera. Let's look at its components and their brief definitions.
According to MDN:
- Navigator Object: The Navigator interface represents the state and the identity of the user agent. It allows scripts to query it and to register themselves to carry on some activities. You can retrieve the read-only
- MediaDevices Objects: The mediaDevices read-only property returns a MediaDevices object, which provides access to connected media input devices like cameras and microphones, as well as screen sharing. You can access it by
- getUserMedia Method: The MediaDevices.getUserMedia() method prompts the user for permission to use a media input which produces a MediaStream with tracks containing the requested types of media.
The MediaDevices interface
We are going to use this object to obtain access to any hardware source of media data. The object has 5 methods and also one event listener property:
- ondevicechange (event)
Since those are out of scope, I'm not going to mention selectAudioOutput method.
1) Listen device changes (devicechange)
We can also add an event listener to the object to listen to device changes.
2) Querying media devices (enumerateDevices)
In many cases, we need to check all the connected cameras and microphones to provide useful feedback to users. To do that, we use
navigator.mediaDevices.enumerateDevices() method. This will return a promise that resolves to an array of
MediaDevicesInfo contains a property named kind with the value
videoinput, indicating what type of media device it is.
You can also inspect your device by executing the below code in your browser console.
3) Sharing/recording display (getDisplayMedia)
The method is for sharing your display or some part of it.
The method prompts the user to select a display or portion of a collection (such as a window) to capture the contents. In this way, it can produce MediaStream. The resulting stream can then be recorded using the MediaStream Recording API.
It is clear that the getDisplayMedia method can be used for nefarious activities against the user. For this reason, some precautions are taken the browsers as follows:
- The permissions acquired from the user is not persisted like how it is in the
getUserMedia()method. Therefore, the user must be prompted every time.
- The call to
getDisplayMedia()must be made from code running in response to a user action, such as in an event handler.
- The specified constraints can't be used to limit the options available to the user. Instead, they must be applied after the user chooses a source, to generate output that matches the constraints.
- Browsers are encouraged to warn users about sharing displays or windows that contain browsers and keep a close eye on what other content might be getting captured and shown to other users.
The function below optionally takes a constraints parameter and returns a promise that resolves to a MediaStream.
The constraints for
getDisplayMedia() differ from the constraints that are used for regular media input.
4) Acquiring audio and video (getUserMedia)
getUserMedia() function is responsible for acquiring audio and video. The function does two things:
- Prompts the user for permission to use media input.
- Produces a MediaStream.
The other important things to know are:
- It takes only one argument: a
MediaStreamConstraintsobject. (we will cover it soon)
- When the function runs, it returns a Promise that resolves to a MediaStream with tracks containing the requested media types.
- Because users are not required to choose, the returned promise may be neither resolved nor rejected.
- It is important to note that the
getUserMedia()function is available only in secure contexts (
localhost). Otherwise, the
I said that the method only takes one parameter. Let's look at this constraints parameter closer.
It is a MediaTrackConstraints object and allows us to specify and constrain the requested media. In other words, this object allows us to specify the media we requested.
For instance, minimum video resolution or the camera type we need must be defined in this object. Therefore, there are plenty of constraint properties we need to understand.
Before starting with a simple example, there are essential things to know about constraints object:
- The constraints object can have audio and video properties. Either or both must be specified.
- If the browser cannot find all media tracks with the specified types that meet the constraints given, then the returned promise is rejected with
trueto any of those properties means that it is necessarily required. If one cannot be included, then the method results in an error.
- Aside from giving
true, an object can give additional constraints to specify requirements:
An Example constraints object
To give an idea, an example for constraints object can be:
The following constraint types are used to specify a constraint for a property.
- ConstrainBoolean: Its value may either be set to a Boolean (
false) or an object containing (
- ConstrainDouble: Its value may either be set to a a
double-precision floating-point numberor an object containing (
- ConstrainDOMString: Its value may either be set to a
array of strings, or an object containing (
- ConstrainULong: Its value may either be set to an
integer numberor an object containing (
Some examples of constraints
|NotFoundError||Thrown if no media tracks of the type specified were found that satisfy the given constraints.|
|NotReadableError||Although the user granted permission to use the matching devices, a hardware error occurred at the operating system, browser, or Web page level which prevented access to the device.|
|OverconstrainedError||Resulted in no candidate devices.|
|SecurityError||Thrown if user media support is disabled on the Document|
|TypeError||Thrown if the list of constraints specified is empty, or has all constraints set to |
5) Constrainable properties of user agent (getSupportedConstraints)
Because only constraints supported by the user agent are included in the list, each of these Boolean properties has the value
The next step will be reviewing peer connections.