George shut his laptop and rubbed his sleep-deprived red eyes. "Customers continue to complain about stream freezing; the new fix package did not help at all! What do I do with this (censored) HLS?" he said.
The browser is not only hypertext, but also a streamer
Browsers have had players for a long time, but the story is different with the video encoder and streaming. Now, in almost any browser of the latest version, we can find modules for encoding, streaming, decoding, and playback. These functions are available through the JavaScript API, and the implementation is called Web Real Time Communications or WebRTC. This library built into browsers can do quite a lot: capture video from a built-in, virtual or USB camera, compress it with H.264, VP8, and VP9 codecs, and send it to the network via SRTP protocol; i.e., it functions as a software streamer video encoder. As a result, we see a browser that has something similar to ffmpeg or gstreamer, compresses video well, streams on RTP, and plays video streams.
WebRTC gives us the freedom to implement a variety of streaming cases in JavaScript:
- stream from the browser to the server for recording and subsequent distribution
- distribute peer-to-peer streams
- play another user’s stream and send one’s own (video chat)
- convert other protocols by the server, for example RTMP, RTSP, etc., and play them in the browser as WebRTC
Refined flow control scripts may look like this:
//Launching broadcast from browser to server
session.createStream({name:”mystream”}).publish();
//Playing broadcast by the browser
session.createStream({name:”mystream”}).play();
HLS works where WebRTC does not work
WebRTC runs in the latest versions of browsers, however, there are the following two factors: 1) Not all users update their browsers in a timely manner and may well use Chrome’s old version for three years. 2) Updates and new browsers, WebView, as well as other clients and instant messengers helping users to surf the Internet are released almost once a week. Needless to say, not all of them have WebRTC support, and if they do, it can be limited. See how things are now:
Everyone's favorite devices by Apple can be a headache. They have begun to support WebRTC only recently and at times, their behavior compared to webkit browsers may seem surprising. Where WebRTC does not work or works not very well, HLS works fine. In this regard, compatibility is required, and something like a converter that allows us to convert WebRTC to HLS and play it on almost any device.
HLS was not originally conceived for real-time streams. Indeed, how can we stream real-time video via HTTP? The task of HLS is to cut the video into pieces and deliver them to the player smoothly, without rushing, by downloading them one by one. A HLS player expects a strictly formed and smooth video stream. Here we have a conflict, since WebRTC, on the contrary, can afford to lose packets due to real-time requirements and low latency and have a floating FPS/GOP and a variable bit rate — be the exact opposite of HLS in terms of predictability and regularity of the stream.
An obvious approach — WebRTC depacketization (SRTP) and subsequent conversion to HLS may not work in a native Apple HLS player or work with freezing, which is a form unsuitable for production. The native player means a player that is used in Apple iOS Safari, Mac OS Safari, and Apple TV.
Therefore, if you notice the HLS freezing in the native player, maybe this is the case, and the source of the stream is WebRTC or another dynamic stream with uneven markup. In addition, in the implementation of the native Apple players, there is behavior that can only be understood empirically. For example, the server should start sending HLS segments immediately after the m3u8 playlist is returned. A 1-second delay may result in freezing. If the bitstream configuration changed in the process (which is fairly common during WebRTC streaming), there will also be freezing.
Fighting freezing in native players
Thus, WebRTC depacketization and HLS packetization generally do not work. In the Web Call Server (WCS) streaming video server, we solve the problem in two ways, and we offer the third as an alternative:
1) Transcoding.
This is the most reliable way to align a WebRTC stream to HLS requirements, set the desired GOP, FPS, etc. However, in some cases, transcoding is not a good solution; for example, transcoding 4k streams
of VR video is indeed a bad idea. Such weighty streams are very expensive to transcode in terms of CPU time or GPU resources.
2) Adapting and aligning WebRTC flow on the go to match HLS requirements.
These are special parsers that analyze H.264 bitstream and adjust it to match the features/bugs of Apple’s native HLS players. Admittedly, non-native players like video.js and hls.js are more tolerant of streams
with a dynamic bitrate and FPS running on WebRTC and do not slow down where the reference implementation of Apple HLS essentially results in freezing.
3) Using RTMP as the stream source instead of WebRTC.
Despite the fact that Flash player is already obsolete, the RTMP protocol is actively used for streaming; take OBS Studio, for example. We must acknowledge that RTMP encoders produce generally more even
streams than WebRTC and therefore practically do not cause freezing in HLS, i.e. RTMP>HLS conversion looks much more suitable in terms of freezing, including in native HLS players. Therefore, if streaming is
done using the desktop and OBS, then it is better to use it for conversion to HLS. If the source is the Chrome browser, then RTMP cannot be used without installing plugins, and only WebRTC works in this case.
All three methods described above have been tested and work, so you can choose based on the task.
WebRTC to HLS on CDN
There are some undesirables you're going to run into in a distributed system when there are several WebRTC stream delivery servers between the WebRTC stream source and the HLS player, namely CDN, in our case, based on a WCS server. It looks like this: There is Origin — a server that accepts WebRTC stream, and there is Edge — servers that distribute this stream including via HLS. There can be many servers, which enables horizontal scaling of the system. For example, 1000 HLS servers can be connected to one Origin server; in this case, system capacity scales 1000 times.
The problem has already been highlighter above; it usually arises in native players: iOS Safari, Mac OS Safari, and Apple TV. By native we mean a player that works with a direct indication of the playlist url in
the tag, for example <video src="https://host/test.m3u8"/>
. As soon as the player requested a playlist – and this action is actually the first step in playing the HLS stream – the server must immediately, without
delay, begin to send out HLS video segments. If the server does not start to send segments immediately, the player will decide that it has been cheated and stop playing. This behavior is typical of Apple’s native HLS players, but we can’t just tell users “please do not use iPhone Mac и Apple TV to play HLS streams.”
So, when you try to play a HLS stream on the Edge server, the server should immediately start returning segments, but how is it supposed do it if it doesn’t have a stream? Indeed, when you try to play it, there
is no stream on this server. CDN logic works on the principle of Lazy Loading – it won’t load the stream to the server until someone requests this stream on this server. There is a problem of the first connected
user; the first one who requested the HLS stream from the Edge server and had the imprudence to do this from the default Apple player will get freezing for the reason that it will take some time to order this stream
from the Origin server, get it on Edge, and begin HLS slicing. Even if it takes three seconds, this will not help. It will freeze.
Here we have two possible solutions: one is OK, and the other is less so. One could abandon the Lazy Loading approach in the CDN and send traffic to all nodes, regardless of whether there have viewers or not. A solution, possibly suitable for those who are not limited in traffic and computing resources. Origin will send traffic to all Edge servers, as a result of which, all servers and the network between them will be constantly loaded. Perhaps this scheme would be suitable only for some specific solutions with a small number of incoming flows. When replicating a large number of streams, such a scheme will be clearly
inefficient in terms of resources. And if you recall that we are only solving the “problem of the first connected user from the native browser,” then it becomes clear that it is not worth it.
The second option is more elegant, but it is also merely an end-around. We give the first connected user a video picture, but this is still not the stream that they want to see – this is a preloader. Since we must give them something already and do it immediately, but we don’t have the source stream (it is still being ordered and delivered from Origin), we decide to ask the client to wait a bit and show them a video of the
preloader with moving animation. The user waits a few seconds while the preloader spins, and when the real stream finally comes, the user starts getting the real stream. As a result, the first user will see the
preloader, and those who connect after that will finally see the regular HLS stream coming from the CDN operating on the principle of Lazy Loading. Thus, the engineering problem has been solved.
But not yet fully solved
It would seem that everything works well. The CDN is functioning, the HLS streams are loaded from the Edge servers, and the issue of the first connected user is solved. And here is another pitfall – we give the
preloader in a fixed aspect ratio of 16:9, while streams of any formats can enter the CDN: 16:9, 4:3, 2:1 (VR video). And this is a problem, because if you send a preloader in 16:9 format to the player, and the ordered stream is 4:3, then the native player will once again face freezing.
Therefore, a new task arises – you need to know with what aspect ratio the stream enters the CDN and give the same ratio to the preloader. A feature of WebRTC streams is the preservation of aspect ratio when
changing resolution and transcoding — if the browser decides to lower the resolution, it lowers it in the same ratio. If the server decides to transcode the stream, it maintains the aspect ratio in the same proportion. Therefore, it makes sense that if we want to show the preloader for HLS, we show it in the same aspect ratio in which the stream enters.
The CDN works as follows: when traffic enters the Origin server, it informs other servers on the network, including Edge servers, about the new stream. The problem is that at this point, the resolution of the
source stream may not yet be known. The resolution is carried by H.264 bitstream configs along with the key frame. Therefore, the Edge server may receive information about a stream, but will not know about its
resolution and aspect ratio, which will not allow it to correctly generate the preloader. In this regard, it is necessary to signal the presence of the stream in the CDN only if there is a key frame – this is guaranteed to give the Edge server size information and allow the correct preloader to be generated to prevent “first connected viewer issue.”
Summary
Converting WebRTC to HLS generally results in freezing when played in default Apple players. The problem is solved by analyzing and adjusting the H.264 bitstream to Apple's HLS requirements, either by ranscoding,
or migrating to the RTMP protocol and encoder as a stream source. In a distributed network with Lazy Loading of streams, there is the problem of the first connected viewer, which is solved using the preloader and determining the resolution on the Origin server side – the entry point of the stream in the CDN.
Links
Web Call Server – WebRTC server
CDN for low latency WebRTC streaming — WCS based CDN
Playing WebRTC and RTMP video streams via HLS — Server functions for converting streams from various sources to HLS