News from Industry

WebRTC vs. MoQ by Use Case

webrtchacks - Tue, 11/04/2025 - 14:05

This might be my first editorial style post here. Fippo’s Is everyone switching to MoQ from WebRTC? started some threads on MoQ vs. WebRTC. I started to respond, but my responses quickly became too long so I decided to go even deeper with a post here. Fippo’s post shows hard data that Media over QUIC […]

The post WebRTC vs. MoQ by Use Case appeared first on webrtcHacks.

Media Over QUIC (MOQ): How it will redefine realtime media and streaming

bloggeek - Mon, 10/27/2025 - 12:30

Learn about MOQ and its emergence in the media industry. See how it compares to traditional WebRTC technologies.

MOQ. Media Over QUIC. A new standard specification that is becoming quite popular, assuming you trust the many blog posts and YouTube videos floating around about it lately.

What exactly is MOQ, and how will it redefine realtime media and streaming?

This is exactly what I want to try and cover in this article.

As with many of the things I do, I’ll be doing this from the perspective of someone who’s been focused on WebRTC technologies for over a decade now…

Key Takeaways

MOQ, or Media Over QUIC, is emerging as a network protocol tailored for web media streaming
It offers advantages like support for both live and on-demand scenarios, while simplifying integration with existing CDN technology
Current deployments of MOQ are minimal, but interest is growing as companies explore its potential
For streaming use cases, investing in MOQ is recommended; however, caution is advised for conferencing applications due to complexity

Table of contents

Everyone is now talking about MOQ
What is MOQ
Latency vs Complexity and the MOQ solution
Current state of MOQ and the market
MOQ in Streaming
MOQ in Conferencing
Should you invest in MOQ today

Everyone is now talking about MOQ

Looking at my RSS feed for recent months, it seems like people start talking about MOQ in my industry just as much as they do talking about WebRTC.

Here are a few of the posts I’ve read recently on this:

nancosmos upgrades their service to use MOQ – they even explain why
Cloudflares comes out with a MOQ beta, which I covered in my previous article focused on Cloudflare
Red5 takes the stance of MOQ and WebRTC can live together, which is obvious, considering they offer multi-protocols streaming servers
Tony Hicks shares his thoughts on MOQ standardization, which is a great insightful read
Lorenzo toying around with MOQ implementation in Janus – giving us a glimpse on some of the technical challenges
Luke Curley left Discord for MOQ

If I missed your banter online – feel free to add your post link in the comments below 😉

What is MOQ

If I had to define MOQ, and it seems like I do, I’d say this:

MOQ is a network protocol suitable for web media streaming, covering both live and on demand scenarios.

It does so while taking into account the modern Internet’s infrastructure and network behavior.

Now that we know what it is (🤣) lets dive a level or two deeper.

MOQ relies on QUIC (or HTTP/3 or WebTransport) for its transport layer. Unlike most other realtime media delivery protocols, MOQ doesn’t use RTP. This makes it quite different in the network protocols scenery that we have today.

The three main characteristics in MOQ that I find unique?

A clear definition of a server component – the RELAY – which is something that can be “easily” introduced by existing CDN vendors
Support for both live and on-demand scenarios
Piggybacking on HTTP/3, making it transparent to firewalls, NATs and other nasty network policy components

There is a lot more in terms of technical advantages to MOQ and QUIC than what I plan on sharing in this article. Suffice to say that it has solutions for many of the nerdy problems previous technologies and protocols have/had. For now, what I shared here should be good enough.

Latency vs Complexity and the MOQ solution

I already said that MOQ is suitable for both live and on-demand scenarios. If I’ll add it to an older latency diagram that I have, it will likely be something like this:

MOQ fits all latency requirements for streaming protocols, which makes it kinda unique in that regard. As the saying goes:

One MOQ protocol to rule them all, one MOQ protocol to find them, one MOQ protocol to bring them all and in the darkness bind them

The fact that MOQ can do everything in a simple and scalable fashion means that it is going to be quite complex to get it to work properly. Why? Because if we have one protocol to rule them all, then it needs to do many things that are sometimes opposite to each other – be able to buffer and delay while being able to playback really really quickly for example.

To me, this indicates that we’re ways to go from a solid implementation across browsers and devices that is useful for all use cases without nasty bugs and headaches. That said, I wouldn’t pass on MOQ because of that – just be cautious about it.

Oh – and as always, the rule applies – the lower the latency you’re aiming for – the higher the complexity of your service – MOQ included…

Current state of MOQ and the market

We can view the market from two different angles: deployments and announcements.

On the deployment side, we’re far from seeing anything of real relevance yet. Philipp Hancke wrote about it here: we’re simply below 0.0x% of web pages using WebTransport (which is needed for MOQ)…

In a way, it feels like this is 2013 and the technology is called WebRTC. Wait a few years and things will definitely change for the better.

On the announcement side, we’ve seen earlier the waking up that is taking place. For the time being, it is mostly companies showcasing what it is they are working on in order to test the waters and get feedback. Experimenting and figuring out the boundaries of the technology today and from there deducing the most appropriate product/MVP to come out with at some future point.

Hey – we even have OpenMOQ – a software consortium with open source MOQ technology, being created by Akamai, CDN77, Cisco, Synamedia and YouTube.

Most are CDNs or video streaming vendors. Cisco? In the conferencing business (and switches) – more on conferencing later.

The standard itself? Close to ready we’re told. With WebRTC close to ready took a few years. Hopefully, here we will see an RFC by 2026.

MOQ in Streaming

Streaming is where MOQ truly shines. For all intent and purposes, MOQ is built and designed for unidirectional streams (it does bidirectional, but not that well – see below about conferencing). It has two separate message types that distinguish between live and on demand for it – you can FETCH older media or SUBSCRIBE to live media – or you can do both.

The purpose of all this is to be able to cater for both the low latency and higher latency requirements and in a way that can also manage and optimize the quality of service accordingly (in low latency it is often useful to sacrifice some quality to keep pace with the media stream).

To be frank? Streaming is what MOQ is designed for. You’ve got all the goodness of what HLS gives you with low latency capabilities that are usually reserved to WebRTC and similar protocols. The way this is achieved is by having these two separate ”modes” of operation that are specified in the protocol and letting the higher abstraction layers (signaling and application) to figure out what they want exactly and pick and choose the specific messages they’ll use based on their latency requirements.

MOQ plays nice with CDN networks by defining relay entities that relay the traffic over QUIC or HTTP/3 – using HTTP/3 as the mechanism translates to the ability of upgrading existing CDNs by adding HTTP/3 to them and a bit more logic on top instead of replacing them with totally new servers and technologies. This is why the likes of Akamai and YouTube are on the bandwagon of MOQ – it allows them to use existing architectures and upgrade them instead of replacing them.

Another aspect of MOQ that is specific to streaming is the fact that publishers and subscribers use the same protocol – MOQ. They don’t need to use different ingest and viewing mechanisms (no one really ingests HLS when creating a media asset for viewing for example). This end-to-end nature of MOQ is also rather useful in simplifying future architecture of streaming infrastructure.

So what do we have as advantages of MOQ in streaming?

Live/low latency and on demand out of the box support
Reliance of classic CDN technology for delivery at scale
Single protocol for publishers and subscribers

While we see WebRTC used for live streaming, I foresee the migration to MOQ for such use cases once MOQ will mature a bit. Give it a couple of years and we’re likely there.

MOQ in Conferencing

This is where things can get a bit trickier and murkier.

MOQ in the browser relies on having an implementation of QUIC, WebTransport, WebCodecs and WebAudio. You need to orchestrate all these when creating a MOQ player in your browser.

The problem of doing this for conferencing is that there’s a lot that is left for the developers to fill in and that’s when there’s a solution already – WebRTC.

What’s missing? A lot of the nuances of how you get things done with FEC, retransmissions, comfort noise generation, etc. You can develop all these on top of WebCodecs and stitch it into your MOQ-based application, but that’s going to do a lot of work and effort that was already put in place for WebRTC as part of the implementation in the browser – and now you have to do all that in your JS application on top of the browser. It gives you a lot of power, but also a lot of headache and work.

Then there’s the benefits of MOQ relays and traditional CDNs. They no longer apply. Why? Because we need SFUs for group conferencing. Relays are going to be too naive in their implementations, whereas SFUs do a lot of decision making for quality optimizations that there is no other entity this can be delegated to in the world of MOQ (unless you implement a specific type of relay and you might as well call it an SFU while you’re at it).

Two things here to consider:

I contemplated the migration to QUIC in WebRTC some 6 years ago. I don’t see this happening – too complicated and not enough interest
Zoom tried doing something similar for the web, and ended up in December 2024 just adding support to WebRTC to be done with it. WebTransport+WebCodecs just didn’t work well enough for them at the time, in particular on constrained mobile devices. It might be better now, but I am not sure this is the case (yet)

Should you invest in MOQ today

That’s the big question.

If your use case is in streaming, then yes. Definitely. Experiment with it today. Understand the technology. Figure out how to incorporate it into your mid-term roadmap.

Assuming what you do is conferencing, then I’d suggest waiting on the sidelines for now. There are other headaches to deal with that are more pressing than this adventure.

–

On a totally irrelevant, nitpicking issue, have you noticed that there’s still no logo for MOQ?

How can a technology be promoted without a decent logo?

The post Media Over QUIC (MOQ): How it will redefine realtime media and streaming appeared first on BlogGeek.me.

Cloudflare video services. Why now and what’s next

bloggeek - Mon, 10/13/2025 - 12:30

Discover how Cloudflare grew its video services and WebRTC offerings in 2025 and where they are now in relation to the industry.

Cloudflare is considered by many the 4th IaaS vendor (after Amazon, Microsoft and Google). This year, they made quite a few steps in real time communications that shows real interest and investments in the domain of video processing at low latency.

What I want to do in this article, is to go over the announcements made, and try to figure out together what does that mean and where is the industry headed.

Key Takeaways

🏃‍➡️ Cloudflare is expanding its video services and WebRTC capabilities, aiming to compete with major IaaS and CPaaS vendors
🆕 They introduced managed TURN servers, low-latency Calls and Streams, and have acquired Dyte to introduce their own client SDK
🔬 Recent innovations include Voice AI offerings and support for the MoQ protocol for live streaming
🦍 Despite strong advancements, Cloudflare lacks public success stories from big customers to establish credibility in the market
🛣️ The distinction between conferencing and streaming services is critical, and Cloudflare separates these offerings strategically

Table of contents

The big 3 and WebRTC
TURN servers
Calls and Streams
The missing client SDK: acquiring Dyte
Voice AI
MoQ
No big customers (yet?)
Conferencing versus Streaming
Is Cloudflare chewing more than they can swallow

The big 3 and WebRTC

Before we dive into Cloudflare, it is important we check what the “other” big IaaS vendors have when it comes to video and real time communications. We will focus on low latencies – below 2 seconds, which leaves out things like YouTube out of this.

Amazon

Amazon Chime SDK – Amazon’s CPaaS offering for developers
Amazon Interactive Video Service – a low latency streaming API offering
Amazon Luna – cloud gaming infrastructure from Amazon

Microsoft

Edge – web browser built on top of Google’s Chromium. Not much is done here with regards to WebRT as it rely mostly on Google here
Microsoft Teams – a full communication service for companies. It makes use of WebRTC since it works in web browsers
Azure Communication Services – the Microsoft Teams CPaaS variant, offering the infrastructure for communication services as an API to integrate with
Xbox Cloud Gaming – cloud gaming infrastructure for Xbox

Google

The creator and maintainer of WebRTC in a way… and not much to show for it

libwebrtc – the main WebRTC client-side implementation out there, that is also integrated into ALL web browsers. Means to an end for Google and nothing more
Chrome – the browser…
Google Meet – video conferencing / meetings service

–

Amazon and Microsoft have an API/developers play with WebRTC. Google not as much.

Cloudflare seems to be taking this to the extreme. Let’s review how exactly.

TURN servers

In 2021, Cloudflare announced their first service related to WebRTC – managed TURN service.

The main differentiators Cloudflare brought with it?

Lower price point per Gb consumed
More regions (300+ today versus less than 50 in all other competitors)
Use of Anycast

As a first step, this is a simple one to make for IaaS vendors and one that relies on scale.

The fact that none of the other IaaS vendors have launched anything similar while all of them do have TURN servers deployed globally at scale for their other services is quite interesting. They likely don’t see the monetary value in offering that, which I find weird, considering the 100s of other managed services they do offer for developers already.

Calls and Streams

A year later, in 2022, Cloudflare announced Calls and Streams.

Calls was a kind of a managed distributed cloud SFU that developers could integrate with using “pure” WebRTC directly to build their apps. And Streams (its live variant) was about live streaming services.

The problem with this was the lack of a client side SDK. Developers had to figure out the signaling and how WebRTC was actually implemented on the Cloudflare SFU to get it to work. Things got murkier when the time came for optimizing things – you can’t really optimize a client SDK without optimizing the server – they are closely tied. This is why, for example, when you go use an open source SFU like mediasoup, there’s also a client side SDK along with it that you will be using.

I’ve written about this a few months back in my tip & offer emails.

The missing client SDK: acquiring Dyte

Fast forward 3 years and finally Cloudflare decided to do something about it.

Instead of implementing and launching their own client SDK, they decided to acquihire Dyte.

Dyte was another CPaaS vendor from India, which could be considered as a competitor of Cloudflare Calls. They were either using Cloudflare’s SFU to begin with at Dyte, or more likely – post acquisition they have/are shifting their infrastructure to use Cloudflare 👉 keeping their client SDK and UI facade almost as is. Or maybe sunsetting it quietly, now that Cloudflare RealtimeKit has been announced.

Voice AI

There were multiple interesting announcements around Voice AI from Cloudflare this year. It started as part of the Dyte acquisition. In the same announcement, they wrapped in hints around partnerships with ElevenLabs and Hugging Face.

The real deal came only recently when they announced 4 different services/offerings:

Cloudflare Realtime Agents – orchestration for Voice AI pipelines. You can think of it as a simplified (less flexible) alternative to Pipecat Cloud or LiveKit Agents in the cloud
Pipe raw WebRTC audio as PCM in Workers – an Opus-G.711 gateway that sits on Cloudflare’s edge network dedicated for connecting WebRTC from clients to voice services that require the use of G.711 (=PCM). It also converts from WebRTC to WebSocket for the exact same reason
Workers AI WebSocket support – Cloudflare Workers AI is a set of AI algorithms you can run on the Cloudflare network as part of your AI pipeline. They now support WebSocket (and not only HTTPS) because that’s what’s needed for realtime. Interestingly, they made an effort to add Daily’s open source PipeCat SmartTurn v2 for turn detection in conversations, with Daily just releasing SmartTurn v3 (smaller, faster, better). How quickly will Cloudflare integrate with it is yet to be seen
Deepgram on Workers AI – Deepgram STT and TTS are now integrated into Cloudflare Workers AI as well. Somehow, this could have been made part of the 3rd announcement, but the decision was to split it

What do we have here?

An AI framework that is proprietary and cloud hosted by Cloudflare. This is needed by CPaaS vendors to compete in this market. In Cloudflare’s case, the result is interesting since it also covers IaaS and not only CPaaS (=same functionality and implementation, but a different mindset of the developer looking for a solution)
Utility gateway that is great in reducing headaches when connecting WebRTC clients to online AI services
A library of AI related algorithms that are preloaded on the platform via the Cloudflare Workers AI. With the introduction of turn detection via PipeCat’s implementation and STT/TTS via Deepgram

In a way, Cloudflare outdid all other IaaS vendors in offering a Voice AI framework targeted directly at developers.

MoQ

MoQ stands for Media Over QUIC. It is a WebTransport based protocol that is being defined for enabling simple live streaming solutions.

The main advantage of MoQ? It is intended to work with current day CDN architectures. So that CDNs can more easily be updated to support MoQ than it is to support something like WebRTC.

Where does MoQ suffer? It doesn’t offer the richness and power of WebRTC when it comes to bidirectional media. Think of echo cancellation as an example.

A week prior to the Voice AI announcements, Cloudflare announced support for MoQ. What they did was launch a MoQ relay service on their global infrastructure for developers to tinker with. This isn’t GA by any means – it is in beta with a protocol that is still being standardized (which is just fine).

In a way, this gives Cloudflare the complete gamut of of supported video streaming protocols in Cloudflare Stream:

HLS and DASH along with their low latency variant
WebRTC with WHIP and WHEP interfaces for live streaming (in beta)
And now MoQ for the next generation of what live streaming is

At the same time, most of the IaaS vendors only do HLS and DASH. For the time being.

No big customers (yet?)

What Cloudflare is lacking is success stories with big customers.

I know of no publicly available name of a vendor that uses Cloudflare’s WebRTC infrastructure for his large-scale deployment.

It doesn’t mean that one doesn’t exist, but it means none has come out publicly. And that’s an issue when looking for trust from new customers who can pick and choose solutions from other vendors in a market that is evolving all the time and is highly competitive.

Cloudflare should remedy this quickly. Either by pushing some of its customers – big and small – to agree to success stories that will be published and promoted. Or by getting a big customer to use their service…

Conferencing versus Streaming

There’s a big difference between conferencing solutions and streaming ones.

This is why MoQ isn’t going to replace WebRTC. At least not in the main use cases where WebRTC is being used. WebRTC is most suitable for conferencing while MoQ is best suited for Streaming.

Does it mean you should switch to using MoQ now?

Maybe. Just note that it is supported only on Chrome, so you might as well wait if what you want is coverage across web browsers.

In the case of Cloudflare, they are making that distinction with these being two separate products:

Stream, where you can find their streaming products. But not MoQ – that’s handled separately as a playground more than a beta. WebRTC live streaming support is still in beta here
Realtime, which is where conferencing lives. It also wraps into it the Voice AI capabilities, split between RealtimeKit and Agents (both in beta)

There’s a separation of technologies, mindsets and infrastructure implementations here that make sense.

It is why most vendors who specialize in streaming or even live streaming don’t do conferencing anymore (Dolby did both and shut down their conferencing CPaaS).

Is Cloudflare chewing more than they can swallow

Which brings me to this question:

Is Cloudflare chewing more than they can swallow?

Cloudflare is definitely a large company with resources. It is pouring funding into these spaces. It made an acquihire in this domain.

They are now competing in parallel against multiple vendors each specializing or focusing on one of these two – either Streaming or Conferencing.

Can they be successful in both at the same time?

Conferencing is becoming more competitive again, especially in the domain of Voice AI.

Streams with MoQ requires a paradigm shift in the technologies used and also adds the standardization effort and instabilities that goes along with it.

Doing these all things at the same time is bound to be exhaustive – both in engineering resources as well as managerial ones.

I do hope Cloudflare succeeds in it all – their angle here is innovative and interesting enough to make a difference and enrich our ecosystem.

The post Cloudflare video services. Why now and what’s next appeared first on BlogGeek.me.

Is everyone switching to MoQ from WebRTC?

webrtchacks - Tue, 10/07/2025 - 14:50

It is time for another edition of “Is everyone switching to…“. Cloudflare recently published a blog post about Media over Quic (MoQ) which made a number of statements about WebRTC that require some “clarification”. Let us start with that and look at MoQ and WebTransport after that. An odd understanding of WebRTC The blog post […]

The post Is everyone switching to MoQ from WebRTC? appeared first on webrtcHacks.

How OpenAI does WebRTC in the new gpt-realtime

webrtchacks - Tue, 09/23/2025 - 14:15

Earlier this month, OpenAI released the GA version of its realtime API. This includes many capabilities that the Beta didn’t have, including video support. I started out doing an update to the The Unofficial Guide to OpenAI’s Realtime WebRTC API I made for the Beta release last November. I discovered there were enough WebRTC updates […]

The post How OpenAI does WebRTC in the new gpt-realtime appeared first on webrtcHacks.

Understand WebRTC metrics with rtcStats

bloggeek - Mon, 09/15/2025 - 12:30

We’re launching rtcStats, a tool to help you debug, troubleshoot and optimize your WebRTC application – all by processing WebRTC stats.

WebRTC came to our lives when the world shifted from one computing paradigm to another.

We’ve moved away from on premise, bare metal computing, proprietary operating systems and proprietary software to cloud computing, virtualization, Linux/Android and open source. Oh – and the web browser as a leading user interface and our window to the world.

How is that related to WebRTC metrics exactly, and where rtcStats, our newly launched tool, comes in? That’s what I want to discuss here.

Table of contents

You are not in control with WebRTC
The need to debug and monitor on the edge
webrtc-internals’ importance
rtcStats for debugging, troubleshooting and optimizing
rtcStats for continuous monitoring
From statistics to Observations

You are not in control with WebRTC

Here’s something you need to understand with/about WebRTC:

You are not in control

There are 4 leading actors in WebRTC:

Your application. You control this part (at least)
Web browsers. Out of your control in 99% of the use cases. Browsers will update without asking permission from you and they do change their behavior from WebRTC more frequently than other technologies they have
Network – the internet. Nope. you usually don’t control this piece… the users join from wherever, over the open internet most likely
The user’s device and peripherals. Users bring their own devices these days (not in all scenarios but in most)

The need to debug and monitor on the edge

So you’re not in control of everything in your WebRTC application. What does that mean?

When a user complains about something in your service. How do you know if the problem is on your end or his?

I remember talking a few years back with an IT manager in one of the large cloud contact center vendors. He said that 90% of all complaints end up being user issues. It might be the network. It might be the headset. It might be the location of the user in his house. It might be a myriad of other problems.

This leaves us with two things we need to figure out:

Who’s problem is it?
What can we do to fix it?

If the problem is on the user’s side, then how are you going to even explain it to him and assist him with resolving the issue? Or are you just going to let him stay disgruntled until he churns?

Now, looking only at data in your infrastructure gives you half the picture. What you lack there is an understanding of the real user experience. What happened on the edge, with the device.

When the industry started adopting WebRTC, it also began looking for logs and metrics from the devices to figure out exactly these issues. Since you’re not in control, the most you can do is collect metrics and use them as hints to solve problems users bump into.

Without it? You’re running blind with your service. Good luck with doing this for long.

webrtc-internals’ importance

The main debugging tool we have for WebRTC is webrtc-internals. A kind of an internal Chrome dashboard that collects and displays WebRTC related events and metrics locally inside the browser.

This tool has been and still is the best friend developers have in understanding what is happening with their ongoing WebRTC sessions. While we contributed a lot of improvements over the years (and will continue to do so), webrtc-internals has its own set of drawbacks:

It is highly technical, useful for the savvy WebRTC developers
webrtc-internals doesn’t do many computations on its own. Simple things like averaging values or understanding outliers and bad behavior is left to the developer
When peer connections are closed, they get garbage collected, and all data related to them gets cleared from webrtc-internals
If you open it after opening a call, you miss a lot of important events that are needed for many debugging sessions
It is part of Chromium so all changes need to be approved by Googlers. And getting a better graph library is basically impossible!

rtcStats for debugging, troubleshooting and optimizing

The fact that webrtc-internals is a powerful tool, coupled with the drawbacks mentioned above, led me to create along with Philipp Hancke and Olivier Anguenot a new tool called rtcStats.

It doesn’t replace any of the existing tools you know out there – free or commercial.

It brings something else, new and fresh into the market.

What we’ve done is take everything found in webrtc-internals, making it stupidly easy to view the data in ways that aren’t available anywhere else.

Think of it as webrtc-internals on steroids.

When you want to look at a webrtc-internals dump file to debug, troubleshoot or even optimize something you’re working on, then you can now just upload it to rtcStats and view it there. Oh, and did I mention you can also share it with others without emailing it to them? And leave comments on it while at it.

You’ll be amazed at how much faster your debugging sessions are going to be from now on.

The best thing? We’ve got a free tier for this tool that is quite useful. Just sign up and upload your first dump file.

👉 Need better ways to review and understand webrtc-internals information when you debug and develop? Use rtcstats.com.

rtcStats for continuous monitoring

There’s another thing you can do with rtcStats, and that’s view rtcstats files. These files are generated by the popular rtcstats SDK. It was traveling between github users and now has a permanent home in our rtcstats account there.

We’ve modernized the code and made it work better than the old rtcstats SDK, which was initially written almost a decade ago and started to show its age. The approach used has stayed relevant over the years and we have refined it a bit, such as significantly reducing the amount of data that is uploaded. Another thing we did was switch to a monorepo approach, where the project on github now has both the rtcstats client SDK as well as rtcstats-server.

The end result?

You can integrate rtcstats client SDK directly with your web application. It will automatically collect and pass the metrics to the rtcstats-server that you can install and host wherever you want.

From there, you can pass the data into your own database (we’re not going to handle long term storage or aggregation analysis – there are enough BI and monitoring tools for that). You can also pass rtcstats files directly from the server to our rtcStats website. The result?

Have the data readily available in rtcStats for analysis
When you upload the files to us, we send back a compact JSON object with all the aggregate top level data and observations we find. You can store this in your database and run trends analysis on it

👉 Want to own your data while enjoying the best debugging capabilities for WebRTC? Use rtcstats-server and pass your data to rtcstats.com for viewing purposes

From statistics to Observations

We’ve just launched rtcStats. There’s a ton we’re going to add here. One of the main things is the way we think about WebRTC statistics. We had quite a few discussions and brainstorming sessions until we got to this point:

What we’re releasing today is the part of rtcStats that covers Foundations, Calculations and Aggregations. We are experimenting a bit with the Observations layer, which is where our focus lies in the next upcoming releases.

We plan on releasing rapidly and frequently, using the time between releases to listen to our users and improve the experience.

The Observations layer and the Deductions layer are going to be part of the paid platform and not part of the free platform (we need to eat).

With the Observations, we’re pouring the best practices and knowhow we have about WebRTC between the three of us, turning it into gold for you to immediately find once you view your WebRTC data. Think of it as your personal WebRTC expert, sitting there next to you, guiding you through the process.

Deductions? That’s what we will tackle once we nail the Observations experience.

What should you do?

Head on to rtcstats.com. Create an account. Upload a dump file. Play with it. Send us feedback.

If you’re working on continuous monitoring, then also be sure to check out and integrate with our rtcstats github repo.

And if you want to talk, you know where and how to find me 😉

The post Understand WebRTC metrics with rtcStats appeared first on BlogGeek.me.

How to impress WebRTC recruiters

bloggeek - Mon, 09/01/2025 - 12:30

Learn what specific WebRTC skills are recruiters looking for in their candidates for companies developing WebRTC applications.

If we dumb it down, I consult companies developing communication services and offer training courses for employees to upskill them in WebRTC. A recent post on reddit caught my eyes, and prodding me to write this article:

Table of contents

Why become a WebRTC expert
What recruiters want wrt WebRTC
4 domains of WebRTC expertise
You need to pick 1-2 out of the 4 domains
The other roles
What I look for in a WebRTC developer

Why become a WebRTC expert

Glad you asked…

For me the answer is simple. I love the domain of communications technology. And WebRTC is right at the cutting edge these days. It is also a VERY creative domain to work at.

Some 15 years ago or so I was at a crossroads. The dilemma was should I become a CTO in the business unit at the company I worked at, or go and become the CTO of a new startup. The startup had a nice idea around the disruption of television and internet. The go to market was focusing on Africa and developing countries.

I didn’t know what to do exactly.

So I went to consult my professor at the university. I did my thesis under him, and he came from the industry, after starting a company and selling it. So he knew a thing or two. He was enthusiastic about this for me not because of the idea – but because of Africa. In his mind, going to developing countries meant having less competition in untapped markets.

Today, when everyone wants to be an AI engineer, which boils down to writing LLMs, being a data scientist or writing prompts; doing something else might make more sense: The AI domain is full of demand and supply. It is overhyped with millions poured at the top engineers.

Most of us aren’t top engineers. We might be better off in a market where there’s a bit less hype, high demand and a lot less supply. Which is WebRTC or communication technologies in general.

Being an expert in the domain of WebRTC practically guarantees you’ll be able to pick and choose your employers, or at the very least won’t be left without a job for long.

If tinkering with AI all day long isn’t your thing (using it is something you’ll need to do no matter which role you are going to pick, but tinkering with the AI box is more than that in my mind) AND if you are interested in communications and realtime, then WebRTC might be a better angle for you.

What recruiters want wrt WebRTC

For the most part? They don’t know. Not really.

In most companies, you don’t need 10s of WebRTC experts in a company. A core team of 2-4 is usually enough. The rest around them should understand WebRTC, but aren’t required to be experts. This also means that understanding what exactly entails WebRTC experts to place in a core team isn’t that simple to understand.

Another thing to remember is that WebRTC is many things. It does voice communications really well. Then there’s video. And live broadcast which is different from group conferencing. You can also do cloud gaming with it. Or just data transfer. And in many cases, the expertise you seek in one of these market niches isn’t necessarily transferrable or needed in another.

When it comes to recruiters, most will be fine with seeing WebRTC in the CV somewhere and validating that the basics are in place.

Getting from there to understanding who would fit in a core team? That’s a totally different ballgame.

Let’s look at a simple problem – just defining what a WebRTC expert is…

4 domains of WebRTC expertise

There are 4 different domains of WebRTC expertise you can find. Being an expert in one would make you a WebRTC expert, but won’t mean you’re any good at the other areas.

This immediately means for recruiters that they need to decide which areas are important and critical to their company – to which you are recruiting someone. It also means for developers who want to become WebRTC experts that they need to pick their area of expertise within WebRTC itself.

Lets break this into the 4 domains and see how they differ from one another.

1️⃣ WebRTC signaling (and APIs)

Dealing with WebRTC signaling is no easy feat. Especially not when you have to scale a service to millions of users and to handle all the tricky edge cases.

Someone who understands signaling well will know what transport protocols are available in web browsers (and elsewhere), understand and be knowledgeable at one or more signaling protocols, have the analytic abilities to discuss and select between tradeoffs of these protocols (both transport and signaling).

Oh, and he needs to know and understand WebRTC’s API surface. Know it really well.

Here in many cases, I’d also wrap things like NAT traversal (STUN and TURN), though some of the relevant work here is better handled by a good DevOps guy who knows his way around networking and WebRTC.

Being able to handle WebRTC signaling well means you know your way with communications AND with WebRTC’s APIs and state machines.

If you are planning on using a CPaaS or Video API vendor, then you may skip excelling at WebRTC signaling. That’s because the one handling it for you would be the vendor. If you are using an open source framework – I suggest still having someone who knows WebRTC signaling well around.

2️⃣ Client side UX

Some would say that Client side UX is similar enough to WebRTC signaling or even that it should be the part of it. I view this as separate.

Client side UX is the person who takes care of the frontend. This person understands things like video tags, accessibility, how to handle paging of video elements in large group meetings, etc.

There is also a need here to know and understand the WebRTC APIs really well. Especially the ones that directly affect the UI.

Is this person more of a good UX person or more of a WebRTC person? That’s hard to say.

As with WebRTC signaling, using CPaaS or Video API vendors means you need less of this here. I’d still recommend knowing more in this area – at least if you aren’t planning on using a Prebuilt solution.

3️⃣ Media processing, at scale

Do you think audio and video are interesting? Then maybe media processing is for you.

Here, you will need to do things like understand to some extent how media compression works – for both audio and video. The difference between the various codecs used in WebRTC.

There’s a need to know and understand RTP and RTCP, along with all the relevant extensions they have that are used by WebRTC.

And that’s just for starters…

The real headache is going to be understanding how media behaves over the network for real time communications and all the different mechanisms WebRTC has to deal with issues – things like packet loss concealment, retransmissions and forward error correction.

You will need to understand how bandwidth estimation works and how to optimize and polish it in media servers. You’ll need to know the various architectures (MCU and SFU) as well as how and when to use each.

Then there are things like cascading, geolocation and a lot of other topics that need to be addressed as services scale up.

Here, there’s a lot of effort you’ll need to put into debugging, troubleshooting and monitoring skills. They are likely to be needed often.

4️⃣ Native, mobile and low-level

Embedded and native development of WebRTC is often overlooked. Assuming you plan on running your application solely inside browsers, you are likely to not need such skills in your company.

When taking this route, assume there will be less companies who may need your skillset, but those that do – will not have a large pool of candidates to draw from.

Usually, I’d suggest choosing where you want the majority of your skills to be:

Focusing on Android devices. This can also lend itself to things like set top boxes, room systems and other embedded devices. I’ve written about the role of Android as an interoperability enabler recently
If you fancy iOS, then you should be fine as well. That’s because most companies will go for iOS first before tackling Android devices
Then there’s general embedded work, usually done using libWebRTC, Pion or other library implementations of WebRTC. These come in handy in any low level work – it is also the hardest skillset to learn and excel at

For me, native and mobile is more about understanding which libraries to use on which devices. How to handle libWebRTC’s build environment, and figuring out OS-specific issues with the ported libraries.

The low-level part is more nuanced. It is the ability to go through the libWebRTC codebase, assess algorithms and work on optimizations and porting. This is a very rewarding capability – both intellectually and monetary. The hardest skills to find in the market.

You need to pick 1-2 out of the 4 domains

Finding someone who is proficient in all of these 4 areas is like winning the jackpot or bumping into a unicorn in the middle of the desert. Both are rather unlikely. Being someone that has that kind of skills across all areas? Unlikely. Might happen, but not in the coming year if you’re reading this to decide what to learn.

My suggestion? Focus on 1-2 of these areas. Figure out what you like best and then put your heart and mind in it.

WebRTC isn’t simple, but that’s its beauty. It is satisfying as hell to build something and see it working when it comes to WebRTC. Especially when there aren’t many others around who can do what you can.

The other roles

We’re done looking at engineering – the developers who build applications and services.

There are other roles that can benefit greatly from WebRTC skills. Here, there is no need for the same technical level and experience, but it still makes a lot of sense to know your way with WebRTC conceptually, understand the jargon as well as the limits and quirks of the technology.

The roles?

Support related – being able to assist end users and do some rudimentary troubleshooting and support for issues. You will need to know how to read logs and how to talk to both developers and end users (two different languages)
DevOps people – here, things like how to configure media servers and other WebRTC components to play nice on K8s environments can be a real boon
Product management – that’s where I “roam” most of the time. Understanding the limits and capabilities of the technology, being able to talk and negotiate with developers – this is what I’d look for
QA – knowing what to test and how to test it in WebRTC applications. You need to be someone who skilled in torturing such apps with a vengeance

If you’re not a developer but want a career with WebRTC – you can still have one. And knowing WebRTC better than the next candidate can get you there faster.

What I look for in a WebRTC developer

When I help companies who are on the lookout for WebRTC talent, the above are the things I have in mind.

Recently, I was asked to join interviews with potential candidates and lead the technical WebRTC part of the interview. Towards that goal, I had a piece of code prepared with a potential bug or two. I also had a nice webrtc-internals dump file. Part of the interview was around seeing how the candidate looks at the code of WebRTC APIs and how he goes about troubleshooting a potential issue.

The other part I look at is what that person did with WebRTC. What WebRTC technology stacks he used and how intimate is his knowledge with these stacks. This is to understand his current experience and skillset as well as trying to figure out his level of curiosity and passion for the use of real time communication technologies.

At the end of the day, hiring for WebRTC talent isn’t an easy task for recruiters. There are ways that you can shine as a candidate in these interviews. Make sure you are well prepared for them.

The post How to impress WebRTC recruiters appeared first on BlogGeek.me.

WebRTC and Android are the great interoperability enablers of our time

bloggeek - Mon, 08/18/2025 - 12:30

Discover the impact of WebRTC and Android on interoperability in communication technology

Interoperability used to be able vendors coming together and making sure their communication equipment talked to each other properly. This is no longer interesting or relevant it seems.

Today, interoperability is mostly about the ability of a service to run from any device and for a service to be reachable by “others”.

How is all that achieved? By relying on two technologies: Android and WebRTC

Table of contents

The good old days of H.323 and SIP interop events
Our brave new world of “interoperability”
Interoperability today is still important
The future of MCP and Generative AI
Where do we go from here

The good old days of H.323 and SIP interop events

Let’s start with a bedtime story – one that will reveal a lot of my “upbringing” in this field and my love and passion for WebRTC and communications.

Tsahi gets acquainted to communication protocols and signaling

When I started working in the communication space, some 25-30 years ago (who’s counting?), my first role was a developer in the H.323 Gatekeeper Toolkit at RADVISION.

H.323 was/is a long lost signaling protocol that started life at relatively the same time SIP did. It was all the rage, and was THE protocol to use for video conferencing.

Anyways, it took about a week until my computer that was ordered late arrived. What do you do with a new engineer in that period of time? Ask him to print and read standard protocols obviously…

My education in that first week consisted of reading H.323, H.245, H.225, ASN.1 and a few other interesting protocols. All now lost art, like COBOL. Once I finished reading these, I asked to read the source code of the gatekeeper toolkit (a signaling server SDK if we simplify it). I had the opportunity of finding a bug on paper 🤣 #truestory

Here’s the thing though. This first week served me well. Through the next 13 years at RADVISION, I worked in and around communication protocols. It included developing them, marketing and selling them, working in standardization bodies as well as going to and hosting interoperability events.

Up until recently, and maybe even today, when you go into a meeting room and see a room system – there is code I’ve written that runs in that room system.

The idea behind communication protocols

Go back to the early 2,000s. We need to understand and realize what the market assumptions and limitations were at the time:

Big service providers had millions of users. Unlike today, where we have networks with billions of users in them
Everything was hardware. You couldn’t really use software based solutions on a PC. There were a few, but that was the exception to the rule
No browser support. It wasn’t even thought of as a possibility
The mindset was to copy the telecom system. Global identity system (phone numbers or different), where anyone can reach out anyone else on the network
On premise. The cloud didn’t exist. There was no AWS EC2 and S3
Proprietary and closed source. Open source was at its infancy at the time, with most of the industry still mostly ignoring it and a few just figuring out what it is
Fragmented operating systems world. Most used things like VxWorks, Nucleus, pSOS and similar. Embedded Linux was all the rage – innovative and new. Linux was a big no no. Same for Windows. No Mac or iOS to speak of. The dominant operating systems were all “realtime” ones, which frankly, I have no clue even how to explain it in 2025

With these assumptions in mind, came a set of conclusions:

Vendors develop and (mainly) manufacture conferencing devices
Customers don’t want to be vendor locked-in and would like to be able to purchase devices from multiple vendors over a span of a few years or even decades
Hence devices purchased from multiple vendors should be able to “talk” and communicate to each other 🟰 interoperability

Let’s recap real quickly:

Standard protocol is defined
Vendors build products that implement the protocol
Products communicate with each other

Travel the world in the name of Interoperability

We call the concept products communicating with each other “interoperability”.

Since multiple vendors are reading the standard protocols independently and implementing them independently, the results almost never interoperate immediately. This type of work led to different understanding and implementation of the standard protocols and to varying behaviors that caused devices not to be able to communicate properly.

Companies had large physical labs with products from multiple vendors and a QA team that manually tested against them at all times. It was time consuming and expensive to operate.

To make things easier, there were (and still are) interoperability events. These are prescheduled industry events where companies join in by having their engineers travel to a single physical location (a large hotel, office of a company or a university) with their devices and test them against each other for a period of a few days.

I had my fair share of such events, even hosting two of them in Israel.

Me… years ago, at an interop event

They are great. But they are far from flexible, since they occur once or twice a year at specific times and you don’t control who is there and how much time they will have for you.

Our brave new world of “interoperability”

That was 20 or so years ago. Times changed. Quickly.

Today there are still interoperability events. There were even one or two for WebTC (🫨). But guess what? The interoperability events for WebRTC were scheduled and conducted between browser vendors in the early days. That was it.

In our new world of cloud computing, there is a lot less need and pressure for interoperability events.

The reasons for that are various. I’d mention 3 of them and then go to the anchors of interoperability we have today:

Cloud computing. Things get deployed in the cloud and at scale. Everything is virtual, and devices themselves have shifted from designed, realtime, proprietary operating systems, to open sources, virtualized software solutions. This simplifies development and reduces costs for everyone – from the entrepreneurs to the end users
Shorter hardware lifespan. Hardware has a shorter lifespan. We replace our phones on average every 2-3 years (I know you don’t – we’re talking about the average Joe here). Software is the focus today for the most part, and most companies have less of a vendor lock-in from a physical conferencing device point of view
Open source. This enabled less people to write their own opinionated implementations where most vendors now rely on a few limited open source implementations (that get tested against each other more frequently because they are more accessible)

Android is eating the hardware world

The first new champion of interoperability is the Android operating system.

It started as a smartphone OS, but today it is much more than that.

You can find Android today in most TV set top boxes and streamers. You can also find it in most video conferencing room systems.

At the extreme, Microsoft today relies on AOSP – Android Open-Source Project when letting its partners build Microsoft Teams hardware devices (I’ve covered that on the RTC@Scale 2023 summary). Translated to the language of this article – Microsoft relies on Android to provide the interoperability it needs for its Microsoft Teams ecosystem.

If you want to build interoperable communication devices today, the main approach is to build devices that existing service providers will be able to use. Android offers a great operating system for it.

Moreover, one vendor can enable the installation of an Android app of another vendor on his own device. In this way, you can get WebEx running on a Microsoft Teams room system or vice versa. I am not saying that this is how it is done today – only that it is a valid technical possibility that I am sure some of the vendors are utilizing.

WebRTC is eating the software world

Then there’s WebRTC. The great equalizer of real time communications:

WebRTC is an open standard
It is a well maintained open source implementation with a permissive license
All modern web browsers have WebRTC support built-in

This availability makes it the perfect candidate for offering an interoperability piece of software without offering interoperability at all.

What do I mean by this? Here are 3 ways in which companies are using WebRTC to overcome the need for classic, old-school interoperability:

Offer guest access using web browsers and WebRTC. Most video conferencing vendors offer this out of the box today. Hell – you can even send an Apple FaceTime video call invite URL for anyone to join from a web browser
Build third party services into your room systems. The Android devices mentioned earlier? They also have Chrome and WebRTC in them, which can then be used to run the third party apps much the same way as installing full Android apps – just with web technology
Offer DMA interoperability.The Digital Markets Act is EU regulation that forces large social networks to open up access and offer interoperability. The easiest way to do this for calls is by way of WebRTC (which is what Whatsapp did with its new Business Calling API)

Instead of having interoperability across vendors, what WebRTC enables us to do is to offer interoperability across devices and browsers – all relying on the WebRTC implementation. How do we validate behavior? Against Chrome of course…

Business APIs to the rescue

The third interoperability enabler of our time is the API. Instead of relying and agreeing on a standard, a service can just expose and publish an API layer to access the service. And when it comes to social networks, these are business APIs.

As I’ve mentioned earlier, the DMA interoperability translates at the end of the day into an API layer that social networks expose. And they usually do so via their business API which is geared more towards having businesses communicate with the network’s users than it is about a user of another social network communicating with users on the network (theoretically possible).

The companies making the most of these interfaces today are likely Meta’s Whatsapp and Apple’s iMessage, who both offer business APIs.

Interoperability today is still important

In some industries at least.

It isn’t that the old way of interoperability based on standard protocols isn’t relevant today. It is. Especially in the realm of telephony and cellular networks. For the most part though, in internet communications, it is enough to have standards that are agreed upon and implemented by the browser vendors.

The rest of the industry simply aligns on top of the browsers and these implemented protocols to the best of their ability.

The future of MCP and Generative AI

MCP is something I’ve been thinking about for quite some time. MCP stands for Model Context Protocol. It is a standard proposed by Anthropic for letting generative AI systems (LLMs) interact with third party APIs.

The nice thing about it is that it offers a natural language description of what an API interface does, so that once implemented, an LLM can query the API and figure out on its own what it can do, how to invoke it and what it can expect in return.

The neat part about it? Once every API has its MCP server and description, AI agents can figure out how to interact and interface with them.

This can bring interoperability to a whole new level, in many cases making it a moot point to even needing to think about interoperability at the lower levels of the implementation, leaving it to the MCP layer itself. We see this being discussed for integration purposes in enterprises, but there’s nothing that is barring us from using it for communication services as well.

Where do we go from here

The future is more interesting and integrated than ever before.

Interoperability is here to stay, but its focus and target audience is changing.

For communications, the cloud along with other trends have pushed it from a desired mechanism between device manufacturers to a low level layer across browsers and devices. The interoperability itself is taking place today via Android, WebRTC and APIs. The future, that’s for MCP to figure out.👉 If you are looking some guidance with your roadmap and strategy, leave me a note

The post WebRTC and Android are the great interoperability enablers of our time appeared first on BlogGeek.me.

How your WebRTC optimizations are costing you money while killing your business

bloggeek - Mon, 08/04/2025 - 12:30

Optimize your WebRTC applications for better performance. Discover effective tips for superior connectivity and user experience.

WebRTC is a balancing act of resources and requirements. This makes optimizing WebRTC applications a tricky business. One where being over enthusiastic with your optimization can lead to the opposite results.

I wanted to share a few of the examples I bump into over and over again – so you don’t end up making similar mistakes.

Table of contents

Deploying TURN servers in all possible regions
Cramming too many TURN servers to the iceServers configuration
Placing WebRTC signaling servers in every region
Sending video at HD resolutions no matter what
Using simulcast (or SVC) for all of your calls
Aiming to use video hardware acceleration wherever possible
Opting for AV1 video codec for all calls
WebRTC optimizations the right way

Deploying TURN servers in all possible regions

When you deploy TURN servers everywhere it is going to cost you money. Having servers stand there in the sun and rot just because someone at some point might be using them is a waste of money. I’d also venture and dare say that if they aren’t used – then they might not be really working properly and you wouldn’t have any real way of knowing that (besides placing monitoring, which in itself adds more to your cost and complexity).

This isn’t to say don’t put TURN servers in a lot of regions. It is just to say that you should decide in which regions to deploy your TURN servers.

Where would I put them? Not sure, but here’s a suggestion to start from:

Collect the regions your users come from
Place that on a histogram graph, showing how many users/connections come from which region
Make sure that for 90% of your users, you have a TURN server in their region

Not happy with this one? Here’s another suggestion, quite different in its direction:

Measure the RTT for your users
Filter out and only look at the users connecting via your TURN servers (the rest don’t matter for this optimization)
Sort the users based on their RTT – the higher the RTT – the more interested you should be about optimizing their value
Do you notice any specific user regions that show off higher RTT values? That’s where you need to add TURN servers if you don’t have any, or figure out if the data centers you use in that region are good enough (they probably aren’t)

Cramming too many TURN servers to the iceServers configuration

Lets call this what it really is – a software bug on your part.

How do I know? Because Firefox will complain to the console that you have too many iceServers if you pass its WebRTC peer connection more than 3 or 4 servers…

Why is that?

Each TURN or STUN server you add to the list means that your WebRTC client will need to gather more ICE candidates. And after collecting these candidates, it will need to conduct more ICE connectivity checks because there are more candidates.

The end result? More messages going over the network, with more resources used on this thing, which is practically useless.

This isn’t the worst of it though…

Let’s say you have TURN servers in 10 regions. This means 30 iceServers to add – UDP, TCP and TLS for each region.

What happens if a user is kinda “in-between” regions? He may end up oscillating between them, replacing the TURN server he is connected through every few seconds or minutes. And yes. I’ve seen this happen in real life.

You don’t want to let WebRTC decide on the region to use with its internal logic – you want to do that via DNS geolocation.

Read this for some more tips on TURN servers 👉 We TURNed to see a STUNning view of the ICE

Placing WebRTC signaling servers in every region

Just don’t.

It is complex to achieve (you need a distributed database or at the very least a distributed in memory data grid to store the state of all active users).

And the gain? Too small.

Users might end up sending their signals 100ms or more faster, but it won’t affect actual media quality.

So just don’t.

Your deployment is likely not large enough for this one to be worth it.

Sending video at HD resolutions no matter what

HD video is great. It gives high quality video. While at it, one can go 4K at 60fps. Or higher still.

The bigger the better. Right?

Not so fast.

Sending higher video resolutions and frame rates theoretically improves media quality.

But it comes at a cost of CPU and network resources. You don’t have abundance in these two, so whenever you decide to go higher in your resolution, you need to make sure it is worth it.

Things I ask myself when assisting clients in this area:

What’s the resolution and framerate of the camera being sent?
What’s the resolution and framerate of the video being sent (from that camera)?
On the receiving end, what’s the resolution of the window the video is going to be displayed on?
How important is the content of that video to that specific interaction?

My view on this?

Do the minimum that will get you good enough results
The more you protect and care for CPU and network resources, the more stabilized the system is going to be. From there, you can experiment with increasing quality and sacrificing more CPU and network resources for it

Check out these two resources while at it:

👉 Tweaking WebRTC video quality: unpacking bitrate, resolution and frame rates

👉 8 ways to optimize WebRTC performance

Using simulcast (or SVC) for all of your calls

Simulcast and SVC are great technologies. That said, they have their place and uses – they shouldn’t be used at all times.

Why?

Simulcast and SVC in general take up more bitrate than “normal” video compression techniques. Especially simulcast
They both also take more CPU
Oh, and SVC? It might not be supported by a hardware implementation of an encoder, which means you may end up going for a software codec

I care less about the hardware angle – I am more into taking up more CPU and bitrate. If at the end of the day, they take up more resources, I want to see some value out of it. But in 1:1 calls, there is usually less value (none with Simulcast, some with SVC due to improved resiliency to packet losses – but again, not something I’d go after for most services).

So no. don’t use these technologies all the time.

In general, don’t use them in 1:1 calls, which are likely the majority of the calls in your WebRTC application.

Aiming to use video hardware acceleration wherever possible

Hardware acceleration is great for video coding. Video compression consumes CPU resources and on mobile devices this also translates to battery life and device heat. When using hardware acceleration for video coding, you suffer less from these because they don’t happen anymore on the CPU – there’s a dedicated DSP chip that takes care of it for you.

Here’s the thing though – you shouldn’t always strive to use video hardware acceleration in WebRTC.

Why? Because of these “minor” inconveniences:

Not all devices have video hardware acceleration for all video codecs you may need
Devices that have hardware acceleration might not have it implemented properly or satisfactorily. Here are some examples from Chrome related WebRTC bugs:
- Crashes on encoding video at very low bitrates
- Poor quality on low resolutions
- Inability to encode or decode specific streams from other devices
- No support for SVC on certain encoders
- Inability to run multiple encoders or decoders concurrently
- Poor optimization for interactive video use cases

What should you do then?

Figure out if hardware acceleration of video codecs is for you
Assume it will require more QA resources and a lot more devices in your lab
Aim for whitelisting devices for hardware acceleration use (versus blacklisting devices once your users bump into them)
Err on the side of being cautious on this one

Opting for AV1 video codec for all calls

AV1 is the best video codec that WebRTC has to offer. It has the latest coding tools and the best compression rates. For a given bitrate, it is likely to produce the highest quality out of all other alternatives.

Why not always use it then? Because using AV1 usually takes up more CPU than the alternatives, so older devices can’t really support it at the resolutions and bitrates you may need it

If we end up choking the CPU to make use of AV1, the result is going to be poor media quality – the machine will be starved and will start throttling – missing packets and frames, heating up and freezing voice and video.

Deciding to use AV1?

Prepare to use multiple video codecs in your application – dynamically
Figure out when to use and when to skip the use of AV1

WebRTC optimizations the right way

Optimizing WebRTC applications is more than just a simple set of rules to follow. It ends up depending on your use case and your implementation.

It is a balancing act between the level of media quality you want to achieve (the highest) versus the available hardware and network resources you have at your disposal (an unknown that changes dynamically from one session to another and also within a session).

If you need help with this, know that this is what I do in many of my consulting projects. Just contact me and we can check together if I can help you out.

The post How your WebRTC optimizations are costing you money while killing your business appeared first on BlogGeek.me.

WebRTC Courses going through a transition (and a price change)

bloggeek - Mon, 07/21/2025 - 12:30

The WebRTC Courses are undergoing a transition, one which will take them to their 4th phase. Here’s what is changing and why.

Table of contents

TL;DR
My WebRTC courses, in phases
Free courses and modules
Additional resources and materials
Upcoming portal
🎁 One more thing…

TL;DR

Here are the changes that have taken place:

WebRTC: The Missing Codelab is now free for all
WebRTC developer courses and the bundle got a price reduction
WebRTC Tooling now includes a module for Built with WebRTC interviews
Supporting WebRTC course now has quizzes in it
All WebRTC eBooks are now part of the ALL INCLUDED developers plan

A few changes in the works:

A free module should be released soon from the WebRTC Security & Privacy Essentials course
I am migrating the Slack courses workspace to my new portal (soon)
Philipp and myself are contemplating a new course. Maybe. We’ll see

🎁 There’s an additional discount, if you’re quick about it 😉 (see below)

My WebRTC courses, in phases

I want to take the time to share with you the path I took with these courses, as this was never really planned. It just… came to be.

If you’re interested in understanding how I got here, to having more than 1,700 paying students and likely 5,000+ non-paying ones, then read below. In a way, this would not have been possible without you.

Phase 1: Inception

The courses started somewhere in 2016. I was in the process of moving to a new apartment with my family and I noticed that the trickle of consulting clients has slowed down somewhat. My focus was elsewhere (in the new place) and with it, the amount of work I had to do.

With the free time that came with it, and the need for a way to feed the family, I decided to start off with a course – I called it Advanced WebRTC Architecture course. It is still the main pillar of my developer courses, undergoing changes on a quarterly basis.

To that, I later added my WebRTC Tooling course, meant to add a bit more depth in areas that just didn’t seem right in the architecture course.

Life was good. Until Philipp decided to knock on my door, making it even better.

Phase 2: Partnership

A few years back, Philipp Hancke approached me, asking if I was interested in doing a codelab for WebRTC together – since the “official one” was lacking.

I said yes, and we set out to create our WebRTC: The missing codelab course. This resulted in a long and fruitful partnership with Philipp that I am very happy about.

We later on expanded this to additional courses as well as to our WebRTC Insights service.

Phase 3: Expansion

Then we expanded the courses to cover more WebRTC topics and for more specific audiences in a way.

The Supporting WebRTC course was born, to assist those dealing with… well… supporting users with their problems.

Then, with Phlipp Hancke again, we’ve introduced two protocols courses and a security course, deepening into areas and topics that have never been touched in such a way in any other training we’ve seen out there.

We have a few more ideas of additional courses that we might or might not add in the future.

Phase 4: Openness

Here we are today. This year, we’ve made a decision.

Open up and make available top notch training around WebRTC to more people. We are doing that in two main ways:

We’ve made the Codelab free. It is available both on the courses website and on YouTube directly (the YouTube one is lacking the exercises)
There’s a change in our pricing for the developer courses:

All developer courses are now 25-50% cheaper, with the ALL INCLUDED bundle~30% cheaper as well.

Oh, and the ALL INCLUDED bundle? It now includes also all WebRTC related eBooks on the courses site.

All courses include 1 year of access. Renewing for an additional year was at 50% of the price list. This is now changing to 25% of the price list. And yes – these courses get updated frequently, along with the changes to the protocol, the browsers and our understanding of how to use these in the real world.

Free courses and modules

There are now 2 full blown free courses, along with a new module that will be released freely in the near future:

WebRTC Basics, which has been a free course for quite some time now
WebRTC: The Missing Codelab, which is now free
We plan to soon release one of the modules in our Security & Privacy Essentials course freely as well

Again, the purpose is for more developers to experience WebRTC and use it. And if you end up liking our free courses, you are bound to love our other available courses here.

Additional resources and materials

In the past year, I’ve made the usual rounds of adding resources and updating course lessons on a regular basis.

On top of that, the following bigger changes took place:

WebRTC Tooling course now includes a new module called “Built with WebRTC”. I am interviewing vendors on specific areas of WebRTC and their own learnings and experiences. There are now 7 such lessons with a new one being added about every 2 months
Supporting WebRTC course now inclues quizzes on many of the lessons. This adds to the learning experience and enables you to validate you understand the materials

Upcoming portal

I am now working on migrating away from the Slack workspace. It has served me well, but I think a better experience is needed.

We recently introduced a new portal for the WebRTC Insights service. It is such a success, that it just made sense to create something along the same lines for the courses – this will be coming soon.

🎁 One more thing…

Ok. Here’s the thing.

There’s a price change taking place, and what would be better than enjoying an additional price cut while at it?

Since we’re in the beginning of summer, and I’ll be off on a short vacation soon as well, here’s the deal:

All courses have an additional 15% discount for their individual plans
What do you need to do? Use coupon code CHANGES2025 on checkout
This is valid until July 31… Here today. Gone tomorrow

What are you waiting for?

The post WebRTC Courses going through a transition (and a price change) appeared first on BlogGeek.me.

Answering ChatGPT questions about WebRTC

bloggeek - Mon, 07/07/2025 - 12:30

Explore the most common WebRTC questions in ChatGPT and get answers to them that a human would give (as opposed to letting ChatGPT both ask and answer…

I am trying to use Generative AI in my own work more and more. I’ve been told (as well as told others) that I won’t be replaced by AI, but I will be replaced by someone who uses Generative AI. So the only thing to do is to replace myself by learning to use AI myself.

It started small, with the Midjourney images on social media and in my articles – these are actually handled by my son most of the time. And now from time to time, I try to have short conversations with different LLM engines that relate to work. Mostly to get ideas.

For my Video Q&A series with Philipp Hancke, which passed 50 (!) videos already, I wanted a few new and fresh questions, so I asked ChatGPT for a few. I picked a couple for some future videos, but decided it was time to write an article, answering these questions – in a way, answering ChatGPT’s questions about WebRTC – and no – I didn’t ask ChatGPT to answer them for me 😉

Table of contents

🔧 General / Introductory
🧑‍💻 For Developers / Technical Use
🔒 Security and Privacy
📈 Performance and Optimization
🌐 Browser and Platform Compatibility
🏗️ Architecture and Deployment
📦 Use Cases and Applications
🤝 Interoperability and Standards
🔍 Debugging and Troubleshooting
Got any questions for me?

🔧 General / Introductory What is WebRTC?

Well… that should be relatively simple.

WebRTC is a technology for enabling live streaming of voice and video in web browsers (and elsewhere). It is open source and available in all modern browsers today.

If you want a longer answer, with videos, then head on to this article 👉 What is WebRTC and what is it good for?

How does WebRTC work?

WebRTC is actually a set of standard specifications that together make up a sophisticated media engine that is optimized for voice and video conversations. A large part of it means dealing with the devices and the networks that users end up using.

If I had to explain it, it would be something like this:

WebRTC has minimal signaling of its own, and relies on the application on top of it to handle signaling for it
It negotiates the requirements of a session via an SDP protocol (which the application is responsible of sending and receiving)
The media connection gets established using the ICE protocol, which uses STUN and TURN server as the means to get connectivity in as many network conditions as possible
Actual media is sent and received over SRTP, with the voice and video codecs negotiated in advance via SDP

I won’t go over all the details here, but if you want to dive deeper into this, then I can suggest these two free courses that I have:

👉 WebRTC Basics

👉 WebRTC: The Missing Codelab

What are the main components of WebRTC?

This depends how you look at it 🥸

I want to suggest two different approaches:

1. Technology stack

Here, you can find the various protocols that make up WebRTC. The drawing is taken from the great book High Performance Browser Networking.

You can read more about each one of these protocols in the WebRTC Glossary, or you can check out my WebRTC Protocols courses on webrtccourse.com

2. Entities

This is my preferred video of the WebRTC components, as it talks about the entities involved in connecting a session. It also suggests that in many ways, you’re not in control of most of what’s going on, which is sad, but it has its reasons.

Here are two articles to dig in about this angle:

👉 WebRTC Server: What is it exactly?

👉 The lead actors in WebRTC are outside of your control

Is WebRTC free to use?

Yes. No. Maybe.

WebRTC is an open protocol with a high quality, popular, permissive open source implementation.

This makes WebRTC free. Individuals and companies can use WebRTC to their heart’s content in whatever application they want to develop.

The thing is… developing with WebRTC is going to cost you time and engineers – both usually expensive. And running WebRTC applications isn’t free either – there are costs associated with hosting servers and paying for networking traffic – this can get expensive quickly for video that requires high bandwidth use.

Here’s a longer article on this topic 👉 Is WebRTC really free? The costs of running a WebRTC application

Who maintains or owns WebRTC?

Google. Not exactly, but close enough.

WebRTC is defined by the W3C and IETF. These are international standardization organizations that encompass the views of multiple vendors.

The implementation that goes into all modern web browsers today? That was implemented and maintained by Google.

They are by far the largest contributor to that piece of code, which means they control and own the behavior your application would deal with the moment it hits a web browser.

And remember, Google owns Chrome which has a bigger market share than all other browsers combined and with a long margin. And all other browsers run the same piece of code, known as libWebRTC.

So yes. WebRTC is maintained by Google. Mostly.

Here’s some more on this topic/question:

👉 With WebRTC, don’t expect Google to be your personal outsourcing vendor

👉 libWebRTC

🧑‍💻 For Developers / Technical Use How do I get started with WebRTC?

Depends who you are, what you do and what you are aiming to achieve.

For a developer trying to learn WebRTC, I’d go for building a sample application to understand the tech a bit better. You can use our WebRTC: The Missing Codelab training course as a starting point and an explainer for this
Companies who wish to develop a demo or an MVP of something should most likely use a third party managed service for that. There are quite a few vendors and you can find many of them listed on this page of a report of mine: Video API report
Support, QA, product managers, entrepreneurs and other people who need a basic understanding of WebRTC can start from my free WebRTC Basics training course

There are likely more approaches and others who may need getting started with WebRTC. If I haven’t covered your scenario or need, just leave me a comment on this article and I’ll try to help you out.

How do I establish a peer-to-peer connection?

Using WebRTC of course 🙂

You will need to pass the SDP messages created by the WebRTC API from one peer to another and vice versa. You will likely also need a TURN server (or a STUN server).

Lucky for you, the WebRTC: The Missing Codelab training course is free for all. It explains how to build a Node app that does exactly that – establishing a peer-to-peer connection with WebRTC. It is packed with explanations and rationale, including covering all relevant edge cases while at it.

What signaling server should I use with WebRTC?

Whatever fits your needs.

You need to start from figuring out the signaling protocol you want to use and move your way from there to the actual signaling server.

Here are two resources to guide you through this:

👉 Choosing the best WebRTC signaling protocol for your application

👉 What is a WebRTC Signaling Server and Why You Should NOT Use AppRTC?

How do I handle NAT traversal in WebRTC?

Using STUN and TURN servers and the ICE protocol.

WebRTC runs on technologies that are slightly different from the rest of what we’re used to in web browsers. This includes things like UDP, SRTP and ephemeral, dynamic ports. As such, certain network elements out there might block its traffic (such as NAT devices and firewalls). These shape the networks in ways that might hinder the ability to send what WebRTC needs sending over the network, which is why WebRTC uses STUN (to figure out public IP addresses) and TURN (to relay media). ICE then orchestrates the process to figure out the best path between the peers in the session.

More on this 👉 We TURNed to see a STUNning view of the ICE

What’s the best TURN/STUN server to use?

That depends…

Here are a few thoughts out of the top of my head:

coturn is the most common and popular open source alternative. It is used quite a lot and by virtually everyone
STUNner is a rather new and promising alternative to coturn. It has an actual company backing it, which can be seen as an advantage
If you want a managed service, then I’d look at Cloudflare TURN or Twilio NTS before venturing to other avenues

How do I record a WebRTC stream?

There are multiple ways to record WebRTC streams.

I’ll start with a fact you need to first accept – you can’t use TURN to record WebRTC media streams. TURN servers aren’t privy to the encryption keys used…

You can record WebRTC streams on the client side using MediaRecorder API or in media servers (multiple alternatives there).

For a deeper dive into recording head here 👉 WebRTC recording challenges and solutions

How can I share my screen with WebRTC?

WebRTC has an API called GetDisplayMedia. With it, the user can decide to share a browser tab, a window or the whole screen to share.

The resulting media stream can then be sent as any other video streams over WebRTC (with some minor but important differences).

WebRTC: The Missing Codelab training course includes a lesson about screen sharing.

How do I implement group calling or multiparty video?

This one will take time and won’t fit here.

Group calling requires media servers. Usually, the SFU kind.

If you are asking, then my suggestion is to use a Video API vendor for this instead of doing it on your own. Assuming you want to build it on your own and be your own boss here, then go for one of the open source SFU media servers.

Start here to learn more about media servers 👉 What exactly is a WebRTC media server?

🔒 Security and Privacy Is WebRTC secure?

Yes.

And no.

WebRTC is secure. To the point of being the most secure VoIP solution out there.

But you can ruin it all by doing things unintentionally in the application layer.

Here’s where you should continue when it comes to WebRTC security:

👉 Everything you need to know about WebRTC security 🔐

👉 WebRTC Security & Privacy Essentials (paid course)

Does WebRTC leak my IP address?

Yes. And no.

WebRTC needs IP addresses to work. How would anyone know how to reach your machine directly to send you media peer-to-peer otherwise?

While most of what you’ll find about WebRTC leak is FUD, there is truth in it as well. The fact IP addresses are needed can be abused in many creative ways.

You can read more about this here 👉 What is the WebRTC leak test?

How can I prevent WebRTC IP leaks in browsers?

A glitch in the ChatGPT matrix!

This question is too similar to the previous one 🤯

Just go read the answer above.

📈 Performance and Optimization How do I reduce WebRTC latency?

By placing your media servers closer to the users
Doing the same for your TURN servers
Analyzing the whole media processing pipeline end to end and reducing latency along that pipeline wherever you see the opportunity to do so

Guess what? I even wrote a long form article titled Reducing latency in WebRTC 😎

How can I measure the quality of a WebRTC call?

This one is tricky.

First you’ll need to define quality. Is it related to connectivity? Actual media quality? On which devices? Over what networks and network conditions?

Are you fine eating up more of the device CPU and network for better quality? Does your answer change if the device is a smartphone and the user would rather use it for the whole day and without it heating up in his hand?

One way to measure quality in WebRTC is by way of MOS and VMAF scores. Both are not really objective and have their drawbacks.

In most cases, and for doing measurements at scale, you will end up just looking at network related metrics, such as bitrate, packet loss, jitter and round trip time.

Here’s an ebook that will give you some more information on this 👉 Top 7 WebRTC Video Quality Metrics and KPIs

Oh, and you use WebRTC stats to collect these metrics and make sure measurements of quality.

What metrics should I monitor for WebRTC?

Ha! We just got an answer to it above.

I’ll reiterate it here then 👉 Top 7 WebRTC Video Quality Metrics and KPIs

How do I improve audio/video quality in poor networks?

Use better codecs
Reduce bitrate requirements
Incorporate error resiliency techniques such as FEC, retransmissions and packet loss concealment

Here are a few resources to read about this topic:

👉 WebRTC media resilience: the role FEC, RED, PLC, RTX and other acronyms play

👉 Fixing packet loss in WebRTC

👉 8 ways to optimize WebRTC performance

🌐 Browser and Platform Compatibility Which browsers support WebRTC?

All modern browsers: Chrome, Safari, Edge and Firefox.

There are some differences, but they aren’t too many. Essentially, you’ll need to test on all browsers and fix any issues that crop up.

Does WebRTC work on mobile (Android/iOS)?

Yes.

Both Chrome and Safari on mobile support WebRTC. Again, with some minor limitations and differences, but for the most part they work.

You can also get WebRTC compiled into a native application on Android and iOS, which is quite popular.

How do I make WebRTC work in Safari?

Just like you do for Chrome, but with less debugging and troubleshooting tools and with a bit more of a headache while doing so 😉

🏗️ Architecture and Deployment Can WebRTC scale for large audiences?

Yes. It requires media servers and effort, but it is doable.

Ignore the FUD around WebRTC being P2P and the need for something different (which is someone ending up selling you their WebRTC implementation).

Here are a few resources for you to read on this topic:

👉 What is WebRTC P2P mesh and why it can’t scale?

👉 How Many Users Can Fit in a WebRTC Call?

👉 Different WebRTC server allocation schemes for scaling group calling

What’s the difference between SFU and MCU?

Both are media servers geared towards managing group meetings.

An MCU will mix the media from the participants and generate a single stream going back to the participants.

An SFU routes the media it receives to the participants in the meeting. It doesn’t process media beyond routing it.

Today? SFUs are a lot more common and popular. They offer flexibility and cost less to operate.

Start here for more information:

👉 WebRTC Multiparty Video Alternatives, and Why SFU is the Winning Model

👉 WebRTC conferences – to mix or to route audio

Should I use a media server with WebRTC?

Yes.

But it depends on your application and use case.

Generally speaking, you will need a media server if you wish to conduct group meetings or broadcasting.

Choosing the best WebRTC signaling protocol for your application

bloggeek - Mon, 06/23/2025 - 12:30

Deciding on WebRTC signaling? Explore standardized and proprietary protocols to find the best fit for your needs.

WebRTC comes without a signaling protocol. This means that you need to choose your own for your application. You can choose a standardized protocol for your WebRTC application. Maybe SIP or XMPP or something else. Or you could go for something proprietary – tailored to your specific custom needs.

Which signaling protocol is best for your WebRTC application? It depends. And it is what we’re going to try and find out today.

Table of contents

TL;DR – when to use what?
WebRTC Signaling 101
- What is signaling and why do we need it for WebRTC?
- There’s signaling and there’s transport…
Standard signaling: SIP over WebSocket
Standard signaling: XMPP
Standard signaling: MQTT
Standard signaling: Matrix
Standard signaling: WHIP and WHEP
Proprietary signaling protocol
Still confused?

TL;DR – when to use what?

Let’s start with a quick answer to satisfy curiosity. Here’s my own set of rules on how to make such a decision:

If your application already uses a “chat” protocol to send messages between users for some communications, then just extend that solution to include WebRTC signaling
- This can be a VoIP product that uses SIP (then you’ll need SIP over WebSockets to get to browsers and WebRTC with it)
- It can be XMPP if you’re more into messaging or MQTT if that’s more of an IOT type application
- Or it can be some other signaling protocol that I am just not aware of. It happens
The application has some kind of a messaging bus that is used for communication with users or between users? Use it
- This can be a simple WebSocket or REST or HTTP protocol (a proprietary one) that has been used before. I always give as an example a dating app that already has a way for people to schedule their blind date
- It can also be a managed cloud messaging service such as Ably, Pubnub, Pusher or others
- Here, you’ll need to introduce new types of messages and have your WebRTC SDP and control logic piggyback on that same signaling solution
Using media servers, most probably an SFU? These come with their own client SDKs and reference apps
- Sometimes, it is easy and better to just adopt these and be done with it
- You will need to extend them as your application evolves, but they do give a simple starting point
Do you send only or receive only? Try using WHIP or WHEP
None of the above? Just create a proprietary signaling protocol to exactly fit your needs

WebRTC Signaling 101

WebRTC is a modern and powerful media engine. The thing is, you need to direct it in the right way to get it started.

I have a couple of questions for you:

How exactly do users register to a service?
How do they indicate that they are available?
How can one user search for the status of another?
How can he reach out and dial? Or alternatively, how does one join a virtual meeting room? Or an online live stream?

These questions aren’t answered by WebRTC. They are answered by a signaling protocol.

What is signaling and why do we need it for WebRTC?

A signaling protocol is there to answer the questions above.

It does so in a standardized way (hopefully, written down and well documented so it is easy to follow and implemented by others as well).

You’d think it makes sense to have a signaling protocol in WebRTC, and you’d be correct!

But there isn’t…

Here’s what I wrote over 10 years ago about the death of signaling:

The decision not to add signaling to WebRTC might have been an innocent one – I can envision engineers sitting around a table in a Google facility some two years ago, having an interesting conversation:

“Guys, let’s add SIP to what we’re doing with WebRTC”

“But we don’t have anything we developed. We will need to use some of that open source stuff”

“And besides – why not pack XMPP with it? Our own GTalk uses XMPP”

“Go for it. Let’s do XMPP. We’ve got that libjingle lying around here somewhere”

“Never did like it, and there are other XMPP libraries floating around – you remember the one we used for that project back in the day? It is way better than libjingle”

“Hmm… thinking about it, it doesn’t seem like we’re ready for signaling. And besides, what we’re trying to do is open source a media engine for the web – we already have JavaScript XMPP – no need to package it now – it will just slow us down”

WebRTC was “rushed”. Google had an implementation ready to be baked into the browser. Figuring out signaling and making a decision by committee at the standardization organizations would have pushed the actual adoption and use by at least 5 years (and I am optimistic here).

So deciding to use something that existed such as SDP as the API interface layer (because they had it already in the implementation mind you), and just let the developers figure out how to send these messages on the network was the result.

Is SDP good? Yes. It works.

Is it perfect? Hell no. It is horrible.

But it is what we have and it is what we use.

???? While we’re talking about SDP, there are plans to get rid of SDP munging as an interface in WebRTC. The question isn’t if this will happen but when. Make sure you are ready for it.

Our WebRTC Insights clients already received an action plan to rid themselves of SDP munging in a controlled way. If you want to be ahead of the curve in everything WebRTC, then you may want to check out our service.

There’s signaling and there’s transport…

You can’t just send your signaling message over TCP or UDP. I mean you can – but not if you want this to occur in a web browser. There is no programmable interface that enables that.

What you do is either use HTTPS or a secure WebSocket. Because that’s what’s available in web browsers for you to use. With HTTPS, there’s REST, XHR and SSE – all mechanisms that transform HTTPS from a page fetching mechanism to something that can do “messaging”.

On top of these transport mechanisms, we can place our signaling protocol.

Why the distinction? I am not sure, but here are a couple of reasons that come to mind:

The transport protocol is always standardized, while the signaling protocol can be proprietary
You can use different transport protocols for a signaling protocol. For example, SIP can work over UDP, TCP, TLS and WebSocket
Because with networking, we like thinking in layers

Standard signaling: SIP over WebSocket

One of the most common signaling protocols we have for VoIP is SIP.

Most of the backbone of the telephony companies is based on SIP or a variant of it. For the most part, I regard that world as PSTN – making a phone call to a phone number not using a specific app.

Incidentally, it also uses SDP (not really – it was on purpose but in an opposite way – the media engine used originally as the baseline of Google’s WebRTC implementation had an SDP interface because it was meant to play nice with SIP).

To make sure SIP can work in web browsers, it needed a few minor changes. RFC 7118 is the standard that was created for that purpose – it enables SIP to work on WebSocket as a transport layer and then with WebRTC as its media engine.

The end result? You can use SIP over WebSocket as your signaling in a WebRTC application.

When to use it?

Your app is SIP based and you just need to enable some of the users to connect to your existing network from web browsers
You know and love SIP. And you feel confident in being able to use it in web browsers using Java Script (this one is less likely)

When NOT to use it?

Your app doesn’t have any connectivity to SIP or PSTN networks. And you’re not a SIP expert
You have connectivity to SIP or PSTN but that’s marginal and not the main focus of your application (if you’re doing a contact center that has standard phones on one end and web browsers on the other, then SIP is most likely for you)

Standard signaling: XMPP

XMPP is the standard originally used for presence and messaging. It was also what Google used for Google Hangouts back in the day before it was rebranded as Google Meet and before WebRTC was even announced.

It is quite the common protocol, so making use of it with WebRTC makes sense. Especially if you want to add voice and video communications to your app.

When to use it?

Similar to SIP, I’d use it if XMPP is already at the core of my application. There’s no point in using yet another signaling protocol next to it
If you know XMPP well, you might as well use it. Assuming you’re comfortable with that decision

When NOT to use it?

If you don’t use XMPP already and don’t know it, I’d skip
Your application doesn’t have a lot of messaging beyond just the pure signaling needed to get WebRTC sessions started

Standard signaling: MQTT

Then there’s MQTT. This is a signaling protocol designed first and foremost for the Internet of Things. Its purpose is to collect telemetry from devices.

Why mention it here? Because Facebook Messenger uses MQTT as its signaling protocol. And Messenger is one of the biggest WebRTC applications out there by usage.

When to use it?

If your application already makes use of MQTT for its messaging
Like XMPP, if you know MQTT, you might as well use it. Assuming you’re comfortable with that decision

When NOT to use it?

In all other cases
I simply don’t know how commonplace this protocol is in our industry, and I’d rather use a well known solution or one I built myself than something that has been around for years, but wasn’t adopted widely by my industry. Not because it isn’t good – but because other solutions seem good enough and more well known

Standard signaling: Matrix

I think it is time I recognize Matrix as a standard signaling solution…

Matrix is rather new and was introduced and built with federated decentralized communications in mind. Big words. I am not going to explain them here.

It comes with an open source implementation of both server and client in multiple programming languages and a managed service on top for those who need it – Element

And yes. It can be used for WebRTC as its signaling protocol.

When to use it?

Think of it as all or nothing. If you use Matrix and its client and server side code for the benefits they offer (messaging, decentralization, etc), then choose it

When NOT to use it?

Don’t pick and choose pieces of it to form a signaling protocol

What I am trying and failing to say here is that you should pick Matrix if the open source app it comes with is very close to your own intended application behavior.

Standard signaling: WHIP and WHEP

Then there are WHIP and WHEP. These ARE WebRTC signaling protocols in the sense that they were designed and defined specifically for WebRTC – they aren’t used for anything else.

They are simple and limited in scope and capability.

When to use it?

For unidirectional streaming, check if WHIP and WHEP are for you
If you plan on having third party devices stream into your service (think about OBS as an example) or if you want to support some future generic players then WHEP (future because this is too early)

When NOT to use it?

What you’re doing is bidirectional in nature
You don’t care about an ecosystem or third parties and adding WHIP or WHEP only complicates things even if only a bit

Proprietary signaling protocol

You decide what you want here.

Sit down and write what type of messages you need to be able to pass. What information these messages convey. Decide on their structure and method of parsing (JSON anyone? Maybe protobuf? Something else?). Figure out what transport you want to use. Document and implement.

Be sure to make it a wee bit extensible with the ability of versioning.

When to use it?

If you already have something that can be viewed as signaling in your service. Then you just extend it this way
When you don’t find any reason to use any of the standardized signaling protocols

When NOT to use it?

Only if you lean into a standardized protocol due to reasons I’ve given in the previous sections

For me? A proprietary signaling protocol is likely going to be the way to go for a lot of the use cases that come my way.

Still confused?

I hear you.

Making a decision isn’t always simple and choosing a solid WebRTC signaling protocol for your application is one of these times.

Here’s what I can suggest:

If you picked the proprietary route, then our WebRTC: The Missing Codelab course has just switched from being a paid course to a free course. Enroll to learn more about this as part of that course.

If you want assistance in making the decision, just contact me.

The post Choosing the best WebRTC signaling protocol for your application appeared first on BlogGeek.me.

WebRTC is about reducing friction and barriers of entry

bloggeek - Mon, 06/09/2025 - 12:30

Discover how WebRTC removes the barriers of entry and the challenges associated with real time communication application implementation.

[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]

I want to go back to the basics of WebRTC and when it came to be.

People complain that WebRTC is too complex. I say it is the simplest thing we have.

When WebRTC came out, developing a web meeting service that does video was expensive as hell:

You had to develop your own media framework (or purchase a commercial one, and there weren’t many out there)
You had to integrate your own signaling into it, even if that was SIP
You then had to port it to multiple operating systems (at least Windows, Android and iOS, but usually more)
And then test it. Over different operating systems and hardware configurations. Different Windows machines act differently (surprise – you likely don’t remember that), and you had to purchase them, test on them, deal with complaints from customers

It was a royal mess.

I wouldn’t start such a project without $1-2M investment just to get a first clunky and limited version to show for it. I know, because I’ve done it once or twice where I worked prior to WebRTC’s launch.

–

What did WebRTC bring with it?

The above… in a day. Or a week:

A commercial grade media engine, built into every browser
1. Widely tested across operating systems and devices
2. Running multiple voice and video codecs
3. With all the bells and whistles of network impairment adaptation logic
The notion and reality of royalty free voice and video codecs – Opus, VP8, VP9 and AV1 have all became commonplace, widely accepted and adopted
All that goodness, with a standard API on top

The end result was that a small team or even a single developer can now build a proof of concept in a short timespan, cutting down the initial investment to virtually nothing. This means you can get something out there either on seed funding or on a shoestring budget.

It also changed the nature of the developers. Most developers using WebRTC aren’t the classic VoIP developers. They don’t have that skill set or training when they start off – they simply take some open source project and move on from there, trying to figure things out for themselves. Sometimes it works. Sometimes it doesn’t. But it does bring with it a lot of creativity and out of the box thinking (simply because they don’t know where or what the box even is).

–

Scaling a successful service still is a huge challenge. For that, you do need to understand the technology intimately.

But that first step? And the second? And the third?

A lot easier to do with WebRTC than it ever was before.

The barrier of entry and friction for developers to use such technologies has gone dramatically down.

–

So if before, the barrier had been having enough money and the training necessary.

That is no longer the case.

You just need a darn good idea, that WebRTC is a viable solution for, with the ability to execute it. Sprinkle on that great timing and luck and you’re good to go.

The barriers needed for your business? They now need to come from elsewhere. WebRTC won’t give them to you.

Need help?

Be sure to follow this blog, as it is the most up to date resource out there about WebRTC

Subscribe to WebRTC Weekly to get a picture of what others are publishing about WebRTC out there

The WebRTC Insights service takes care of a lot of what goes on in the market for you, as well as the progress made by browsers with WebRTC support

I can assist with figuring out what is possible with WebRTC, and where to focus your energies in putting up the mote you need for your business

The post WebRTC is about reducing friction and barriers of entry appeared first on BlogGeek.me.

Using LTE modems under Debian

TXLAB - Sun, 06/08/2025 - 23:07

Back in the day I created a set of scripts for 3G and LTE modems to use under Debian: they used PPP chat scripts and custom udev rules for convenience. That’s all obsolete now.

NetworkManager and Modemmanager hide all the modem communication under the hood, and you only need to initialize them properly. The following scenario was tested with Huawei ME906s and Fibocom L850-GL modems:

apt install -y network-manager modemmanager nmcli connection edit type gsm con-name LTE save quit

Here it’s important not to set “connection.interface-name“, so that NetworkManager can pick any interface name of type “gsm”. You may also need to set the APN name if it’s different from “internet“.

The Fibocom L850-GL needs to be set to MBIM mode first:

apt install -y picocom picocom /dev/ttyACM0 AT+GTUSBMODE? AT+GTUSBMODE=7 AT+CFUN=15

After that, the NetworkManager will connect automatically to the LTE network if it is available. If an Ethernet connection is present, it will receive a route with a lower metric, so that the LAN path is preferred.

How WebRTC’s NetEQ Jitter Buffer Provides Smooth Audio

webrtchacks - Tue, 06/03/2025 - 14:00

Audio jitter buffers are required 101 introductory material for understanding VoIP. libWebRTC’s audio jitter buffer implementation – the one in Chromium – is known as NetEQ. NetEQ is anything but basic. This is good from a user perspective since real-life networks conditions are often challenging. However, this means NetEQ’s esoteric code is complex and difficult […]

The post How WebRTC’s NetEQ Jitter Buffer Provides Smooth Audio appeared first on webrtcHacks.

8 ways to optimize WebRTC performance

bloggeek - Mon, 05/26/2025 - 12:30

Discover effective strategies to optimize WebRTC and enhance the quality of your video and audio streaming services.

In my update to the Video API report this time, I had the chance of reviewing what the vendors have done in the last 12 months or so. Some added new features and capabilities. Others not so much. Many were improving and optimizing their offering – better background replacement, less peer connections, more users in a single call, additional devices, …

WebRTC is a marathon and not a sprint. You can’t just write once and forget. You need to work at it. Day in, day out. Improving and optimizing your application.

Part of these optimizations are around WebRTC performance. Here are 8 places to validate the next time you need to optimize your WebRTC application’s performance:

Table of contents

1. Send and receive less bytes
2. Use better video codecs
3. Don’t send all audio streams all the time
4. Use simulcast and SVC only when needed
5. Treat different configurations differently
6. Have more media servers
7. Allocate users to closer servers
8. Collect, measure and monitor your metrics
Final thoughts on optimizing WebRTC performance

1. Send and receive less bytes

Here’s a shocker – if you send and receive less bytes (especially of the video kind), you are going to have higher performance. Your device will use less network and CPU resources (which will make it perform better). The media servers will have less data to route through them.

I know that what we want at the end of the day is the best possible 4K resolution at 60fps in a crisp look. And that’s before you start dreaming of doing VR or 8K.

But here’s the thing – do you really need 4K or even full HD on a smartphone with a 5” or 6” display? Is that 4K from the webcam useful when you’re also sharing your display at the same time and the other participant cares about your display and not your looks?

Why did I switch here from bytes to resolution? Because the higher the bitrate (=bytes) the higher the resolution I can compress at reasonable quality

We call this the resolution ladder – for a given bitrate, we match a suitable resolution, and we go up or down the ladder based on how much bitrate we have. The numbers vary per the video codec, frame rate, type of content and if you’re going up or down the ladder, but that’s for another time.

Oh, and you don’t control where the rungs on the ladder are – that’s a decision left to the browser to make

–

So… first things first.

Go count your pixels. Check your bitrate. See if it is optimal for your use case. Ask yourself if, where and how can you reduce that bitrate. Either on the incoming or the outgoing streams. To think about it in a simpler way, start by focusing on the resolution and framerate and move your way from there towards bitrate and bytes.

2. Use better video codecs

Did I mention that video codecs affect bitrate and quality?

For the same bitrate budget, the quality you get will be something like this for each video codec:

VP8 < VP9 < AV1

AV1 will give better quality than VP9 which in turn offers better quality than VP8 (for the same bitrate).

So yes. Picking a newer video codec means lower bitrate. But it also means higher CPU and memory use. This makes the decision non-trivial…

When you pick a better video codec, there’s another decision to be made – are you going to use the added bitrate to improve quality or will you reduce the bitrate and maintain the same level of quality?

And this isn’t the only question to deal with in a multi video codec environment. You need to pick the video codec that is suitable for the specific scenario you’re in:

AV1 is a great codec to use today. But not on older devices. And not when the resolution and bitrate might be too high
AV1 is also great for text in screen sharing (text legibility at even low bitrates is way better than the other alternatives)
H.264 can be a great codec on the right devices – it comes with hardware acceleration in many cases, which means lower CPU use and having mobile handsets that don’t warm up on long video calls
VP8 is rock solid, available everywhere
HEVC is an Apple thing for Apple devices that might or might not be available
VP9 today is a kind of a transition point between VP8 and AV1

Which. One. Do. You. Use?

It depends.

And we will leave it at that. Just know that optimizing WebRTC for performance means figuring out which video codec to use in which scenario.

3. Don’t send all audio streams all the time

During Covid, I had a customer asking to be able to recreate the experience of a stadium full of people. Hearing the people around you and the crowd cheering together when a goal is scored.

The problem, besides the CPU and/or network required to make that happen, was that the WebRTC implementation from Google at the time (that’s libWebRTC) wasn’t fond of mixing too many audio sources. It simply took the 3 incoming streams with the loudest audio and mixed them – ignoring all others.

The good thing about it? It reduced CPU load. And frankly, if you have more than 3 people speaking in a meeting you have other issues than the WebRTC implementation – likely something you’ll need to settle between the people speaking anyway.

What happened is that Google a year or two ago decided to remove that optimization. It will now mix all incoming audio streams thrown at it. Theoretically, you can now give that stadium audience the vocal experience of everyone cheering. In reality? Your users might be suffering from CPUs that warm up a lot more due to the extra mixing effort.

What should you do?

“3 loudest” approach to audio mixing

Decide on the maximum number of audio streams you wish to mix. If you aren’t sure – just pick the magic number 3.

3 was the magic number libWebRTC used for over a decade. Now there’s no limit in libWebRTC. But… Google Meet still decide on 3 as its magic number.

Now that you have that number, make sure in your SFU to never send more than the 3 loudest streams to send towards the listeners. What do you do with the rest? Replace their media with DTX or just don’t send them… up to you and your architecture.

That will improve your session’s scale and optimize WebRTC performance for both network and CPU.

4. Use simulcast and SVC only when needed

Simulcast is great! SVC? Even better!

But not every problem is a nail with that hammer you call simulcast (or SVC for that matter).

Let’s take simulcast as an example. We use it to generate multiple video streams in various bitrates so that a group meeting will be able to deal with users on different networks and devices. It improves the average user experience of the meeting for its participants.

But… done in a 1:1 meeting, it is just wasteful.

The sender here is sending too many streams, causing it to waste precious CPU and network resources instead of using the same resources to improve the quality of that meeting with a single video stream.

You need to figure out when to use and when not to use these features…

5. Treat different configurations differently

That example around simulcast above? Let’s generalize it a bit, shall we?

Your application will have different configurations for its WebRTC operation. It might be due to the number of users, their locations, the devices used, their network quality or even what it is that they are doing in the meeting itself.

Take all of these different permutations, let’s call them configurations. And now for each, figure out what is the best approach to optimize the performance of your WebRTC stack for it.

Is it worth the effort to optimize in such a way?

Does this configuration happen often enough? To important users/customers?

How complex is it to implement that kind of optimization?

What about switching from one configuration to another – can you smoothly turn on and off the various optimizations you have in place?

This is important. Go do the work.

6. Have more media servers

If you want to optimize a WebRTC application for performance, you might as well throw more media servers on the problem.

Throwing more hardware is great, but the point I want to make here is that these servers need to be CLOSER to the users.

Got all your media servers in a single data center in US East? You need to add another region.

Covered the US and Europe? Time to add Asia.

Etc.

In my Video API report, there’s the whole gamut of deployments:

Everything from a single region, single continent to over 200 regions. And it seems that you’re either happy with 10-30 or you strive for 200+ regions.

Check where your users are from. Populate the data centers around them with your media servers.

Oh – and you don’t really need to overdo it. Many of the bigger vendors (who have high media quality) make do with less than 20 different regions.

7. Allocate users to closer servers

Got your servers sprinkled all over the globe? Great!

Now where do you end up connecting your users? To which location?

If there’s a meeting between 2 people in the US and 1 in France. Which regions do you have media servers covering this headache of a meeting?

If it is in France… then the two in the US are going to have a poor experience. Especially when they talk to one another in the meeting (their media flows over the Atlantic ocean for no good reason)
If it is in the US… well… that guy in France might suffer from a poor connection over that same ocean and end up with more packet losses and latency than you wish for
You could cascade this and have multiple media servers in multiple regions handle the session. But that takes effort. Make it happen

The point I am trying to make? Media server allocation for group meetings isn’t trivial. Take your time figuring it out and implementing it properly.

8. Collect, measure and monitor your metrics

If you don’t know what’s wrong and why, there’s no way you’re going to be able to fix things. Or improve. Or optimize.

I started off by saying that WebRTC is a marathon and not a sprint. When it comes to optimizing WebRTC performance, it means that you need to improve over time your application.

Where and what to improve?

What gives the highest ROI for your effort?

Did your changes make a dent and actually improve things?

To answer these questions requires you to collect metrics, measure and analyze the data. And monitor continuously for it.

Make that a top priority for you.

Why?

Because the time will come when you will have users complaining. I’ve seen it happen multiple times with the companies I help.

Starting to put these monitoring tools in place at that point in time means you’re working with urgency of churning customers, which isn’t fun.

Start earlier than that.

Final thoughts on optimizing WebRTC performance

This is what came out of the top of my head the other day about optimizing WebRTC performance. There are likely at least 8 more ways to do that – all of them important and useful.

Don’t neglect this part in your WebRTC application development planning.

Optimizing a WebRTC application is great. But what about successfully launching it?

Check out my 3-step WebRTC launch action plan – a free resource that will show you what I do with every consulting project that deals with launching WebRTC applications.

Get the 3-step WebRTC launch action plan

The post 8 ways to optimize WebRTC performance appeared first on BlogGeek.me.

A good WebRTC application is like a great orchestra performance

bloggeek - Mon, 05/12/2025 - 12:30

Learn about the qualities that define an exceptional WebRTC application and why user experience matters.

[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]

There’s something to be said about great WebRTC applications. Something about them is simply better than the rest when you bump into them. We’ve all seen them. For each of us it might even be a different application.

What do they all have in common?

Their user experience works for you (instead of you working for it)
You don’t need to think about calls not connecting (they might not connect, but somehow, you’re going to understand why – and it will happen less often)
Media quality will be good enough (you won’t find yourself comparing it to other experiences you’ve had)

Getting there requires a certain commitment. A need to look at the various parts of the application, the whole design, the implementation. And then to lovingly optimize it over and over again. Iterating in each stage to polish another piece of it.

Somehow, I wanted to compare a good WebRTC application to a great orchestra performance in this quote, but I find myself drawn to another conclusion immediately – the one that says that WebRTC is a marathon and not a sprint.

Table of contents

For Engineers
- Cover all your bases
- An ongoing effort
For Product Managers
For Customer Success and Support
Need help?

For Engineers

Getting WebRTC properly tuned like a great orchestra requires finesse and a lot of understanding of how WebRTC works.

There are a lot of moving parts in WebRTC – clients, browsers, media servers, TURN servers, …

And they all need to work together properly:

Cover all your bases

Just recently I sent out my tip & offer email to my subscribers, where I mentioned that a media server cannot work in vacuum and needs a client side SDK.

Fast forward a few weeks, and Cloudflare acquires Dyte because its SFU was missing … a client SDK. I’ve written about Cloudflare as part of my previous article – go check it out.

The same is true for the other bits and pieces of WebRTC:

Yes. TURN servers are rather independent and the first thing I’d suggest to my clients is to “outsource” these to third party managed services if possible. But you still need to focus here on where your users are, their types, the need for custom installations at times, etc.
Media servers need to have client SDKs. I mentioned that already above
You need to figure out the source of truth in the whole deployment – and if you even have one – or do you have media servers and application servers communicate independently directly with the clients that pass JWT tokens with their permissions
Scaling has multiple dimensions here: scaling a group call, scaling on the client’s UI, scaling specific server types, scaling a global session across servers, …
How do clients and media servers “negotiate” the dynamic capabilities and limitations of the client’s device?
Where does the UI and UX on the device play a role to “hide” certain limitations of the system – such as the latency, mute signals, low CPU, poor networks, …

The list here is endless…

An ongoing effort

An orchestra? It has a conductor. His role is to decide in real time what takes place. And for that he looks and listens to the musicians.

With a WebRTC application, we need observability – a way to understand what users feel in real time as it relates to the media being sent and received. And then we need to adapt.

This adaptation is done dynamically. But also as an optimization effort that takes place over time.

For Product Managers

Here are a few immediate insights to draw from this:

It isn’t that simple and obvious what makes up a good application
Good applications require attention to detail
Since WebRTC is built out of many moving parts, you need to orchestrate and tune how they work together to reach the type of an experience you want
This is going to take time. Longer than what your developers or your outsourcing vendor is promising you. And not because they don’t know – but because getting from a WebRTC application to a good WebRTC application isn’t obvious (or even factored in the requirements)

So. Where does that lead you?

Look at WebRTC projects as an ongoing investments
Plan for generous “technical debt” time
- 20% of engineering effort around the communications piece should be fine
- Split this between actual technical debt and small tweaks and improvements that are targeted at tuning your WebRTC orchestra
- Have a Product Manager guide and prioritize these tuning initiatives
Compare your application to the market leaders every 6 months or so
- WebRTC moves fast, and so are the leading vendors
- Knowing what they do and “feeling” their apps will give you insights

For Customer Success and Support

An orchestra has lots of different musical instruments. Each giving his own unique sound to the final composition.

With WebRTC applications, we must not forget Customer Success and Support functions.

While we may have the best implementation of WebRTC. The best infrastructure is in place. At the end of the day, what is going to matter is the here and now. The session the user is on, and the experience he is having.

And as I always say, things are out of our control, and some of the reasons for that are the user’s own device and the network.

In such cases, we will need to front user complaints and requests, and be able to handle them properly. This is part of the overall experience. Part of the “orchestra performance” that we’re putting out there in our WebRTC application.

Take care of all your WebRTC instruments – even the non-technical ones.

Need help?

Be sure to follow this blog, as it is the most up to date resource out there about WebRTC

Subscribe to WebRTC Weekly to get a picture of what others are publishing about WebRTC out there

The WebRTC Insights service takes care of a lot of what goes on in the market for you, as well as the progress made by browsers with WebRTC support

I can assist with comparisons to market leading apps, as well as in prioritizing efforts

The post A good WebRTC application is like a great orchestra performance appeared first on BlogGeek.me.

The future of Video APIs is… AI: LiveKit, Daily and Cloudflare this month

bloggeek - Mon, 04/28/2025 - 12:30

Three important news items were published in the past couple of weeks that are shaping the Video API market. And all have an AI aspect to them.

Our Programmable Communications industry (CPaaS) is moving and shifting. And those focused on video are the ones who matter at the moment. In the past, we’ve seen such innovations coming from Twilio, who defined and redefined the CPaaS market. In recent years not much. These days? You need to look at the video players to understand the trends.

Here are 3 big news items that got my attention this month and why they matter:

Daily announced Pipecat Cloud
LiveKit series B funding and… LiveKit Cloud Agents
Cloudflare acquires Dyte and partners with Hugging Face

Let’s see where all this lead us to

Table of contents

Why video is leading the way in AI for CPaaS
From an AI interface to an AI framework
Daily and Pipecat Cloud
LiveKit and Cloud Agents
Cloudflare closing its gaps
Upcoming update of the Video API report

Why video is leading the way in AI for CPaaS

CPaaS started off around SMS and voice. The concept around it was to aggregate telecom providers and place a single, sane API on top of them.

The barrier or mote here for vendors was the negotiation of contracts and integrating with interfaces of 100+ telecom providers around the globe. Not fun at all.

That meant a customer could purchase a phone number, send a message and answer calls without the need to think if the underlying provider is Verizon, AT&T or Globe Telecom in the Philippines. And the customer didn’t really care – not who the underlying provider was, as long as the service was good. And that service was uniform in nature – you want calls to get connected and messages to be delivered at a high rate. Nothing less and nothing more.

Fast forward to today and nothing changed in voice-land.

But AI is different.

When you look at how the voice focused vendors are adding AI, some are doing so by deciding which algorithms/vendors to use and placing an API layer on top of it, taking their sweet time about it. The notion is that the customer doesn’t care much/enough about this anyways and/or that the algorithms/vendors are finite and small in their number. So they can do it all themselves.

The video focused vendors who are looking at this and are at the forefront with their vision are Daily, LiveKit and Agora. They all created AI frameworks making them open source. Gustavo wrote about these already.

The concept behind all these frameworks is simple:

Make it easy to connect the Programmable Communications media stream to the framework
Have the framework flexible enough to deal with a variety of use cases, some of which are still unknown to us
Integrate with as many algorithms/vendors as possible
Make it open source, so that others can integrate more algorithms/vendors (because the world is infinite here and not finite)

And it worked for them. At least based on the engagement numbers we see on git for the relevant projects.

From an AI interface to an AI framework

The naive solution which I was promoting and aiming for was simple. If you are dealing with CPaaS, what you need to offer is a way to extract or inject in real-time audio and video streams to your platform in a backend-to-backend manner.

Such an approach just means that you have a WebSocket, RTP or some other transport mechanism from your media servers that can then be connected to external AI services. Think of TTS (Text-To-Speech) for a call as an example. Users connect to your SFU. The developers can connect the audio from that meeting and send it towards whatever TTS service they want and continue things from there.

That enabler is an AI interface for CPaaS. Some services have had these for years on their voice channels. Those doing video started introducing them more recently. It gives developers the full capabilities, but little else. In a way, it leaves a lot to be desired. Especially now that LLMs are so popular and mostly text based.

What happens is that we usually need a kind of a processing pipeline these days. A way to ship media from the media server through one or more external components and then back into the media server. That requires an AI framework.

Something akin to… well… Daily’s Pipecat and LiveKit Agents.

I believe such frameworks connected to the Video API or being an integral part of them will be critical moving forward.

Daily and Pipecat Cloud

Daily had a hosted solution for AI called Daily Bots. It decided to sunset it and instead introduce Pipecat Cloud. The actual announcement was made by their CEO, Kwindla Hultman Kramer over LinkedIn:

(you should follow Kwindla on LinkedIn – he shares a ton of insights and resources there regularly)

The main change?

Up until now, Daily developers could use one of two approaches:

Adopt Pipecat as their AI framework, build their logic with it, and then deploy it on their own wherever they wanted – just like any other open source component
Use Daily Bots, which was a hosted service by Daily, built on top of Pipecat. It was great but limited in nature (it didn’t allow running custom Python code as part of the bot)

Daily decided to sunset Daily Bots and migrate its customers to a new platform called Pipecat Cloud. This is a managed Pipecat service, where developers build their own Pipecat pipelines in local Docker containers and then upload them to the Pipecat Cloud where they run in production. Daily takes care of scaling, monitoring and everything else.

It was the natural next step:

This increases the mote between Pipecat and Daily’s competitors to just use Pipecat; they would need to now offer a cloud based service as well to make it compelling to begin with
It enables and entices an easy migration path between the Cloud and the open source offering

In a way, Daily took a step from LiveKit’s playbook – starting by offering an open source framework (Pipecat), getting developers hooked and addicted to it, and then introducing a paid Cloud service for it. Which is a natural segway to… LiveKit.

LiveKit and Cloud Agents

LiveKit had a big announcement this month, celebrating its new series B funding of $45M. This post is interesting in the way it is written – from the least important to the most important (at least for me):

LiveKit Agents 1.0 is released, in a way, stating this isn’t a beta or an MVP anymore without really saying it
- Workflows are introduced, for better support of a conversation flow with known steps in it (mainly for contact centers)
- Multilingual semantic turn detection, which is neat
- Telephony support, which was there before, but somehow mentioned here for emphasis I believe
Wrapped under Agents 1.0 is also Cloud Agents, which I believe deserve to be mentioned separately
- LiveKit Cloud Agents is the same as Pipecat Cloud – in the sense that you build your own LiveKit Agents logic and code, and then host it on LiveKit’s cloud
- Unlike Pipecat Cloud, Cloud Agents is in closed beta with a Google Form in front of it to access
- This might mean that LiveKit weren’t ready for this announcement, but had to push it through because of Daily’s announcement AND because of their series B funding
Series B funding
- $45M is serious money in 2025, especially in the Video API domain where funding is scarce. This comes low when compared to pure AI players, but in a way, shows where the focus is in our industry now – AI (not surprising)
- Total funding LiveKit raised so far is $83M, which is considerable and shows the trust of its investors
- LiveKit plans to use this new funding towards “growing our team and furthering our progress towards offering an all-in-one platform for building AI agents that can see, hear, and speak like we do.”

This was great news for LiveKit and it gives them what they need to push through and grow their offering in ways that are hard to achieve in the current economic climate.

Cloudflare closing its gaps

I must admit. For me, Cloudflare in WebRTC was a bright shining light and a huge disappointment at the same time.

On one hand:

Cloudflare is likely the 4th IaaS vendor after AWS, Azure and GCP
Their spread of 200+ data center and use of Anycast brought something fresh and new to the WebRTC market
A no frills hosted SFU was again something interesting and new

On the other hand though:

There was no client SDK to speak of
Cloudflare assumed developers would just connect to their SFU and it will magically just work, which is far from the reality. Especially if you want to optimize for media quality
Since the initial announcements, no further news came out of Cloudflare officially

It seems like Cloudflare didn’t lose interest in WebRTC. It just tried to figure out what the next big step should be, and it is trying to close the gaps with two different deals it did, wrapped into a single announcement.

It starts with a new name for the offering. Instead of Cloudflare Calls it is now called Cloudflare Realtime, which now includes 3 products: RealtimeKit (new and in beta), TURN Server (once almost a hidden part under Calls) and Serverless SFU (what was Calls).

Cloudflare acquired Dyte, another Video API vendor from India, and wrapped it into RealtimeKit
- Dyte will be moving its own API and SDKs to use Cloudflare’s infrastructure (IaaS and most likely also TURN and SFU). At some future point, they might just close Dyte as a product/company and have it all under RealtimeKit
- RealtimeKit now serves as the biggest missing part for Cloudflare – client SDKs. The announced platforms that will be supported by these SDKs are Kotlin, React Native, Swift, JavaScript and Flutter
- Recording and Voice AI (in partnership with EleventLabs) will be part of the platform as well
- As with LiveKit Cloud Agents, the access to RealtimeKit is also in private beta behind a signup form
- There is also a promise that all this comes with a robust AI offering, but that feels more of a lip service or a roadmap item than anything else at the moment
Partnership with Hugging Face
- Hugging Face is a large and important player in the generative AI and machine learning domain
- Recently, it launched their own FastRTC framework. FastRTC is all about connecting WebRTC and WebSockets to AI models – essentially what we need to build our media pipelines in Video APIs; and in a way, somewhat similar a bit to PipeCat and LiveKit Agents
- To make sure users of FastRTC end up with Cloudflare’s WebRTC infrastructure and RealtimeKit, the initial step that Cloudflare took was to offer free 10Gb of TURN bandwidth each month to Hugging Face users. It sounds much, but it is $0.5/month based on Cloudflare’s TURN pricing
- What’s important here is the partnership and the intent. I am sure this is a first step, considering the acquisition of Dyte in parallel to this

All in all, a positive announcement for Cloudflare and shows intent of investing further in WebRTC and Video APIs.

Upcoming update of the Video API report

These market changes, along with a few previous ones, made me decide to update my Video APIs report.

It needs a better explanation of the market after Twilio decided to keep their Programmable Video service, but also in light of the trends mentioned here, beef up the whole section dealing with AI frameworks.

I am again reaching out to the vendors, to see what I missed from the work they put into their platforms this past year, and also looking for vendors who weren’t covered by the report so far and should be there. If you know of one, or work in one, just ping me to let me know.

And if you are interested to learn more about this report, or any of my other services – just reach out to me.

The post The future of Video APIs is… AI: LiveKit, Daily and Cloudflare this month appeared first on BlogGeek.me.

What’s Your SaaS for WebRTC Signaling?

bloggeek - Thu, 04/24/2025 - 12:00

Looking for a signaling solution for WebRTC? Why not ditch the whole protocol discussion and head straight towards a SaaS based approach?

The true meaning of cloud-based signaling

I’ve written about how to select a signaling protocol for WebRTC. This led to a lively discussion both on my blog and on Facebook’s WebRTC group. I learned a new thing that day:

VoIP signaling is a religion. People believe in a specific protocol and worship it. And they tend to fight with the atheists.

I was part of the religious in VoIP, but I now have doubts of its need. Call me a signaling protocol atheist.

I was corrected on that post that signaling protocol and network protocol are two separate things that need to be discussed and selected separately. It is true that they are different, but I think that most developers who approach WebRTC today don’t make that distinction any more – they simply don’t care – they are just trying to get their service to work.

On Facebook, Olle E Johansson commented:

Yes, the API is the key. Finding an abstraction level that the web developer understands, not that just exposes protocol operations. The signalling matters when things start growing and you need scalability or interoperability with other systems, but only then.

I guess you care about signaling protocol for interoperability, I just don’t see how it can help with scalability – the web is scalable enough – a lot more than VoIP today – just ask Whatsapp. Why bog it down with SIP? My recommendation still stands: If you don’t need to connect to other networks (or if you do, but only for a small part of your use case) – go for a proprietary signaling protocol.

What I did ignore/miss though, is what happens when you decide to go for a proprietary protocol, but don’t really want to deploy a server at all. What if what you want is to get “signaling as a service” – SaaS.

First question is why would you?

The easy answer here is because you can, and because it has its advantages over building your own. As with any other SaaS or cloud related service, these things come to mind:

Scalability – someone else who does that for a living takes care of it for you
Maintenance – do you really want a DevOps guy to sit all day playing with scripts and monitoring your signaling infrastructure?
Availability – same as above. Just too much work to deal with

The main thing though, is probably deciding what’s core to your business and what is just details. Signaling has migrated in to the “details” part, so outsourcing it to a SaaS vendor makes sense.

Here are a few viable options to use for WebRTC signaling in a SaaS model.

Ably

Ably is one of the independent managed messaging/signaling platforms out that that can be used for WebRTC signaling.

I have a soft spot for Ably – at testRTC, years ago, when we needed some signaling solution to create our own simple demos or to integrate into our products, after going through the motion of trying out other alternatives (some listed here below), we ended up with Ably.

Why? Because it was the most straightforward and simple for our develoveprs to integrate with.

What were the exact reasons? I don’t know, and didn’t investigate much at the time. It simply provided the best experience for us in getting things up and running – and that’s our goal anyways.

PubNub

If you’ve been around long enough with WebRTC, you should already be aware of PubNub. As an example, Rebtel are already using them.

PubNub offers a publish/subscribe infrastructure that can be used to develop messaging applications. WebRTC services being one of their targets, they are heavy on marketing their solution in WebRTC events and have gone as far as offering a reference implementation for developing a video calling service with WebRTC and PubNub.

If you are looking for a vendor that cares about show casing customers that use WebRTC and offers the kind of scaling you will need today for other use cases – PubNub is a good choice.

Firebase

Google Firebase is one of the BaaS vendors out there (Backend as a Service). Their intent is to enable developers to build frontend apps without having to care at all about the backend.

The main difference from PubNub here is that it synchronizes data and acts as distributed storage/memory for your apps. If you need more than just messaging, I’d suggest you check it out.

Since its acquisition by Google, Firebase has expanded to be the developer backbone solution of a lot of services for developers – especially on Android apps.

I know of a few in the WebRTC community that are using Firebase, so it is a valid option. Firebase might not make a lot of noise about WebRTC, but that’s because it isn’t their main focus (which can be seen also as a downside)

PeerJS

PeerJS isn’t really a SaaS provider, but it is contemplating to be one – at least from how it looks like in their website.

PeerJS is a framework that provides signaling for WebRTC. It operates with a Node.js based server called PeerServer that has a service called PeerServer Cloud. This cloud service offers only free accounts for hacking, but nothing in the form of production support.

It is here because of three reasons:

Many are using it already to build their services, so it made sense to me
They have the potential (and general intent) of offering it in a SaaS model; although this hasn’t properly materialized in over 10 years of their existence
The source is freely available, which means that you have the ability to start with SaaS and migrate to your own data center

A word of caution – from the looks of it – this might not be able to scale easily to the millions. It just seem too… lightweight.

Pusher

If Ably, PubNub and Firebase are here, then Pusher should be as well. It is a messaging SaaS provider. Not much in the WebRTC domain about it, but I guess that it can be used just as well.

Use it if you already know it and like it.

–

I am sure there are others as well that I missed, like XSockets.NET – and those that are still too small like GrimWire. And then there are the likes of Stream, which started for messaging and now has its own video service built on top of WebRTC as weel.

–

If you are trying to figure out what to use for your product, you can always contact me about it.

The post What’s Your SaaS for WebRTC Signaling? appeared first on BlogGeek.me.

OpenAI & WebRTC Q&A with Sean DuBois

webrtchacks - Tue, 04/22/2025 - 13:23

OpenAI is utilizing WebRTC for its Realtime API! Even better, webrtcHacks friend and Pion founder Sean DuBois helped to develop it and agreed to a Q&A about the implementation. It is not often a massive WebRTC use case like this emerges so rapidly. In addition, Sean was extremely transparent about his work at OpenAI. In […]

The post OpenAI & WebRTC Q&A with Sean DuBois appeared first on webrtcHacks.

News from Industry

WebRTC vs. MoQ by Use Case

Media Over QUIC (MOQ): How it will redefine realtime media and streaming

Cloudflare video services. Why now and what’s next

Is everyone switching to MoQ from WebRTC?

How OpenAI does WebRTC in the new gpt-realtime

Understand WebRTC metrics with rtcStats

How to impress WebRTC recruiters

WebRTC and Android are the great interoperability enablers of our time

How your WebRTC optimizations are costing you money while killing your business

WebRTC Courses going through a transition (and a price change)

Answering ChatGPT questions about WebRTC

Choosing the best WebRTC signaling protocol for your application

WebRTC is about reducing friction and barriers of entry

Using LTE modems under Debian

How WebRTC’s NetEQ Jitter Buffer Provides Smooth Audio

8 ways to optimize WebRTC performance

A good WebRTC application is like a great orchestra performance

The future of Video APIs is… AI: LiveKit, Daily and Cloudflare this month

What’s Your SaaS for WebRTC Signaling?

OpenAI & WebRTC Q&A with Sean DuBois

Pages

Using the greatness of Parallax

Yet more available pages

Responsive grid

Typography

About

WITH A RICH FOOTER

Recent comments

Main menu

News from Industry

Pages

Using the greatness of Parallax

Yet more available pages

Responsive grid

Typography

Main menu

User login