How Discord Handles Two and Half Million Concurrent Voice Users Using WebRTC

Project description

Content

Updating Configurations or Audio Files
Add the Bot to Servers

I have at least 12 servers for WoW alone. I also have a fair amount of different IRL/internet friend groups who all have their own channels. Encouraging integration of their social features into games will allow them to take a healthy bite out of Steam’s pie. I say "potentially" flawed because these assumptions easily hold true for small enough providers and little enough bandwidth.

I've dealt with video compression and live syncing time stampings, and I can say from experience that, this is no easy feature. I understand this is audio streams , but still the persistent voice server needs to handle the incoming connections, web socket heartbeats , compression (high I/O), and deliver the streams . Each discord_voice_client connects to one voice channel and derives from a websocket client. Yes and no, there's no packets being passed over the WebRTC connection, but the server maintains a WebSocket connection for state changes.

The problem seems to be that long strings of words all get lumped into a single activation.
It's not hard to imagine that something like a UGC video site might significantly increase that spikiness ratio, if only because of the sheer quantity of data involved.
For ixgbevf, the ubiquitous commercial option, it's been in-tree for the Linux kernel for at least half a decade.
Sometimes it can be hard to know that your setup isn't working as you expected.

It depends on the activity of the server. But generally, you're only receiving message create events and updates to the server. But you aren't receiving things like member list updates, presence or typing events from the server until you focus it initially.

Eh we raided Mythics with 25 where most people had voice activation. IMO it would be significantly more user-hostile to disallow users choosing whether to use PTT unless the benevolent server owner allows it. Like the fact you're a goddamn mouth breather. Fuck I refuse to join most discords because most the time I have to hear people mouth breath into their mics. And then they are not exactly strangers anymore.

Updating Configurations or Audio Files

You just have to build your .so and .jar files. So, in this case, bandwidth is cheap, let's use some, in an effort to simplify the SFU, and also, make it more CPU efficient. Default audio stream is 64kbps (or 8 KB/sec), per speaking user. Unfortunately you can't use these APIs to create third party clients because that violates their ToS, they are meant only for writing bots. If they catch you using these APIs with your full account credentials they'll ban your account. Forgetting that "the cloud is just somebody else's servers" also led to the delusion that one doesn't have to "worry" about hardware failures in the cloud.

So I'm gathering that discord's voice servers receive multiple persistent connections, then compress the audio streams for delivery to each end user. THIS part is where I can't imagine the on-the-fly cpu usage. Each client's receiving compression needs to also negate their own audio to prevent an echo effect , but it also means separate compression streams per user. I imagine this helps significantly with I/O in converting live streams into 1 stream per end user.

Add the Bot to Servers

I would think that that SHOULD make it cheaper for clients, but the opposite seems to be the case. If so, I recommend looking into testing this with SR-IOV based NICs and passing through a VF to the guest. Even in regular operation the latency difference between bare metal and an ixgbevf virtualized NIC all but disappear into levels well below anything that would be meaningful for voice communication. Additionally, you can buy bandwidth for much cheaper from dedicated hosting providers as opposed to cloud providers.

There's two, one to the guild service, which handles assignment of the voice service. Want to count the number of clients connected to a voice server? Would rather waste bandwidth than CPU cycles in this case.

Member Function Documentation

Wouldn't it be cheaper for cloud providers though? They are buying more bandwidth so they can get it cheaper. Also, they are taking amplitude discord bot advantage of the fact that clients have unused bandwidth so they can overprovision and get cost savings that way as well.

Please do, it would be an actual act of charity! I've left a lot of interest i 8 servers due to the voice chat being clogged. I've been in a couple channels with several hundred people. The admins handled it by shouting until everyone was quiet, enforcing one person speaking at a time, and swiftly muting, moving, or banning anyone who disobeyed.

Personal tools

I mean if everyone already has a way to contact their friends and say "hey let's play this game together", sharing that link does away with the need for a coordination server. But there's still the need for at least a server, even if it's just one of the clients that knows how to take charge . Zero-config peer to peer is basically impossible without central servers or some large populated list of long-term seed nodes which are basically central servers.

Be sure to read the instructions, however. Iirc there are some free servers you can use provided by Google and/or twilio. There are probably some problems with this idea, but why not just use a popular social network or similar as the lobby? For example, if there is a twitter feed "@MyWebRTCLobby" that everyone follows, clients could just @ a tweet at it with current contact details. I'm not actually using WebRTC yet, but I hope to for a project relatively soon and the possibility of not requiring a STUN/ICE server is very enticing to me. Sadly I never finished that project, so I don't really have any code to show you, but in theory it should work okay.

Ultimately, though, especially in this case, it seems like virtualization is a solution looking for a problem. With potentially substantial engineering effort, including needing to hire someone with a relatively rare expertise, they could eliminate that specific downside of virtualization. You custom implemented many components of webrtc I barely got part of the pats down in my project. Should I quit the discords that I don't use?

Kudos on silence detection to save overhead. I remember in the past the Discord devs have moved particularly overloaded servers to their own dedicated hardware to improve performance. What they added was voice chat to static groups, whereas before you had to invite everyone to a custom group chat every time. Definitely a QoL improvement, but not exactly "new". SR-IOV is available on basically any and all server grade NICs, and is quite simple to use. With Azure and AWS it's basically just making sure you have the proper driver installed and flipping a command switch.

Because the two peers need to know how to coordinate how to connect to each other before actually connecting. It can be a websocket server, REST API, email, QR codes, carrier pigeon, etc. I think their UX is awful, and their client is very slow in channels with thousands of users.

Would take way too much CPU time to mux audio streams together server-side, and then recompress. (Means we have to buffer data for each sender, deal with silence, deal with retransmits and packet drops, have a jitter buffer, etc...). No way we'd be able to hit the # of clients we want per core with that overhead. Our SFU's are intentionally very dumb for this reason.

Moreover, it's a large quantity of data transfer per user, so even modest user growth would result in huge network use growth. As a sibling comment pointed out a cloud provider "may not really want that type of client". We host our voice server on dedicated hardware, not VPS.

NLP programming