WebRTC Signaling Concepts
HOME © Muaz Khan . @WebRTCWeb . Github . Latest issues . What's New?
This tutorial is out-dated (written in 2013). Please check this tutorial instead: https://codelabs.developers.google.com/codelabs/webrtc-web/#0
The term "signaling" in WebRTC...
Signaling is a process used in WebRTC to detect peers; exchange session descriptions to setup media ports; and helps share everything used for initial handshake.
Signaling is used to coordinate communication and send control messages.
Signaling is used to exchange session control messages known as SDP; network configurations as ICE candidatess; and media capabilities using same session control messages.
Signaling can be done either using copy/paste mechanism; or using a gateway like WebSocket, Socket.io, XMPP or most famous one i.e. session initiation protocol (SIP).
You can use the easiest mechanism as well i.e. POST/GET data from a database using XHR.
WebRTC signaling process is based on new standard; JSEP: JavaScript Session Establishment Protocol. JSEP is a collection of interfaces for signaling identification; e.g. to identify negotiation of local and remote addresses.
Out of JSEP; signaling processs is not left entirely to the application developer. We need to follow the order of the code!
How the term "signaling" is used in WebRTC Experiments?
Approximately all WebRTC experiments rely on channels. "Channel" is a term used in realtime protocols like WebSocket to make sure data is transmitted privately over (100%) relevant clients.
Channels are created dynamically for each peer; to make sure SDP/ICE is exchanged among relevant users.
Just think about "channels" as a "unique-private-room"; the text messages transmitted over that room is accessible inside the room only. Out siders will NEVER ever be able to access those messages.
Channels concept is same like "namespaces" concept in socket.io. Namespaces allows you broadcast data over single namespace instead of transmitting publicly.
Assume that there are two users; UserA and UserB. WebRTC Experiments will setup a new namespace or channel or room; and use it to exchange SDP/ICE/etc. between them. UserA's offer will be shared with UserB using same signaling room; and vice versa.
In simplest words; you can consider it "signaling room"; a dynamically opened room used to exchange data between two users. Remember, "Signaling rooms" are unique. So, one can't guess/reach others' signaling rooms, until we manually share room-unique-identifier with him.
"Room unique identifier" is a randomly generated number which is used as access key for the room.
To share UserA's offer over the room; room-id is used to send/transmit data. UserB i.e. "subscriber" will use same room-id to receive that transmitted data.
What about WebSockets which doesn't have concept like "channels"?
You just need to use a trick on the server side. Look at this structure:
var global_variable_Channels = {
    'channel-1': [websocket1, websocket2],
    'channel-2': [websocket3, websocket4],
    'channel-3': [websocket5, websocket6]
};
		
            On the server side; capture "transmitted messages" like this:
var global_variable_Channels = {};
websocket.onmessage = function(e) {
    var data = e.data, room;
    // if someone asked to open/join a room
    if (data.openRoom) {
        room = global_variable_Channels[data.roomid];
        if (room)
            room.push(websocket);
        else
            room = [websocket];
    } else {
        // otherwise transmit data over relevant websockets
        var message = data.message;
        
        // capture relevant signaling room
        room = global_variable_Channels[data.roomid];
        
        if (room == null) throw 'No such signaling-room exists.';
        
        // iterate over all websockets using same room-id
        for (var i = 0; i < room.length; i++) {
            var websocket = room[i];
            
            // transmit data over those websockets
            websocket.send(message);
        }
    }
};
		
            On the client side; open or join a room like this:
websocket.send({
    openRoom: true,
    roomid: Math.random() * 1000
});
		
            On the client side; you can transmit data like this:
websocket.send({
    message: 'SDP-or-ICE-etc.',
    roomid: 'room-id'
});
            What about XHR i.e. POST/GET? An Example
Create a table and name it "RoomTable"; add following columns:
- Room-id
- Owner-id
- Participants-id
- Message
Create another table and name it "UserTable"; add following columns:
- User-id
Now, if someone initiates WebRTC session; you should make an XHR request to create a record in the room-table; and set "Owner-id" equals to that user's "user-id".
Now, if someone else joins the room; you can update above record; and append his "user-id" in the "Participants-id" column.
What if I don't want to use socket.io "namespaces"?
Follow process explained for "WebSocket", above.
What if I want to use SIP?
You don't need to worry about "channels" concept; because you already have "sip-uri". SIP unique identifiers can be used to publish/receive messages privately between two users.
Can you compare "channels" with other signaling methods?
Channels provide us easiest mechanism to transmit data privately over relevant users. It uses less memory comparing other solutions; and this mechanism is capable to setup unlimited channels without any interrupt.
Non-channels based signaling solutions usually stores rooms in an array; same like we did earlier for WebSocket; where you can face memory dump or stack-over-flow issues.
Non-channels based signaling is a little bit slower because we need to iterate over the array to find relevant sockets to transmit data.
Some "novice" users try to handle all things in the client side; which is NEVER recommended. Because such users transmit all messages publicly, this is easily vulnerable and there are huge chance of failures out of buffer-size and other issues.
Remember; NEVER send unneeded messages to a user. Be specific; and keep privacy and make things reliable.
What about server-less signaling?
You can use copy/paste mechanism to share one user's offer with other; and vice versa.
Remember, such things causes delay in setting remote session descriptions for answerer; which may cause failure to setup ports.
This mechanism also needs a gateway like Email or IMS to share copied text with other user; who will paste and create answer and so on.
What about LAN or intranet?
You always need a signaling gateway; whether it installed publicly or privately. A gateway can be a copy/paste mechanism or a realtime protocol.
It doesn't matter whether you use custom protocols or topologies for signaling; everything is up to you!
Just make it done!
What is the difference between ICE server and Signaling server?
Signaling is used to detect peers; and exchange prerequisites to setup media connections.
ICE which is stands for interactive connectivity establishment is a protocol used to capture public IP addresses of the user. It let us know:
- Public IP addresses of the user
- It is ipv4 or ipv6
- UDP is blocked or not; otherwise fallback to TCP; otherwise fallback to custom protocol.
Some firewalls allow only ports like 443 and 80.
So, ICE connectivity establishment is a process used to capture ports and IP addresses and returns "data" back to the browser.
WebRTC media engines use those ports and public IP addresses to open ports and stream RTP/RTCP packets.
RTP contains media packets; and RTCP contains info used to control RTP packets. Both run simultaneously over two unique ports.
Differentiating signaling and ICE:
- Signaling a process is used to detect peers presence; exchange offer/answer between users; as well as exchange ICE candidates.
- ICE is a process used to capture public IP addresses and ports of the current user. It NEVER exchanges signaling-data between two users.
- There are some ICE servers like TURN that acts as a media gateway in case when Firewall hide public IP addresses of the NAT. RTP/RTCP packets flows from browser to TURN server to other browser.
- According to third option; TURN can act as media packets exchanger.
A media packet is an RTP packet contains audio/video/data blobs. Media packets are shared over UDP ports; however signaling packets are shared over HTTP and/or TCP protocols.
Remember, media packets i.e. RTP-packets can be shared over TCP protocol as well.