Voice over IP (VoIP)

Sending telephone calls over the Internet (Voice Over IP (VoIP)) has been growing rapidly, and it has been predicted that soon most Americans will use VoIP in place of the traditional telephone system. The main incentive for people to switch to VoIP from a traditional phone is cost. VoIP is less expensive, particularly for people who make lots of long distance calls.

This can only be implemented if there is a high speed Internet (broadband) connection, but most homes have such a connection now.

VoIP typically provides many special features like call forwarding, conference calls, caller id etc because they are easy to implement in software.

Issues:

VoIP requires the following components: There are a number of different and incompatible protocols for these.

Connecting a phone to the Internet

There are actually three ways to do this. The most common way is with an analog telephone adapter (ATA). This is a device between an ordinary analog telephone and a computer. It converts the analog signal to a digital signal. The device that does this is a CODEC (coder/decoder), which samples the analog signal 8000 times per second, and converts the value to an 8 bit value (64kb/sec).

An alternative is an IP phone, a special digital phone that can connect directly to ethernet.

The third option is software within your computer, which uses a microphone and the computer speakers.

VoIP gateways

Your computer sends signals to your provider. Your provider is running a gateway, which, among other things, has a Soft Switch. A soft switch is a database mapping protocol that converts IP addresses to phone numbers and vice versa. So it has to know the locations of IP addresses. If it does not know the address itself, it hands the request off to another switch.

The overall process of setting up a connection in the traditional phone system is call signalling.

There are several different protocols for performing the signalling function, i.e. the interface between the Internet and the phone system.

The Real Time Transport Protocol (RTP)

Real time media such as voice or video, etc. needs to be delivered much more reliably than ordinary Internet traffic. If there are even small delays in displaying data, this is annoying to the listener/viewer. The term for this is jitter. To prevent this, the program which receives the packets buffers them so that it can run the frames or voice packets at exactly the correct speed. Naturally this introduces some delay, which is acceptable for video, but not for VoIP.

Note that if packets are delayed enough so that they arrive after their scheduled time, they are simply discarded; it is pointless to retransmit missing packets.

There is a special transport layer protocol for real time applications; Realtime Transport Protocol (RTP). RTP is not really a transport layer, it runs over UDP. Also, RTP does not actually ensure on-time delivery of packets or even that packets will not be lost. That is up to the receiving end point software. However, it does supply two pieces of information which can be helpful to the receiver, all packets have a sequence number, and they all have a time stamp.

RTP is used for many different media, and the granularity of the time stamp differs depending on the payload type.

Here is the RTP header.

  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
  Octet 1,5,9 Octet 2,6,10 Octet 3,7,11 Octet 4,8,12
1 - 4 V=2 >P X CC M PT Sequence number
5 - 8 Timestamp
9 - 12 Synchronisation source (SSRC) number
Version
Identifies the version of RTP (currently 2).
Padding
A flag which indicates whether the packet has been appended with padding octets after the payload data.
X (Header extension)
Indicates whether an optional fixed length extension has been added to the RTP header.
CC (CSRC count)
Although not shown on this header diagram, the 12 octet header can optionally be expanded to include a list of up to contributing sources.   Contributing sources are added by mixers, and are only relevant for conferencing application where elements of the data payload have originated from different computers.   For point to point communications, CSRCs are not required.
M (Marker)
Alllows significant events such as frame boundaries to be marked in the packet stream.
PT (Payload type)
This field identifies the format of the RTP payload and determines its interpretation by the application
Sequence number
A unique reference number which increments by one for each RTP packet sent.  It allows the receiver to reconstruct the sender's packet sequence.
Timestamp
The time that this packet was transmitted.  This field allows the received to buffer and playout the data in a continuous stream.
Synchronisation source (SSRC) number
A randomly chosen number which identifies the source of the data stream.

The RTP header is inserted after the UDP header and before the actual payload.

Applications which run RTP also have to run the Real Time Control Protocol (RTCP) . This allows the two end points to provide out-of-band data to each other. This protocol supports various types of messages. For example, the sender periodically sends a sender report which provides an absolute timestamp periodically to allow the receiver or receivers to resynchronize. The receiver periodically sends a receiver report to report on how well it is receiving the signal. Other messages are involved in initiating or terminating streams.

It would be nice if the Internet provided some sort of Quality of Service (QoS), but at the moment it does not. QoS is only possible if routers are able to reserve bandwidth for a particular stream. IPv6 is much better designed to allow this than IPv4.

Protocols have been developed to reserve bandwidth and to guarantee quality of service, and they can be useful on Intranets, but in order for them to be useful on the overall Internet, it will be necessary to reengineer the whole thing.