Note that this approach to padding removes any need for a length field in the RTP header thus serving the goal of keeping the header short ; in the common case of no padding, the length is deduced from the lower-layer protocol.
Padding of an RTP packet. The extension X bit is used to indicate the presence of an extension header, which would be defined for a specific application and follow the main header. Such headers are rarely used, since it is generally possible to define a payload-specific header as part of the payload format definition for a particular application. The X bit is followed by a 4-bit field that counts the number of contributing sources , if any are included in the header.
Contributing sources are discussed below. We noted above the frequent need for some sort of frame indication; this is provided by the marker bit, which has a profile-specific use. For a voice application, it could be set at the beginning of a talkspurt, for example. The 7-bit payload type field follows; it indicates what type of multimedia data is carried in this packet.
One possible use of this field would be to enable an application to switch from one coding scheme to another based on information about resource availability in the network or feedback on application quality. The exact usage of the payload type is also determined by the application profile.
Note that the payload type is generally not used as a demultiplexing key to direct data to different applications or to different streams within a single application, such as the audio and video stream for a videoconference. This is because such demultiplexing is typically provided at a lower layer e.
The sequence number is used to enable the receiver of an RTP stream to detect missing and misordered packets. The sender simply increments the value by one for each transmitted packet.
Note that RTP does not do anything when it detects a lost packet, in contrast to TCP, which both corrects for the loss by retransmission and interprets the loss as a congestion indication which may cause it to reduce its window size. Rather, it is left to the application to decide what to do when a packet is lost because this decision is likely to be highly application dependent. For example, a video application might decide that the best thing to do when a packet is lost is to replay the last frame that was correctly received.
Some applications might also decide to modify their coding algorithms to reduce bandwidth needs in response to loss, but this is not a function of RTP.
It would not be sensible for RTP to decide that the sending rate should be reduced, as this might make the application useless. The function of the timestamp field is to enable the receiver to play back samples at the appropriate intervals and to enable different media streams to be synchronized. Because different applications may require different granularities of timing, RTP itself does not specify the units in which time is measured.
The clock granularity is one of the details that is specified in the RTP profile or payload format for an application. The timestamp value in the packet is a number representing the time at which the first sample in the packet was generated. The timestamp is not a reflection of the time of day; only the differences between timestamps are relevant. Note that fewer than 80 samples might have been sent due to compression techniques such as silence detection, and yet the timestamp allows the receiver to play back the samples with the correct temporal relationship.
In a given multimedia conference, each sender picks a random SSRC and is expected to resolve conflicts in the unlikely event that two sources pick the same value. By making the source identifier something other than the network or transport address of the source, RTP ensures independence from the lower-layer protocol. It also enables a single node with multiple sources e. When a single node generates different media streams e. A mixer can be used to reduce the bandwidth requirements for a conference by receiving data from many sources and sending it as a single stream.
For example, the audio streams from several concurrent speakers could be decoded and recoded as a single audio stream. In this case, the mixer lists itself as the synchronization source but also lists the contributing sources—the SSRC values of the speakers who contributed to the packet in question.
RTCP provides a control stream that is associated with a data stream for a multimedia application. This control stream provides three main functions:. The first function may be useful for detecting and responding to congestion. Some applications are able to operate at different rates and may use performance data to decide to use a more aggressive compression scheme to reduce congestion, for example, or to send a higher-quality stream when there is little congestion.
Performance feedback can also be useful in diagnosing network problems. As already noted, multiple cameras from a single node might have different SSRC values. Furthermore, there is no requirement that an audio and video stream from the same node use the same SSRC. Simply correlating two streams is only part of the problem of intermedia synchronization. However like the padding value, this is likely to be 0.
RTP allows you to have multiple Contributing Sources. If the marker bit is set or not is actually up to the underlying protocol.
The payload type is what specifies the contents of the payload. The sequence number is a supposedly random number that increments by 1 for each packet sent. The sequence numbers are supposed to be random. By having this as a random number it adds an extra unknown part of the packet for someone trying to break any crypto on top to guess. Polycom however just start all theirs at 0. Slow clap. The BackTrack distributions and Wildpackets Omnipeek also have the ability to collect packets and play them back.
How does an attacker get access to the RTP stream? The proliferation of wireless networks also leads to the proliferation of wireless endpoints, such as phones. Attacking a wireless network is straightforward: capture the traffic. The same tools that provide a player also have the ability to capture wireless frames. But even without access to the wireless network, or if the wireless network is encrypted, an attacker can sometimes gain access to the RTP streams by attacking infrastructure devices.
Two popular methods are overflowing the source address table on a switch and spoofing a trunk port on a switch. With source address table flooding also known as MAC address table flooding , the switch memory is constantly filled with MAC addresses such that valid addresses cannot be added to the table. Traffic destined for these valid MAC addresses must be flooded out of all ports. Spoofing a trunk port is an attack in which the target switch is fooled into believing that a trunk line is connected.
Traffic destined for unknown MAC addresses is flooded down trunk ports like broadcast traffic. The attacker can also send traffic to specific destinations by tagging traffic and VLAN hopping. Attacks against hosts can trick them into sending traffic to the attacker or allowing the attacker to act as a man in the middle.
In the face of these challenges, RTP streams must be encrypted in order to protect their privacy. Thus, it modifies RTP slightly to suit its purposes. Figure depicts the packet structure. We can also see that while the entire packet is authenticated, only a part is encrypted. STP fields include:. This is an optional field that can be used to provide information about which master key is to be used.
RFC has predefined keys and algorithms though others are supported for use in a secure deployment. There are two keys used: a master and session key. Endpoints and the call server use the master key to derive the session. The session keys are those actually used to encrypt the voice or video data. The RFC does not specify key distribution. This is often handled by the signaling protocol. But endpoints also keep track of rollover counters which count the reuse of sequence numbers , replay lists, and any salt keys.
Salting adds extra material to the session generation process in order to make the session key more difficult to derive externally. Again, we can see the shared structure of the packet. The entire message is authenticated, but only the data about the stream is encrypted. Even for those that do, it is often the case that in a mature environment, endpoint devices have varying levels of support for features or encryption requirements.
Lastly, some of these items differ between vendors. If you want to encrypt media transmissions in real-time streams, you should thoroughly examine the planned deployment with an eye toward the SRTP profile. An argument can be made that encryption is not necessary because most VoIP endpoints are wired and internal. This is necessarily a local decision, but the presence of wireless networks, hosted solutions, guest access, telecommuters, or other situations in which the RTP streams may be exposed argue for a close examination of the network specifications.
VoIP signaling protocols handle such items as registration, address signaling, establishing logical channels, and call termination. However, they do not transport voice or video real-time data. RTP provides encapsulation for this data, sequencing, time-stamps, and identification for all of the packets that are part of the real-time stream.
RTCP carries data about information such as timing and packet count between the senders and receivers. Both of these protocols are described in RFC Profiles allow media streams to provide additional fields to the RTP header that may contain flow-specific parameters.
This RFC provides for the encryption of the real-time data and authentication of the messages. True or false: most communication systems use RTP when transporting voice and video data. This chapter is supported by the book website. So, if the activity lists equipment or software that you do not have, go to the book website for additional content. This can be done via a topology with a call manager at its center or via point-to-point connections using VoIP soft clients.
For example, this book typically uses topologies with a call server but occasionally uses captures done with just a pair of Polycom soft clients Figure Once the topology is built, start a capture on either the soft-client endpoints or the monitor stations watching the VoIP phones. Ensure that the capture obtains the packets necessary for the next couple of activities. How do the sequence numbers advance? Are there contributing sources?
How do the timestamps advance? Do any of the packets have markers? Open each of these packets and identify the fields in each. What are the packets trying to tell you? Skip to main content. Start your free trial.
Chapter 4. Protocol Description. Basic Operation. Protocol Structure. Header first octet. Version V This is a 2-bit field indicating the protocol variant. Header second octet. Marker M The simple definition of this single-bit field is that the marker allows important event such as a frame boundary to be marked.
RFC provides the following guidance: For applications which send either no packets or occasional comfort-noise packets during silence, the first packet of a talkspurt, that is, the first packet after a silence period during which packets have not been transmitted contiguously, SHOULD be distinguished by setting the marker bit in the RTP data header to one.
Packet fields beyond the first two octets. Sequence numbers This 2-byte field contains the number referencing a particular packet and can help in detecting lost packets and placing the packets in the correct order. Note When audio and video are coming from the same node, different synchronization source identifiers are used to prevent confusion between the data formats. RTP extension header. RTP Control Protocol.
Detailed Operation. SRTP Operation. While RTP carries the media streams e. RTP is originated and received on even port numbers and the associated RTCP communication uses the next higher odd port number. As its name implies, the design goal for RTP is the end-to-end streaming in real-time of media-related data. RTP includes mechanisms for jitter compensation, packet loss detection, as well as out-of-order data packet delivery, issues that are especially common in UDP User Datagram Protocol transmissions over IP.
As RTP enables data transfer to multiple destination end-points in parallel via IP multicast , it is the primary standard eployed for audio and video IP network transfers.
The mechanisms for the associated profile and payload format, referenced in the design of the RTP architecture , are implemented on the level of the application layer, instead of the operating system layer.
Applications such as VoIP that need to employ real-time streaming of multimedia data, typically require the timely delivery of data, with varying tolerance in packet loss.
As an example, audio packet loss in a VoIP application can cause losing some milliseconds of audio data.
0コメント