- Multimedia network applications are network applications that employs audio or video.
9.1.1 – Properties of Video
- The most salient characteristics of videos is its high bit rate.
- 100 kbps – 3 Mbps
- Thus video streaming consumes far more bandwidth than f.ex. viewing images or streaming music. It can use more than 10 times as much bandwidth compared to the others.
- Cisco predicts that streaming and stored video will be approximately 80% of global consumer Internet traffic by 2019.
- Another characteristic of video is that it can be compressed, there by trading off video quality with bit rate.
- There are two types of redundancy in videos, both of which can be exploited by video compression.
- Spatial redundancy
- The redundancy within a given image.
- An image that consists of mostly white space has a high degree of redundancy and can be efficiently compressed without significantly sacrificing image quality.
- Temporal redundancy
- Reflects repetition form image to subsequent image.
- If an image and the subsequent image are exactly the same, there is no reason to re-encode the subsequent image; it is instead more efficient simply to indicate during encoding that the subsequent image is exactly the same.
- We can use compression to create multiple versions of the same video, each at a different quality level.
- F.ex. 3 different versions, 300 Kbps, 1 Mbps, and 3 Mbps.
9.1.2 – Properties of Audio
- Let’s consider how analog audio is converted to a digital signal
- The analog audio signal is sampled at some fixed rate. The value of each sample will be some real number.
- Each of the samples is rounded to one of a finite number of values. This is referred to as quantization.
- The number of finite values, called quantization values, is typically a power of two.
- Each of the quantization values is represented by a fixed number of bits.
- If there are 256 quantization values then each value is represented by one byte.
- The bit representation of all the samples are then concatenated together to form the digital representation of the signal.
- F.ex. an analog audio signal is sampled at 8000 samples per second and each sample is quantized and represented by 8 bits, then the resulting digital signal will have a rate of 64000 bits per second.
- By increasing the sample rate and the number of quantization values, the decoded signal can better approximate the original signal.
- The basic encoding technique that we just described above is called pulse code modulation (PCM).
- Speech encoding often use PCM with sampling rate of 8000 samples per second and 8 bits per sample resulting in 64 Kbps.
- CD has a sample rate of 44100 samples per second with 16 bits per sample resulting in 705.6 Kbps for mono and 1.411 Mbps for stereo.
- Human speech can be compressed to less than 10 Kbps and still be inteligible.
- A popular compression technique for near CD-quality stereo music is MPEG 1 layer 3 (MP3).
- It can compress to many different rates; 128 Kbps is the most common encoding rate and produces very little sound degradation.
- Advanced Audio Coding (AAC) who got popularized by Apple is a similar standard.
- Streaming stored audio and video
- The underlying medium is prerecorded video, such as a movie, TV-show etc.
- The prerecorded videos are placed on servers, and users send requests to the servers to view the videos on demand.
- Streaming stored video has 3 key distinguishing features:
- Streaming
- Begins video playout within a few seconds after it beings receiving the video form the server which means the client will be playing out from one location in the video while at the same time receiving later parts.
- Avoids having to download the entire video file.
- Interactivity
- The user may pause, reposition forward, reposition backward, fast-forward, and so on through the video content.
- Continuous playout
- Once playout of the video beings, it should proceed according to the original timing of the recording. Therefor, data must be received from the server in time for its playout at the client; otherwise, users experience video frame freezing or frame skipping.
- The most important performance measure for streaming video is average throughput.
- The network must provide an average throughput that is at least as large as the bit rate of the video itself.
- Most video applications hos the prerecorded videos on a CDN rather than from a single data center.
- There also exists P2P video streaming applications for which the video is stored on user’s hosts, with different chunks of video arriving from different peers.
- Conversational Voice- and Voice-over-IP
- Real-time conversational voice over the Internet is often referred to as Internet telephony. It is also called Voice-over-IP (VoIP)
- Time considerations and tolerance of data loss is particularly important for conversational voice and video applications.
- Timing considerations are important because audio and video conversation applications are highly delay-sensitive.
- For a conversation with 2+ interacting speakers, the delay from when a user speaks or moves until the action is manifested at the other end should be less than a few hundred milliseconds.
- For voice, delays smaller than 150 milliseconds are not perceived by a human listener, delays between 150-400 can be acceptable, and delays over 400 can result in frustration if not completely unintelligible voice conversations.
- Conversational multimedia applications are loss-tolerant.
- Occasional loss only causes occasional glitches in audio/video playback, and these losses can often be partially or fully concealed.
- Streaming live audio and video
- Live, broadcast-like applications often have many users who receive the same audio/video program at the same time. This is typically done with CDNs.
- The network must provide each live multimedia flow with an average throughput that is larger than the video consumption rate and because the event is live, the delay can also be an issue, although the timing constrains are much less stringent than those for conversational voice.
- Delays of up to ten seconds or so from when the user chooses to view a live transmission to when playout beings can be tolerated.