Posts Tagged ‘bitrate

Content Adaptive Stereo Video Coding

There are different theories about the effects of unequal bit allocation between left and right video sequences, such as the fusion theory and suppression theory. According to fusion theory, the stereo bitrate (hence distortion) needs to be equally allocated between the views for the best human perception. Contrarily, according to suppression theory, the highest quality view in a stereo-video determines the overall perception performance. Therefore, the target (right) sequence can be compressed as much as possible to save bits for the reference (left) sequence, so that the overall perceived distortion is the lowest. The proposed content-adaptive stereo encoder (CA-SC) is motivated by the suppression theory and reduces the frame (temporal) rate and spatial resolution of the target (right) sequence adaptively according to its content-based features.

Figure 1.0: Stereoscopic encoder

The principle behind content adaptive video coding is to parse video into temporal segments. Each temporal segment can be encoded at different spatial, temporal and SNR resolution (hence at a different target bitrate) depending on its low and/or high-level content-based features. Even though this approach has been used for monoscopic video encoding, there are no such studies in the literature for content-adaptive stereoscopic coding. The proposed CA-SC codec is an extension of the stereo codec (SC), which is based on AVC/H.264. We note that CA-SC can also be developed as an extension of the recently standardized MVC codec. The codec structure is shown in Figure 1.0.

In stereoscopic coding, in the compatible mode, any standard H.264/AVC decoder can decode the sequence as a monoscopic sequence since left channel is coded independent of the right channel. In order to improve the coding efficiency without significant perceptual quality loss, we added three modes to the encoder for down-sampling the right-view only. They are the spatial, temporal, and content-adaptive scaling modes.

Tags : , , , , , , , , , , ,

Video Content Analysis and Adaptive Rate Allocation

In classical rate control, all GoPs are treated equally, and frame-level bit allocation is basedon the frame type and a complexity measure, sometimes with multiple passes over the video, but without considering the semantics of picture content. In the H.264/AVC reference encoder, the GoP borders are determined according to a predefined pattern of frames, and the same target bit rate is used for each GoP given the available channel rate. As a result,the video quality varies from GoP to GoP depending on the video content. The problem with this approach is that in some applications, e.g., wireless video, the total bit budget is not sufficient to encode the entire content at an acceptable quality. Video segments with high motion and/or small details may become unacceptable when all GoPs are encoded at the same low rate. Inter-GOP rate control schemes, that is variation of the target bitrate from GoP-to-GoP, have been proposed to offer uniform video quality over the entire video. For example, an optimal solution for the buffer constrained adaptive quantization problem is formulated.The rate-distortion characteristics of the encoded video are used to find the frame rate and quantization parameters that provide the minimum distortion under rate constraints. The minimization operation is done in an iterative manner so thatthe measured distortion is smaller than the previous iteration at each step. However, these methods do not consider the semantics of the video content either in GoP definition or in GoP target bitrate allocation.

As the available computing power at the encoders increases, so does the level of sophisti-cation of the encoders and their associated control techniques. By using appropriate content analysis, it is now possible to define GoPs according to shot boundaries, and allocate target bit rates to each GoP based on the shot type considering the “relevance” or “semantics” of each type of shot. Such a rate control scheme will be called “content-adaptive rate control.” In content-adaptive rate control, video will be encoded according to a pre-specified or user defined relevance-distortion policy. In effect, we accept a priori that some losses are goingto occur due to the high compression ratios needed, and we force these losses to occur in less relevant parts of the video content. We note that “relevance of the content” is highly context (domain) dependent. For example, in the context of a soccer game, the temporal video segments showing a goal event and the spatial segments around the ball are definitely more important than any other part of the video. There are a variety of other domains,such as other sports videos and broadcast news, where the relevance of the content can easily be classified. In content adaptive video coding, temporal segmentation policy used has a major effect on the overall efficiency and rate distribution among temporal segments. There exist techniques for automatically locating such content. A summarization of the available multimedia access technologies that support Universal Multimedia Access (UMA) is presented. Segmentation and summarization of audio-video content are discussed in detail and the transcoding techniques for such content are demon-strated.

Content adaptive rate allocation ideas have been introduced in the literature before.The input video is segmented and encoded as two streams for different relevance levels with “predetermined bit rates,” namely, the high target bitrate (highly relevant) and the low target bitrate (less relevant) streams. The less relevant shots are then encoded such that they are shown as still images at the receiving side and the more important shots are encoded at full quality. In this pioneering work, the decision to restrict the number ofthe relevance levels to two and the determination of the relative bit allocations are donein an ad-hoc manner. Quality of Service (QoS) is required for continuous playback to be guaranteed and low and high rates are determined by the client buffer size and the channel bandwidth. The server buffer size required is set afterwards, which effectively determines the pre-roll delay.

There are also techniques that divide the input video into segments by considering vari-ous statistics along these segments that affect the ease of coding without taking into accountany relevance issues. For example, MPEG-7 metadata are used for video transcoding for home networks. Concepts like “difficulty hints” and “motion hints” are described. Difficulty hints are a kind of metadata that denotes the encoding difficulty of the given content.The motion hints describe the motion un-compensability metadata, which contains infor-mation about the GOP structure, frame rate and bitrate control and also the search range metadata that reduces the complexity of the transcoding process. In this work, boundaries of the temporal segments of the content are determined by the points where the motionun-compensability metadata makes a peak and then the video is transcoded using the difficulty hints. Here, GOP size is varied according to the motion un-compensability metadata. A hybrid scaling algorithm using a quality metric based on the features of the human visual system is introduced in, which tries to make full utilization of the communication channel by scaling video in either temporal or spatial dimensions. In this work, frame rate ofthe encoded video is reduced at scenes where motion jitter is low (high temporal resolution)and all the frames are kept for scenes with high motion at the expense of reduced spatial resolution.

Tags : , , , , , , , , , , , , , , , , , , , , ,