There are different theories about the effects of unequal bit allocation between left and right video sequences, such as the fusion theory and suppression theory. According to fusion theory, the stereo bitrate (hence distortion) needs to be equally allocated between the views for the best human perception. Contrarily, according to suppression theory, the highest quality view in a stereo-video determines the overall perception performance. Therefore, the target (right) sequence can be compressed as much as possible to save bits for the reference (left) sequence, so that the overall perceived distortion is the lowest. The proposed content-adaptive stereo encoder (CA-SC) is motivated by the suppression theory and reduces the frame (temporal) rate and spatial resolution of the target (right) sequence adaptively according to its content-based features.
Figure 1.0: Stereoscopic encoder
The principle behind content adaptive video coding is to parse video into temporal segments. Each temporal segment can be encoded at different spatial, temporal and SNR resolution (hence at a different target bitrate) depending on its low and/or high-level content-based features. Even though this approach has been used for monoscopic video encoding, there are no such studies in the literature for content-adaptive stereoscopic coding. The proposed CA-SC codec is an extension of the stereo codec (SC), which is based on AVC/H.264. We note that CA-SC can also be developed as an extension of the recently standardized MVC codec. The codec structure is shown in Figure 1.0.
In stereoscopic coding, in the compatible mode, any standard H.264/AVC decoder can decode the sequence as a monoscopic sequence since left channel is coded independent of the right channel. In order to improve the coding efficiency without significant perceptual quality loss, we added three modes to the encoder for down-sampling the right-view only. They are the spatial, temporal, and content-adaptive scaling modes.