examples/SFExamples/oggvorbiscodec/src/libvorbis/doc/stereo.html

00001 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
00002 <html>
00003 <head>
00004 
00005 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
00006 <title>Ogg Vorbis Documentation</title>
00007 
00008 <style type="text/css">
00009 body {
00010   margin: 0 18px 0 18px;
00011   padding-bottom: 30px;
00012   font-family: Verdana, Arial, Helvetica, sans-serif;
00013   color: #333333;
00014   font-size: .8em;
00015 }
00016 
00017 a {
00018   color: #3366cc;
00019 }
00020 
00021 img {
00022   border: 0;
00023 }
00024 
00025 #xiphlogo {
00026   margin: 30px 0 16px 0;
00027 }
00028 
00029 #content p {
00030   line-height: 1.4;
00031 }
00032 
00033 h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a {
00034   font-weight: bold;
00035   color: #ff9900;
00036   margin: 1.3em 0 8px 0;
00037 }
00038 
00039 h1 {
00040   font-size: 1.3em;
00041 }
00042 
00043 h2 {
00044   font-size: 1.2em;
00045 }
00046 
00047 h3 {
00048   font-size: 1.1em;
00049 }
00050 
00051 li {
00052   line-height: 1.4;
00053 }
00054 
00055 #copyright {
00056   margin-top: 30px;
00057   line-height: 1.5em;
00058   text-align: center;
00059   font-size: .8em;
00060   color: #888888;
00061   clear: both;
00062 }
00063 </style>
00064 
00065 </head>
00066 
00067 <body>
00068 
00069 <div id="xiphlogo">
00070   <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
00071 </div>
00072 
00073 <h1>Ogg Vorbis stereo-specific channel coupling discussion</h1>
00074 
00075 <h2>Abstract</h2>
00076 
00077 <p>The Vorbis audio CODEC provides a channel coupling
00078 mechanisms designed to reduce effective bitrate by both eliminating
00079 interchannel redundancy and eliminating stereo image information
00080 labeled inaudible or undesirable according to spatial psychoacoustic
00081 models. This document describes both the mechanical coupling
00082 mechanisms available within the Vorbis specification, as well as the
00083 specific stereo coupling models used by the reference
00084 <tt>libvorbis</tt> codec provided by xiph.org.</p>
00085 
00086 <h2>Mechanisms</h2>
00087 
00088 <p>In encoder release beta 4 and earlier, Vorbis supported multiple
00089 channel encoding, but the channels were encoded entirely separately
00090 with no cross-analysis or redundancy elimination between channels.
00091 This multichannel strategy is very similar to the mp3's <em>dual
00092 stereo</em> mode and Vorbis uses the same name for its analogous
00093 uncoupled multichannel modes.</p>
00094 
00095 <p>However, the Vorbis spec provides for, and Vorbis release 1.0 rc1 and
00096 later implement a coupled channel strategy. Vorbis has two specific
00097 mechanisms that may be used alone or in conjunction to implement
00098 channel coupling. The first is <em>channel interleaving</em> via
00099 residue backend type 2, and the second is <em>square polar
00100 mapping</em>. These two general mechanisms are particularly well
00101 suited to coupling due to the structure of Vorbis encoding, as we'll
00102 explore below, and using both we can implement both totally
00103 <em>lossless stereo image coupling</em> [bit-for-bit decode-identical
00104 to uncoupled modes], as well as various lossy models that seek to
00105 eliminate inaudible or unimportant aspects of the stereo image in
00106 order to enhance bitrate. The exact coupling implementation is
00107 generalized to allow the encoder a great deal of flexibility in
00108 implementation of a stereo or surround model without requiring any
00109 significant complexity increase over the combinatorially simpler
00110 mid/side joint stereo of mp3 and other current audio codecs.</p>
00111 
00112 <p>A particular Vorbis bitstream may apply channel coupling directly to
00113 more than a pair of channels; polar mapping is hierarchical such that
00114 polar coupling may be extrapolated to an arbitrary number of channels
00115 and is not restricted to only stereo, quadraphonics, ambisonics or 5.1
00116 surround. However, the scope of this document restricts itself to the
00117 stereo coupling case.</p>
00118 
00119 <h3>Square Polar Mapping</h3>
00120 
00121 <h4>maximal correlation</h4>
00122  
00123 <p>Recall that the basic structure of a a Vorbis I stream first generates
00124 from input audio a spectral 'floor' function that serves as an
00125 MDCT-domain whitening filter. This floor is meant to represent the
00126 rough envelope of the frequency spectrum, using whatever metric the
00127 encoder cares to define. This floor is subtracted from the log
00128 frequency spectrum, effectively normalizing the spectrum by frequency.
00129 Each input channel is associated with a unique floor function.</p>
00130 
00131 <p>The basic idea behind any stereo coupling is that the left and right
00132 channels usually correlate. This correlation is even stronger if one
00133 first accounts for energy differences in any given frequency band
00134 across left and right; think for example of individual instruments
00135 mixed into different portions of the stereo image, or a stereo
00136 recording with a dominant feature not perfectly in the center. The
00137 floor functions, each specific to a channel, provide the perfect means
00138 of normalizing left and right energies across the spectrum to maximize
00139 correlation before coupling. This feature of the Vorbis format is not
00140 a convenient accident.</p>
00141 
00142 <p>Because we strive to maximally correlate the left and right channels
00143 and generally succeed in doing so, left and right residue is typically
00144 nearly identical. We could use channel interleaving (discussed below)
00145 alone to efficiently remove the redundancy between the left and right
00146 channels as a side effect of entropy encoding, but a polar
00147 representation gives benefits when left/right correlation is
00148 strong.</p>
00149 
00150 <h4>point and diffuse imaging</h4>
00151 
00152 <p>The first advantage of a polar representation is that it effectively
00153 separates the spatial audio information into a 'point image'
00154 (magnitude) at a given frequency and located somewhere in the sound
00155 field, and a 'diffuse image' (angle) that fills a large amount of
00156 space simultaneously. Even if we preserve only the magnitude (point)
00157 data, a detailed and carefully chosen floor function in each channel
00158 provides us with a free, fine-grained, frequency relative intensity
00159 stereo*. Angle information represents diffuse sound fields, such as
00160 reverberation that fills the entire space simultaneously.</p>
00161 
00162 <p>*<em>Because the Vorbis model supports a number of different possible
00163 stereo models and these models may be mixed, we do not use the term
00164 'intensity stereo' talking about Vorbis; instead we use the terms
00165 'point stereo', 'phase stereo' and subcategories of each.</em></p>
00166 
00167 <p>The majority of a stereo image is representable by polar magnitude
00168 alone, as strong sounds tend to be produced at near-point sources;
00169 even non-diffuse, fast, sharp echoes track very accurately using
00170 magnitude representation almost alone (for those experimenting with
00171 Vorbis tuning, this strategy works much better with the precise,
00172 piecewise control of floor 1; the continuous approximation of floor 0
00173 results in unstable imaging). Reverberation and diffuse sounds tend
00174 to contain less energy and be psychoacoustically dominated by the
00175 point sources embedded in them. Thus, we again tend to concentrate
00176 more represented energy into a predictably smaller number of numbers.
00177 Separating representation of point and diffuse imaging also allows us
00178 to model and manipulate point and diffuse qualities separately.</p>
00179 
00180 <h4>controlling bit leakage and symbol crosstalk</h4>
00181 
00182 <p>Because polar
00183 representation concentrates represented energy into fewer large
00184 values, we reduce bit 'leakage' during cascading (multistage VQ
00185 encoding) as a secondary benefit. A single large, monolithic VQ
00186 codebook is more efficient than a cascaded book due to entropy
00187 'crosstalk' among symbols between different stages of a multistage cascade.
00188 Polar representation is a way of further concentrating entropy into
00189 predictable locations so that codebook design can take steps to
00190 improve multistage codebook efficiency. It also allows us to cascade
00191 various elements of the stereo image independently.</p>
00192 
00193 <h4>eliminating trigonometry and rounding</h4>
00194 
00195 <p>Rounding and computational complexity are potential problems with a
00196 polar representation. As our encoding process involves quantization,
00197 mixing a polar representation and quantization makes it potentially
00198 impossible, depending on implementation, to construct a coupled stereo
00199 mechanism that results in bit-identical decompressed output compared
00200 to an uncoupled encoding should the encoder desire it.</p>
00201 
00202 <p>Vorbis uses a mapping that preserves the most useful qualities of
00203 polar representation, relies only on addition/subtraction (during
00204 decode; high quality encoding still requires some trig), and makes it
00205 trivial before or after quantization to represent an angle/magnitude
00206 through a one-to-one mapping from possible left/right value
00207 permutations. We do this by basing our polar representation on the
00208 unit square rather than the unit-circle.</p>
00209 
00210 <p>Given a magnitude and angle, we recover left and right using the
00211 following function (note that A/B may be left/right or right/left
00212 depending on the coupling definition used by the encoder):</p>
00213 
00214 <pre>
00215       if(magnitude>0)
00216         if(angle>0){
00217           A=magnitude;
00218           B=magnitude-angle;
00219         }else{
00220           B=magnitude;
00221           A=magnitude+angle;
00222         }
00223       else
00224         if(angle>0){
00225           A=magnitude;
00226           B=magnitude+angle;
00227         }else{
00228           B=magnitude;
00229           A=magnitude-angle;
00230         }
00231     }
00232 </pre>
00233 
00234 <p>The function is antisymmetric for positive and negative magnitudes in
00235 order to eliminate a redundant value when quantizing. For example, if
00236 we're quantizing to integer values, we can visualize a magnitude of 5
00237 and an angle of -2 as follows:</p>
00238 
00239 <p><img src="squarepolar.png" alt="square polar"/></p>
00240 
00241 <p>This representation loses or replicates no values; if the range of A
00242 and B are integral -5 through 5, the number of possible Cartesian
00243 permutations is 121. Represented in square polar notation, the
00244 possible values are:</p>
00245 
00246 <pre>
00247  0, 0
00248 
00249 -1,-2  -1,-1  -1, 0  -1, 1
00250 
00251  1,-2   1,-1   1, 0   1, 1
00252 
00253 -2,-4  -2,-3  -2,-2  -2,-1  -2, 0  -2, 1  -2, 2  -2, 3  
00254 
00255  2,-4   2,-3   ... following the pattern ...
00256 
00257  ...   5, 1   5, 2   5, 3   5, 4   5, 5   5, 6   5, 7   5, 8   5, 9
00258 
00259 </pre>
00260 
00261 <p>...for a grand total of 121 possible values, the same number as in
00262 Cartesian representation (note that, for example, <tt>5,-10</tt> is
00263 the same as <tt>-5,10</tt>, so there's no reason to represent
00264 both. 2,10 cannot happen, and there's no reason to account for it.)
00265 It's also obvious that this mapping is exactly reversible.</p>
00266 
00267 <h3>Channel interleaving</h3>
00268 
00269 <p>We can remap and A/B vector using polar mapping into a magnitude/angle
00270 vector, and it's clear that, in general, this concentrates energy in
00271 the magnitude vector and reduces the amount of information to encode
00272 in the angle vector. Encoding these vectors independently with
00273 residue backend #0 or residue backend #1 will result in bitrate
00274 savings. However, there are still implicit correlations between the
00275 magnitude and angle vectors. The most obvious is that the amplitude
00276 of the angle is bounded by its corresponding magnitude value.</p>
00277 
00278 <p>Entropy coding the results, then, further benefits from the entropy
00279 model being able to compress magnitude and angle simultaneously. For
00280 this reason, Vorbis implements residue backend #2 which pre-interleaves
00281 a number of input vectors (in the stereo case, two, A and B) into a
00282 single output vector (with the elements in the order of
00283 A_0, B_0, A_1, B_1, A_2 ... A_n-1, B_n-1) before entropy encoding. Thus
00284 each vector to be coded by the vector quantization backend consists of
00285 matching magnitude and angle values.</p>
00286 
00287 <p>The astute reader, at this point, will notice that in the theoretical
00288 case in which we can use monolithic codebooks of arbitrarily large
00289 size, we can directly interleave and encode left and right without
00290 polar mapping; in fact, the polar mapping does not appear to lend any
00291 benefit whatsoever to the efficiency of the entropy coding. In fact,
00292 it is perfectly possible and reasonable to build a Vorbis encoder that
00293 dispenses with polar mapping entirely and merely interleaves the
00294 channel. Libvorbis based encoders may configure such an encoding and
00295 it will work as intended.</p>
00296 
00297 <p>However, when we leave the ideal/theoretical domain, we notice that
00298 polar mapping does give additional practical benefits, as discussed in
00299 the above section on polar mapping and summarized again here:</p>
00300 
00301 <ul>
00302 <li>Polar mapping aids in controlling entropy 'leakage' between stages
00303 of a cascaded codebook.</li>
00304 <li>Polar mapping separates the stereo image
00305 into point and diffuse components which may be analyzed and handled
00306 differently.</li>
00307 </ul>
00308 
00309 <h2>Stereo Models</h2>
00310 
00311 <h3>Dual Stereo</h3>
00312 
00313 <p>Dual stereo refers to stereo encoding where the channels are entirely
00314 separate; they are analyzed and encoded as entirely distinct entities.
00315 This terminology is familiar from mp3.</p>
00316 
00317 <h3>Lossless Stereo</h3>
00318 
00319 <p>Using polar mapping and/or channel interleaving, it's possible to
00320 couple Vorbis channels losslessly, that is, construct a stereo
00321 coupling encoding that both saves space but also decodes
00322 bit-identically to dual stereo. OggEnc 1.0 and later uses this
00323 mode in all high-bitrate encoding.</p>
00324 
00325 <p>Overall, this stereo mode is overkill; however, it offers a safe
00326 alternative to users concerned about the slightest possible
00327 degradation to the stereo image or archival quality audio.</p>
00328 
00329 <h3>Phase Stereo</h3>
00330 
00331 <p>Phase stereo is the least aggressive means of gracefully dropping
00332 resolution from the stereo image; it affects only diffuse imaging.</p>
00333 
00334 <p>It's often quoted that the human ear is deaf to signal phase above
00335 about 4kHz; this is nearly true and a passable rule of thumb, but it
00336 can be demonstrated that even an average user can tell the difference
00337 between high frequency in-phase and out-of-phase noise. Obviously
00338 then, the statement is not entirely true. However, it's also the case
00339 that one must resort to nearly such an extreme demonstration before
00340 finding the counterexample.</p>
00341 
00342 <p>'Phase stereo' is simply a more aggressive quantization of the polar
00343 angle vector; above 4kHz it's generally quite safe to quantize noise
00344 and noisy elements to only a handful of allowed phases, or to thin the
00345 phase with respect to the magnitude. The phases of high amplitude
00346 pure tones may or may not be preserved more carefully (they are
00347 relatively rare and L/R tend to be in phase, so there is generally
00348 little reason not to spend a few more bits on them)</p>
00349 
00350 <h4>example: eight phase stereo</h4>
00351 
00352 <p>Vorbis may implement phase stereo coupling by preserving the entirety
00353 of the magnitude vector (essential to fine amplitude and energy
00354 resolution overall) and quantizing the angle vector to one of only
00355 four possible values. Given that the magnitude vector may be positive
00356 or negative, this results in left and right phase having eight
00357 possible permutation, thus 'eight phase stereo':</p>
00358 
00359 <p><img src="eightphase.png" alt="eight phase"/></p>
00360 
00361 <p>Left and right may be in phase (positive or negative), the most common
00362 case by far, or out of phase by 90 or 180 degrees.</p>
00363 
00364 <h4>example: four phase stereo</h4>
00365 
00366 <p>Similarly, four phase stereo takes the quantization one step further;
00367 it allows only in-phase and 180 degree out-out-phase signals:</p>
00368 
00369 <p><img src="fourphase.png" alt="four phase"/></p>
00370 
00371 <h3>example: point stereo</h3>
00372 
00373 <p>Point stereo eliminates the possibility of out-of-phase signal
00374 entirely. Any diffuse quality to a sound source tends to collapse
00375 inward to a point somewhere within the stereo image. A practical
00376 example would be balanced reverberations within a large, live space;
00377 normally the sound is diffuse and soft, giving a sonic impression of
00378 volume. In point-stereo, the reverberations would still exist, but
00379 sound fairly firmly centered within the image (assuming the
00380 reverberation was centered overall; if the reverberation is stronger
00381 to the left, then the point of localization in point stereo would be
00382 to the left). This effect is most noticeable at low and mid
00383 frequencies and using headphones (which grant perfect stereo
00384 separation). Point stereo is is a graceful but generally easy to
00385 detect degradation to the sound quality and is thus used in frequency
00386 ranges where it is least noticeable.</p>
00387 
00388 <h3>Mixed Stereo</h3>
00389 
00390 <p>Mixed stereo is the simultaneous use of more than one of the above
00391 stereo encoding models, generally using more aggressive modes in
00392 higher frequencies, lower amplitudes or 'nearly' in-phase sound.</p>
00393 
00394 <p>It is also the case that near-DC frequencies should be encoded using
00395 lossless coupling to avoid frame blocking artifacts.</p>
00396 
00397 <h3>Vorbis Stereo Modes</h3>
00398 
00399 <p>Vorbis, as of 1.0, uses lossless stereo and a number of mixed modes
00400 constructed out of lossless and point stereo. Phase stereo was used
00401 in the rc2 encoder, but is not currently used for simplicity's sake. It
00402 will likely be re-added to the stereo model in the future.</p>
00403 
00404 <div id="copyright">
00405   The Xiph Fish Logo is a
00406   trademark (&trade;) of Xiph.Org.<br/>
00407 
00408   These pages &copy; 1994 - 2005 Xiph.Org. All rights reserved.
00409 </div>
00410 
00411 </body>
00412 </html>
00413 
00414 
00415 
00416 
00417 
00418 

Generated by  doxygen 1.6.2