00001 <?xml version="1.0" standalone="no"?> 00002 <!DOCTYPE appendix PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" 00003 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [ 00004 00005 ]> 00006 00007 <appendix id="vorbis-over-ogg"> 00008 <appendixinfo> 00009 <releaseinfo> 00010 $Id: a1-encapsulation_ogg.xml 7186 2004-07-20 07:19:25Z xiphmont $ 00011 </releaseinfo> 00012 </appendixinfo> 00013 <title>Embedding Vorbis into an Ogg stream</title> 00014 00015 <section> 00016 <title>Overview</title> 00017 00018 <para> 00019 This document describes using Ogg logical and physical transport 00020 streams to encapsulate Vorbis compressed audio packet data into file 00021 form.</para> 00022 00023 <para> 00024 The <xref linkend="vorbis-spec-intro"/> provides an overview of the construction 00025 of Vorbis audio packets.</para> 00026 00027 <para> 00028 The <ulink url="oggstream.html">Ogg 00029 bitstream overview</ulink> and <ulink url="framing.html">Ogg logical 00030 bitstream and framing spec</ulink> provide detailed descriptions of Ogg 00031 transport streams. This specification document assumes a working 00032 knowledge of the concepts covered in these named backround 00033 documents. Please read them first.</para> 00034 00035 <section><title>Restrictions</title> 00036 00037 <para> 00038 The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis 00039 streams use Ogg transport streams in degenerate, unmultiplexed 00040 form only. That is: 00041 00042 <itemizedlist> 00043 <listitem><simpara> 00044 A meta-headerless Ogg file encapsulates the Vorbis I packets 00045 </simpara></listitem> 00046 <listitem><simpara> 00047 The Ogg stream may be chained, i.e. contain multiple, contigous logical streams (links). 00048 </simpara></listitem> 00049 <listitem><simpara> 00050 The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link) 00051 </simpara></listitem> 00052 </itemizedlist> 00053 </para> 00054 00055 <para> 00056 This is not to say that it is not currently possible to multiplex 00057 Vorbis with other media types into a multi-stream Ogg file. At the 00058 time this document was written, Ogg was becoming a popular container 00059 for low-bitrate movies consisting of DiVX video and Vorbis audio. 00060 However, a 'Vorbis I audio file' is taken to imply Vorbis audio 00061 existing alone within a degenerate Ogg stream. A compliant 'Vorbis 00062 audio player' is not required to implement Ogg support beyond the 00063 specific support of Vorbis within a degenrate ogg stream (naturally, 00064 application authors are encouraged to support full multiplexed Ogg 00065 handling). 00066 </para> 00067 00068 </section> 00069 00070 <section><title>MIME type</title> 00071 00072 <para> 00073 The correct MIME type of any Ogg file is <literal>application/ogg</literal>. 00074 However, if a file is a Vorbis I audio file (which implies a 00075 degenerate Ogg stream including only unmultiplexed Vorbis audio), the 00076 mime type <literal>audio/x-vorbis</literal> is also allowed.</para> 00077 00078 </section> 00079 00080 </section> 00081 00082 <section> 00083 <title>Encapsulation</title> 00084 00085 <para> 00086 Ogg encapsulation of a Vorbis packet stream is straightforward.</para> 00087 00088 <itemizedlist> 00089 00090 <listitem><simpara> 00091 The first Vorbis packet (the identification header), which 00092 uniquely identifies a stream as Vorbis audio, is placed alone in the 00093 first page of the logical Ogg stream. This results in a first Ogg 00094 page of exactly 58 bytes at the very beginning of the logical stream. 00095 </simpara></listitem> 00096 00097 <listitem><simpara> 00098 This first page is marked 'beginning of stream' in the page flags. 00099 </simpara></listitem> 00100 00101 <listitem><simpara> 00102 The second and third vorbis packets (comment and setup 00103 headers) may span one or more pages beginning on the second page of 00104 the logical stream. However many pages they span, the third header 00105 packet finishes the page on which it ends. The next (first audio) packet 00106 must begin on a fresh page. 00107 </simpara></listitem> 00108 00109 <listitem><simpara> 00110 The granule position of these first pages containing only headers is zero. 00111 </simpara></listitem> 00112 00113 <listitem><simpara> 00114 The first audio packet of the logical stream begins a fresh Ogg page. 00115 </simpara></listitem> 00116 00117 <listitem><simpara> 00118 Packets are placed into ogg pages in order until the end of stream. 00119 </simpara></listitem> 00120 00121 <listitem><simpara> 00122 The last page is marked 'end of stream' in the page flags. 00123 </simpara></listitem> 00124 00125 <listitem><simpara> 00126 Vorbis packets may span page boundaries. 00127 </simpara></listitem> 00128 00129 <listitem><simpara> 00130 The granule position of pages containing Vorbis audio is in units 00131 of PCM audio samples (per channel; a stereo stream's granule position 00132 does not increment at twice the speed of a mono stream). 00133 </simpara></listitem> 00134 00135 <listitem><simpara> 00136 The granule position of a page represents the end PCM sample 00137 position of the last packet <emphasis>completed</emphasis> on that page. 00138 A page that is entirely spanned by a single packet (that completes on a 00139 subsequent page) has no granule position, and the granule position is 00140 set to '-1'. 00141 </simpara></listitem> 00142 00143 <listitem> 00144 <simpara> 00145 The granule (PCM) position of the first page need not indicate 00146 that the stream started at position zero. Although the granule 00147 position belongs to the last completed packet on the page and a 00148 valid granule position must be positive, by 00149 inference it may indicate that the PCM position of the beginning 00150 of audio is positive or negative. 00151 </simpara> 00152 00153 <itemizedlist> 00154 <listitem><simpara> 00155 A positive starting value simply indicates that this stream begins at 00156 some positive time offset, potentially within a larger 00157 program. This is a common case when connecting to the middle 00158 of broadcast stream. 00159 </simpara></listitem> 00160 <listitem><simpara> 00161 A negative value indicates that 00162 output samples preceeding time zero should be discarded during 00163 decoding; this technique is used to allow sample-granularity 00164 editing of the stream start time of already-encoded Vorbis 00165 streams. The number of samples to be discarded must not exceed 00166 the overlap-add span of the first two audio packets. 00167 </simpara></listitem> 00168 </itemizedlist> 00169 00170 <simpara> 00171 In both of these cases in which the initial audio PCM starting 00172 offset is nonzero, the second finished audio packet must flush the 00173 page on which it appears and the third packet begin a fresh page. 00174 This allows the decoder to always be able to perform PCM position 00175 adjustments before needing to return any PCM data from synthesis, 00176 resulting in correct positioning information without any aditional 00177 seeking logic. 00178 </simpara> 00179 00180 <note><simpara> 00181 Failure to do so should, at worst, cause a 00182 decoder implementation to return incorrect positioning information 00183 for seeking operations at the very beginning of the stream. 00184 </simpara></note> 00185 </listitem> 00186 00187 <listitem><simpara> 00188 A granule position on the final page in a stream that indicates 00189 less audio data than the final packet would normally return is used to 00190 end the stream on other than even frame boundaries. The difference 00191 between the actual available data returned and the declared amount 00192 indicates how many trailing samples to discard from the decoding 00193 process. 00194 </simpara></listitem> 00195 </itemizedlist> 00196 00197 </section> 00198 00199 </appendix> 00200 00201 <!-- end appendix on Vorbis encapsulation in Ogg -->