You are currently browsing the monthly archive for March, 2008.

get the encoding charts

Growing concern for the quality of my own library of recordings prompted me to take the opportunity provided by a class assignment to research the properties of popular audio encoders and compressors. Being by nature short of cash, I decided to focus on free software, unencumbered (more or less) by patents and readily available over the internet. I undertook to compare program output with nothing but my own two ears, a little peace and quiet, a set of average computer speakers and a pair of Beyerdynamic DT 770 studio headphones. The results are published below, following a short introduction to present-day audio encoding methods and to the actual encoders used.

The MP3 (MPEG-1, Layer 3) standard, developed from various technological advances at the German University of Erlangen and the Fraunhofer Institute1, became quite a profitable enterprise in the 1990s. Most audio/video encoding and compression technology presently in use either consists of or is built on the concepts first embodied in the MP3 standard. That being the case, I thought it best to begin with a brief review of MP3 audio encoding.

Audio data, typically recorded in a PCM (pulse code modulation) format, takes up a good deal of digital space. CD-quality audio is recorded at a sample rate of 44.1k and a bit rate of 16; the math is simple. 44,100 samples a second at 16 bits a sample is equivalent to 705,600 bits a second; in other words, about 10 MB for a minute of data. Hence, the average CD, which is designed to hold 700-800 MB of data, can handle about 74 minutes of music. For the purposes of radio broadcasting and internet transmission, 10 MB a minute sounded even more impractical in the 1990s than it does today. The development of the MP3 standard served both as a solution to the digital problem in the business world and as a catalyst for the modern internet media explosion. MP3 also created a host of new problems for the recording industry.

Lossless data compression was old news in the 1990s, but it hadn’t solved the audio problem. The solution turned out to be some simple math over a century old. Basically stated, and at the risk of drastic oversimplification, the math (courtesy of Fourier) allows an encoder to take certain reliable properties of sound (e.g., sine waves) for granted and reference them as constants rather than take up valuable bits encoding them over and over again. As a result, music is mapped to 32 even subdivisions, or sub-bands, in the “frequency” domain (as opposed to the “time” domain) and dropped into sequential frames which are individually analyzed and operated upon. One method applied to frames is windowing, where the frequency mapping is slightly adjusted depending on what comes before and after (“long” windows for continuous sounds, “short” windows for transients).2 Even more data is removed from the original signal through perceptual coding, the use of well-studied human psychoacoustic characteristics to knock off inaudible frequencies—for example, those above practical hearing range (somewhere from 16k-20k), or those which will be “masked” to our ears as a result of the amplitude of a competing frequency. “Excess” data is also possible to remove through the use of a few types of stereo encoding or a combination of them—in joint stereo mode, an encoder may use middle/side stereo processing to output the left and right channels as the sum and difference (respectively) of those two channels, making later compression easier, or the encoder may use intensity stereo processing, which sums signals in the high frequency bands and puts stereo information in “intensity points”.3 The latter method more often results in subtle inconsistencies, because stereo information that was only present in one channel will be present to some degree in both, but the inconsistency is not always detectable. Many MP3 encoders use some combination of the above methods to encode stereo, or contain other options (for example, separating the channels and encoding them as two distinct files). Near the end of the process, a complex lossless compression algorithm (Huffman coding) is applied to the “transformed” data.

In addition to the above information, it should be noted that MP3 encoders can use the bits available to them in a few different ways. In CBR (constant bit rate) mode, regardless of the simplicity or complexity of the incoming data, the same bitrate is used throughout. This has been standard practice (particularly at 128k) but is being (has been?) superseded by VBR (variable bit rate), a method with which the encoder attempts to intelligently use more bits where there is more complex information and less when the information is simpler, and ABR (average bit rate), which is a sort of compromise between the two other methods where a target bit rate is set, but the actual rate varies based on a choice of algorithms (implementations vary). As might be expected, the bit rate proportionately increases or decreases the size of the output file.

For the MP3 portion of the experiment in question, I have elected to use the LAME encoder (http://lame.sourceforge.net) as it is now free of the original code snippets borrowed from the ISO (standard-makers) implementation and is non-proprietary, licensed under the LGPL. Perhaps not surprisingly, LAME is recognized as one of the best MP3 encoders currently available. Having decided to weigh in the MP3 standard against a couple of more recently developed standards, AAC (Advanced Audio Coding) and Ogg Vorbis, I thought it would be fitting to outline some of the differences that separate these three formats from one another.

MPEG-4 AAC (MP4) is built on MPEG-2 AAC, which, though designed from the MP3 standard, is a non-backward-compatible format. Two goals of MP4 were the removal of some of the limitations of the MP3 standard and the addition of functional improvements. For example, MP3 can only handle sample rates up to 48k, whereas MP4 encoders will read up to 96k. MP4 is also equipped to encode up to 48 audio channels and 16 low-frequency enhancement channels, compared to MP3’s 5 audio and 1 low-frequency enhancement (the present Dolby 5.1 standard). Regarding the encoding process itself, MP4 uses longer “long windows” and shorter “short windows” than the MP3 standard, allowing for better handling of transients and better frequency resolution, and encourages manipulation of the stereo encoding process (middle/side or intensity) at the sub-band level rather than only at the whole-file level.4 MP4 also includes tools to reduce quantization noise and enhance compressibility of data through prediction. The MP4 format is even reported to perform better at low bit rates than MP3. On the whole, MP4 is supposed to be a major improvement on MP3 in terms of overall compression and sound quality. The people behind DRM and iTunes, among others, have already made AAC the standard of choice for their mass audio distribution.

In order to test the MP4 codec, in keeping with the theme of “free” software, I will use two different encoders. The first, FAAC (http://www.audiocoding.com), is open-source free software and is still in development. The developers are not yet ready to claim that their encoder is on a par with the proprietary MP4 encoders, so I am ready to take the FAAC results with a grain of salt. The second is a proprietary (but free for use) encoder offered by Nero (http://www.nero.com/eng/nero-aac-codec.html). NeroAAC is advertised as state-of-the-art.

Finally, Ogg Vorbis is a free bitstream container and audio codec with a funny name. The listening tests on their website (http://www.xiph.org/vorbis/listen.html) intrigued me, and since I was already on the path to free software in my research, I decided to add Vorbis to the experiment. The Vorbis audio codec was designed to be both a free and unencumbered open-source competitor with MP3, MP4, WMA, etc., and an especially effective codec for use at lower bit rates, which are necessary in most internet streaming media applications. This is one area where most other codecs have arguably failed. Many design differences with other codecs arise as a result of the Vorbis philosophy (read more at http://xiph.org/vorbis/doc), but one sticks out because of the name alone. “Ogg” is not an audio codec, but an adaptable bitstream container which can serve as a transport layer for all sorts of streaming data. “Vorbis” is an audio specification that can be plugged into an Ogg stream, but also into many other packet-handling streams (i.e., RTP). The tools designed by the developers of Ogg, Vorbis and the like are “forward-adaptive,” a phrase they frequently employ in their documentation. They are modular tools and were created for as many conceivable uses as possible. One of the Vorbis encoding distinctives is its unique VBR alogrithm (though there are ABR and CBR options), which doesn’t have a one-to-one correspondence in its methodology with any other lossy audio compressor.5 The Vorbis codec has a high degree of configurability and a number of sensitive encoding tweaks available at the command-line.

For the present task I have chosen OggEnc2 (http://rarewares.org/ogg-oggenc.php) with libvorbis as well as an additional third-party library (aoTuVb5) which is said to improve Vorbis performance at low bit rates.

In addition, all test tracks have also been compressed with the free lossless audio codec, FLAC (http://flac.sourceforge.net), so that I have a reference point from which to compare lossless compression results with lossy encoding results.

 

I chose three tracks for the testing in an attempt to get some variety in the type of sounds to be encoded. The tracks, from my disc collection, were ripped uncompressed to my computer using EAC (Exact Audio Copy—http://www.exactaudiocopy.de/) in order to get the most faithful data representation possible. The tracks are as follows:

 

Jellyfish—Sebrina, Paste and Plato from Spilt Milk (Rock / Power Pop)

Dave Brubeck—Bicycle Built for Two from Quiet as the Moon (Jazz)

Olivier Messiaen—Intermède from Quatuor pour la fin du Temps (Classical)

 

Here are the versions and pertinent information for the encoders used in the test (please note that these programs are command-line utilities, which I used for convenience of timing the test runs among other things, but that there are GUI tools available—see Links section):

 

LAME v. 3.97 (32 bit)—Windows Binary

http://www.rarewares.org/dancer/dancer.php?f=lame-current

FAAC v. 1.24.1 (unstable)—Windows Binary

http://pessoal.onda.com.br/rjamorim/faac.zip

Nero Digital Audio Reference MPEG-4 & 3GPP Audio Encoder v. 1.1.34.2—Windows Binary

http://www.nero.com/eng/down-ndaudio.php

OggEnc v. 2.84 (libvorbis aoTuVb5)—Windows Binary

http://rarewares.org/dancer/dancer.php?f=167

 

Also important to the analysis are the specs of my machine and the outputs:

 

Windows XP SP2

Celeron 1.7Ghz (P4-based, Willamette)

1152 MB DDR-SDRAM

Intel i845E Chipset

AC’97 Sound Controller

Line out to Boston Speakers / Sub (average)

Beyerdynamic DT 770 studio headphones

 

All encoding was done with the CPU under a minimal load (just typical WinXP services running, Firefox and OpenOffice loaded). There are virtually endless options to be toyed with in the software, but I tried to run the programs with a reasonable cross-section of the simpler ones; options that the average person who downloads the encoder would probably use.

 

FLAC—run with “-6”

FAAC, NeroAAC, Ogg—run with “-q n” where “n” is the last number in the filename

LAME—run with options at end of filename in most cases:

*.std128.mp3 = run with no options (default)

*.h192.mp3 = run with “-h -b 192”

*.hv5.mp3 = run with “-h -V5”

*.best.mp3 = run with “-V0 –vbr-new -q0 –lowpass 19.7”

 

For each test encoding, I measured the following parameters: Size, Compression Ratio (ratio), Time to Encode (mm:ss:00), and Average Bit Rate (abr). Unfortunately, NeroAAC doesn’t report the bit rate by default, so those fields are blank. The source .wav entry on the charts contains the full values for applicable parameters (i.e., 100% ratio, time=length of track). Upon listening, both through the phones and in the speakers, comparing each “take” with the source .wav, I gave each file a Roderick Rating (rr), which is a sophisticated way of saying that I rated the audio files from one to ten. In the Roderick Rating scale, 10 means “incredibly hard to tell any subjective difference from the original” and 1 means “did I input an audio file or a device driver?” Comments offered for each track should help the reader with interpretation. I have indicated “(def)” in that field where the options run are the software defaults.

Based on the values and observations obtained, I judged the fastest (and slowest) compression, the best compression ratio (with respect to sound quality), and the highest quality at an average bit rate under ~96k. The results are detailed on the first three charts. Not surprisingly, Ogg Vorbis does better at lower bit rates than the others. I was surprised, however, that Vorbis won two out of three in compression ratio/sound quality. I know what format I’ll be using for my library in the future.

As a final touch, I wanted to get a sense of what each encoder would do with a full-length album, given the defaults, so I pulled out my copy of Abbey Road, ripped it uncompressed with EAC, and began timing encoding loops. The results are detailed on the fourth chart. For the full album encode, only default settings for the encoders were used, with the exception of “lame best” (as described above). The charts are here.

 

 

Links

 

MPEG specifications http://www.mpeg.org/

MP3 technical info http://www.mp3-tech.org/

MP3 at Fraunhofer http://www.iis.fraunhofer.de/EN/bf/amm/mp3history/index.jsp

LAME source http://lame.sourceforge.net/

LAME GUI frontend http://www.dors.de/razorlame/

 

FAAC (MP4) source http://www.audiocoding.com/

Nero AAC codec http://www.nero.com/eng/nero-aac-codec.html

 

Ogg Vorbis software http://xiph.org/, http://vorbis.com/

 

FLAC source / frontend http://flac.sourceforge.net/

 

Lots of encoder binaries http://www.rarewares.org/

Vintage MP3 enc. guide http://mp3.radified.com/

Audio codec comparison http://en.wikipedia.org/wiki/Comparison_of_audio_codecs

MP3 encoder comparison http://arstechnica.com/wankerdesk/1q00/mp3/mp3-1.html

 

EAC CD copier http://exactaudiocopy.de/

Winamp media player http://www.winamp.com/

VLC media player http://www.videolan.org/vlc/

HydrogenAudio forums http://www.hydrogenaudio.org/forums/index.php

 

Sources Consulted

 

Hacker, Scot. MP3: The Definitive Guide. O’Reilly, 2000.

 

Pan, Davis. “A Tutorial on MPEG/Audio Compression” [paper online]. Available from

http://www.digital-audio.net/res/docs/pdf/mpegaud.pdf ; Accessed 18 March 2008.

 

Raissi, Raissol. “The Theory Behind MP3” [paper online]. Available from

http://www.mp3-tech.org/programmer/docs/mp3_theory.pdf ; Accessed 18 March 2008.

 

The Story of MP3,” Fraunhofer IIS. Available from

http://www.iis.fraunhofer.de/EN/bf/amm/mp3history/mp3history01.jsp ; Accessed 18 March 2008.

 

MPEG-2 / MPEG-4 AAC. Available from http://www.mp3-tech.org/aac.html ; Accessed 18

March 2008.

 

(NOTES)

Ogg Vorbis FAQ. Available from http://vorbis.com/faq/#quality ; Accessed 18 March 2008.

1“The Story of MP3,” Fraunhofer IIS; available from http://www.iis.fraunhofer.de/EN/bf/amm/mp3history/mp3history01.jsp; Internet; accessed 18 March 2008.

2 Rassol Raissi, “The Theory Behind MP3,” 19-22 [paper online]; available from

http://www.mp3-tech.org/programmer/docs/mp3_theory.pdf; Internet; accessed 18 March 2008. A good flow-chart of the encoding process is available in this document.

3Davis Pan, “A Tutorial on MPEG/Audio Compression,” 9 [paper online]; available from

http://www.digital-audio.net/res/docs/pdf/mpegaud.pdf; Internet; accessed 18 March 2008.

4MPEG-2 / MPEG-4 AAC, available from http://www.mp3-tech.org/aac.html; Internet; accessed 18 March 2008.

5Ogg Vorbis FAQ, available from http://vorbis.com/faq/#quality; Internet; accessed 18 March 2008. More info available at http://xiph.org/vorbis/doc/Vorbis_I_spec.html.