Preamble: All observations here were practice-based, not theory-based. In other words, I didn't go about trying to find the most *theoretically* highest-quality settings for MP3's, and work backwards- I base all my observations here on just that - observation - finding out what worked best according to my ears, and then working back from that to find the theory behind why it worked. I work in a music studio and I do quite a lot of mastering. So this isn't self-fooling, confirmation bias, whatever, this is just what I've observed. You can trust my ears and my equipment, I believe. I've done ABX tests on myself for this sort of thing.
We all have to use MP3 at some point. Regardless of whether we like it, or prefer lossless formats like wavpack or flac, it's necessary to deliver music to punters, as well as colleagues or as previews of finished works. Sometimes we need to to fit music on our low-capacity portable audio players. In these circumstances, it pays to know a thing or two about MP3 encoding, if we're very tight on our audio standards and what we will put up with. This guide is dedicated to those working in the audio profession and all of those who like high quality audio in general. I haven't found the guides on other sites such as hydrogen audio revealing or strictly accurate, so I wrote my own.
No it's not an insult towards a particular MP3 encoder, the best encoder out there is called 'Lame' (a self-referential acronym for 'Lame, another MP3 encoder.'). You can download binary builds for it here, and front-end for batch-encoding multiple MP3's here. To gain the highest quality output from it, you want to use the "-Q 0" flag (highest quality but slowest encode). Some people say you can't tell the difference between -Q 0 and -Q 3. I beg to differ, and CPU speed being what it is nowadays, I don't care.
MP3's have a maximum bitrate of 320kpbs, which will in all cases produce the maximum possible quality from your signal, at the expense of filesize. CBR = Constant bitrate and VBR = variable bitrate. A constant bitrate MP3 will use the same amount of data to represent each second of an audio signal, for example 320kb. A variable bitrate MP3 will vary the amount of data used to represent the signal based on the incoming complexity of the audio signal, and what the MP3 encoder decides is 'more noticable' and 'less noticable' information. Both methods reduce the amount of audible information in an audio signal by removing components of the audio signal which are less easily noticed (for example, stereo separation is more difficult for the human ear to percieve in bass frequencies than it is in high frequencies).
The theory is that VBR can produce quality equal to that of a maximum-bitrate CBR encoding because it reduces the amount of data ascribed to less-complex audio signals. My observation is that this is not the case - the reason being that while a computer does not adjust it's receptivity based on contrast, the human brain does. If you are in a dark room, your brain will immediately pick up on the smallest details that are visible in order to make sense of it's environment. If you are in a bright room, your brain doesn't notice the detail so much, because it becomes irrelevant for survival. Your brain does the same thing to what your ears hear. In a 'complex' passage of audio, there are a lot of different things happening at once, and your brain doesn't tend to notice the smaller details because of the greater profundity of larger details. Once you get a quieter, simpler 'less complex' passage you will be able to automatically hear and pick out those smaller details more. In fact, you hear the details far more keenly and with greater precision than if there was more happening in the signal.
For those reasons VBR algorithms fail for high-quality encoding. When a VBR algorithm comes across a simpler piece of audio, it will automatically reduce the bitrate for that piece, failing to take into account the human brain's adjustment for detail under those circumstances. While this is more audibly-noticable on lower-bitrate signals, the same principle is in action at higher bitrates. Hence, CBR is the only high-quality choice for MP3 encoding.
There is a very good and true reason for the removal of high-frequency information from a signal as part of the process of MP3 encoding - most of us can't hear it. Certainly anyone who's damaged their hearing, won't hear it. Most people under 16 can hear frequencies up to 18khz, and beyond this point, recognition of higher frequencies tends to decay rapidly. However we still percieve the removal of these signals, even if we can't recognisably stand beside our dogs and wag our tails when a 21khz frequency is emitted, unless it's at very high volume. Just as it has been proven that sub-audible low frequencies have an effect on the human brain, it's also known that ultrasonic frequencies have an effect, despite not being consciously 'heard'. This is something I've noticed in the studio, and could be a result of the necessarily imperfect nature of low-pass filtering (the terminology given to an audio filter which removes frequencies above a certain point and keeps those below it), but the implementation of a low-pass filter at 20khz almost always removes a sense of a signal being 'live' ie. actually happening. This is understandable psychologically, as almost all signals we hear in real life have audio components that go above what we can immediately identify. Sometimes you want this loss of information in a studio situation, but you never want it in your encoding process if high-quality is your goal. And I can hear the same thing in MP3 encoding. When a signal is encoded with a lowpass filter at 20khz or below, the signal stops sounding like what you're hearing is actually there.
On lower-quality equipment and with low-quality speakers, obviously you're not going to notice this stuff - but that's not what this guide is about. Now, the problem with not removing high-frequency information when creating a lossy-format MP3, with a limited bitrate, is that suddenly you've got a whole lot of extra audio information which most people can't hear that well (or at least, notice that they hear it) that's taking valuable filespace away from all the lower frequencies which people definitely can audibly hear. So for all the higher-frequency information you retain, you will take some detail away from the lower frequencies, in terms of your finished product. From my testing, I found that at 320kbps, the extra sense of higher frequency detail and the sense of the audio signal sounding 'alive' was more than enough to compensate from the slight drop in detail in the lower frequencies. YMMV, but this is my recommendation. In addition, there is a way to reduce this effect with the lame encoder, which we will go into later.
So for a high-quality MP3, we want 320kbps, or as close to it as possible - we want CBR encoding - and we want no lowpass or highpass filtering on the signal. Bug in the ointment, Lame removed the ability to disable lowpass filtering (-k flag) on CBR encodes at version 3.99 - and, although you can change the highest frequency in the lowpass filter, you can't increase it above 20khz. However, they also made CBR and VBR encoding algorithms identical at that point, and the -V0 level of VBR encoding (highest quality level of VBR) disables lowpass frequency filtering. You thinking what I'm thinking? That's right, we're going to use a VBR mode to encode in CBR. On the command line, the syntax looks like this:
lame -q 0 -b 320 -B 320 -Y -V 0 [input filename] [output filename]
The '-q 0' makes the encoder use the highest possible encoding settings, -V 0 makes it use the highest possible VBR encoding settings and disable lowpass filtering, while -b 320 and -B 320 define the highest possible and lowest possible bitrates used - so when you set them both to 320kbps, you end up with, effectively, a CBR signal. One additional advantage of this technique is when a signal is genuinely so simple that it doesn't require 320kbps to encode (ie. when the most accurate mp3 representation of the signal for any given second is achievable with less than 320kb - for example a signal of pure silence) it will in fact drop down to a lower bitrate for that second. So you do tend to end up with a 317kbps average signal, but with no actual loss of audible quality in the signal compared to 320kbps CBR-
Meanwhile, the '-Y' flag tells the encoder that, if the frequency signal above 16khz (the max that most humans can audibly 'hear') is so complex that a typical high-quality encoding of it is going to significantly reduce the amount of space available to achieve a high-quality encoding of the lower frequencies, then reduce the amount of bits allocated to encoding the frequencies above 16khz in this particular part of the signal. The nice thing about this is that rather than removing the upper frequencies, it retains them along with the sense of the signal feeling 'live' without significantly impacting on the lower, more audible frequencies - and since we can't really hear the detail of the upper frequencies significantly, it doesn't significantly impact on the quality of those either, from an audible perspective.
To achieve the settings above in Razorlame (the batch-encoding frontend mentioned earlier), simply turn on VBR encoding, set your CBR and VBR bitrates to 320, set your Q and VBR values to 0 and add '-Y' to the extra commandline options. Leave your stereo mode on 'default'.
Well, that's it - hope you enjoyed it. Even if I can't hear the difference in some instances, I'm sure there are people who can. Also, where I *can* hear the difference, I'm equally sure there are plenty of people who can't, either due to their hearing, or their equipment. That's what makes this field so subjective, and interesting. But as a cautionary warning, just because you can (or can't) hear something, never assume someone else can (or can't).
All advice given without guarantee - use your brain - if anything dies/fries/stops/explodes, see a doctor (but don't talk to me).
Back to the main page