[Audacity-devel] Mp3 decoding with the MAD library: We’ve all been doing it wrong

Discussion:

Steve the Fiddle

2016-11-27 14:22:55 UTC

Interesting blog post from Chris Cannam ("Sonic Visualiser"):
https://thebreakfastpost.com/2016/11/26/mp3-decoding-with-the-mad-library-weve-all-been-doing-it-wrong/

He's sent a pull request to fix one of the problems:
https://github.com/audacity/audacity/pull/175
and I've invited him to introduce himself to this list

I'll be logging these issues on bugzilla tomorrow unless anyone beats me to
it.

Steve

Vaughan Johnson

2016-11-29 23:07:23 UTC

Permalink

Steve: "I've invited him to introduce himself to this list"

Thx. Some of us know him already, from prev contribs to Audacity. :-)

- V

Post by Steve the Fiddle
https://thebreakfastpost.com/2016/11/26/mp3-decoding-with-
the-mad-library-weve-all-been-doing-it-wrong/
https://github.com/audacity/audacity/pull/175
and I've invited him to introduce himself to this list
I'll be logging these issues on bugzilla tomorrow unless anyone beats me
to it.
Steve
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Chris Cannam

2016-11-30 09:13:49 UTC

Permalink

Hello all -- so the problem here is as described in the link at
https://thebreakfastpost.com/2016/11/26/mp3-decoding-with-the-mad-library-weve-all-been-doing-it-wrong/,
but there are at least three levels of "things that could be done about
it":

1. Fix the 1152-sample truncation from the end of imported mp3 files.
This is what my patch in the PR at
https://github.com/audacity/audacity/pull/175 is intended to do. As I
mention in the PR, even this single fix is surprisingly tricky and
could use careful review (so I completely agree it shouldn't be merged
in a hurry) but this one probably is necessary.

2. Stop passing Xing/LAME metadata frames through to the mp3 stream
decoder, thus removing 1152 samples of redundant padding at the start
(for mp3 files that have this frame). This is relatively easy and safe
to do, involving adding a filter callback to the MAD decoder and
checking the frame magic number in it, but it does involve an aspect
of policy: are you prepared to change the start offset when an mp3
file is loaded? That's maybe a harder question for an annotation
program like SV than it is for Audacity.

3. Instead of just discarding the Xing/LAME metadata frame, actually
parse out the encoder delay and end padding values from it and use
these to trim the silence properly. This is what an mp3 player's
"gapless" mode does -- this way, PCM data encoded with LAME and then
decoded in Audacity would start and end at the same samples as the
original PCM.

(It's ironic that the main use of LAME metadata these days is to enable
gapless playback, but we've all been "using" it to make the gaps worse
instead. I expect that, like me, you never had any idea this metadata
wasn't handled at the decoder layer.)

You can see in this file how I've been handling this in this class from
the Sonic Visualiser code, which I've now got to point 3 above, i.e.
doing proper gapless loading:

https://code.soundsoftware.ac.uk/projects/svcore/repository/entry/data/fileio/MP3FileReader.cpp?utf8=%E2%9C%93&rev=3.0-integration

The main thing that addresses points 2 and 3 above is the use of a
filter callback, provided to mad_decoder_init at line 327. For our
purposes, this callback only needs to do anything if we are looking at
the very first mp3 frame, and it can either (for point 2 above) check
the frame's magic number (see line 416) and skip the frame if "Xing" or
"Info" is found, or (point 3) read on until it finds the delay and
padding values and make use of those to trim the ends of the audio (line
458, but the trimming itself is done elsewhere).

You can also refer to the madplay source code, which handles points 1
and 2 but doesn't have gapless support. For the gapless logic and magic
numbers I referred to the mpg123 library.

I'd be happy to prepare a PR to cover point 2 (identifying and dropping
the Xing/LAME frame) but I think I probably don't have the time to
address point 3 (actual gapless mode) for Audacity. Note again that the
only fix I think is absolutely essential is the first one, the other two
are just nice to have.

Chris

Post by Vaughan Johnson
Steve: "I've invited him to introduce himself to this list"
Thx. Some of us know him already, from prev contribs to Audacity. :-)
- V
On Sun, Nov 27, 2016 at 6:22 AM, Steve the Fiddle

------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------------------------

Gale Andrews

2016-11-30 13:43:06 UTC

Permalink

Thanks, Chris for the further input. 3) might be nice as an option, maybe
even default.

Given mpg123 already supports (3), and is still maintained, should we
move to it?

This might be a concern (high CPU use on Linux):
https://sourceforge.net/p/mpg123/bugs/137/ .

Gale

Post by Chris Cannam
Hello all -- so the problem here is as described in the link at
https://thebreakfastpost.com/2016/11/26/mp3-decoding-with-the-mad-library-weve-all-been-doing-it-wrong/,
but there are at least three levels of "things that could be done about
1. Fix the 1152-sample truncation from the end of imported mp3 files.
This is what my patch in the PR at
https://github.com/audacity/audacity/pull/175 is intended to do. As I
mention in the PR, even this single fix is surprisingly tricky and
could use careful review (so I completely agree it shouldn't be merged
in a hurry) but this one probably is necessary.
2. Stop passing Xing/LAME metadata frames through to the mp3 stream
decoder, thus removing 1152 samples of redundant padding at the start
(for mp3 files that have this frame). This is relatively easy and safe
to do, involving adding a filter callback to the MAD decoder and
checking the frame magic number in it, but it does involve an aspect
of policy: are you prepared to change the start offset when an mp3
file is loaded? That's maybe a harder question for an annotation
program like SV than it is for Audacity.
3. Instead of just discarding the Xing/LAME metadata frame, actually
parse out the encoder delay and end padding values from it and use
these to trim the silence properly. This is what an mp3 player's
"gapless" mode does -- this way, PCM data encoded with LAME and then
decoded in Audacity would start and end at the same samples as the
original PCM.
(It's ironic that the main use of LAME metadata these days is to enable
gapless playback, but we've all been "using" it to make the gaps worse
instead. I expect that, like me, you never had any idea this metadata
wasn't handled at the decoder layer.)
You can see in this file how I've been handling this in this class from
the Sonic Visualiser code, which I've now got to point 3 above, i.e.
https://code.soundsoftware.ac.uk/projects/svcore/repository/entry/data/fileio/MP3FileReader.cpp?utf8=%E2%9C%93&rev=3.0-integration
The main thing that addresses points 2 and 3 above is the use of a
filter callback, provided to mad_decoder_init at line 327. For our
purposes, this callback only needs to do anything if we are looking at
the very first mp3 frame, and it can either (for point 2 above) check
the frame's magic number (see line 416) and skip the frame if "Xing" or
"Info" is found, or (point 3) read on until it finds the delay and
padding values and make use of those to trim the ends of the audio (line
458, but the trimming itself is done elsewhere).
You can also refer to the madplay source code, which handles points 1
and 2 but doesn't have gapless support. For the gapless logic and magic
numbers I referred to the mpg123 library.
I'd be happy to prepare a PR to cover point 2 (identifying and dropping
the Xing/LAME frame) but I think I probably don't have the time to
address point 3 (actual gapless mode) for Audacity. Note again that the
only fix I think is absolutely essential is the first one, the other two
are just nice to have.
Chris

------------------------------------------------------------------------------

Chris Cannam

2016-11-30 13:54:27 UTC

Permalink

Post by Gale Andrews
Given mpg123 already supports (3), and is still maintained, should we
move to it?

Steve the Fiddle

2016-11-30 19:51:28 UTC

Permalink

Thanks Chris for the work that you've done so far.
As I wrote on GitHub, I can confirm that the problem exists in Audacity and
that the PR appears to work as intended (fixing part 1), so imo we need
this, and I hope that someone more experienced than myself will review the
PR soon after 2.1.3 is released.

I'm sure that "part 3" ("gapless" import) would be a popular feature with
users. I'm thinking that if we had this, it may be useful to expose it in
the GUI as an option (rather than an "invisible", behind the scenes,
feature that just happens).

Currently we see a fair number of support requests along the lines of "why
does my carefully prepared audio have a gap when looped in <name of
application>?" If Audacity automatically and invisibly removed the "gap",
then that would probably increase the confusion for such users "It loops
perfectly in Audacity, but not in <name of application>. What's wrong with
Audacity's export?"

On the other hand, if "gapless import" was presented as a feature in the
GUI, then that could help to increase understanding, and so reduce support
requests, in addition to the immediate benefit to users.

Steve

Post by Chris Cannam

Post by Gale Andrews
Given mpg123 already supports (3), and is still maintained, should we
move to it?

I considered that for Sonic Visualiser, but took the view that
introducing another library, with all that it implied in terms of build
systems and potential for cross-platform bugs and regressions, was
probably more painful than adding logic manually to handle these cases
for mp3 import in the application.
MAD does still appear to be good code that has had relatively few
problems despite not having been updated in a while. The difficulty here
is not a bug in MAD (considered as a stream decoder) but is the result
of a lack of support at "the next level up" for file attributes like
encoder delay.
That said, I could understand if someone else felt the priorities worked
out differently.
Chris
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Leon Müller

2016-12-17 10:05:40 UTC

Permalink

Post by Steve the Fiddle
Thanks Chris for the work that you've done so far.
As I wrote on GitHub, I can confirm that the problem exists in
Audacity and that the PR appears to work as intended (fixing part 1),
so imo we need this, and I hope that someone more experienced than
myself will review the PR soon after 2.1.3 is released.
I'm sure that "part 3" ("gapless" import) would be a popular feature
with users. I'm thinking that if we had this, it may be useful to
expose it in the GUI as an option (rather than an "invisible", behind
the scenes, feature that just happens).
Currently we see a fair number of support requests along the lines of
"why does my carefully prepared audio have a gap when looped in <name
of application>?" If Audacity automatically and invisibly removed the
"gap", then that would probably increase the confusion for such users
"It loops perfectly in Audacity, but not in <name of application>.
What's wrong with Audacity's export?"
On the other hand, if "gapless import" was presented as a feature in
the GUI, then that could help to increase understanding, and so reduce
support requests, in addition to the immediate benefit to users.
Steve
On 30 November 2016 at 13:54, Chris Cannam

Post by Gale Andrews
Given mpg123 already supports (3), and is still maintained,

should we

Post by Gale Andrews
move to it?

I considered that for Sonic Visualiser, but took the view that
introducing another library, with all that it implied in terms of build
systems and potential for cross-platform bugs and regressions, was
probably more painful than adding logic manually to handle these cases
for mp3 import in the application.
MAD does still appear to be good code that has had relatively few
problems despite not having been updated in a while. The
difficulty here
is not a bug in MAD (considered as a stream decoder) but is the result
of a lack of support at "the next level up" for file attributes like
encoder delay.
That said, I could understand if someone else felt the priorities worked
out differently.
Chris
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus