Discussion:
[Audacity-devel] Why is sampleCount 64 bits wide and signed?
Paul Licameli
2016-08-16 23:35:49 UTC
Permalink
typedef sampleCount is a signed 64 bit quantity on Windows, and long long
on others. I can tell that on Mac that is the same width because this
compiles without error (using the new C++11 keyword static_assert):

static_assert(sizeof(long long) == 8, "");

Someone tell me if that compiles on Linux too.

Who recalls why it was decided that the type should be this wide? You need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow an
unsigned 32 bit integer, or half of that for a signed integer.

Who ever makes a track that long?

Well maybe it is sometimes done, because I answered a question from someone
on the Audacity Google group, who was trying to export more than 13.5 hours
of audio at once. The re-imported audio was truncated to less than that
length, but I had reason to believe it was because of the limitations of
the .wav format and not the fault of our own code.

There is also the question whether existing uses of sampleCount should be
signed or unsigned. Sometimes it really is meant to be a count, as of a
buffer size, so unsigned would be appropriate, but then the width is
probably excessive for the purpose. (The width of size_t is only 4 bytes
on my Mac.)

But sometimes it is meant as a quantized time identifying a place in an
audio track, and negative times are not impossible (see what time shift can
do), so it should be signed, and also very wide, if we really mean to
support days-long tracks and the possible very big positive values.

PRL
Paul Licameli
2016-08-16 23:46:07 UTC
Permalink
384000 is the maximum choice for sample rate in the track drop-down menu,
if you don't enter a custom rate which could presumably be even higher.

At that rate a mere 3.1 hours plus a tad would overflow 32 bits unsigned,
or half that, signed.

Do any of our scientist friends study dolphin and bat voices for that sort
of duration? (I am not being facetious. It says here that dolphin voices
go up to 150 kHz, so 300 kHz sampling rate isn't crazy for that purpose.
https://en.wikipedia.org/wiki/Whale_vocalization And here it says bat
voices even go to 212 kHz, so 384 kHz is even deficient
https://en.wikipedia.org/wiki/Animal_echolocation)

PRL
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long long
on others. I can tell that on Mac that is the same width because this
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow an
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more than
13.5 hours of audio at once. The re-imported audio was truncated to less
than that length, but I had reason to believe it was because of the
limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should be
signed or unsigned. Sometimes it really is meant to be a count, as of a
buffer size, so unsigned would be appropriate, but then the width is
probably excessive for the purpose. (The width of size_t is only 4 bytes
on my Mac.)
But sometimes it is meant as a quantized time identifying a place in an
audio track, and negative times are not impossible (see what time shift can
do), so it should be signed, and also very wide, if we really mean to
support days-long tracks and the possible very big positive values.
PRL
Gale Andrews
2016-08-17 00:49:49 UTC
Permalink
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
for reopened projects, until Audacity 2.0.6 fixed that:
http://legacywiki.audacityteam.org/wiki/Recording_length#long .

WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.


Gale
typedef sampleCount is a signed 64 bit quantity on Windows, and long long on
others. I can tell that on Mac that is the same width because this compiles
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow an
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from someone
on the Audacity Google group, who was trying to export more than 13.5 hours
of audio at once. The re-imported audio was truncated to less than that
length, but I had reason to believe it was because of the limitations of the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should be
signed or unsigned. Sometimes it really is meant to be a count, as of a
buffer size, so unsigned would be appropriate, but then the width is
probably excessive for the purpose. (The width of size_t is only 4 bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying a place in an
audio track, and negative times are not impossible (see what time shift can
do), so it should be signed, and also very wide, if we really mean to
support days-long tracks and the possible very big positive values.
PRL
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
Paul Licameli
2016-08-17 01:01:51 UTC
Permalink
Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long .
I read there

"Here, samples are stored as 64-bit values"

Shouldn't thay say, "sample counts are stored as 64-bit values"

Also the cautions in the red box are only meant for older versions of the
program, right?

PRL
Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on
Post by Paul Licameli
others. I can tell that on Mac that is the same width because this
compiles
Post by Paul Licameli
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You
need
Post by Paul Licameli
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow an
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone
Post by Paul Licameli
on the Audacity Google group, who was trying to export more than 13.5
hours
Post by Paul Licameli
of audio at once. The re-imported audio was truncated to less than that
length, but I had reason to believe it was because of the limitations of
the
Post by Paul Licameli
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should be
signed or unsigned. Sometimes it really is meant to be a count, as of a
buffer size, so unsigned would be appropriate, but then the width is
probably excessive for the purpose. (The width of size_t is only 4
bytes on
Post by Paul Licameli
my Mac.)
But sometimes it is meant as a quantized time identifying a place in an
audio track, and negative times are not impossible (see what time shift
can
Post by Paul Licameli
do), so it should be signed, and also very wide, if we really mean to
support days-long tracks and the possible very big positive values.
PRL
------------------------------------------------------------
------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Steve the Fiddle
2016-08-17 07:43:48 UTC
Permalink
Post by Paul Licameli
Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long .
I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older versions of the
program, right?
That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.

Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).

These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.

Steve
Post by Paul Licameli
PRL
Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale
typedef sampleCount is a signed 64 bit quantity on Windows, and long long on
others. I can tell that on Mac that is the same width because this compiles
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow an
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from someone
on the Audacity Google group, who was trying to export more than 13.5 hours
of audio at once. The re-imported audio was truncated to less than that
length, but I had reason to believe it was because of the limitations of the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should be
signed or unsigned. Sometimes it really is meant to be a count, as of a
buffer size, so unsigned would be appropriate, but then the width is
probably excessive for the purpose. (The width of size_t is only 4 bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying a place in an
audio track, and negative times are not impossible (see what time shift can
do), so it should be signed, and also very wide, if we really mean to
support days-long tracks and the possible very big positive values.
PRL
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
Paul Licameli
2016-08-17 11:14:15 UTC
Permalink
I think that when sample counts were changed to 64 bit values, not all of
the very many uses of that type were reexamined for correctness. There may
still be many narrowing conversions where a sample count is assigned to a
32 bit int, with loss of information. The int and long types are each only
4 bytes wide in the Mac compiler.

But also, as I say, the wide type is used unnecessarily in places for the
size of a buffer of samples in memory, rather than indicating a position in
a sound file. In such cases size_t ought to be used. We still compile as
a 32 bit executable with a 4GB address space where size_t is four bytes
wide. Also size_t is an unsigned type but sampleCount is signed.

PRL
Post by Steve the Fiddle
Post by Paul Licameli
Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long .
I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older versions of the
program, right?
That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.
Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).
These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.
Steve
Post by Paul Licameli
PRL
Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale
typedef sampleCount is a signed 64 bit quantity on Windows, and long long on
others. I can tell that on Mac that is the same width because this compiles
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow an
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from someone
on the Audacity Google group, who was trying to export more than 13.5 hours
of audio at once. The re-imported audio was truncated to less than
that
Post by Paul Licameli
Post by Gale Andrews
length, but I had reason to believe it was because of the limitations
of
Post by Paul Licameli
Post by Gale Andrews
the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should be
signed or unsigned. Sometimes it really is meant to be a count, as of
a
Post by Paul Licameli
Post by Gale Andrews
buffer size, so unsigned would be appropriate, but then the width is
probably excessive for the purpose. (The width of size_t is only 4 bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an
Post by Paul Licameli
Post by Gale Andrews
audio track, and negative times are not impossible (see what time
shift
Post by Paul Licameli
Post by Gale Andrews
can
do), so it should be signed, and also very wide, if we really mean to
support days-long tracks and the possible very big positive values.
PRL
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Paul Licameli
2016-08-18 13:00:58 UTC
Permalink
Just a curiosity, but I think the 64 bit offset gives us a practical limit
of 101 years at 44100 kHz sampling and 16 bit PCM format -- not the really
astronomical numbers you might imagine.

Why?

Whenever a WaveClip is in memory, we store a contiguous array of structures
called SeqBlock describing the block files.
Each SeqBlock structure occupies 16 bytes (2^4).
The outer limit of addressable memory is 2^32 bytes, so long as we compile
a 32 bit executable.
Therefore, 2^28 SeqBlocks is an upper limit for the SeqBlocks we can have
in memory at a time.

Each block file is now limited to 1 MB (2 ^ 20 bytes) of sample data,
whatever the format.
The smallest format is 16 bit (2-byte) PCM.
Therefore a block has at most 2^19 samples.

Therefore a project is limited to 2^47 samples.

At 44100 sampling rate, that is 101.1291 years. (At 384000, you get 11.614
years.)

Ample for foreseeable usage of the program. Still a far cry from the 13.26
million years you might naively calculate assuming a project of 2^64
samples, ignoring the other machine limitations.

PRL
Post by Paul Licameli
I think that when sample counts were changed to 64 bit values, not all of
the very many uses of that type were reexamined for correctness. There may
still be many narrowing conversions where a sample count is assigned to a
32 bit int, with loss of information. The int and long types are each only
4 bytes wide in the Mac compiler.
But also, as I say, the wide type is used unnecessarily in places for the
size of a buffer of samples in memory, rather than indicating a position in
a sound file. In such cases size_t ought to be used. We still compile as
a 32 bit executable with a 4GB address space where size_t is four bytes
wide. Also size_t is an unsigned type but sampleCount is signed.
PRL
On Wed, Aug 17, 2016 at 3:43 AM, Steve the Fiddle <
Post by Steve the Fiddle
Post by Paul Licameli
Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long .
I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older versions of
the
Post by Paul Licameli
program, right?
That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.
Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).
These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.
Steve
Post by Paul Licameli
PRL
Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale
typedef sampleCount is a signed 64 bit quantity on Windows, and long long on
others. I can tell that on Mac that is the same width because this compiles
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide?
You
Post by Paul Licameli
Post by Gale Andrews
need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow an
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from someone
on the Audacity Google group, who was trying to export more than 13.5 hours
of audio at once. The re-imported audio was truncated to less than
that
Post by Paul Licameli
Post by Gale Andrews
length, but I had reason to believe it was because of the
limitations of
Post by Paul Licameli
Post by Gale Andrews
the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount
should
Post by Paul Licameli
Post by Gale Andrews
be
signed or unsigned. Sometimes it really is meant to be a count, as
of a
Post by Paul Licameli
Post by Gale Andrews
buffer size, so unsigned would be appropriate, but then the width is
probably excessive for the purpose. (The width of size_t is only 4 bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an
Post by Paul Licameli
Post by Gale Andrews
audio track, and negative times are not impossible (see what time
shift
Post by Paul Licameli
Post by Gale Andrews
can
do), so it should be signed, and also very wide, if we really mean to
support days-long tracks and the possible very big positive values.
PRL
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Roger Dannenberg
2016-08-18 15:07:54 UTC
Permalink
Only 101 years? I've been robbed! Can I have my money back?
Post by Paul Licameli
Just a curiosity, but I think the 64 bit offset gives us a practical
limit of 101 years at 44100 kHz sampling and 16 bit PCM format -- not
the really astronomical numbers you might imagine.
------------------------------------------------------------------------------
Adrian Wadey
2016-08-18 22:24:06 UTC
Permalink
When you run out.
Post by Roger Dannenberg
Only 101 years? I've been robbed! Can I have my money back?
Post by Paul Licameli
Just a curiosity, but I think the 64 bit offset gives us a practical
limit of 101 years at 44100 kHz sampling and 16 bit PCM format -- not
the really astronomical numbers you might imagine.
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Vaughan Johnson
2016-08-19 07:28:20 UTC
Permalink



I'd love to have a laptop that lasts more than 2 yrs!
Post by Roger Dannenberg
Only 101 years? I've been robbed! Can I have my money back?
Post by Paul Licameli
Just a curiosity, but I think the 64 bit offset gives us a practical
limit of 101 years at 44100 kHz sampling and 16 bit PCM format -- not
the really astronomical numbers you might imagine.
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Steve the Fiddle
2016-08-19 08:13:32 UTC
Permalink
Post by Vaughan Johnson
http://youtu.be/t8HoP9JHWTc
I'd love to have a laptop that lasts more than 2 yrs!
I think mine is coming up for it's 8th birthday, but it's not been
recording all that time ;-)
I started working out how big the hard drive would need to be for 101
years, but it made me feel dizzy so I gave up.

A 'performance' of John Cage's "Organ2/ASLAP" (As SLow As Possible)
began on 5th September 2001. It is due to end in 2640. A recent
snippet of the performance is posted on the official website for the
performance: http://www.aslsp.org/de/ (description in English:
http://universes-in-universe.org/eng/magazine/articles/2012/john_cage_organ_project_halberstadt)
I hope they are not trying to record it with Audacity.

Steve
Post by Vaughan Johnson
Post by Roger Dannenberg
Only 101 years? I've been robbed! Can I have my money back?
Post by Paul Licameli
Just a curiosity, but I think the 64 bit offset gives us a practical
limit of 101 years at 44100 kHz sampling and 16 bit PCM format -- not
the really astronomical numbers you might imagine.
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
Vaughan Johnson
2016-08-19 09:30:16 UTC
Permalink
2640! LOL. I hope to still be around then. And have time to listen to
the whole thing, after. Taking my supplements.

Talking about "As SLow As Possible" and dizzy, I heard this interview w/
Dizzy Gillespie on Marian McPartland's radio show where he described
playing with Ray Charles, in a big band. So Ray says "One!" to count the
tempo. Diz walks across the stage, talks a little with the bass player,
then walks back across the stage, and Ray says "two!" nearly a minute
later.

-- V
Post by Steve the Fiddle
Post by Vaughan Johnson
http://youtu.be/t8HoP9JHWTc
I'd love to have a laptop that lasts more than 2 yrs!
I think mine is coming up for it's 8th birthday, but it's not been
recording all that time ;-)
I started working out how big the hard drive would need to be for 101
years, but it made me feel dizzy so I gave up.
A 'performance' of John Cage's "Organ2/ASLAP" (As SLow As Possible)
began on 5th September 2001. It is due to end in 2640. A recent
snippet of the performance is posted on the official website for the
http://universes-in-universe.org/eng/magazine/articles/
2012/john_cage_organ_project_halberstadt)
I hope they are not trying to record it with Audacity.
Steve
Post by Vaughan Johnson
Post by Roger Dannenberg
Only 101 years? I've been robbed! Can I have my money back?
Post by Paul Licameli
Just a curiosity, but I think the 64 bit offset gives us a practical
limit of 101 years at 44100 kHz sampling and 16 bit PCM format -- not
the really astronomical numbers you might imagine.
------------------------------------------------------------
------------------
Post by Vaughan Johnson
Post by Roger Dannenberg
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Vaughan Johnson
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Federico Miyara
2016-08-19 19:04:36 UTC
Permalink
Steve,
Post by Steve the Fiddle
I started working out how big the hard drive would need to be for 101
years, but it made me feel dizzy so I gave up.
The answer is simple, at 44100 Hz, 16 bit, stereo:

101*365.25*24*60*60*2*2 = 5,6224282464e+14 bytes
Post by Steve the Fiddle
A 'performance' of John Cage's "Organ2/ASLAP" (As SLow As Possible)
began on 5th September 2001. It is due to end in 2640. A recent
snippet of the performance is posted on the official website for the
http://universes-in-universe.org/eng/magazine/articles/2012/john_cage_organ_project_halberstadt)
I hope they are not trying to record it with Audacity.
This would be theoretically possible if it could go on recording in
background while a new version is installed which then takes control.
Even if a new operating sysstem is installed. This is a bit more
difficult but not impossible.

The long-term preservation of digital information is an intriguing and
fascinating topic.

Regards,

Federico



------------------------------------------------------------------------------
Federico Miyara
2016-08-18 22:39:37 UTC
Permalink
Paul

Nice calculation... I wish some hard disk could last at least 10
years... let alone 101.1291 years

Regards,

Federico
Post by Paul Licameli
Just a curiosity, but I think the 64 bit offset gives us a practical
limit of 101 years at 44100 kHz sampling and 16 bit PCM format -- not
the really astronomical numbers you might imagine.
Why?
Whenever a WaveClip is in memory, we store a contiguous array of
structures called SeqBlock describing the block files.
Each SeqBlock structure occupies 16 bytes (2^4).
The outer limit of addressable memory is 2^32 bytes, so long as we
compile a 32 bit executable.
Therefore, 2^28 SeqBlocks is an upper limit for the SeqBlocks we can
have in memory at a time.
Each block file is now limited to 1 MB (2 ^ 20 bytes) of sample data,
whatever the format.
The smallest format is 16 bit (2-byte) PCM.
Therefore a block has at most 2^19 samples.
Therefore a project is limited to 2^47 samples.
At 44100 sampling rate, that is 101.1291 years. (At 384000, you get
11.614 years.)
Ample for foreseeable usage of the program. Still a far cry from the
13.26 million years you might naively calculate assuming a project of
2^64 samples, ignoring the other machine limitations.
PRL
On Wed, Aug 17, 2016 at 7:14 AM, Paul Licameli
I think that when sample counts were changed to 64 bit values, not
all of the very many uses of that type were reexamined for
correctness. There may still be many narrowing conversions where
a sample count is assigned to a 32 bit int, with loss of
information. The int and long types are each only 4 bytes wide in
the Mac compiler.
But also, as I say, the wide type is used unnecessarily in places
for the size of a buffer of samples in memory, rather than
indicating a position in a sound file. In such cases size_t ought
to be used. We still compile as a 32 bit executable with a 4GB
address space where size_t is four bytes wide. Also size_t is an
unsigned type but sampleCount is signed.
PRL
On Wed, Aug 17, 2016 at 3:43 AM, Steve the Fiddle
On 17 August 2016 at 02:01, Paul Licameli
On Tue, Aug 16, 2016 at 8:49 PM, Gale Andrews
Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical
limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long
<http://legacywiki.audacityteam.org/wiki/Recording_length#long> .
I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older
versions of the
program, right?
That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.
Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).
These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.
Steve
PRL
Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale
On 17 August 2016 at 00:35, Paul Licameli
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on
Windows, and long
Post by Gale Andrews
Post by Paul Licameli
long on
others. I can tell that on Mac that is the same width
because this
Post by Gale Andrews
Post by Paul Licameli
compiles
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be
this wide? You
Post by Gale Andrews
Post by Paul Licameli
need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to
overflow an
Post by Gale Andrews
Post by Paul Licameli
unsigned 32 bit integer, or half of that for a signed
integer.
Post by Gale Andrews
Post by Paul Licameli
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a
question from
Post by Gale Andrews
Post by Paul Licameli
someone
on the Audacity Google group, who was trying to export
more than 13.5
Post by Gale Andrews
Post by Paul Licameli
hours
of audio at once. The re-imported audio was truncated to
less than that
Post by Gale Andrews
Post by Paul Licameli
length, but I had reason to believe it was because of the
limitations of
Post by Gale Andrews
Post by Paul Licameli
the
.wav format and not the fault of our own code.
There is also the question whether existing uses of
sampleCount should
Post by Gale Andrews
Post by Paul Licameli
be
signed or unsigned. Sometimes it really is meant to be a
count, as of a
Post by Gale Andrews
Post by Paul Licameli
buffer size, so unsigned would be appropriate, but then
the width is
Post by Gale Andrews
Post by Paul Licameli
probably excessive for the purpose. (The width of size_t
is only 4
Post by Gale Andrews
Post by Paul Licameli
bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying
a place in an
Post by Gale Andrews
Post by Paul Licameli
audio track, and negative times are not impossible (see
what time shift
Post by Gale Andrews
Post by Paul Licameli
can
do), so it should be signed, and also very wide, if we
really mean to
Post by Gale Andrews
Post by Paul Licameli
support days-long tracks and the possible very big
positive values.
Post by Gale Andrews
Post by Paul Licameli
PRL
------------------------------------------------------------------------------
Post by Gale Andrews
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Paul Licameli
2016-08-18 22:48:19 UTC
Permalink
But I wonder what is the length of all original audio recording on all
types of analog and digital media in the world. It is surely growing
rapidly. How many orders of magnitude to you think that is? Could it
already be thousands of years? Where might it be at this century's end?

I am not very practiced with these order-of-magnitude brain teasers, like,
how many piano tuners are there in Chicago.

PRL
Paul
Nice calculation... I wish some hard disk could last at least 10 years...
let alone 101.1291 years
Regards,
Federico
Just a curiosity, but I think the 64 bit offset gives us a practical limit
of 101 years at 44100 kHz sampling and 16 bit PCM format -- not the really
astronomical numbers you might imagine.
Why?
Whenever a WaveClip is in memory, we store a contiguous array of
structures called SeqBlock describing the block files.
Each SeqBlock structure occupies 16 bytes (2^4).
The outer limit of addressable memory is 2^32 bytes, so long as we compile
a 32 bit executable.
Therefore, 2^28 SeqBlocks is an upper limit for the SeqBlocks we can have
in memory at a time.
Each block file is now limited to 1 MB (2 ^ 20 bytes) of sample data,
whatever the format.
The smallest format is 16 bit (2-byte) PCM.
Therefore a block has at most 2^19 samples.
Therefore a project is limited to 2^47 samples.
At 44100 sampling rate, that is 101.1291 years. (At 384000, you get
11.614 years.)
Ample for foreseeable usage of the program. Still a far cry from the
13.26 million years you might naively calculate assuming a project of 2^64
samples, ignoring the other machine limitations.
PRL
Post by Paul Licameli
I think that when sample counts were changed to 64 bit values, not all of
the very many uses of that type were reexamined for correctness. There may
still be many narrowing conversions where a sample count is assigned to a
32 bit int, with loss of information. The int and long types are each only
4 bytes wide in the Mac compiler.
But also, as I say, the wide type is used unnecessarily in places for the
size of a buffer of samples in memory, rather than indicating a position in
a sound file. In such cases size_t ought to be used. We still compile as
a 32 bit executable with a 4GB address space where size_t is four bytes
wide. Also size_t is an unsigned type but sampleCount is signed.
PRL
On Wed, Aug 17, 2016 at 3:43 AM, Steve the Fiddle <
Post by Steve the Fiddle
Post by Paul Licameli
Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long .
I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older versions of
the
Post by Paul Licameli
program, right?
That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.
Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).
These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.
Steve
Post by Paul Licameli
PRL
Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale
typedef sampleCount is a signed 64 bit quantity on Windows, and long long on
others. I can tell that on Mac that is the same width because this compiles
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide?
You
Post by Paul Licameli
Post by Gale Andrews
need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow an
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from someone
on the Audacity Google group, who was trying to export more than
13.5
Post by Paul Licameli
Post by Gale Andrews
hours
of audio at once. The re-imported audio was truncated to less than
that
Post by Paul Licameli
Post by Gale Andrews
length, but I had reason to believe it was because of the
limitations of
Post by Paul Licameli
Post by Gale Andrews
the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount
should
Post by Paul Licameli
Post by Gale Andrews
be
signed or unsigned. Sometimes it really is meant to be a count, as
of a
Post by Paul Licameli
Post by Gale Andrews
buffer size, so unsigned would be appropriate, but then the width is
probably excessive for the purpose. (The width of size_t is only 4 bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying a place
in an
Post by Paul Licameli
Post by Gale Andrews
audio track, and negative times are not impossible (see what time
shift
Post by Paul Licameli
Post by Gale Andrews
can
do), so it should be signed, and also very wide, if we really mean
to
Post by Paul Licameli
Post by Gale Andrews
support days-long tracks and the possible very big positive values.
PRL
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Vaughan Johnson
2016-08-19 07:30:05 UTC
Permalink
number of piano tuners must be decreasing, because u can get pnos for
*free* these days.
Post by Paul Licameli
But I wonder what is the length of all original audio recording on all
types of analog and digital media in the world. It is surely growing
rapidly. How many orders of magnitude to you think that is? Could it
already be thousands of years? Where might it be at this century's end?
I am not very practiced with these order-of-magnitude brain teasers, like,
how many piano tuners are there in Chicago.
PRL
Paul
Nice calculation... I wish some hard disk could last at least 10 years...
let alone 101.1291 years
Regards,
Federico
Just a curiosity, but I think the 64 bit offset gives us a practical
limit of 101 years at 44100 kHz sampling and 16 bit PCM format -- not the
really astronomical numbers you might imagine.
Why?
Whenever a WaveClip is in memory, we store a contiguous array of
structures called SeqBlock describing the block files.
Each SeqBlock structure occupies 16 bytes (2^4).
The outer limit of addressable memory is 2^32 bytes, so long as we
compile a 32 bit executable.
Therefore, 2^28 SeqBlocks is an upper limit for the SeqBlocks we can have
in memory at a time.
Each block file is now limited to 1 MB (2 ^ 20 bytes) of sample data,
whatever the format.
The smallest format is 16 bit (2-byte) PCM.
Therefore a block has at most 2^19 samples.
Therefore a project is limited to 2^47 samples.
At 44100 sampling rate, that is 101.1291 years. (At 384000, you get
11.614 years.)
Ample for foreseeable usage of the program. Still a far cry from the
13.26 million years you might naively calculate assuming a project of 2^64
samples, ignoring the other machine limitations.
PRL
Post by Paul Licameli
I think that when sample counts were changed to 64 bit values, not all
of the very many uses of that type were reexamined for correctness. There
may still be many narrowing conversions where a sample count is assigned to
a 32 bit int, with loss of information. The int and long types are each
only 4 bytes wide in the Mac compiler.
But also, as I say, the wide type is used unnecessarily in places for
the size of a buffer of samples in memory, rather than indicating a
position in a sound file. In such cases size_t ought to be used. We still
compile as a 32 bit executable with a 4GB address space where size_t is
four bytes wide. Also size_t is an unsigned type but sampleCount is signed.
PRL
On Wed, Aug 17, 2016 at 3:43 AM, Steve the Fiddle <
Post by Steve the Fiddle
Post by Paul Licameli
Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long .
I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older versions of
the
Post by Paul Licameli
program, right?
That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.
Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).
These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.
Steve
Post by Paul Licameli
PRL
Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and
long
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
long on
others. I can tell that on Mac that is the same width because this
compiles
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide?
You
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow
an
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from someone
on the Audacity Google group, who was trying to export more than
13.5
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
hours
of audio at once. The re-imported audio was truncated to less
than that
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
length, but I had reason to believe it was because of the
limitations of
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount
should
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
be
signed or unsigned. Sometimes it really is meant to be a count, as
of a
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
buffer size, so unsigned would be appropriate, but then the width
is
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
probably excessive for the purpose. (The width of size_t is only 4
bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying a place
in an
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
audio track, and negative times are not impossible (see what time
shift
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
can
do), so it should be signed, and also very wide, if we really mean
to
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
support days-long tracks and the possible very big positive values.
PRL
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Vaughan Johnson
2016-08-19 07:41:34 UTC
Permalink
to wit:

http://sfist.com/2016/08/17/drunk_san_francisco_man_builds_webs.php

http://sfist.com/2013/06/28/why_is_there_a_piano_on_top_of_bern.php

https://pianopound.com/about/


i just need to find time & space to get 3 or 4!


===

all my cassette tapes are failing, turning into just noise. Brownian
motion?

===


-- Vaughan
Post by Vaughan Johnson
number of piano tuners must be decreasing, because u can get pnos for
*free* these days.
Post by Paul Licameli
But I wonder what is the length of all original audio recording on all
types of analog and digital media in the world. It is surely growing
rapidly. How many orders of magnitude to you think that is? Could it
already be thousands of years? Where might it be at this century's end?
I am not very practiced with these order-of-magnitude brain teasers,
like, how many piano tuners are there in Chicago.
PRL
On Thu, Aug 18, 2016 at 6:39 PM, Federico Miyara <
Post by Federico Miyara
Paul
Nice calculation... I wish some hard disk could last at least 10
years... let alone 101.1291 years
Regards,
Federico
Just a curiosity, but I think the 64 bit offset gives us a practical
limit of 101 years at 44100 kHz sampling and 16 bit PCM format -- not the
really astronomical numbers you might imagine.
Why?
Whenever a WaveClip is in memory, we store a contiguous array of
structures called SeqBlock describing the block files.
Each SeqBlock structure occupies 16 bytes (2^4).
The outer limit of addressable memory is 2^32 bytes, so long as we
compile a 32 bit executable.
Therefore, 2^28 SeqBlocks is an upper limit for the SeqBlocks we can
have in memory at a time.
Each block file is now limited to 1 MB (2 ^ 20 bytes) of sample data,
whatever the format.
The smallest format is 16 bit (2-byte) PCM.
Therefore a block has at most 2^19 samples.
Therefore a project is limited to 2^47 samples.
At 44100 sampling rate, that is 101.1291 years. (At 384000, you get
11.614 years.)
Ample for foreseeable usage of the program. Still a far cry from the
13.26 million years you might naively calculate assuming a project of 2^64
samples, ignoring the other machine limitations.
PRL
Post by Paul Licameli
I think that when sample counts were changed to 64 bit values, not all
of the very many uses of that type were reexamined for correctness. There
may still be many narrowing conversions where a sample count is assigned to
a 32 bit int, with loss of information. The int and long types are each
only 4 bytes wide in the Mac compiler.
But also, as I say, the wide type is used unnecessarily in places for
the size of a buffer of samples in memory, rather than indicating a
position in a sound file. In such cases size_t ought to be used. We still
compile as a 32 bit executable with a 4GB address space where size_t is
four bytes wide. Also size_t is an unsigned type but sampleCount is signed.
PRL
On Wed, Aug 17, 2016 at 3:43 AM, Steve the Fiddle <
Post by Paul Licameli
Post by Paul Licameli
Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per
clip
Post by Paul Licameli
Post by Gale Andrews
http://legacywiki.audacityteam.org/wiki/Recording_length#long .
I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older versions
of the
Post by Paul Licameli
program, right?
That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.
Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).
These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.
Steve
Post by Paul Licameli
PRL
Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and
long
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
long on
others. I can tell that on Mac that is the same width because
this
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
compiles
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this
wide? You
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow
an
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question
from
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
someone
on the Audacity Google group, who was trying to export more than
13.5
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
hours
of audio at once. The re-imported audio was truncated to less
than that
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
length, but I had reason to believe it was because of the
limitations of
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount
should
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
be
signed or unsigned. Sometimes it really is meant to be a count,
as of a
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
buffer size, so unsigned would be appropriate, but then the width
is
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
probably excessive for the purpose. (The width of size_t is only
4
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying a place
in an
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
audio track, and negative times are not impossible (see what time
shift
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
can
do), so it should be signed, and also very wide, if we really
mean to
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
support days-long tracks and the possible very big positive
values.
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
PRL
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Gale Andrews
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Martyn Shaw
2016-08-23 23:50:33 UTC
Permalink
Hi

Has this question been answered satisfactorily?

Is it still an open question?

TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width because
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was truncated
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should
be signed or unsigned. Sometimes it really is meant to be a count, as
of a buffer size, so unsigned would be appropriate, but then the width
is probably excessive for the purpose. (The width of size_t is only 4
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
Paul Licameli
2016-08-24 13:38:02 UTC
Permalink
I think I have got satsifactory answers.

Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type, called
sampleCount.

My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.

Where we need an offset into an audio file or into a WaveTrack or WaveClip
or Sequence -- use sampleCount. Where we need the difference of two such
values, also use sampleCount.

But where describing the number of samples that fit into a buffer in memory
or in a block file, I think size_t should be used instead.

sampleCount should not be used just whenever counting samples. Perhaps the
type name was a misnomer and sampleOffset would have been better.

PRL
Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width because
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was truncated
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should
be signed or unsigned. Sometimes it really is meant to be a count, as
of a buffer size, so unsigned would be appropriate, but then the width
is probably excessive for the purpose. (The width of size_t is only 4
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------
------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Roger Dannenberg
2016-08-24 14:58:29 UTC
Permalink
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer
sizes are a constant source of pain and portability problems in C and
C++. Looking at this historically, the mindset when C and C++ were
developed was that we needed a high-level way to express the low-level
stuff we wanted to do: manipulate bits, use memory efficiently, etc. C
tries to have it both ways: if you want an integer, you write int and
let the compiler pick the best (fastest?) implementation, but if you
want a 32-bit int, you use conditional compilation or system-dependent
definitions to get it. In practice, all this depends on being very
careful with assumptions and definitions. I think one of the reasons
Java and Python became popular is that they handle portability at the
virtual machine level, so you don't have int meaning different things on
different systems.

Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in
terms of ideal integers most of the time, and thinking about how
implementations can diverge from that perfect model is a huge burden,
especially in systems like Audacity that are both large and worked on by
many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_Infinitely_Ranged_Integer_Model
for some serious integer-related bug discovery.)

If we spend any time optimizing, it should be based on profiling - since
cache lines are longer than 64-bits, I bet the additional overhead of
64-bit ints is very small compared to 32-bits and the additional
overhead of arithmetic on 64-bit values in the cache is swamped by
memory loads and stores.

To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.

-Roger
Post by Paul Licameli
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type,
called sampleCount.
My examination of the code, though, convinces me that this type is
also used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference
of two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples.
Perhaps the type name was a misnomer and sampleOffset would have been
better.
PRL
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width
because
Post by Paul Licameli
this compiles without error (using the new C++11 keyword
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this
wide? You
Post by Paul Licameli
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to
overflow
Post by Paul Licameli
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was
truncated
Post by Paul Licameli
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our
own code.
Post by Paul Licameli
There is also the question whether existing uses of sampleCount
should
Post by Paul Licameli
be signed or unsigned. Sometimes it really is meant to be a
count, as
Post by Paul Licameli
of a buffer size, so unsigned would be appropriate, but then the
width
Post by Paul Licameli
is probably excessive for the purpose. (The width of size_t is
only 4
Post by Paul Licameli
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Paul Licameli
2016-08-24 15:14:22 UTC
Permalink
Good thoughts, Roger.

But my motivation is not so much about the optimization, as instead the
appropriate use of size_t which is the unsigned type describing sizes of
things that are certain to fit inside of memory.

I see too much indiscriminate use of sampleCount expressions (which might
be large, or negative) as subscripts or as sizes passed to the array
version of operator new. I got the bug in my brain now to inspect all of
those conversions for correctness. That's not so hard as it seems, if you
use the type system the right way to help the C++ compiler help you. I can
redefine types in my build so that compilation fails unless I do something
explicit when sampleCount values need to be narrowed, and find a proof in
those places that the narrowing is not losing any nonzero bits.

PRL
Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer
sizes are a constant source of pain and portability problems in C and C++.
Looking at this historically, the mindset when C and C++ were developed was
that we needed a high-level way to express the low-level stuff we wanted to
do: manipulate bits, use memory efficiently, etc. C tries to have it both
ways: if you want an integer, you write int and let the compiler pick the
best (fastest?) implementation, but if you want a 32-bit int, you use
conditional compilation or system-dependent definitions to get it. In
practice, all this depends on being very careful with assumptions and
definitions. I think one of the reasons Java and Python became popular is
that they handle portability at the virtual machine level, so you don't
have int meaning different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in
terms of ideal integers most of the time, and thinking about how
implementations can diverge from that perfect model is a huge burden,
especially in systems like Audacity that are both large and worked on by
many people. (C.f. https://www.researchgate.net/
publication/221542289_As-If_Infinitely_Ranged_Integer_Model for some
serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling - since
cache lines are longer than 64-bits, I bet the additional overhead of
64-bit ints is very small compared to 32-bits and the additional overhead
of arithmetic on 64-bit values in the cache is swamped by memory loads and
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type, called
sampleCount.
My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or WaveClip
or Sequence -- use sampleCount. Where we need the difference of two such
values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples. Perhaps
the type name was a misnomer and sampleOffset would have been better.
PRL
Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width because
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was truncated
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should
be signed or unsigned. Sometimes it really is meant to be a count, as
of a buffer size, so unsigned would be appropriate, but then the width
is probably excessive for the purpose. (The width of size_t is only 4
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------
------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Paul Licameli
2016-08-24 15:17:32 UTC
Permalink
... And let me add that in commit 1189cfd62a2d30f5e61c00183a387c3d767ed698
I have commented the few such conversions that I have decided are the
really problematic ones.

PRL
Post by Paul Licameli
Good thoughts, Roger.
But my motivation is not so much about the optimization, as instead the
appropriate use of size_t which is the unsigned type describing sizes of
things that are certain to fit inside of memory.
I see too much indiscriminate use of sampleCount expressions (which might
be large, or negative) as subscripts or as sizes passed to the array
version of operator new. I got the bug in my brain now to inspect all of
those conversions for correctness. That's not so hard as it seems, if you
use the type system the right way to help the C++ compiler help you. I can
redefine types in my build so that compilation fails unless I do something
explicit when sampleCount values need to be narrowed, and find a proof in
those places that the narrowing is not losing any nonzero bits.
PRL
Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer
sizes are a constant source of pain and portability problems in C and C++.
Looking at this historically, the mindset when C and C++ were developed was
that we needed a high-level way to express the low-level stuff we wanted to
do: manipulate bits, use memory efficiently, etc. C tries to have it both
ways: if you want an integer, you write int and let the compiler pick the
best (fastest?) implementation, but if you want a 32-bit int, you use
conditional compilation or system-dependent definitions to get it. In
practice, all this depends on being very careful with assumptions and
definitions. I think one of the reasons Java and Python became popular is
that they handle portability at the virtual machine level, so you don't
have int meaning different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in
terms of ideal integers most of the time, and thinking about how
implementations can diverge from that perfect model is a huge burden,
especially in systems like Audacity that are both large and worked on by
many people. (C.f. https://www.researchgate.net/p
ublication/221542289_As-If_Infinitely_Ranged_Integer_Model for some
serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling - since
cache lines are longer than 64-bits, I bet the additional overhead of
64-bit ints is very small compared to 32-bits and the additional overhead
of arithmetic on 64-bit values in the cache is swamped by memory loads and
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type, called
sampleCount.
My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference of
two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples. Perhaps
the type name was a misnomer and sampleOffset would have been better.
PRL
Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width because
this compiles without error (using the new C++11 keyword
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was truncated
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our own
code.
Post by Paul Licameli
There is also the question whether existing uses of sampleCount should
be signed or unsigned. Sometimes it really is meant to be a count, as
of a buffer size, so unsigned would be appropriate, but then the width
is probably excessive for the purpose. (The width of size_t is only 4
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------
------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Steve the Fiddle
2016-08-24 15:49:41 UTC
Permalink
The other 'benefit' of using sampleCount when counting samples, even
in cases where size_t is big enough, is that it can be a reminder that
we are dealing with a count of samples and not some other integer
quantity, and so avoid silly mistakes such as multiplying a buffer
size by a loop count and ending up with a duration in samples as
size_t. We only need one such bug to creep in to make the time spent
changing occurrences of sampleCount to size_t counter-productive. I'm
not keen on fixing things that aren't broken, especially if they make
the meaning/intent less clear.

Steve
... And let me add that in commit 1189cfd62a2d30f5e61c00183a387c3d767ed698 I
have commented the few such conversions that I have decided are the really
problematic ones.
PRL
Post by Paul Licameli
Good thoughts, Roger.
But my motivation is not so much about the optimization, as instead the
appropriate use of size_t which is the unsigned type describing sizes of
things that are certain to fit inside of memory.
I see too much indiscriminate use of sampleCount expressions (which might
be large, or negative) as subscripts or as sizes passed to the array version
of operator new. I got the bug in my brain now to inspect all of those
conversions for correctness. That's not so hard as it seems, if you use the
type system the right way to help the C++ compiler help you. I can redefine
types in my build so that compilation fails unless I do something explicit
when sampleCount values need to be narrowed, and find a proof in those
places that the narrowing is not losing any nonzero bits.
PRL
Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer sizes
are a constant source of pain and portability problems in C and C++. Looking
at this historically, the mindset when C and C++ were developed was that we
if you want an integer, you write int and let the compiler pick the best
(fastest?) implementation, but if you want a 32-bit int, you use conditional
compilation or system-dependent definitions to get it. In practice, all this
depends on being very careful with assumptions and definitions. I think one
of the reasons Java and Python became popular is that they handle
portability at the virtual machine level, so you don't have int meaning
different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in terms
of ideal integers most of the time, and thinking about how implementations
can diverge from that perfect model is a huge burden, especially in systems
like Audacity that are both large and worked on by many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_Infinitely_Ranged_Integer_Model
for some serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling - since
cache lines are longer than 64-bits, I bet the additional overhead of 64-bit
ints is very small compared to 32-bits and the additional overhead of
arithmetic on 64-bit values in the cache is swamped by memory loads and
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type, called
sampleCount.
My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference of
two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples. Perhaps
the type name was a misnomer and sampleOffset would have been better.
PRL
Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width because
this compiles without error (using the new C++11 keyword
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was truncated
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should
be signed or unsigned. Sometimes it really is meant to be a count, as
of a buffer size, so unsigned would be appropriate, but then the width
is probably excessive for the purpose. (The width of size_t is only 4
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
Steve the Fiddle
2016-08-24 16:10:53 UTC
Permalink
In commit 1189cfd62, is this safe on all platforms?

// Protect Nyquist from selections greater than 2^31 samples (bug 439)
#define NYQ_MAX_LEN (std::numeric_limits<long>::max())

Steve
Post by Steve the Fiddle
The other 'benefit' of using sampleCount when counting samples, even
in cases where size_t is big enough, is that it can be a reminder that
we are dealing with a count of samples and not some other integer
quantity, and so avoid silly mistakes such as multiplying a buffer
size by a loop count and ending up with a duration in samples as
size_t. We only need one such bug to creep in to make the time spent
changing occurrences of sampleCount to size_t counter-productive. I'm
not keen on fixing things that aren't broken, especially if they make
the meaning/intent less clear.
Steve
... And let me add that in commit 1189cfd62a2d30f5e61c00183a387c3d767ed698 I
have commented the few such conversions that I have decided are the really
problematic ones.
PRL
Post by Paul Licameli
Good thoughts, Roger.
But my motivation is not so much about the optimization, as instead the
appropriate use of size_t which is the unsigned type describing sizes of
things that are certain to fit inside of memory.
I see too much indiscriminate use of sampleCount expressions (which might
be large, or negative) as subscripts or as sizes passed to the array version
of operator new. I got the bug in my brain now to inspect all of those
conversions for correctness. That's not so hard as it seems, if you use the
type system the right way to help the C++ compiler help you. I can redefine
types in my build so that compilation fails unless I do something explicit
when sampleCount values need to be narrowed, and find a proof in those
places that the narrowing is not losing any nonzero bits.
PRL
Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer sizes
are a constant source of pain and portability problems in C and C++. Looking
at this historically, the mindset when C and C++ were developed was that we
if you want an integer, you write int and let the compiler pick the best
(fastest?) implementation, but if you want a 32-bit int, you use conditional
compilation or system-dependent definitions to get it. In practice, all this
depends on being very careful with assumptions and definitions. I think one
of the reasons Java and Python became popular is that they handle
portability at the virtual machine level, so you don't have int meaning
different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in terms
of ideal integers most of the time, and thinking about how implementations
can diverge from that perfect model is a huge burden, especially in systems
like Audacity that are both large and worked on by many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_Infinitely_Ranged_Integer_Model
for some serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling - since
cache lines are longer than 64-bits, I bet the additional overhead of 64-bit
ints is very small compared to 32-bits and the additional overhead of
arithmetic on 64-bit values in the cache is swamped by memory loads and
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type, called
sampleCount.
My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference of
two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples. Perhaps
the type name was a misnomer and sampleOffset would have been better.
PRL
Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width because
this compiles without error (using the new C++11 keyword
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was truncated
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should
be signed or unsigned. Sometimes it really is meant to be a count, as
of a buffer size, so unsigned would be appropriate, but then the width
is probably excessive for the purpose. (The width of size_t is only 4
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
James Crook
2016-08-24 16:28:23 UTC
Permalink
I think you (Steve), Paul and Roger are all correct. And I still
approve of what Paul is doing.

We had a recent CVE, fixed in Audacity 2.1.2, which was related to
integer overflow in a library we used. Paul's work is helping ensure we
don't have similar issues with our own code. Our existing code is an
unholy mix of integer and floating types. I see what Paul is doing less
as optimisation (our code often is already partially optimised), so much
as giving us more guarantees of correctness on the conversions. If it
ain't broke don't fix it is good. However I'm fairly sure there are
pieces of our code that are broke, and we don't know it, and Paul is
finding/fixing some of those in his integer conversion review.

--James.
Post by Steve the Fiddle
The other 'benefit' of using sampleCount when counting samples, even
in cases where size_t is big enough, is that it can be a reminder that
we are dealing with a count of samples and not some other integer
quantity, and so avoid silly mistakes such as multiplying a buffer
size by a loop count and ending up with a duration in samples as
size_t. We only need one such bug to creep in to make the time spent
changing occurrences of sampleCount to size_t counter-productive. I'm
not keen on fixing things that aren't broken, especially if they make
the meaning/intent less clear.
Steve
... And let me add that in commit 1189cfd62a2d30f5e61c00183a387c3d767ed698 I
have commented the few such conversions that I have decided are the really
problematic ones.
PRL
Post by Paul Licameli
Good thoughts, Roger.
But my motivation is not so much about the optimization, as instead the
appropriate use of size_t which is the unsigned type describing sizes of
things that are certain to fit inside of memory.
I see too much indiscriminate use of sampleCount expressions (which might
be large, or negative) as subscripts or as sizes passed to the array version
of operator new. I got the bug in my brain now to inspect all of those
conversions for correctness. That's not so hard as it seems, if you use the
type system the right way to help the C++ compiler help you. I can redefine
types in my build so that compilation fails unless I do something explicit
when sampleCount values need to be narrowed, and find a proof in those
places that the narrowing is not losing any nonzero bits.
PRL
Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer sizes
are a constant source of pain and portability problems in C and C++. Looking
at this historically, the mindset when C and C++ were developed was that we
if you want an integer, you write int and let the compiler pick the best
(fastest?) implementation, but if you want a 32-bit int, you use conditional
compilation or system-dependent definitions to get it. In practice, all this
depends on being very careful with assumptions and definitions. I think one
of the reasons Java and Python became popular is that they handle
portability at the virtual machine level, so you don't have int meaning
different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in terms
of ideal integers most of the time, and thinking about how implementations
can diverge from that perfect model is a huge burden, especially in systems
like Audacity that are both large and worked on by many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_Infinitely_Ranged_Integer_Model
for some serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling - since
cache lines are longer than 64-bits, I bet the additional overhead of 64-bit
ints is very small compared to 32-bits and the additional overhead of
arithmetic on 64-bit values in the cache is swamped by memory loads and
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type, called
sampleCount.
My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference of
two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples. Perhaps
the type name was a misnomer and sampleOffset would have been better.
PRL
Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width because
this compiles without error (using the new C++11 keyword
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was truncated
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should
be signed or unsigned. Sometimes it really is meant to be a count, as
of a buffer size, so unsigned would be appropriate, but then the width
is probably excessive for the purpose. (The width of size_t is only 4
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
Paul Licameli
2016-08-24 18:36:19 UTC
Permalink
Thank you for supporting this, James.

Really this thing began as a big detour from the project of eliminating
naked news and deletes. I have finished that for the scalar case. But not
for the array new[] and delete[]. I wanted to wrap those in a class and
require the size arguments to be of type size_t. Then I found I was
casting sampleCount to size_t far too often. So I was led to reexamine all
the uses of sampleCount.

PRL
Post by James Crook
I think you (Steve), Paul and Roger are all correct. And I still
approve of what Paul is doing.
We had a recent CVE, fixed in Audacity 2.1.2, which was related to
integer overflow in a library we used. Paul's work is helping ensure we
don't have similar issues with our own code. Our existing code is an
unholy mix of integer and floating types. I see what Paul is doing less
as optimisation (our code often is already partially optimised), so much
as giving us more guarantees of correctness on the conversions. If it
ain't broke don't fix it is good. However I'm fairly sure there are
pieces of our code that are broke, and we don't know it, and Paul is
finding/fixing some of those in his integer conversion review.
--James.
Post by Steve the Fiddle
The other 'benefit' of using sampleCount when counting samples, even
in cases where size_t is big enough, is that it can be a reminder that
we are dealing with a count of samples and not some other integer
quantity, and so avoid silly mistakes such as multiplying a buffer
size by a loop count and ending up with a duration in samples as
size_t. We only need one such bug to creep in to make the time spent
changing occurrences of sampleCount to size_t counter-productive. I'm
not keen on fixing things that aren't broken, especially if they make
the meaning/intent less clear.
Steve
Post by Paul Licameli
... And let me add that in commit 1189cfd62a2d30f5e61c00183a387c3d767ed698
I
Post by Steve the Fiddle
Post by Paul Licameli
have commented the few such conversions that I have decided are the
really
Post by Steve the Fiddle
Post by Paul Licameli
problematic ones.
PRL
On Wed, Aug 24, 2016 at 11:14 AM, Paul Licameli <
Post by Paul Licameli
Good thoughts, Roger.
But my motivation is not so much about the optimization, as instead the
appropriate use of size_t which is the unsigned type describing sizes
of
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
things that are certain to fit inside of memory.
I see too much indiscriminate use of sampleCount expressions (which
might
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
be large, or negative) as subscripts or as sizes passed to the array
version
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
of operator new. I got the bug in my brain now to inspect all of those
conversions for correctness. That's not so hard as it seems, if you
use the
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
type system the right way to help the C++ compiler help you. I can
redefine
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
types in my build so that compilation fails unless I do something
explicit
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
when sampleCount values need to be narrowed, and find a proof in those
places that the narrowing is not losing any nonzero bits.
PRL
Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and
integer sizes
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
are a constant source of pain and portability problems in C and C++.
Looking
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
at this historically, the mindset when C and C++ were developed was
that we
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
needed a high-level way to express the low-level stuff we wanted to
manipulate bits, use memory efficiently, etc. C tries to have it both
if you want an integer, you write int and let the compiler pick the
best
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
(fastest?) implementation, but if you want a 32-bit int, you use
conditional
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
compilation or system-dependent definitions to get it. In practice,
all this
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
depends on being very careful with assumptions and definitions. I
think one
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
of the reasons Java and Python became popular is that they handle
portability at the virtual machine level, so you don't have int
meaning
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think
in terms
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
of ideal integers most of the time, and thinking about how
implementations
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
can diverge from that perfect model is a huge burden, especially in
systems
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
like Audacity that are both large and worked on by many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_
Infinitely_Ranged_Integer_Model
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
for some serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling -
since
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
cache lines are longer than 64-bits, I bet the additional overhead of
64-bit
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
ints is very small compared to 32-bits and the additional overhead of
arithmetic on 64-bit values in the cache is swamped by memory loads
and
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type,
called
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
sampleCount.
My examination of the code, though, convinces me that this type is
also
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the
difference of
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples.
Perhaps
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
the type name was a misnomer and sampleOffset would have been better.
PRL
Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width
because
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide?
You
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to
overflow
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was
truncated
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount
should
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
be signed or unsigned. Sometimes it really is meant to be a count,
as
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
of a buffer size, so unsigned would be appropriate, but then the
width
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
is probably excessive for the purpose. (The width of size_t is
only 4
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------
------------------
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Steve the Fiddle
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Steve the Fiddle
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Steve the Fiddle
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
James Crook
2016-08-24 19:01:39 UTC
Permalink
Post by Paul Licameli
Thank you for supporting this, James.
Very easy to support :-)
Post by Paul Licameli
Really this thing began as a big detour from the project of eliminating
naked news and deletes. I have finished that for the scalar case. But not
for the array new[] and delete[]. I wanted to wrap those in a class and
require the size arguments to be of type size_t. Then I found I was
casting sampleCount to size_t far too often. So I was led to reexamine all
the uses of sampleCount.
Such is the way. Audacity is very tangled. Lessons to be learned from
why, and interesting to think about alternatives - not just in the
int-sizing context.
Post by Paul Licameli
PRL
Steve (in an offlist) is terrified of something being broken by the
extensive changes. I think if you find tangible examples of where
narrowing potentially cost us dearly - in the same way that the CVE did
- that would help shift the emotion. The 'risk' (which is actually
small when looked at right) is more than offset by the tangible gain.
Steve doesn't see 'automatically checked' code as such a tangible gain,
as the idioms are unfamiliar to him, and he isn't expecting to get used
to them quickly. The idioms aren't familiar to me either, but I do
expect to get used to them quickly.


--James.





------------------------------------------------------------------------------
Steve the Fiddle
2016-08-24 20:35:44 UTC
Permalink
Post by James Crook
Post by Paul Licameli
Thank you for supporting this, James.
Very easy to support :-)
Post by Paul Licameli
Really this thing began as a big detour from the project of eliminating
naked news and deletes. I have finished that for the scalar case. But not
for the array new[] and delete[]. I wanted to wrap those in a class and
require the size arguments to be of type size_t. Then I found I was
casting sampleCount to size_t far too often. So I was led to reexamine all
the uses of sampleCount.
Such is the way. Audacity is very tangled. Lessons to be learned from
why, and interesting to think about alternatives - not just in the
int-sizing context.
Post by Paul Licameli
PRL
Steve (in an offlist) is terrified of something being broken by the
extensive changes.
Somewhat over-egged James. I'm "concerned", which considering the vast
number of changes and the few tangible benefits I don't think is
unreasonable.

I can see that code review has benefits for "quality assurance", but
I'm less enthused by changes that appear (to me) to be cosmetic. James
has kindly explained to me that there are possibly marginal benefits
to some of the changes that I saw to be cosmetic, so I am partially
reassured by James taking some responsibility in supporting these
changes.

Steve
Post by James Crook
I think if you find tangible examples of where
narrowing potentially cost us dearly - in the same way that the CVE did
- that would help shift the emotion. The 'risk' (which is actually
small when looked at right) is more than offset by the tangible gain.
Steve doesn't see 'automatically checked' code as such a tangible gain,
as the idioms are unfamiliar to him, and he isn't expecting to get used
to them quickly. The idioms aren't familiar to me either, but I do
expect to get used to them quickly.
--James.
------------------------------------------------------------------------------
Paul Licameli
2016-08-24 20:36:51 UTC
Permalink
Post by James Crook
Post by Paul Licameli
Thank you for supporting this, James.
Very easy to support :-)
Post by Paul Licameli
Really this thing began as a big detour from the project of eliminating
naked news and deletes. I have finished that for the scalar case. But
not
Post by Paul Licameli
for the array new[] and delete[]. I wanted to wrap those in a class and
require the size arguments to be of type size_t. Then I found I was
casting sampleCount to size_t far too often. So I was led to reexamine
all
Post by Paul Licameli
the uses of sampleCount.
Such is the way. Audacity is very tangled. Lessons to be learned from
why, and interesting to think about alternatives - not just in the
int-sizing context.
Post by Paul Licameli
PRL
Steve (in an offlist) is terrified of something being broken by the
extensive changes. I think if you find tangible examples of where
narrowing potentially cost us dearly - in the same way that the CVE did
- that would help shift the emotion. The 'risk' (which is actually
Reming me, what is CVE?

PRL
Post by James Crook
small when looked at right) is more than offset by the tangible gain.
Steve doesn't see 'automatically checked' code as such a tangible gain,
as the idioms are unfamiliar to him, and he isn't expecting to get used
to them quickly. The idioms aren't familiar to me either, but I do
expect to get used to them quickly.
--James.
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
James Crook
2016-08-24 22:05:56 UTC
Permalink
Post by Paul Licameli
Reming me, what is CVE?
http://www.cvedetails.com/cve/CVE-2009-0490/

--James.

------------------------------------------------------------------------------
Richard Ash
2016-08-29 21:14:18 UTC
Permalink
On Wed, 24 Aug 2016 14:36:19 -0400
Post by Paul Licameli
Thank you for supporting this, James.
Really this thing began as a big detour from the project of
eliminating naked news and deletes. I have finished that for the
scalar case. But not for the array new[] and delete[]. I wanted to
wrap those in a class
This is something I can thoroughly support.
Post by Paul Licameli
and require the size arguments to be of type
size_t. Then I found I was casting sampleCount to size_t far too
often. So I was led to reexamine all the uses of sampleCount.
This doesn't make any sense to me however. If you need casts to make
your own class work, then it's almost certainly designed wrong.

I would expect your class to have an overload which accepts sampleCount
arguments (with a warning prgama on the method if you really feel it is
undesirable to use), which then does a safe (i.e. runtime range checked)
conversion to size_t internally. This has the property of minimising
the scope of changes, whilst still enabling and encouraging better error
checking (which is a very desirable aim).

Note also that the definition of size_t varies between platforms, so
you will need to be careful about that conversion (another good reason
for doing it in an overload method, not by casting).

Richard

------------------------------------------------------------------------------
Paul Licameli
2016-08-24 16:28:48 UTC
Permalink
Post by Steve the Fiddle
The other 'benefit' of using sampleCount when counting samples, even
in cases where size_t is big enough, is that it can be a reminder that
we are dealing with a count of samples and not some other integer
quantity, and so avoid silly mistakes such as multiplying a buffer
size by a loop count and ending up with a duration in samples as
size_t. We only need one such bug to creep in to make the time spent
changing occurrences of sampleCount to size_t counter-productive. I'm
not keen on fixing things that aren't broken, especially if they make
the meaning/intent less clear.
Steve
I don't understand the hypothetical. But it is true that there are places
where a sampleCount variable accumulates a sum of buffer sizes, and the sum
may need to grow beyond the bounds of size_t.

That's widening, which doesn't bother me so much. It's incorrect narrowing
of sampleCount that bothers me.

And in fact there are a few places where a sampleCount - typed expression
is assigned wrongly to a long or int variable. It would be useful to
redefine sampleCount not simply as a type alias, but as a class with
explicit conversion operators, so that such errors would simply fail
compilation.

PRL
Post by Steve the Fiddle
Post by Paul Licameli
... And let me add that in commit 1189cfd62a2d30f5e61c00183a387c3d767ed698
I
Post by Paul Licameli
have commented the few such conversions that I have decided are the
really
Post by Paul Licameli
problematic ones.
PRL
Post by Paul Licameli
Good thoughts, Roger.
But my motivation is not so much about the optimization, as instead the
appropriate use of size_t which is the unsigned type describing sizes of
things that are certain to fit inside of memory.
I see too much indiscriminate use of sampleCount expressions (which
might
Post by Paul Licameli
Post by Paul Licameli
be large, or negative) as subscripts or as sizes passed to the array
version
Post by Paul Licameli
Post by Paul Licameli
of operator new. I got the bug in my brain now to inspect all of those
conversions for correctness. That's not so hard as it seems, if you
use the
Post by Paul Licameli
Post by Paul Licameli
type system the right way to help the C++ compiler help you. I can
redefine
Post by Paul Licameli
Post by Paul Licameli
types in my build so that compilation fails unless I do something
explicit
Post by Paul Licameli
Post by Paul Licameli
when sampleCount values need to be narrowed, and find a proof in those
places that the narrowing is not losing any nonzero bits.
PRL
Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer
sizes
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
are a constant source of pain and portability problems in C and C++.
Looking
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
at this historically, the mindset when C and C++ were developed was
that we
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
manipulate bits, use memory efficiently, etc. C tries to have it both
if you want an integer, you write int and let the compiler pick the
best
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
(fastest?) implementation, but if you want a 32-bit int, you use
conditional
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
compilation or system-dependent definitions to get it. In practice,
all this
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
depends on being very careful with assumptions and definitions. I
think one
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
of the reasons Java and Python became popular is that they handle
portability at the virtual machine level, so you don't have int meaning
different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in
terms
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
of ideal integers most of the time, and thinking about how
implementations
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
can diverge from that perfect model is a huge burden, especially in
systems
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
like Audacity that are both large and worked on by many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_
Infinitely_Ranged_Integer_Model
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
for some serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling -
since
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
cache lines are longer than 64-bits, I bet the additional overhead of
64-bit
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
ints is very small compared to 32-bits and the additional overhead of
arithmetic on 64-bit values in the cache is swamped by memory loads and
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type,
called
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
sampleCount.
My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference
of
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples. Perhaps
the type name was a misnomer and sampleOffset would have been better.
PRL
Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width
because
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide?
You
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to
overflow
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was
truncated
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount
should
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
be signed or unsigned. Sometimes it really is meant to be a count,
as
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
of a buffer size, so unsigned would be appropriate, but then the
width
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
is probably excessive for the purpose. (The width of size_t is
only 4
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
Post by Martyn Shaw
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
Post by Paul Licameli
Post by Roger Dannenberg
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Roger Dannenberg
2016-08-24 16:05:30 UTC
Permalink
I guess it all depends on being clever enough to "use the type system
the right way" -- type systems are still a very active area of research,
which to me indicates that both the C++ type system leaves a lot to be
desired, and a lot of really smart people have decided that exploring
type systems are still a worthy area to pursue. I guess my experience
trying to use C's type system has had mixed results, but I'm all in
favor of your plan to verify that narrowing is safe. -Roger
Post by Paul Licameli
Good thoughts, Roger.
But my motivation is not so much about the optimization, as instead
the appropriate use of size_t which is the unsigned type describing
sizes of things that are certain to fit inside of memory.
I see too much indiscriminate use of sampleCount expressions (which
might be large, or negative) as subscripts or as sizes passed to the
array version of operator new. I got the bug in my brain now to
inspect all of those conversions for correctness. That's not so hard
as it seems, if you use the type system the right way to help the C++
compiler help you. I can redefine types in my build so that
compilation fails unless I do something explicit when sampleCount
values need to be narrowed, and find a proof in those places that the
narrowing is not losing any nonzero bits.
PRL
My take on this is that the overhead of using 64-bit integers is
quite small in the scheme of things. Furthermore, word, pointer,
and integer sizes are a constant source of pain and portability
problems in C and C++. Looking at this historically, the mindset
when C and C++ were developed was that we needed a high-level way
to express the low-level stuff we wanted to do: manipulate bits,
use memory efficiently, etc. C tries to have it both ways: if you
want an integer, you write int and let the compiler pick the best
(fastest?) implementation, but if you want a 32-bit int, you use
conditional compilation or system-dependent definitions to get it.
In practice, all this depends on being very careful with
assumptions and definitions. I think one of the reasons Java and
Python became popular is that they handle portability at the
virtual machine level, so you don't have int meaning different
things on different systems.
Furthermore, it is very difficult to anticipate the implications
and interactions with different integer sizes -- programmers just
think in terms of ideal integers most of the time, and thinking
about how implementations can diverge from that perfect model is a
huge burden, especially in systems like Audacity that are both
large and worked on by many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_Infinitely_Ranged_Integer_Model
<https://www.researchgate.net/publication/221542289_As-If_Infinitely_Ranged_Integer_Model>
for some serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling -
since cache lines are longer than 64-bits, I bet the additional
overhead of 64-bit ints is very small compared to 32-bits and the
additional overhead of arithmetic on 64-bit values in the cache is
swamped by memory loads and stores.
To summarize, my approach would be don't sweat the 64-bit ints and
if there's any time for optimization, use profiling to spend
optimization efforts wisely.
-Roger
Post by Paul Licameli
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough
that we ought to accomodate them, therefore we use this wide
integral type, called sampleCount.
My examination of the code, though, convinces me that this type
is also used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the
difference of two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer
in memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples.
Perhaps the type name was a misnomer and sampleOffset would have
been better.
PRL
On Tue, Aug 23, 2016 at 7:50 PM, Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows,
and long
Post by Paul Licameli
long on others. I can tell that on Mac that is the same
width because
Post by Paul Licameli
this compiles without error (using the new C++11 keyword
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this
wide? You
Post by Paul Licameli
need over 27.05 hours of audio, at 44.1 kHz sampling rate,
to overflow
Post by Paul Licameli
an unsigned 32 bit integer, or half of that for a signed
integer.
Post by Paul Licameli
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a
question from
Post by Paul Licameli
someone on the Audacity Google group, who was trying to
export more
Post by Paul Licameli
than 13.5 hours of audio at once. The re-imported audio
was truncated
Post by Paul Licameli
to less than that length, but I had reason to believe it
was because
Post by Paul Licameli
of the limitations of the .wav format and not the fault of
our own code.
Post by Paul Licameli
There is also the question whether existing uses of
sampleCount should
Post by Paul Licameli
be signed or unsigned. Sometimes it really is meant to be a
count, as
Post by Paul Licameli
of a buffer size, so unsigned would be appropriate, but
then the width
Post by Paul Licameli
is probably excessive for the purpose. (The width of size_t
is only 4
Post by Paul Licameli
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a
place in
Post by Paul Licameli
an audio track, and negative times are not impossible (see
what time
Post by Paul Licameli
shift can do), so it should be signed, and also very wide,
if we
Post by Paul Licameli
really mean to support days-long tracks and the possible
very big
Post by Paul Licameli
positive values.
PRL
------------------------------------------------------------------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________ audacity-devel
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Federico Miyara
2016-08-24 19:46:30 UTC
Permalink
Dear Friends,

To clarify what seemed just a divertimento last week, it is very
unlikely that anybody will record during 101 o 639 or whatever large
number of years, but it is not unlikely that someone wants to record for
a week. People who study and analyze soundscape may be willing to record
during relatively long periods of time.

Back in the 90s (or the turn of the 2000s) there was a guy (Greg Kunkel,
http://www.bio.umass.edu/biology/kunkel/gjk/homepage.htm) who recorded
thousands of bird chirps using an unattended system... based on an AT386
computer! As he couldn't record continuously for several days because of
storage limitations, he had devised a spectrum-based trigger that caused
the computer to start recording for a given period of time each time
some spectral indicator (that revealed the presence of a bird sound) was
present. Nowadays he might be willing to get a complete record from
which to extract the interesting sounds.

Regards,

Federico
Post by Paul Licameli
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type,
called sampleCount.
My examination of the code, though, convinces me that this type is
also used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference
of two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples.
Perhaps the type name was a misnomer and sampleOffset would have been
better.
PRL
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width
because
Post by Paul Licameli
this compiles without error (using the new C++11 keyword
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this
wide? You
Post by Paul Licameli
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to
overflow
Post by Paul Licameli
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was
truncated
Post by Paul Licameli
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our
own code.
Post by Paul Licameli
There is also the question whether existing uses of sampleCount
should
Post by Paul Licameli
be signed or unsigned. Sometimes it really is meant to be a
count, as
Post by Paul Licameli
of a buffer size, so unsigned would be appropriate, but then the
width
Post by Paul Licameli
is probably excessive for the purpose. (The width of size_t is
only 4
Post by Paul Licameli
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
James Crook
2016-08-24 20:08:02 UTC
Permalink
From our code:
t1 = 1000000000.0; // record for a long, long time (tens
of years)

It's actually 31.7 years, so not even 101.

Some of our users in the past have used Audacity for recording nuisance
noises during the week, and asked about possibility of a date-based time
ruler. This was before auto-save, which I believe is what made longer
recordings impractical for us. Way back before auto-save I remember
debugging a 4-hour recording crash-issue, and Audacity was fast at
working with the audio, whereas now on more powerful computers it is
slow. That is something we should revisit as the speed of working with
the audio is a practical limit that hits us sooner than the number sizes.

--James.
Post by Federico Miyara
Dear Friends,
To clarify what seemed just a divertimento last week, it is very
unlikely that anybody will record during 101 o 639 or whatever large
number of years, but it is not unlikely that someone wants to record
for a week. People who study and analyze soundscape may be willing to
record during relatively long periods of time.
Back in the 90s (or the turn of the 2000s) there was a guy (Greg
Kunkel, http://www.bio.umass.edu/biology/kunkel/gjk/homepage.htm) who
recorded thousands of bird chirps using an unattended system... based
on an AT386 computer! As he couldn't record continuously for several
days because of storage limitations, he had devised a spectrum-based
trigger that caused the computer to start recording for a given period
of time each time some spectral indicator (that revealed the presence
of a bird sound) was present. Nowadays he might be willing to get a
complete record from which to extract the interesting sounds.
Regards,
Federico
Post by Paul Licameli
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that
we ought to accomodate them, therefore we use this wide integral
type, called sampleCount.
My examination of the code, though, convinces me that this type is
also used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the
difference of two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples.
Perhaps the type name was a misnomer and sampleOffset would have been
better.
PRL
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn
Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and
long
Post by Paul Licameli
long on others. I can tell that on Mac that is the same width
because
Post by Paul Licameli
this compiles without error (using the new C++11 keyword
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this
wide? You
Post by Paul Licameli
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to
overflow
Post by Paul Licameli
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question
from
Post by Paul Licameli
someone on the Audacity Google group, who was trying to export
more
Post by Paul Licameli
than 13.5 hours of audio at once. The re-imported audio was
truncated
Post by Paul Licameli
to less than that length, but I had reason to believe it was
because
Post by Paul Licameli
of the limitations of the .wav format and not the fault of our
own code.
Post by Paul Licameli
There is also the question whether existing uses of sampleCount
should
Post by Paul Licameli
be signed or unsigned. Sometimes it really is meant to be a
count, as
Post by Paul Licameli
of a buffer size, so unsigned would be appropriate, but then the
width
Post by Paul Licameli
is probably excessive for the purpose. (The width of size_t is
only 4
Post by Paul Licameli
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a
place in
Post by Paul Licameli
an audio track, and negative times are not impossible (see what
time
Post by Paul Licameli
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------------------------
Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Loading...