[Audacity-devel] Why is sampleCount 64 bits wide and signed?

384000 is the maximum choice for sample rate in the track drop-down menu,
if you don't enter a custom rate which could presumably be even higher.

At that rate a mere 3.1 hours plus a tad would overflow 32 bits unsigned,
or half that, signed.

Do any of our scientist friends study dolphin and bat voices for that sort
of duration? (I am not being facetious. It says here that dolphin voices
go up to 150 kHz, so 300 kHz sampling rate isn't crazy for that purpose.
https://en.wikipedia.org/wiki/Whale_vocalization And here it says bat
voices even go to 212 kHz, so 384 kHz is even deficient
https://en.wikipedia.org/wiki/Animal_echolocation)

PRL

Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long long
on others. I can tell that on Mac that is the same width because this
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow an
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more than
13.5 hours of audio at once. The re-imported audio was truncated to less
than that length, but I had reason to believe it was because of the
limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should be
signed or unsigned. Sometimes it really is meant to be a count, as of a
buffer size, so unsigned would be appropriate, but then the width is
probably excessive for the purpose. (The width of size_t is only 4 bytes
on my Mac.)
But sometimes it is meant as a quantized time identifying a place in an
audio track, and negative times are not impossible (see what time shift can
do), so it should be signed, and also very wide, if we really mean to
support days-long tracks and the possible very big positive values.
PRL

Gale Andrews

2016-08-17 00:49:49 UTC

Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
for reopened projects, until Audacity 2.0.6 fixed that:
http://legacywiki.audacityteam.org/wiki/Recording_length#long .

WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.

Gale

typedef sampleCount is a signed 64 bit quantity on Windows, and long long on
others. I can tell that on Mac that is the same width because this compiles
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow an
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from someone
on the Audacity Google group, who was trying to export more than 13.5 hours
of audio at once. The re-imported audio was truncated to less than that
length, but I had reason to believe it was because of the limitations of the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should be
signed or unsigned. Sometimes it really is meant to be a count, as of a
buffer size, so unsigned would be appropriate, but then the width is
probably excessive for the purpose. (The width of size_t is only 4 bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying a place in an
audio track, and negative times are not impossible (see what time shift can
do), so it should be signed, and also very wide, if we really mean to
support days-long tracks and the possible very big positive values.
PRL
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------------------------

Paul Licameli

2016-08-17 01:01:51 UTC

Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long .

I read there

"Here, samples are stored as 64-bit values"

Shouldn't thay say, "sample counts are stored as 64-bit values"

Also the cautions in the red box are only meant for older versions of the
program, right?

PRL

Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale

Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long

long on

Post by Paul Licameli
others. I can tell that on Mac that is the same width because this

compiles

Post by Paul Licameli
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You

need

Post by Paul Licameli
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow an
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from

someone

Post by Paul Licameli
on the Audacity Google group, who was trying to export more than 13.5

hours

Post by Paul Licameli
of audio at once. The re-imported audio was truncated to less than that
length, but I had reason to believe it was because of the limitations of

the

Post by Paul Licameli
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should be
signed or unsigned. Sometimes it really is meant to be a count, as of a
buffer size, so unsigned would be appropriate, but then the width is
probably excessive for the purpose. (The width of size_t is only 4

bytes on

Post by Paul Licameli
my Mac.)
But sometimes it is meant as a quantized time identifying a place in an
audio track, and negative times are not impossible (see what time shift

can

Post by Paul Licameli
do), so it should be signed, and also very wide, if we really mean to
support days-long tracks and the possible very big positive values.
PRL
------------------------------------------------------------

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Steve the Fiddle

2016-08-17 07:43:48 UTC

Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long .

I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older versions of the
program, right?

That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.

Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).

These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.

Steve

Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale

------------------------------------------------------------------------------

Paul Licameli

2016-08-17 11:14:15 UTC

I think that when sample counts were changed to 64 bit values, not all of
the very many uses of that type were reexamined for correctness. There may
still be many narrowing conversions where a sample count is assigned to a
32 bit int, with loss of information. The int and long types are each only
4 bytes wide in the Mac compiler.

But also, as I say, the wide type is used unnecessarily in places for the
size of a buffer of samples in memory, rather than indicating a position in
a sound file. In such cases size_t ought to be used. We still compile as
a 32 bit executable with a 4GB address space where size_t is four bytes
wide. Also size_t is an unsigned type but sampleCount is signed.

PRL

Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long .

I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older versions of the
program, right?

That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.
Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).
These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.
Steve

Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale

that

length, but I had reason to believe it was because of the limitations

the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should be
signed or unsigned. Sometimes it really is meant to be a count, as of

buffer size, so unsigned would be appropriate, but then the width is
probably excessive for the purpose. (The width of size_t is only 4 bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying a place in

audio track, and negative times are not impossible (see what time

shift

can
do), so it should be signed, and also very wide, if we really mean to
support days-long tracks and the possible very big positive values.
PRL
------------------------------------------------------------

------------------

_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Paul Licameli

2016-08-18 13:00:58 UTC

Just a curiosity, but I think the 64 bit offset gives us a practical limit
of 101 years at 44100 kHz sampling and 16 bit PCM format -- not the really
astronomical numbers you might imagine.

Why?

Whenever a WaveClip is in memory, we store a contiguous array of structures
called SeqBlock describing the block files.
Each SeqBlock structure occupies 16 bytes (2^4).
The outer limit of addressable memory is 2^32 bytes, so long as we compile
a 32 bit executable.
Therefore, 2^28 SeqBlocks is an upper limit for the SeqBlocks we can have
in memory at a time.

Each block file is now limited to 1 MB (2 ^ 20 bytes) of sample data,
whatever the format.
The smallest format is 16 bit (2-byte) PCM.
Therefore a block has at most 2^19 samples.

Therefore a project is limited to 2^47 samples.

At 44100 sampling rate, that is 101.1291 years. (At 384000, you get 11.614
years.)

Ample for foreseeable usage of the program. Still a far cry from the 13.26
million years you might naively calculate assuming a project of 2^64
samples, ignoring the other machine limitations.

PRL

Post by Paul Licameli
I think that when sample counts were changed to 64 bit values, not all of
the very many uses of that type were reexamined for correctness. There may
still be many narrowing conversions where a sample count is assigned to a
32 bit int, with loss of information. The int and long types are each only
4 bytes wide in the Mac compiler.
But also, as I say, the wide type is used unnecessarily in places for the
size of a buffer of samples in memory, rather than indicating a position in
a sound file. In such cases size_t ought to be used. We still compile as
a 32 bit executable with a 4GB address space where size_t is four bytes
wide. Also size_t is an unsigned type but sampleCount is signed.
PRL
On Wed, Aug 17, 2016 at 3:43 AM, Steve the Fiddle <

Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long .

I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older versions of

the

Post by Paul Licameli
program, right?

That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.
Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).
These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.
Steve

Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale

You

need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow an
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from someone
on the Audacity Google group, who was trying to export more than 13.5 hours
of audio at once. The re-imported audio was truncated to less than

that

length, but I had reason to believe it was because of the

limitations of

the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount

should

be
signed or unsigned. Sometimes it really is meant to be a count, as

of a

audio track, and negative times are not impossible (see what time

shift

------------------

_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Roger Dannenberg

2016-08-18 15:07:54 UTC

Only 101 years? I've been robbed! Can I have my money back?

------------------------------------------------------------------------------

Adrian Wadey

2016-08-18 22:24:06 UTC

When you run out.

Post by Roger Dannenberg
Only 101 years? I've been robbed! Can I have my money back?

Vaughan Johnson

2016-08-19 07:28:20 UTC

I'd love to have a laptop that lasts more than 2 yrs!

Post by Roger Dannenberg
Only 101 years? I've been robbed! Can I have my money back?

Steve the Fiddle

2016-08-19 08:13:32 UTC

Post by Vaughan Johnson
http://youtu.be/t8HoP9JHWTc
I'd love to have a laptop that lasts more than 2 yrs!

I think mine is coming up for it's 8th birthday, but it's not been
recording all that time ;-)
I started working out how big the hard drive would need to be for 101
years, but it made me feel dizzy so I gave up.

A 'performance' of John Cage's "Organ2/ASLAP" (As SLow As Possible)
began on 5th September 2001. It is due to end in 2640. A recent
snippet of the performance is posted on the official website for the
performance: http://www.aslsp.org/de/ (description in English:
http://universes-in-universe.org/eng/magazine/articles/2012/john_cage_organ_project_halberstadt)
I hope they are not trying to record it with Audacity.

Steve

Post by Vaughan Johnson

Post by Roger Dannenberg
Only 101 years? I've been robbed! Can I have my money back?

------------------------------------------------------------------------------

Vaughan Johnson

2016-08-19 09:30:16 UTC

2640! LOL. I hope to still be around then. And have time to listen to
the whole thing, after. Taking my supplements.

Talking about "As SLow As Possible" and dizzy, I heard this interview w/
Dizzy Gillespie on Marian McPartland's radio show where he described
playing with Ray Charles, in a big band. So Ray says "One!" to count the
tempo. Diz walks across the stage, talks a little with the bass player,
then walks back across the stage, and Ray says "two!" nearly a minute
later.

-- V

Post by Vaughan Johnson
http://youtu.be/t8HoP9JHWTc
I'd love to have a laptop that lasts more than 2 yrs!

I think mine is coming up for it's 8th birthday, but it's not been
recording all that time ;-)
I started working out how big the hard drive would need to be for 101
years, but it made me feel dizzy so I gave up.
A 'performance' of John Cage's "Organ2/ASLAP" (As SLow As Possible)
began on 5th September 2001. It is due to end in 2640. A recent
snippet of the performance is posted on the official website for the
http://universes-in-universe.org/eng/magazine/articles/
2012/john_cage_organ_project_halberstadt)
I hope they are not trying to record it with Audacity.
Steve

Post by Vaughan Johnson

Post by Roger Dannenberg
Only 101 years? I've been robbed! Can I have my money back?

------------------------------------------------------------

------------------

Post by Vaughan Johnson

Post by Roger Dannenberg
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Vaughan Johnson
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Federico Miyara

2016-08-19 19:04:36 UTC

Steve,

Post by Steve the Fiddle
I started working out how big the hard drive would need to be for 101
years, but it made me feel dizzy so I gave up.

The answer is simple, at 44100 Hz, 16 bit, stereo:

101*365.25*24*60*60*2*2 = 5,6224282464e+14 bytes

Post by Steve the Fiddle
A 'performance' of John Cage's "Organ2/ASLAP" (As SLow As Possible)
began on 5th September 2001. It is due to end in 2640. A recent
snippet of the performance is posted on the official website for the
http://universes-in-universe.org/eng/magazine/articles/2012/john_cage_organ_project_halberstadt)
I hope they are not trying to record it with Audacity.

This would be theoretically possible if it could go on recording in
background while a new version is installed which then takes control.
Even if a new operating sysstem is installed. This is a bit more
difficult but not impossible.

The long-term preservation of digital information is an intriguing and
fascinating topic.

Regards,

Federico

------------------------------------------------------------------------------

Federico Miyara

2016-08-18 22:39:37 UTC

Paul

Nice calculation... I wish some hard disk could last at least 10
years... let alone 101.1291 years

Regards,

Federico

Post by Paul Licameli
Just a curiosity, but I think the 64 bit offset gives us a practical
limit of 101 years at 44100 kHz sampling and 16 bit PCM format -- not
the really astronomical numbers you might imagine.
Why?
Whenever a WaveClip is in memory, we store a contiguous array of
structures called SeqBlock describing the block files.
Each SeqBlock structure occupies 16 bytes (2^4).
The outer limit of addressable memory is 2^32 bytes, so long as we
compile a 32 bit executable.
Therefore, 2^28 SeqBlocks is an upper limit for the SeqBlocks we can
have in memory at a time.
Each block file is now limited to 1 MB (2 ^ 20 bytes) of sample data,
whatever the format.
The smallest format is 16 bit (2-byte) PCM.
Therefore a block has at most 2^19 samples.
Therefore a project is limited to 2^47 samples.
At 44100 sampling rate, that is 101.1291 years. (At 384000, you get
11.614 years.)
Ample for foreseeable usage of the program. Still a far cry from the
13.26 million years you might naively calculate assuming a project of
2^64 samples, ignoring the other machine limitations.
PRL
On Wed, Aug 17, 2016 at 7:14 AM, Paul Licameli
I think that when sample counts were changed to 64 bit values, not
all of the very many uses of that type were reexamined for
correctness. There may still be many narrowing conversions where
a sample count is assigned to a 32 bit int, with loss of
information. The int and long types are each only 4 bytes wide in
the Mac compiler.
But also, as I say, the wide type is used unnecessarily in places
for the size of a buffer of samples in memory, rather than
indicating a position in a sound file. In such cases size_t ought
to be used. We still compile as a 32 bit executable with a 4GB
address space where size_t is four bytes wide. Also size_t is an
unsigned type but sampleCount is signed.
PRL
On Wed, Aug 17, 2016 at 3:43 AM, Steve the Fiddle
On 17 August 2016 at 02:01, Paul Licameli

On Tue, Aug 16, 2016 at 8:49 PM, Gale Andrews

Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical

limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long
<http://legacywiki.audacityteam.org/wiki/Recording_length#long> .

I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older

versions of the

program, right?

That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.
Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).
These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.
Steve

PRL

Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale
On 17 August 2016 at 00:35, Paul Licameli

Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on

Windows, and long

Post by Paul Licameli
long on
others. I can tell that on Mac that is the same width

because this

Post by Paul Licameli
compiles
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be

this wide? You

Post by Paul Licameli
need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to

overflow an

Post by Paul Licameli
unsigned 32 bit integer, or half of that for a signed

integer.

Post by Paul Licameli
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a

question from

Post by Paul Licameli
someone
on the Audacity Google group, who was trying to export

more than 13.5

Post by Paul Licameli
hours
of audio at once. The re-imported audio was truncated to

less than that

Post by Paul Licameli
length, but I had reason to believe it was because of the

limitations of

Post by Paul Licameli
the
.wav format and not the fault of our own code.
There is also the question whether existing uses of

sampleCount should

Post by Paul Licameli
be
signed or unsigned. Sometimes it really is meant to be a

count, as of a

Post by Paul Licameli
buffer size, so unsigned would be appropriate, but then

the width is

Post by Paul Licameli
probably excessive for the purpose. (The width of size_t

is only 4

Post by Paul Licameli
bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying

a place in an

Post by Paul Licameli
audio track, and negative times are not impossible (see

what time shift

Post by Paul Licameli
can
do), so it should be signed, and also very wide, if we

really mean to

Post by Paul Licameli
support days-long tracks and the possible very big

positive values.

------------------------------------------------------------------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list

https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------

Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------

_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Paul Licameli

2016-08-18 22:48:19 UTC

But I wonder what is the length of all original audio recording on all
types of analog and digital media in the world. It is surely growing
rapidly. How many orders of magnitude to you think that is? Could it
already be thousands of years? Where might it be at this century's end?

I am not very practiced with these order-of-magnitude brain teasers, like,
how many piano tuners are there in Chicago.

PRL

Paul
Nice calculation... I wish some hard disk could last at least 10 years...
let alone 101.1291 years
Regards,
Federico
Just a curiosity, but I think the 64 bit offset gives us a practical limit
of 101 years at 44100 kHz sampling and 16 bit PCM format -- not the really
astronomical numbers you might imagine.
Why?
Whenever a WaveClip is in memory, we store a contiguous array of
structures called SeqBlock describing the block files.
Each SeqBlock structure occupies 16 bytes (2^4).
The outer limit of addressable memory is 2^32 bytes, so long as we compile
a 32 bit executable.
Therefore, 2^28 SeqBlocks is an upper limit for the SeqBlocks we can have
in memory at a time.
Each block file is now limited to 1 MB (2 ^ 20 bytes) of sample data,
whatever the format.
The smallest format is 16 bit (2-byte) PCM.
Therefore a block has at most 2^19 samples.
Therefore a project is limited to 2^47 samples.
At 44100 sampling rate, that is 101.1291 years. (At 384000, you get
11.614 years.)
Ample for foreseeable usage of the program. Still a far cry from the
13.26 million years you might naively calculate assuming a project of 2^64
samples, ignoring the other machine limitations.
PRL

Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long .

I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older versions of

the

Post by Paul Licameli
program, right?

That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.
Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).
These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.
Steve

Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale

You

13.5

hours
of audio at once. The re-imported audio was truncated to less than

that

length, but I had reason to believe it was because of the

limitations of

the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount

should

be
signed or unsigned. Sometimes it really is meant to be a count, as

of a

in an

audio track, and negative times are not impossible (see what time

shift

can
do), so it should be signed, and also very wide, if we really mean

support days-long tracks and the possible very big positive values.
PRL
------------------------------------------------------------

------------------

_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------------------------
_______________________________________________
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Vaughan Johnson

2016-08-19 07:30:05 UTC

number of piano tuners must be decreasing, because u can get pnos for
*free* these days.

Post by Paul Licameli
But I wonder what is the length of all original audio recording on all
types of analog and digital media in the world. It is surely growing
rapidly. How many orders of magnitude to you think that is? Could it
already be thousands of years? Where might it be at this century's end?
I am not very practiced with these order-of-magnitude brain teasers, like,
how many piano tuners are there in Chicago.
PRL

Paul
Nice calculation... I wish some hard disk could last at least 10 years...
let alone 101.1291 years
Regards,
Federico
Just a curiosity, but I think the 64 bit offset gives us a practical
limit of 101 years at 44100 kHz sampling and 16 bit PCM format -- not the
really astronomical numbers you might imagine.
Why?
Whenever a WaveClip is in memory, we store a contiguous array of
structures called SeqBlock describing the block files.
Each SeqBlock structure occupies 16 bytes (2^4).
The outer limit of addressable memory is 2^32 bytes, so long as we
compile a 32 bit executable.
Therefore, 2^28 SeqBlocks is an upper limit for the SeqBlocks we can have
in memory at a time.
Each block file is now limited to 1 MB (2 ^ 20 bytes) of sample data,
whatever the format.
The smallest format is 16 bit (2-byte) PCM.
Therefore a block has at most 2^19 samples.
Therefore a project is limited to 2^47 samples.
At 44100 sampling rate, that is 101.1291 years. (At 384000, you get
11.614 years.)
Ample for foreseeable usage of the program. Still a far cry from the
13.26 million years you might naively calculate assuming a project of 2^64
samples, ignoring the other machine limitations.
PRL

Post by Paul Licameli
I think that when sample counts were changed to 64 bit values, not all
of the very many uses of that type were reexamined for correctness. There
may still be many narrowing conversions where a sample count is assigned to
a 32 bit int, with loss of information. The int and long types are each
only 4 bytes wide in the Mac compiler.
But also, as I say, the wide type is used unnecessarily in places for
the size of a buffer of samples in memory, rather than indicating a
position in a sound file. In such cases size_t ought to be used. We still
compile as a 32 bit executable with a 4GB address space where size_t is
four bytes wide. Also size_t is an unsigned type but sampleCount is signed.
PRL
On Wed, Aug 17, 2016 at 3:43 AM, Steve the Fiddle <

Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per clip
http://legacywiki.audacityteam.org/wiki/Recording_length#long .

I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older versions of

the

Post by Paul Licameli
program, right?

That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.
Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).
These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.
Steve

Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale

Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and

long

Post by Paul Licameli
long on
others. I can tell that on Mac that is the same width because this
compiles
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide?

You

Post by Paul Licameli
need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow

Post by Paul Licameli
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from someone
on the Audacity Google group, who was trying to export more than

13.5

Post by Paul Licameli
hours
of audio at once. The re-imported audio was truncated to less

than that

Post by Paul Licameli
length, but I had reason to believe it was because of the

limitations of

Post by Paul Licameli
the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount

should

Post by Paul Licameli
be
signed or unsigned. Sometimes it really is meant to be a count, as

of a

Post by Paul Licameli
buffer size, so unsigned would be appropriate, but then the width

Post by Paul Licameli
probably excessive for the purpose. (The width of size_t is only 4
bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying a place

in an

Post by Paul Licameli
audio track, and negative times are not impossible (see what time

shift

Post by Paul Licameli
can
do), so it should be signed, and also very wide, if we really mean

Post by Paul Licameli
support days-long tracks and the possible very big positive values.
PRL
------------------------------------------------------------

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Vaughan Johnson

2016-08-19 07:41:34 UTC

to wit:

http://sfist.com/2016/08/17/drunk_san_francisco_man_builds_webs.php

http://sfist.com/2013/06/28/why_is_there_a_piano_on_top_of_bern.php

https://pianopound.com/about/

i just need to find time & space to get 3 or 4!

===

all my cassette tapes are failing, turning into just noise. Brownian
motion?

===

-- Vaughan

Post by Vaughan Johnson
number of piano tuners must be decreasing, because u can get pnos for
*free* these days.

Post by Paul Licameli
But I wonder what is the length of all original audio recording on all
types of analog and digital media in the world. It is surely growing
rapidly. How many orders of magnitude to you think that is? Could it
already be thousands of years? Where might it be at this century's end?
I am not very practiced with these order-of-magnitude brain teasers,
like, how many piano tuners are there in Chicago.
PRL
On Thu, Aug 18, 2016 at 6:39 PM, Federico Miyara <

Post by Federico Miyara
Paul
Nice calculation... I wish some hard disk could last at least 10
years... let alone 101.1291 years
Regards,
Federico
Just a curiosity, but I think the 64 bit offset gives us a practical
limit of 101 years at 44100 kHz sampling and 16 bit PCM format -- not the
really astronomical numbers you might imagine.
Why?
Whenever a WaveClip is in memory, we store a contiguous array of
structures called SeqBlock describing the block files.
Each SeqBlock structure occupies 16 bytes (2^4).
The outer limit of addressable memory is 2^32 bytes, so long as we
compile a 32 bit executable.
Therefore, 2^28 SeqBlocks is an upper limit for the SeqBlocks we can
have in memory at a time.
Each block file is now limited to 1 MB (2 ^ 20 bytes) of sample data,
whatever the format.
The smallest format is 16 bit (2-byte) PCM.
Therefore a block has at most 2^19 samples.
Therefore a project is limited to 2^47 samples.
At 44100 sampling rate, that is 101.1291 years. (At 384000, you get
11.614 years.)
Ample for foreseeable usage of the program. Still a far cry from the
13.26 million years you might naively calculate assuming a project of 2^64
samples, ignoring the other machine limitations.
PRL

Post by Paul Licameli
I think that when sample counts were changed to 64 bit values, not all
of the very many uses of that type were reexamined for correctness. There
may still be many narrowing conversions where a sample count is assigned to
a 32 bit int, with loss of information. The int and long types are each
only 4 bytes wide in the Mac compiler.
But also, as I say, the wide type is used unnecessarily in places for
the size of a buffer of samples in memory, rather than indicating a
position in a sound file. In such cases size_t ought to be used. We still
compile as a 32 bit executable with a 4GB address space where size_t is
four bytes wide. Also size_t is an unsigned type but sampleCount is signed.
PRL
On Wed, Aug 17, 2016 at 3:43 AM, Steve the Fiddle <

Post by Gale Andrews
Due to a bug, 13.5 hours at 44100 Hz was the practical limit per

clip

Post by Gale Andrews
http://legacywiki.audacityteam.org/wiki/Recording_length#long .

I read there
"Here, samples are stored as 64-bit values"
Shouldn't thay say, "sample counts are stored as 64-bit values"
Also the cautions in the red box are only meant for older versions

of the

Post by Paul Licameli
program, right?

That's the "legacy wiki", so yes it was written about old versions of
Audacity and nothing in that wiki should be taken as fact for versions
later than 2.0.6.
Yes some people do want to make recordings longer than 13.5 hours (as
witnessed by a steady stream of problems reported on the forum about
very long recordings).
These days we see less problems regarding very long recordings because
Audacity is generally better behaved with long recordings than old
versions, but we do still see problems.
Steve

Post by Gale Andrews
WAV (and AIFF) are 4 GB maximum size. RF64 does not have that
limitation.
Gale

Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and

long

Post by Paul Licameli
long on
others. I can tell that on Mac that is the same width because

this

Post by Paul Licameli
compiles
static_assert(sizeof(long long) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this

wide? You

Post by Paul Licameli
need
over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow

Post by Paul Licameli
unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question

from

Post by Paul Licameli
someone
on the Audacity Google group, who was trying to export more than

13.5

Post by Paul Licameli
hours
of audio at once. The re-imported audio was truncated to less

than that

Post by Paul Licameli
length, but I had reason to believe it was because of the

limitations of

Post by Paul Licameli
the
.wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount

should

Post by Paul Licameli
be
signed or unsigned. Sometimes it really is meant to be a count,

as of a

Post by Paul Licameli
buffer size, so unsigned would be appropriate, but then the width

Post by Paul Licameli
probably excessive for the purpose. (The width of size_t is only

Post by Paul Licameli
bytes on
my Mac.)
But sometimes it is meant as a quantized time identifying a place

in an

Post by Paul Licameli
audio track, and negative times are not impossible (see what time

shift

Post by Paul Licameli
can
do), so it should be signed, and also very wide, if we really

mean to

Post by Paul Licameli
support days-long tracks and the possible very big positive

values.

Post by Paul Licameli
PRL
------------------------------------------------------------

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Gale Andrews
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Martyn Shaw

2016-08-23 23:50:33 UTC

Hi

Has this question been answered satisfactorily?

Is it still an open question?

TTFN
Martyn

Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width because
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was truncated
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should
be signed or unsigned. Sometimes it really is meant to be a count, as
of a buffer size, so unsigned would be appropriate, but then the width
is probably excessive for the purpose. (The width of size_t is only 4
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------------------------

Paul Licameli

2016-08-24 13:38:02 UTC

I think I have got satsifactory answers.

Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type, called
sampleCount.

My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.

Where we need an offset into an audio file or into a WaveTrack or WaveClip
or Sequence -- use sampleCount. Where we need the difference of two such
values, also use sampleCount.

But where describing the number of samples that fit into a buffer in memory
or in a block file, I think size_t should be used instead.

sampleCount should not be used just whenever counting samples. Perhaps the
type name was a misnomer and sampleOffset would have been better.

PRL

Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Roger Dannenberg

2016-08-24 14:58:29 UTC

My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer
sizes are a constant source of pain and portability problems in C and
C++. Looking at this historically, the mindset when C and C++ were
developed was that we needed a high-level way to express the low-level
stuff we wanted to do: manipulate bits, use memory efficiently, etc. C
tries to have it both ways: if you want an integer, you write int and
let the compiler pick the best (fastest?) implementation, but if you
want a 32-bit int, you use conditional compilation or system-dependent
definitions to get it. In practice, all this depends on being very
careful with assumptions and definitions. I think one of the reasons
Java and Python became popular is that they handle portability at the
virtual machine level, so you don't have int meaning different things on
different systems.

Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in
terms of ideal integers most of the time, and thinking about how
implementations can diverge from that perfect model is a huge burden,
especially in systems like Audacity that are both large and worked on by
many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_Infinitely_Ranged_Integer_Model
for some serious integer-related bug discovery.)

If we spend any time optimizing, it should be based on profiling - since
cache lines are longer than 64-bits, I bet the additional overhead of
64-bit ints is very small compared to 32-bits and the additional
overhead of arithmetic on 64-bit values in the cache is swamped by
memory loads and stores.

To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.

-Roger

Post by Paul Licameli
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type,
called sampleCount.
My examination of the code, though, convinces me that this type is
also used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference
of two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples.
Perhaps the type name was a misnomer and sampleOffset would have been
better.
PRL
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn

Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width

because

Post by Paul Licameli
this compiles without error (using the new C++11 keyword
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this

wide? You

Post by Paul Licameli
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to

overflow

Post by Paul Licameli
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was

truncated

Post by Paul Licameli
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our

own code.

Post by Paul Licameli
There is also the question whether existing uses of sampleCount

should

Post by Paul Licameli
be signed or unsigned. Sometimes it really is meant to be a

count, as

Post by Paul Licameli
of a buffer size, so unsigned would be appropriate, but then the

width

Post by Paul Licameli
is probably excessive for the purpose. (The width of size_t is

only 4

Post by Paul Licameli
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL

------------------------------------------------------------------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Paul Licameli

2016-08-24 15:14:22 UTC

Good thoughts, Roger.

But my motivation is not so much about the optimization, as instead the
appropriate use of size_t which is the unsigned type describing sizes of
things that are certain to fit inside of memory.

I see too much indiscriminate use of sampleCount expressions (which might
be large, or negative) as subscripts or as sizes passed to the array
version of operator new. I got the bug in my brain now to inspect all of
those conversions for correctness. That's not so hard as it seems, if you
use the type system the right way to help the C++ compiler help you. I can
redefine types in my build so that compilation fails unless I do something
explicit when sampleCount values need to be narrowed, and find a proof in
those places that the narrowing is not losing any nonzero bits.

PRL

Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer
sizes are a constant source of pain and portability problems in C and C++.
Looking at this historically, the mindset when C and C++ were developed was
that we needed a high-level way to express the low-level stuff we wanted to
do: manipulate bits, use memory efficiently, etc. C tries to have it both
ways: if you want an integer, you write int and let the compiler pick the
best (fastest?) implementation, but if you want a 32-bit int, you use
conditional compilation or system-dependent definitions to get it. In
practice, all this depends on being very careful with assumptions and
definitions. I think one of the reasons Java and Python became popular is
that they handle portability at the virtual machine level, so you don't
have int meaning different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in
terms of ideal integers most of the time, and thinking about how
implementations can diverge from that perfect model is a huge burden,
especially in systems like Audacity that are both large and worked on by
many people. (C.f. https://www.researchgate.net/
publication/221542289_As-If_Infinitely_Ranged_Integer_Model for some
serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling - since
cache lines are longer than 64-bits, I bet the additional overhead of
64-bit ints is very small compared to 32-bits and the additional overhead
of arithmetic on 64-bit values in the cache is swamped by memory loads and
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type, called
sampleCount.
My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or WaveClip
or Sequence -- use sampleCount. Where we need the difference of two such
values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples. Perhaps
the type name was a misnomer and sampleOffset would have been better.
PRL

Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Paul Licameli

2016-08-24 15:17:32 UTC

... And let me add that in commit 1189cfd62a2d30f5e61c00183a387c3d767ed698
I have commented the few such conversions that I have decided are the
really problematic ones.

PRL

Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer
sizes are a constant source of pain and portability problems in C and C++.
Looking at this historically, the mindset when C and C++ were developed was
that we needed a high-level way to express the low-level stuff we wanted to
do: manipulate bits, use memory efficiently, etc. C tries to have it both
ways: if you want an integer, you write int and let the compiler pick the
best (fastest?) implementation, but if you want a 32-bit int, you use
conditional compilation or system-dependent definitions to get it. In
practice, all this depends on being very careful with assumptions and
definitions. I think one of the reasons Java and Python became popular is
that they handle portability at the virtual machine level, so you don't
have int meaning different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in
terms of ideal integers most of the time, and thinking about how
implementations can diverge from that perfect model is a huge burden,
especially in systems like Audacity that are both large and worked on by
many people. (C.f. https://www.researchgate.net/p
ublication/221542289_As-If_Infinitely_Ranged_Integer_Model for some
serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling - since
cache lines are longer than 64-bits, I bet the additional overhead of
64-bit ints is very small compared to 32-bits and the additional overhead
of arithmetic on 64-bit values in the cache is swamped by memory loads and
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type, called
sampleCount.
My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference of
two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples. Perhaps
the type name was a misnomer and sampleOffset would have been better.
PRL

Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn

code.

Post by Paul Licameli
There is also the question whether existing uses of sampleCount should
be signed or unsigned. Sometimes it really is meant to be a count, as
of a buffer size, so unsigned would be appropriate, but then the width
is probably excessive for the purpose. (The width of size_t is only 4
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Steve the Fiddle

2016-08-24 15:49:41 UTC

The other 'benefit' of using sampleCount when counting samples, even
in cases where size_t is big enough, is that it can be a reminder that
we are dealing with a count of samples and not some other integer
quantity, and so avoid silly mistakes such as multiplying a buffer
size by a loop count and ending up with a duration in samples as
size_t. We only need one such bug to creep in to make the time spent
changing occurrences of sampleCount to size_t counter-productive. I'm
not keen on fixing things that aren't broken, especially if they make
the meaning/intent less clear.

Steve

... And let me add that in commit 1189cfd62a2d30f5e61c00183a387c3d767ed698 I
have commented the few such conversions that I have decided are the really
problematic ones.
PRL

Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer sizes
are a constant source of pain and portability problems in C and C++. Looking
at this historically, the mindset when C and C++ were developed was that we
if you want an integer, you write int and let the compiler pick the best
(fastest?) implementation, but if you want a 32-bit int, you use conditional
compilation or system-dependent definitions to get it. In practice, all this
depends on being very careful with assumptions and definitions. I think one
of the reasons Java and Python became popular is that they handle
portability at the virtual machine level, so you don't have int meaning
different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in terms
of ideal integers most of the time, and thinking about how implementations
can diverge from that perfect model is a huge burden, especially in systems
like Audacity that are both large and worked on by many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_Infinitely_Ranged_Integer_Model
for some serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling - since
cache lines are longer than 64-bits, I bet the additional overhead of 64-bit
ints is very small compared to 32-bits and the additional overhead of
arithmetic on 64-bit values in the cache is swamped by memory loads and
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type, called
sampleCount.
My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference of
two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples. Perhaps
the type name was a misnomer and sampleOffset would have been better.
PRL

Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn

Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width because
this compiles without error (using the new C++11 keyword
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide? You
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to overflow
an unsigned 32 bit integer, or half of that for a signed integer.
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a question from
someone on the Audacity Google group, who was trying to export more
than 13.5 hours of audio at once. The re-imported audio was truncated
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount should
be signed or unsigned. Sometimes it really is meant to be a count, as
of a buffer size, so unsigned would be appropriate, but then the width
is probably excessive for the purpose. (The width of size_t is only 4
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a place in
an audio track, and negative times are not impossible (see what time
shift can do), so it should be signed, and also very wide, if we
really mean to support days-long tracks and the possible very big
positive values.
PRL
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------------------------

Steve the Fiddle

2016-08-24 16:10:53 UTC

In commit 1189cfd62, is this safe on all platforms?

// Protect Nyquist from selections greater than 2^31 samples (bug 439)
#define NYQ_MAX_LEN (std::numeric_limits<long>::max())

Steve

Post by Steve the Fiddle
The other 'benefit' of using sampleCount when counting samples, even
in cases where size_t is big enough, is that it can be a reminder that
we are dealing with a count of samples and not some other integer
quantity, and so avoid silly mistakes such as multiplying a buffer
size by a loop count and ending up with a duration in samples as
size_t. We only need one such bug to creep in to make the time spent
changing occurrences of sampleCount to size_t counter-productive. I'm
not keen on fixing things that aren't broken, especially if they make
the meaning/intent less clear.
Steve

... And let me add that in commit 1189cfd62a2d30f5e61c00183a387c3d767ed698 I
have commented the few such conversions that I have decided are the really
problematic ones.
PRL

Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer sizes
are a constant source of pain and portability problems in C and C++. Looking
at this historically, the mindset when C and C++ were developed was that we
if you want an integer, you write int and let the compiler pick the best
(fastest?) implementation, but if you want a 32-bit int, you use conditional
compilation or system-dependent definitions to get it. In practice, all this
depends on being very careful with assumptions and definitions. I think one
of the reasons Java and Python became popular is that they handle
portability at the virtual machine level, so you don't have int meaning
different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in terms
of ideal integers most of the time, and thinking about how implementations
can diverge from that perfect model is a huge burden, especially in systems
like Audacity that are both large and worked on by many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_Infinitely_Ranged_Integer_Model
for some serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling - since
cache lines are longer than 64-bits, I bet the additional overhead of 64-bit
ints is very small compared to 32-bits and the additional overhead of
arithmetic on 64-bit values in the cache is swamped by memory loads and
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type, called
sampleCount.
My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference of
two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples. Perhaps
the type name was a misnomer and sampleOffset would have been better.
PRL

Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn

------------------------------------------------------------------------------

James Crook

2016-08-24 16:28:23 UTC

I think you (Steve), Paul and Roger are all correct. And I still
approve of what Paul is doing.

We had a recent CVE, fixed in Audacity 2.1.2, which was related to
integer overflow in a library we used. Paul's work is helping ensure we
don't have similar issues with our own code. Our existing code is an
unholy mix of integer and floating types. I see what Paul is doing less
as optimisation (our code often is already partially optimised), so much
as giving us more guarantees of correctness on the conversions. If it
ain't broke don't fix it is good. However I'm fairly sure there are
pieces of our code that are broke, and we don't know it, and Paul is
finding/fixing some of those in his integer conversion review.

--James.

... And let me add that in commit 1189cfd62a2d30f5e61c00183a387c3d767ed698 I
have commented the few such conversions that I have decided are the really
problematic ones.
PRL

Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer sizes
are a constant source of pain and portability problems in C and C++. Looking
at this historically, the mindset when C and C++ were developed was that we
if you want an integer, you write int and let the compiler pick the best
(fastest?) implementation, but if you want a 32-bit int, you use conditional
compilation or system-dependent definitions to get it. In practice, all this
depends on being very careful with assumptions and definitions. I think one
of the reasons Java and Python became popular is that they handle
portability at the virtual machine level, so you don't have int meaning
different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in terms
of ideal integers most of the time, and thinking about how implementations
can diverge from that perfect model is a huge burden, especially in systems
like Audacity that are both large and worked on by many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_Infinitely_Ranged_Integer_Model
for some serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling - since
cache lines are longer than 64-bits, I bet the additional overhead of 64-bit
ints is very small compared to 32-bits and the additional overhead of
arithmetic on 64-bit values in the cache is swamped by memory loads and
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type, called
sampleCount.
My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference of
two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples. Perhaps
the type name was a misnomer and sampleOffset would have been better.
PRL

Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn

------------------------------------------------------------------------------

Paul Licameli

2016-08-24 18:36:19 UTC

Thank you for supporting this, James.

Really this thing began as a big detour from the project of eliminating
naked news and deletes. I have finished that for the scalar case. But not
for the array new[] and delete[]. I wanted to wrap those in a class and
require the size arguments to be of type size_t. Then I found I was
casting sampleCount to size_t far too often. So I was led to reexamine all
the uses of sampleCount.

PRL

Post by James Crook
I think you (Steve), Paul and Roger are all correct. And I still
approve of what Paul is doing.
We had a recent CVE, fixed in Audacity 2.1.2, which was related to
integer overflow in a library we used. Paul's work is helping ensure we
don't have similar issues with our own code. Our existing code is an
unholy mix of integer and floating types. I see what Paul is doing less
as optimisation (our code often is already partially optimised), so much
as giving us more guarantees of correctness on the conversions. If it
ain't broke don't fix it is good. However I'm fairly sure there are
pieces of our code that are broke, and we don't know it, and Paul is
finding/fixing some of those in his integer conversion review.
--James.

Post by Paul Licameli
... And let me add that in commit 1189cfd62a2d30f5e61c00183a387c3d767ed698

Post by Paul Licameli
have commented the few such conversions that I have decided are the

really

Post by Paul Licameli
problematic ones.
PRL
On Wed, Aug 24, 2016 at 11:14 AM, Paul Licameli <

Post by Paul Licameli
Good thoughts, Roger.
But my motivation is not so much about the optimization, as instead the
appropriate use of size_t which is the unsigned type describing sizes

Post by Paul Licameli
things that are certain to fit inside of memory.
I see too much indiscriminate use of sampleCount expressions (which

might

Post by Paul Licameli
be large, or negative) as subscripts or as sizes passed to the array

version

Post by Paul Licameli
of operator new. I got the bug in my brain now to inspect all of those
conversions for correctness. That's not so hard as it seems, if you

use the

Post by Paul Licameli
type system the right way to help the C++ compiler help you. I can

redefine

Post by Paul Licameli
types in my build so that compilation fails unless I do something

explicit

Post by Paul Licameli
when sampleCount values need to be narrowed, and find a proof in those
places that the narrowing is not losing any nonzero bits.
PRL

Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and

integer sizes

Post by Roger Dannenberg
are a constant source of pain and portability problems in C and C++.

Looking

Post by Roger Dannenberg
at this historically, the mindset when C and C++ were developed was

that we

Post by Roger Dannenberg
needed a high-level way to express the low-level stuff we wanted to
manipulate bits, use memory efficiently, etc. C tries to have it both
if you want an integer, you write int and let the compiler pick the

best

Post by Roger Dannenberg
(fastest?) implementation, but if you want a 32-bit int, you use

conditional

Post by Roger Dannenberg
compilation or system-dependent definitions to get it. In practice,

all this

Post by Roger Dannenberg
depends on being very careful with assumptions and definitions. I

think one

Post by Roger Dannenberg
of the reasons Java and Python became popular is that they handle
portability at the virtual machine level, so you don't have int

meaning

Post by Roger Dannenberg
different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think

in terms

Post by Roger Dannenberg
of ideal integers most of the time, and thinking about how

implementations

Post by Roger Dannenberg
can diverge from that perfect model is a huge burden, especially in

systems

Post by Roger Dannenberg
like Audacity that are both large and worked on by many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_

Infinitely_Ranged_Integer_Model

Post by Roger Dannenberg
for some serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling -

since

Post by Roger Dannenberg
cache lines are longer than 64-bits, I bet the additional overhead of

64-bit

Post by Roger Dannenberg
ints is very small compared to 32-bits and the additional overhead of
arithmetic on 64-bit values in the cache is swamped by memory loads

and

Post by Roger Dannenberg
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type,

called

Post by Roger Dannenberg
sampleCount.
My examination of the code, though, convinces me that this type is

also

Post by Roger Dannenberg
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the

difference of

Post by Roger Dannenberg
two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer in
memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples.

Perhaps

Post by Roger Dannenberg
the type name was a misnomer and sampleOffset would have been better.
PRL

Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn

Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width

because

Post by Paul Licameli
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide?

You

Post by Paul Licameli
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to

overflow

truncated

Post by Paul Licameli
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our own code.
There is also the question whether existing uses of sampleCount

should

Post by Paul Licameli
be signed or unsigned. Sometimes it really is meant to be a count,

Post by Paul Licameli
of a buffer size, so unsigned would be appropriate, but then the

width

Post by Paul Licameli
is probably excessive for the purpose. (The width of size_t is

only 4

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Martyn Shaw
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Roger Dannenberg
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------

------------------

Post by Roger Dannenberg
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Steve the Fiddle
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

James Crook

2016-08-24 19:01:39 UTC

Post by Paul Licameli
Thank you for supporting this, James.

Very easy to support :-)

Post by Paul Licameli
Really this thing began as a big detour from the project of eliminating
naked news and deletes. I have finished that for the scalar case. But not
for the array new[] and delete[]. I wanted to wrap those in a class and
require the size arguments to be of type size_t. Then I found I was
casting sampleCount to size_t far too often. So I was led to reexamine all
the uses of sampleCount.

Such is the way. Audacity is very tangled. Lessons to be learned from
why, and interesting to think about alternatives - not just in the
int-sizing context.

Steve (in an offlist) is terrified of something being broken by the
extensive changes. I think if you find tangible examples of where
narrowing potentially cost us dearly - in the same way that the CVE did
- that would help shift the emotion. The 'risk' (which is actually
small when looked at right) is more than offset by the tangible gain.
Steve doesn't see 'automatically checked' code as such a tangible gain,
as the idioms are unfamiliar to him, and he isn't expecting to get used
to them quickly. The idioms aren't familiar to me either, but I do
expect to get used to them quickly.

--James.

------------------------------------------------------------------------------

Steve the Fiddle

2016-08-24 20:35:44 UTC

Post by James Crook

Post by Paul Licameli
Thank you for supporting this, James.

Very easy to support :-)

Such is the way. Audacity is very tangled. Lessons to be learned from
why, and interesting to think about alternatives - not just in the
int-sizing context.

Steve (in an offlist) is terrified of something being broken by the
extensive changes.

Somewhat over-egged James. I'm "concerned", which considering the vast
number of changes and the few tangible benefits I don't think is
unreasonable.

I can see that code review has benefits for "quality assurance", but
I'm less enthused by changes that appear (to me) to be cosmetic. James
has kindly explained to me that there are possibly marginal benefits
to some of the changes that I saw to be cosmetic, so I am partially
reassured by James taking some responsibility in supporting these
changes.

Steve

Post by James Crook
I think if you find tangible examples of where
narrowing potentially cost us dearly - in the same way that the CVE did
- that would help shift the emotion. The 'risk' (which is actually
small when looked at right) is more than offset by the tangible gain.
Steve doesn't see 'automatically checked' code as such a tangible gain,
as the idioms are unfamiliar to him, and he isn't expecting to get used
to them quickly. The idioms aren't familiar to me either, but I do
expect to get used to them quickly.
--James.

------------------------------------------------------------------------------

Paul Licameli

2016-08-24 20:36:51 UTC

Post by James Crook

Post by Paul Licameli
Thank you for supporting this, James.

Very easy to support :-)

Post by Paul Licameli
Really this thing began as a big detour from the project of eliminating
naked news and deletes. I have finished that for the scalar case. But

not

Post by Paul Licameli
for the array new[] and delete[]. I wanted to wrap those in a class and
require the size arguments to be of type size_t. Then I found I was
casting sampleCount to size_t far too often. So I was led to reexamine

all

Post by Paul Licameli
the uses of sampleCount.

Such is the way. Audacity is very tangled. Lessons to be learned from
why, and interesting to think about alternatives - not just in the
int-sizing context.

Reming me, what is CVE?

PRL

Post by James Crook
small when looked at right) is more than offset by the tangible gain.
Steve doesn't see 'automatically checked' code as such a tangible gain,
as the idioms are unfamiliar to him, and he isn't expecting to get used
to them quickly. The idioms aren't familiar to me either, but I do
expect to get used to them quickly.
--James.
------------------------------------------------------------
------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

James Crook

2016-08-24 22:05:56 UTC

Post by Paul Licameli
Reming me, what is CVE?

http://www.cvedetails.com/cve/CVE-2009-0490/

--James.

------------------------------------------------------------------------------

Richard Ash

2016-08-29 21:14:18 UTC

On Wed, 24 Aug 2016 14:36:19 -0400

Post by Paul Licameli
Thank you for supporting this, James.
Really this thing began as a big detour from the project of
eliminating naked news and deletes. I have finished that for the
scalar case. But not for the array new[] and delete[]. I wanted to
wrap those in a class

This is something I can thoroughly support.

Post by Paul Licameli
and require the size arguments to be of type
size_t. Then I found I was casting sampleCount to size_t far too
often. So I was led to reexamine all the uses of sampleCount.

This doesn't make any sense to me however. If you need casts to make
your own class work, then it's almost certainly designed wrong.

I would expect your class to have an overload which accepts sampleCount
arguments (with a warning prgama on the method if you really feel it is
undesirable to use), which then does a safe (i.e. runtime range checked)
conversion to size_t internally. This has the property of minimising
the scope of changes, whilst still enabling and encouraging better error
checking (which is a very desirable aim).

Note also that the definition of size_t varies between platforms, so
you will need to be careful about that conversion (another good reason
for doing it in an overload method, not by casting).

Richard

------------------------------------------------------------------------------

Paul Licameli

2016-08-24 16:28:48 UTC

I don't understand the hypothetical. But it is true that there are places
where a sampleCount variable accumulates a sum of buffer sizes, and the sum
may need to grow beyond the bounds of size_t.

That's widening, which doesn't bother me so much. It's incorrect narrowing
of sampleCount that bothers me.

And in fact there are a few places where a sampleCount - typed expression
is assigned wrongly to a long or int variable. It would be useful to
redefine sampleCount not simply as a type alias, but as a class with
explicit conversion operators, so that such errors would simply fail
compilation.

PRL

Post by Paul Licameli
... And let me add that in commit 1189cfd62a2d30f5e61c00183a387c3d767ed698

Post by Paul Licameli
have commented the few such conversions that I have decided are the

really

Post by Paul Licameli
problematic ones.
PRL

might

Post by Paul Licameli
be large, or negative) as subscripts or as sizes passed to the array

version

Post by Paul Licameli
of operator new. I got the bug in my brain now to inspect all of those
conversions for correctness. That's not so hard as it seems, if you

use the

Post by Paul Licameli
type system the right way to help the C++ compiler help you. I can

redefine

Post by Paul Licameli
types in my build so that compilation fails unless I do something

explicit

Post by Paul Licameli
when sampleCount values need to be narrowed, and find a proof in those
places that the narrowing is not losing any nonzero bits.
PRL

Post by Roger Dannenberg
My take on this is that the overhead of using 64-bit integers is quite
small in the scheme of things. Furthermore, word, pointer, and integer

sizes

Post by Roger Dannenberg
are a constant source of pain and portability problems in C and C++.

Looking

Post by Roger Dannenberg
at this historically, the mindset when C and C++ were developed was

that we

Post by Roger Dannenberg
manipulate bits, use memory efficiently, etc. C tries to have it both
if you want an integer, you write int and let the compiler pick the

best

Post by Roger Dannenberg
(fastest?) implementation, but if you want a 32-bit int, you use

conditional

Post by Roger Dannenberg
compilation or system-dependent definitions to get it. In practice,

all this

Post by Roger Dannenberg
depends on being very careful with assumptions and definitions. I

think one

Post by Roger Dannenberg
of the reasons Java and Python became popular is that they handle
portability at the virtual machine level, so you don't have int meaning
different things on different systems.
Furthermore, it is very difficult to anticipate the implications and
interactions with different integer sizes -- programmers just think in

terms

Post by Roger Dannenberg
of ideal integers most of the time, and thinking about how

implementations

Post by Roger Dannenberg
can diverge from that perfect model is a huge burden, especially in

systems

Post by Roger Dannenberg
like Audacity that are both large and worked on by many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_

Infinitely_Ranged_Integer_Model

Post by Roger Dannenberg
for some serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling -

since

Post by Roger Dannenberg
cache lines are longer than 64-bits, I bet the additional overhead of

64-bit

Post by Roger Dannenberg
ints is very small compared to 32-bits and the additional overhead of
arithmetic on 64-bit values in the cache is swamped by memory loads and
stores.
To summarize, my approach would be don't sweat the 64-bit ints and if
there's any time for optimization, use profiling to spend optimization
efforts wisely.
-Roger
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough that we
ought to accomodate them, therefore we use this wide integral type,

called

Post by Roger Dannenberg
sampleCount.
My examination of the code, though, convinces me that this type is also
used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the difference

Post by Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn

Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width

because

Post by Paul Licameli
static_assert(sizeof(longlong) == 8, "");
Someone tell me if that compiles on Linux too.
Who recalls why it was decided that the type should be this wide?

You

Post by Paul Licameli
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to

overflow

truncated

should

Post by Paul Licameli
be signed or unsigned. Sometimes it really is meant to be a count,

Post by Paul Licameli
of a buffer size, so unsigned would be appropriate, but then the

width

Post by Paul Licameli
is probably excessive for the purpose. (The width of size_t is

only 4

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Martyn Shaw
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

------------------

Post by Roger Dannenberg
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------

------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Roger Dannenberg

2016-08-24 16:05:30 UTC

I guess it all depends on being clever enough to "use the type system
the right way" -- type systems are still a very active area of research,
which to me indicates that both the C++ type system leaves a lot to be
desired, and a lot of really smart people have decided that exploring
type systems are still a worthy area to pursue. I guess my experience
trying to use C's type system has had mixed results, but I'm all in
favor of your plan to verify that narrowing is safe. -Roger

Post by Paul Licameli
Good thoughts, Roger.
But my motivation is not so much about the optimization, as instead
the appropriate use of size_t which is the unsigned type describing
sizes of things that are certain to fit inside of memory.
I see too much indiscriminate use of sampleCount expressions (which
might be large, or negative) as subscripts or as sizes passed to the
array version of operator new. I got the bug in my brain now to
inspect all of those conversions for correctness. That's not so hard
as it seems, if you use the type system the right way to help the C++
compiler help you. I can redefine types in my build so that
compilation fails unless I do something explicit when sampleCount
values need to be narrowed, and find a proof in those places that the
narrowing is not losing any nonzero bits.
PRL
My take on this is that the overhead of using 64-bit integers is
quite small in the scheme of things. Furthermore, word, pointer,
and integer sizes are a constant source of pain and portability
problems in C and C++. Looking at this historically, the mindset
when C and C++ were developed was that we needed a high-level way
to express the low-level stuff we wanted to do: manipulate bits,
use memory efficiently, etc. C tries to have it both ways: if you
want an integer, you write int and let the compiler pick the best
(fastest?) implementation, but if you want a 32-bit int, you use
conditional compilation or system-dependent definitions to get it.
In practice, all this depends on being very careful with
assumptions and definitions. I think one of the reasons Java and
Python became popular is that they handle portability at the
virtual machine level, so you don't have int meaning different
things on different systems.
Furthermore, it is very difficult to anticipate the implications
and interactions with different integer sizes -- programmers just
think in terms of ideal integers most of the time, and thinking
about how implementations can diverge from that perfect model is a
huge burden, especially in systems like Audacity that are both
large and worked on by many people. (C.f.
https://www.researchgate.net/publication/221542289_As-If_Infinitely_Ranged_Integer_Model
<https://www.researchgate.net/publication/221542289_As-If_Infinitely_Ranged_Integer_Model>
for some serious integer-related bug discovery.)
If we spend any time optimizing, it should be based on profiling -
since cache lines are longer than 64-bits, I bet the additional
overhead of 64-bit ints is very small compared to 32-bits and the
additional overhead of arithmetic on 64-bit values in the cache is
swamped by memory loads and stores.
To summarize, my approach would be don't sweat the 64-bit ints and
if there's any time for optimization, use profiling to spend
optimization efforts wisely.
-Roger

Post by Paul Licameli
I think I have got satsifactory answers.
Audio files over 13.5 hours long are unusual but likely enough
that we ought to accomodate them, therefore we use this wide
integral type, called sampleCount.
My examination of the code, though, convinces me that this type
is also used unnecessarily in many places.
Where we need an offset into an audio file or into a WaveTrack or
WaveClip or Sequence -- use sampleCount. Where we need the
difference of two such values, also use sampleCount.
But where describing the number of samples that fit into a buffer
in memory or in a block file, I think size_t should be used instead.
sampleCount should not be used just whenever counting samples.
Perhaps the type name was a misnomer and sampleOffset would have
been better.
PRL
On Tue, Aug 23, 2016 at 7:50 PM, Martyn Shaw
Hi
Has this question been answered satisfactorily?
Is it still an open question?
TTFN
Martyn

Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows,

and long

Post by Paul Licameli
long on others. I can tell that on Mac that is the same

width because

wide? You

Post by Paul Licameli
need over 27.05 hours of audio, at 44.1 kHz sampling rate,

to overflow

Post by Paul Licameli
an unsigned 32 bit integer, or half of that for a signed

integer.

Post by Paul Licameli
Who ever makes a track that long?
Well maybe it is sometimes done, because I answered a

question from

Post by Paul Licameli
someone on the Audacity Google group, who was trying to

export more

Post by Paul Licameli
than 13.5 hours of audio at once. The re-imported audio

was truncated

Post by Paul Licameli
to less than that length, but I had reason to believe it

was because

Post by Paul Licameli
of the limitations of the .wav format and not the fault of

our own code.

Post by Paul Licameli
There is also the question whether existing uses of

sampleCount should

Post by Paul Licameli
be signed or unsigned. Sometimes it really is meant to be a

count, as

Post by Paul Licameli
of a buffer size, so unsigned would be appropriate, but

then the width

Post by Paul Licameli
is probably excessive for the purpose. (The width of size_t

is only 4

Post by Paul Licameli
bytes on my Mac.)
But sometimes it is meant as a quantized time identifying a

place in

Post by Paul Licameli
an audio track, and negative times are not impossible (see

what time

Post by Paul Licameli
shift can do), so it should be signed, and also very wide,

if we

Post by Paul Licameli
really mean to support days-long tracks and the possible

very big

Post by Paul Licameli
positive values.
PRL

------------------------------------------------------------------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

------------------------------------------------------------------------------
_______________________________________________ audacity-devel
https://lists.sourceforge.net/lists/listinfo/audacity-devel
<https://lists.sourceforge.net/lists/listinfo/audacity-devel>
------------------------------------------------------------------------------
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

Federico Miyara

2016-08-24 19:46:30 UTC

Dear Friends,

To clarify what seemed just a divertimento last week, it is very
unlikely that anybody will record during 101 o 639 or whatever large
number of years, but it is not unlikely that someone wants to record for
a week. People who study and analyze soundscape may be willing to record
during relatively long periods of time.

Back in the 90s (or the turn of the 2000s) there was a guy (Greg Kunkel,
http://www.bio.umass.edu/biology/kunkel/gjk/homepage.htm) who recorded
thousands of bird chirps using an unattended system... based on an AT386
computer! As he couldn't record continuously for several days because of
storage limitations, he had devised a spectrum-based trigger that caused
the computer to start recording for a given period of time each time
some spectral indicator (that revealed the presence of a bird sound) was
present. Nowadays he might be willing to get a complete record from
which to extract the interesting sounds.

Regards,

Federico

Post by Paul Licameli
typedef sampleCount is a signed 64 bit quantity on Windows, and long
long on others. I can tell that on Mac that is the same width

because

wide? You

Post by Paul Licameli
need over 27.05 hours of audio, at 44.1 kHz sampling rate, to

overflow

truncated

Post by Paul Licameli
to less than that length, but I had reason to believe it was because
of the limitations of the .wav format and not the fault of our

own code.

Post by Paul Licameli
There is also the question whether existing uses of sampleCount

should

Post by Paul Licameli
be signed or unsigned. Sometimes it really is meant to be a

count, as

Post by Paul Licameli
of a buffer size, so unsigned would be appropriate, but then the

width

Post by Paul Licameli
is probably excessive for the purpose. (The width of size_t is

only 4

------------------------------------------------------------------------------

Post by Paul Licameli
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel

James Crook

2016-08-24 20:08:02 UTC