http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19870014670.pdf .
Yes I've known about dynamic programming since about then. Good work, James
-- I like your trick.
Post by James CrookSorry for the delay in getting back to you on this thread.
If you do use a dynamic programming approach, there is a neat trick I
invented (in context of DNA sequence matching) that caters for different
kinds of matching. The trick is to run two 'match matrices' at the same
time, and have a penalty for switching between them. This is excellent
where there is a mix of signal and noise, as in your test examples. For
aligning noise you want a fairly sloppy not very precisely discriminating
comparison that is picking up broad characteristics. What's great about
running two match matrices is that the algorithm naturally switches in to
using the best kind of matching for different sections.
On storage requirements, these can be reduced dramatically relative to
MATCH, even allowing large time shifts, by a divide and conquer approach.
Instead of allocating space length x max-shift you sample evenly and only
allocate space of k x max-shift for some small value of k such as 100. The
cost is that you have to repeat the analysis log( length-of-sequence) times,
where log is to the base k. So aligning to the nearest 10ms on two 1hr
sequences with a shift of up to 20 mins would take 50Mb storage (if one
match matrix) or 100Mb (with two in parallel), and the analysis would be
repeated 3 times. Because you stay in cache in the analysis and write much
less to external memory it's a big net win both in storage and speed over a
single pass approach.
I haven't written versions for sound. This is extrapolating from back in
old times, in the late 80's when I was analysing DNA and protein sequences
on a PC with a fraction of the power and storage of modern PCs. You had to
be inventive to get any decent performance at all. This kind of trick can
pay off in a big way, even today.
I can spell out in more detail if you might go down the dynamic
programming route, as I realise I have been a bit abbreviated in my
description here!
--James.
Thanks for the information.
I did some testing of the MATCH vamp plugin, running it via sonic
analyzer, which integrates it already.
First of all, the algorithm is pretty expensive, and its runtime seems
linear in the max time shift allowed. For aligning two 1h tracks, with a max
allowed time shift of 60s, it takes 6 minutes on a recent processor (Intel
i5-5200U), and takes about 8GB of RAM. Using is for largeer time shifts such
as 10 minutes will be quite expensive...
I also tested the quality of the results, to the extent sonic-analyzer
allowed me - it can only report graphical results of the alignment analysis,
but does not actually align the tracks.
(1) 2 identical audio tracks of a recorded concert, with a time-shift of
about 15s between them.
Alignment seems perfect.
(2) 2 identical audio tracks of a recorded concert, except for a 30s hole
filled with pink noise, with a time-shift of about 15s between them.
There are 1-2 second zones at the boundaries of the hole where the audio
is wrongly aligned. This will be quite problematic when building a feature
that allows mix and matching different versions of each passage.
(3) 2 audio tracks recorded from the same concert (left right channels
from same device), except for a 30s hole filled with pink noise, with a
time-shift of about 15s between them.
Sames issues as (2), no new issues.
(4) 2 audio tracks of the same concert, recorded with 2 different devices.
Throughout the match, it finds ratios of tempos that are as divergent as
<0.8 or >1.2 a significant fraction of the time. This is pretty bad since a
correct match should find a tempo ratio of 1 throughout the recording.
Things can be improved using non-default parameters of lowering the cost of
the diagonal to 1.5, and enabling the "path smoothing" feature, but tempo
ratio still routinely hovers around 0.9 - 1.1.
(5) 2 recordings of two performances of the same composition, time shift
of about 15s, and hole of about 30s.
Default parameters lead to big issues at boundaries around the hole (10s
and 30s of incorrect matches).
However, using non-default cost for diagonal again significantly improves
the match by mostly fixing the boundaries around the hole. There is still a
small issue with the first 0.5s of the performance that remains incorrectly
matched.
I cannot really evaluate the match more than that, because sonic-analyzer
just produces the graphs, but does not actually match the tracks.
My conclusion is that the match plugin cannot be used that easily, even
for the simple case of 2 recordings of the same event, because of accuracy
and performance. The former could be fixable by imposing stronger regularity
of the path (e.g. piecewise linear). The latter might be harder.
I propose to start working on an algorithm and feature specific to the
case of 2 recordings of the same event, which is an easier case to start
with both in terms of algorithm and UI.
I also agree that we won't be able to align perfectly, in particular
because of stereo. All we can do is best-effort given the sources. I will
allow for piecewise linear ratios between frequencies (with additional
regularity restrictions), to account for varying clock drifts.
Cheers,
--
Raphaël
Post by Robert HänggiHi
Incidentally, I've just stumbled over a real-life example where this
alignment would really be of great use to me.
I'm modelling a CD4 demodulation plug-in.
http://forum.audacityteam.org/viewtopic.php?p=307553#p307553
There are also two test (calibration) recordings in this specific post.
In essence, four tracks are embedded in a single stereo track.
The aim is to reverse-engineer what is in a hardware phono demodulator.
I can demodulate the signal, however, there are some difficulties in
Base left=LFront + LBack (for normal stereo playback)
FM Left= LFront - LBack
(ditto for right)
Thus, I can't simply align them until they cancel.
What's more, the frequencies do not match exactly because we have RIAA
in combination with a noise reduction expander, a delay caused by the
low/high pass filter etc.
In summary, the alignment had to be very exact but at the same time
insensitive to noise, phase & amplitude deviations, and on and on...
For the moment, I will use cross-correlation and least square fitting
for certain "anchor" points.
I look forward to seeing the aligning feature someday implemented in
Audacity. Good luck.
Cheers
Robert
Post by Roger DannenbergExcellent point. Also, aligning anything to a stereo track will generate
similar problems. I would suggest that if you're recording with multiple
microphones and devices, you're guaranteed to hit phase and multiple
source problems. In the spirit of the "principle of least surprise" I
would expect an alignment effect to just do a reasonable job given the
sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
the speed of sound), I'd hope individual sources would be aligned within
30ms. If there were a single source, I'd hope for much better.
Another possibility is aligning to multiple tracks representing the same
collection of sound sources recorded from different locations. It's
subtly different from aligning to a single track.
-Roger
Post by James CrookSomething else to think about is what happens if you attempt to align
two mono tracks that happen actually to be left and right audio of a
stereo track.
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity
planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
audacity-devel mailing list
https://lists.sourceforge.net/lists/listinfo/audacity-devel
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity