[Kdenlive-devel] Feature request: Sync External Audio Automatically

Wed Feb 8 18:39:36 UTC 2012

On Wed, Feb 8, 2012 at 12:47 AM, Simon A. Eugster <simon.eu at gmail.com> wrote:
> On 02/08/2012 09:08 AM, Alexandre Prokoudine wrote:
>> On Wed, Feb 8, 2012 at 11:32 AM, Simon A. Eugster wrote:
>>
>>>> http://jeff.ecchi.ca/blog/2011/07/25/automated-multicamera-clip-syncing/
>>>
>>> I have no idea how you manage to have a link to a solution for nearly
>>> every problem. Thanks for the link!
>>
>> YW :)
>>
>>> How accurate can we position audio streams? Just by full frames, or is
>>> it possible to have a finer granularity? When I synced audio/video I
>>> often had the problem that the audio was too early and after moving it
>>> by one frame it was too late.

then your sense is more acute than most humans'

>>
>> Admittedly, I haven't had a chance to test it myself yet. However
>> http://bemasc.net/wordpress/2011/07/26/an-auto-aligner-for-pitivi/
>> states:
>
> Already read! :)
> I rather meant kdenlive/MLT here. Can we move an audio clip by just a
> few samples or only by full frames?

at the framework level only by frame, but something more precise can
be achieved with an audio filter, e.g. sox.delay

>
> Simon
>
>> "The algorithm I settled on resembles the method a human uses when
>> looking at the waveform view. First, it breaks each input audio stream
>> into 40 ms blocks and computes the mean absolute value of each block.

40ms aka duration of 25 fps frame, this value can be simply computed
per mlt_frame

>> The resulting 25 Hz signal is the “volume envelope”. The code
>> subtracts the mean volume from each track’s envelope, then performs a
>> cross-correlation between tracks and looks for the peak, which
>> identifies the relative shift.

can be implemented as a passive transition that computes the shift and
reports it through a property.  Then, an application can make a
frame-level adjustment at the edit level and apply sox.delay or
frei0r.delay0r filters for sub-frame accuracy. Or, the transition can
be dual pass, and perform the sub-frame adjustments itself on the
second pass.
Alternatively, kdenlive is already getting all of the audio in a
consumer-frame-show event, so it could just do all of this analysis in
its own code and use existing filters for sub-frame accuracy.

>> To avoid performing N^2
>> cross-correlations, one clip is selected as the fixed reference, and
>> all others are compared to it. The peak position is quantized to the
>> block duration (creating an error of +/- 20ms), so to improve accuracy
>> a parabolic fit is used to interpolate the true maximum. I don’t know
>> the exact residual error, but I expect it’s typically less than 5 ms,
>> which should be plenty good enough, seeing as sound travels about 1
>> foot per ms."
>>
>> Alexandre Prokoudine
>> http://libregraphicsworld.org
>>