GSOC Analyzer Support

Ian Monroe ian.monroe at gmail.com
Thu Feb 26 19:43:58 UTC 2009


On Thu, Feb 26, 2009 at 12:34 PM, Ricard Marxer Piñón
<email at ricardmarxer.com> wrote:
> Thanks for the quick answer!
>
> See below for some thoughts.
>
> On Thu, Feb 26, 2009 at 1:44 AM, Ian Monroe <ian.monroe at gmail.com> wrote:
>>
>> On Wed, Feb 25, 2009 at 2:19 PM, Ricard Marxer Piñón
>> <email at ricardmarxer.com> wrote:
>> > Hi,
>> >
>> > I'm Ricard Marxer and I'm planning to apply for the GSOC Analyzer
>> > Support
>> > Idea presented by Ian Moore:
>> >
>> >
>> > http://techbase.kde.org/Projects/Summer_of_Code/2009/Ideas#Project:_Analyzer_Support
>> >
>> > Ideas extending GSOC Analyzer Support proposal
>> > -----------------------------------------------------------------
>> >
>> > Data:
>> > I have seen that Phonon offers an experimental AudioDataOutput class
>> > made
>> > for the purpose of visualization and analysis.  This is the obvious
>> > entry
>> > point for accessing the audio frames to perform further processing.
>>
>> Yep! Whoever ends up mentoring, you'll probably be getting some
>> direction from Kretz (the creator of Phonon) or maybe one of the
>> Trolls especially on API issues.
>>
>> > Processing:
>> > As for the library to be used for the mathematical processing of the
>> > audio
>> > (FFT, MFCC, Onset Detection, Pitch Estimation...) I would like to use
>> > Eigen,
>> > since it makes the code very readable and clear and it is highly
>> > optimized.
>> > I would of course use some external libraries for some specific algos
>> > such
>> > as FFTW.
>>
>> I'm not sure how much processing is needed and how much is already
>> done by Xine and/or Gstreamer. Eigen is pretty nifty though yea. :)
>>
>> Here's an example of a Xine plugin used to give Amarok analyzer
>> information from Amarok 1.4:
>> http://kollide.net:8060/browse/Amarok/src/engine/xine/xine-scope.c?r=10660
>> And GstEngine::scope() from Amarok 1.4:
>> http://kollide.net:8060/browse/Amarok/src/engine/gst/gstengine.cpp?r=2375
>>
>> In both cases the code is probably utterly worthless, just showing you
>> the sort of stuff you might end up doing on the backend side.
>
> I'm still wondering what processing should be done inside the backend and
> which outside.
>
> Phonon already provides an experimental class which gives access to the
> audio frames as they arrive.  The class emits signals of the type
> dataReady(QMap<channel, vector of samples>).
>
> The simplest processing would be to apply a windowing function, an FFT  and
> some weighting, scaling and band processing (for a simple EQ
> visualization).  If we do this outside of Phonon (e.g. using FFTW and Eigen)
> we don't have to duplicate the work for several backends.
>
> However, some more complex processing/analysis (such as some onset detection
> and pitch estimation methods) require buffering since they may act on
> several overlapping frames.  In these cases backends can be very helpful.
> It might even be that some of these backends already have this kind of
> methods implemented, have to check.
>
> For really fast visualization such as smooth scrolling spectrograms we will
> also require some sort buffering, but I'm not so sure how Phonon could
> expose an interface for such specific techniques and how many of the
> backends would ease the task on making them.  This is something I will have
> to research a bit.
>
> Next, I will contact Matthias to get some input from him.  About all this.

You might just email phonon-backends, Kretz reads this I think. He is
also a PhD student btw, so you can commiserate with him. :)

Anyways Phonon is supposed to be an API to make things simple. An API
to make doing analyzers and visualizations (eg just a list of 10-15
integers given the levels of various frequencies sent out every 100ms
or so) would make sense to me. Also as you can see in the above source
such info may be available fairly easily from the multimedia
libraries.

Raw data access for other things is really cool too of course. Perhaps
both should be done.

>>
>> Given your experience using audio analysis techniques I'm sitting here
>> wondering if perhaps we could expand the project to give applications
>> access to the raw decoded audio. This would be useful for some other
>> things we want to do in Amarok, and it would let you do more advanced
>> analysis stuff. (The first priority is still the basic Codeine/Amarok
>> 1.4 analyzer though).
>>
>> > Visualization:
>> > Here is where I would like to ask for some help about what would be the
>> > best
>> > choice.  I think to start with, the simplest thing would be to first
>> > hook
>> > the output of the processing directly to a Graphics View.
>> > If you guys think it is a good idea it might be nice to have the
>> > Graphics
>> > View inside a plasma applet (which could then fit in Amarok's context
>> > view).
>>
>> Actually some of the analyzers in Amarok 1.4 were pretty cool and
>> could probably be ported. Some used QPainter and others used OpenGL to
>> do 3D stuff. QGV would also make sense.
>>
>> Probably for the bigger plasmoid-sized visualizations (as opposed to
>> analyzers) we would want to use http://projectm.sourceforge.net/. Like
>> the 'giving access to raw audio' idea above we might be getting out of
>> scope for the project depending on what you want to do.
>
> I didn't know about ProjectM thanks for the link, it looks really good.  And
> since Phonon already exposes the audio frames, I think it shouldn't be too
> much work to connect these to and create ProjectM Plasma applet as part of
> the GSOC.

That would be cool indeed. If we could rig up some dbus and have it be
out-of-process... a desktop plasmoid for music playing in Amarok would
be really sweet. :)



More information about the Amarok mailing list