On Thu, Feb 26, 2009 at 1:44 AM, Ian Monroe <ian.monroe at> wrote:

> On Wed, Feb 25, 2009 at 2:19 PM, Ricard Marxer Piñón
> <email at> wrote:
> > I'm Ricard Marxer and I'm planning to apply for the GSOC Analyzer Support
> > Idea presented by Ian Moore:
> > Ideas extending GSOC Analyzer Support proposal
> > Data:
> > I have seen that Phonon offers an experimental AudioDataOutput class made
> > for the purpose of visualization and analysis.  This is the obvious entry
> > point for accessing the audio frames to perform further processing.
> Yep! Whoever ends up mentoring, you'll probably be getting some
> direction from Kretz (the creator of Phonon) or maybe one of the
> Trolls especially on API issues.
> > Processing:
> > As for the library to be used for the mathematical processing of the
> audio
> > (FFT, MFCC, Onset Detection, Pitch Estimation...) I would like to use
> Eigen,
> > since it makes the code very readable and clear and it is highly
> optimized.
> > I would of course use some external libraries for some specific algos
> such
> > as FFTW.
> I'm not sure how much processing is needed and how much is already
> done by Xine and/or Gstreamer. Eigen is pretty nifty though yea. :)
> Here's an example of a Xine plugin used to give Amarok analyzer
> information from Amarok 1.4:
> And GstEngine::scope() from Amarok 1.4:
> In both cases the code is probably utterly worthless, just showing you
> the sort of stuff you might end up doing on the backend side.

I'm still wondering what processing should be done inside the backend and
which outside.

Phonon already provides an experimental class which gives access to the
audio frames as they arrive.  The class emits signals of the type
dataReady(QMap<channel, vector of samples>).

The simplest processing would be to apply a windowing function, an FFT  and
some weighting, scaling and band processing (for a simple EQ
visualization).  If we do this outside of Phonon (e.g. using FFTW and Eigen)
we don't have to duplicate the work for several backends.

However, some more complex processing/analysis (such as some onset detection
and pitch estimation methods) require buffering since they may act on
several overlapping frames.  In these cases backends can be very helpful.
It might even be that some of these backends already have this kind of
methods implemented, have to check.

For really fast visualization such as smooth scrolling spectrograms we will
also require some sort buffering, but I'm not so sure how Phonon could
expose an interface for such specific techniques and how many of the
backends would ease the task on making them.  This is something I will have
to research a bit.

Next, I will contact Matthias to get some input from him.  About all this.

> Given your experience using audio analysis techniques I'm sitting here
> wondering if perhaps we could expand the project to give applications
> access to the raw decoded audio. This would be useful for some other
> things we want to do in Amarok, and it would let you do more advanced
> analysis stuff. (The first priority is still the basic Codeine/Amarok
> 1.4 analyzer though).
> > Visualization:
> > Here is where I would like to ask for some help about what would be the
> best
> > choice.  I think to start with, the simplest thing would be to first hook
> > the output of the processing directly to a Graphics View.
> > If you guys think it is a good idea it might be nice to have the Graphics
> > View inside a plasma applet (which could then fit in Amarok's context
> view).
> Actually some of the analyzers in Amarok 1.4 were pretty cool and
> could probably be ported. Some used QPainter and others used OpenGL to
> do 3D stuff. QGV would also make sense.
> Probably for the bigger plasmoid-sized visualizations (as opposed to
> analyzers) we would want to use Like
> the 'giving access to raw audio' idea above we might be getting out of
> scope for the project depending on what you want to do.

I didn't know about ProjectM thanks for the link, it looks really good.  And
since Phonon already exposes the audio frames, I think it shouldn't be too
much work to connect these to and create ProjectM Plasma applet as part of
the GSOC.

> > Of course the main goal is to have the lowest possible hit in CPU and
> still
> > keep beautiful visualization.  Also it should be possible to completely
> turn
> > off the audio processing and visualization when in power saving mode.
> > Anyway, this is just to create some discussion about the directions the
> idea
> > could take.  What do you all think?
> I'm really glad that someone is taking an interest in this. As you can
> see there's some flexibility on where you want to make the emphasis of
> your proposal.
> A little inside baseball (sorry don't know a more international term,
> lol): this project has a decent chance of being selected as
> kdelibs-related projects (and Phonon at least used to be in kdelibs)
> typically aren't proposed so often but have a lot of people voting for
> them since they help out many parts of KDE. Making this clear in your
> proposal would be good politics.

Yes, you are right I'm going to have to narrow down the exact goals and
select them smartly so that it can advantage the most of KDE.  I think
creating a good base for applications wanting to analyze audio (whether it
is for visualization or content based organization) could be good direction.

Well everything is still a bit ripe, but I will try to contact Matthias
Kretz to get things clearer.

