GSoC application review - reimplementing personal metadata importers
Konrad Zemek
konrad.zemek at gmail.com
Wed Apr 24 02:33:13 UTC 2013
On 22.04.2013 14:56, Matěj Laitl wrote:
> On 21. 4. 2013 Konrad Zemek wrote:
>> 2013/4/20 Matěj Laitl <matej at laitl.cz>:
>>> On 20. 4. 2013 Konrad Zemek wrote:
>>> I think there are no importer tests, please check it and then be more
>>> certain in the proposal.
>> I did actually check that, but the lack of tests is a bit unusual for
>> me, so I figured they may be just well-hidden somewhere. I've
>> conducted more extensive research, and there are indeed none.
> Yeah, we've been more serious about test-driven-development only very
> recently. :) One reason is that our "business logic" code is unfortunately
> often interwined with GUI code, which makes testing unpleasant.
>
>> What I meant were integration tests with importer implementation as a
>> black box. I'm sure they can be written so that they bypass any
>> interactivity, and they would help me a lot with refactoring (as well
>> as providing a "done" condition).
> Okay, should be doable. Be warned that some of our code is not testing-
> friendly - which is another opportunity for you to make it so and be of even
> greater benefit to Amarok. ;)
Well, it wouldn't be fun if it were too easy. ;-)
>>> Nice. I should be able to review & merge your review request soon, but
>>> please still make a distinction between already merged and yet-in-review
>>> patches. Having submitted/merged one more little patch would be a small
>>> bonus, too.
>> I understand "contribute" as "write and submit code, hoping it will be
>> helpful", not necessarily that it's merged. Although again, I very
>> well recognize how it can be understood the other way (and probably my
>> understanding here is different than commonly accepted definition). I
>> didn't change that in hopes that it will be true in both senses by the
>> time of application; I will rewrite it if not.
> Well, unfortunately we've already seen review requests with just random (and
> actually harmful) changes -- I don't think we should consider these
> "contributions". This is certainly not the case for your well-done and non-
> trivial patch, which I should be able to merge soon.
>
>> And as much as I'd love to submit a lot more patches right now, I
>> can't really do that this week. I have an exam coming in the next few
>> days, as well as some mid-term tests.
> I see, good luck with your exams! The deadline for proposals is May 3, so
> maybe you'll manage to at least submit a small change by then.
>
I've just submitted a small patch which just "happened" as I was working
on some other things. What I'm worried about is that every GSoC resource
states in bold letters that application should be submitted early rather
than late.
>> To alleviate this, I intend to add Rhythmbox as a new target for
>> synchronization. It stores following personal track metadata compatible with
>> Amarok:
>> * user rating
>> * playcount
>> * last played time
>> * time when the song was first added to collection
> Good. Note that "added to collection" is currently not part of
> synchronization. (you could add it, but maybe it isn't worth the effort)
I removed it from the description. Rating, playcount and last time
played are already enough, and I think "time added to collection" is
actually a bit of a shady thing to synchronize (i.e. I'm not sure if
it's even correct to synchronize it).
> Also note that current iTunes import matches tracks by just filenames. I
> secretly hope that one can get metadata (at least: artist, album, title;
> better also: track number, disc number, album artist) out of iTunes db, but
> please check - inability to do so would be a show-stopper for iTunes provider.
>
Are you certain about that? I reviewed iTunes importer, and it reads
information from an XML file. iTunes maintains an "iTunes Library.XML"
file for the purpose of compatibility with other applications, and
although I couldn't find an official specification for it, I found some
example databases which contained all of the metadata you mention (and
some more).
>> Implementation Details:
>> (...)
>> On the high level, the StatSyncing framework consists of the
>> following elements
>> (...)
>> * Track, which is an abstraction of a track and contains relevant
>> metadata used for trackmatching
> ...and methods to set the metadata.
Right.
>> * Process, which is responsible for matching the tracks between
>> collection and synchronizing them
> "collections"
>
>> The project can be broken into three parts; reimplementation of
>> existing importers (read-only), introduction of new importers
>> (read-only) and finally an optional goal of making the
>> synchronization work in read-write mode (so that foreign statistics
>> can be updated as well as local ones). This last goal is optional
>> (soft goal), and will involve renaming importers to reflect their
>> new capabilities.
> Sounds good, except for the renaming. StatSyncing::ITunesProvider sounds well
> for both cases.
I actually had a conception to leave importers where they are, or in a
directory called simply importers. After implementing two-way
synchronization, importing would be only one of capabilities (the other
being of course exporting), so that name would no longer be appropriate.
:-) Of course all of that is still in the phase of a "fuzzy idea".
I removed the mention about renaming because it's too insignificant to
even mention there, and especially considering how much explanation is
needed for it to make sense.
>
>> Tentative Timeline:
>> MIDTERM EVALUATION (deadline: August 02)
>> The hard goal for midterm evaluation is to have both
>> FastForward and iTunes importers reimplemented, and if possible (soft goal)
>> unit-tested.
> Good.
>
>> FINAL EVALUATION (deadline: September 27)
>> The hard goal for final evaluation is to have all importers
>> implemented and working. The slightly softer goal is to have
>> them unit-tested. The soft goal is to implement two-way synchronization.
> This looks well too.
>
>> June 17 - 23:
>> Drafting up test cases for FastForward and iTunes importers
>> June 24 - 30:
>> Implementing the initial test suite for importers (exam
>> week) July 01 - 07:
>> Reimplementing FastForward importer, cleaning up
>> data-reading module, introducing StatSyncing framework
> Ehm, I think I know what you mean, but "introducing StatSyncing framework"
> sounds like it didn't existed before.
>
>> July 08 - 14:
>> Reimplementing FastForward importer, implementing unit tests
>> July 15 - 21:
>> Reimplementing iTunes importer, cleaning up data-reading
>> module July 22 - 28:
>> Reimplementing iTunes importer, implementing unit tests
>> July 29 - August 04: MIDTERM EVALUATION
>> Ironing out last issues with reimplemented importers,
>> submitting current work for midterm evaluation
>> August 05 - 11:
>> In-depth research of Rhythmbox metadata format, implementing
>> Rhythmbox importer
>> August 12 - 18:
>> Implementing Rhythmbox importer, implementing unit tests
>> August 19 - 25:
>> Implementing Amarok 2.x importer
>> August 26 - September 01:
>> Implementing Amarok 2.x importer, unit tests
>> September 02 - 08:
>> Investigating and implementing two-way synchronization
>> September 09 - 15:
>> Implementing two-way synchronization
>> September 16 - 22:
>> Ironing out bugs, cushion period in case of delays
>> September 23 - 29: FINAL EVALUATION
>> Ironing out bugs, cushion period in case of delays,
>> submitting work for final evaluation
> Looks better than the original timeline. OTOH the general remark (not
> necessarily mine, I just predict what the reviewers could say) to me more
> detailed remains. Perhaps you can mention class names, what would be working
> at the time, what will be NOT working yet...
>
>> I'm a big fan of open source, and KDE in general. I've used Amarok
>> a lot and not only read the source,
> This sounds like you don't use it anymore. perhaps "I've been using ..."?
>
>> I am also familiar with Qt framework and have used it to prototype
>> several small applications. I'm an early adopter, and my fiddling
>> with Qt 5 beta has led me to uncover an important bug [4].
> Nice.
>
>> I'm very skilled in C++. Other than that I program mainly in Scala,
>> and Java when required (and my university requires that a lot).
>> I'm experienced with test driven development, continuous
>> integration, code reviews and other agile techniques.
> Do you have some experience with git and/or other VCSes? (self-assessment:
> would you be able to explain difference between `git rebase` and `git merge`?)
> But don't actually explain it in the proposal. :-)
I can! ;-)
>
>> If my application is accepted, I am going to commit my full time to
>> complete the project, including putting my job on hold for the
>> duration of GSoC. I expect to be working 40 hours a week.
> Nice. What are your plans after the GSoC has ended? Do they involve Amarok in
> any way?
>
> [the questions above are rather hints for the proposal than something I'd like
> immediately answered]
>
> This starts to look really promising,
> Matěj
As always, many thanks for your help! I'm including the next iteration
below (with diff in the attachment). There may be some errors due to the
late hour - the day does not have quite enough hours lately to squeeze
in everything I need to do.
Konrad
Name: Konrad Zemek
Email Address: konrad.zemek at gmail.com
Freenode IRC Nick: kzemek
IM Service and Username: Skype, handle: konrad-kun
Location: Kraków, Poland. GMT+2 during the Summer.
Proposal Title:
Reimplement Amarok 1.4 (FastForward)/iTunes import on top of
Statistics Synchronization, and new Amarok 2.x and Rhythmbox
as synchronization targets.
Motivation for Proposal / Goal:
Currently, Amarok has the ability to import personal track metadata,
like playcounts, ratings and last played time, from Amarok 1.4 and
iTunes databases. Although this code still fulfills its purpose,
the StatSyncing framework has since been added to the codebase and
serves as a perfect base for a rewrite of importers. Perhaps the
most significant advantage that would be obtained by implementing
importers on top of StatSyncing is the ability to resynchronize
information and to work with collections other than local.
Still, synchronization with Amarok 1.4 and iTunes covers only
a small portion of users that may be willing to use Amarok 2.x,
but are tied to their current music player because of the
consequence of losing their personal metadata. To alleviate this,
I intend to add Rhythmbox as a new target for synchronization. It
stores following personal track metadata compatible with Amarok:
* user rating
* playcount
* last played time
Rhythmbox is the default GNOME's music player, and since
Ubuntu 12.04, the music player shipped by default in Ubuntu. I feel
that this is an important target for metadata synchronization,
easing not only the transition between music players, but between
whole desktop environments - GNOME and KDE.
To complete the set, synchronization with Amarok 2.x will be added,
so that user can synchronize his personal track metadata between
separate Amarok databases.
Implementation Details:
This project breaks up in a few parts with well-defined deliverables
(details about timeline are in Tentative Timeline section).
On the high level, the StatSyncing framework consists of the
following elements
* Controller, which registers providers and starts synchronization
* Config, used to store preferences for statistics synchronization
and can be set through UI
* Provider, which works with the backend and provides
StatSyncing::Tracks
* Track, which is an abstraction of a track and contains relevant
metadata used for trackmatching, as well as methods to modify it
* Process, which is responsible for matching the tracks between
collections and synchronizing them
* ScrobblingService, which is used for scrobbling capabilities
(transferring information about played tracks; see [1])
The project can be broken into three parts; reimplementation of
existing importers (read-only), introduction of new importers
(read-only) and finally an optional goal of making the
synchronization work in read-write mode (so that foreign statistics
can be updated as well as local ones). This last goal is optional
(soft goal).
* Importer tests
There are no tests for importing in the current trunk. Since
a big part of this project consists of refactoring, I will have
to prepare a quality suite of tests to make sure that
I introduce no regressions.
- Implement whole test suite using QTest and gmock best
practices
- Test cases would involve using a dummy input, invoking
importer and verifying collection.
- The actual logic behind importing tracks will be changed, so
these tests must treat importer as a black box (black box
integration tests)
* Reimplementing importers
- Reimplement importers to use StatSyncing
- Data-reading modules, would be cleaned up and factored out
into their own classes that will be used by a subclass of
StatSyncing::Provider
- Current services using StatSyncing framework will be used as
a reference (primarily last.fm service)
- At each step where the code is functional (at least between
reimplementing FastForwardImporter and ITunesImporter),
supplement the tests so that new capabilities are accounted
for (e.g. make sure that foreign playcounts are not simply
stacked on locally saved playcounts with each synchronization)
- Additionally, unit tests can now be written to confirm
correct implementation
- The importer code may prove to be repeatable (in the extreme
case the only thing differing these importers will be their
respective parsers); this code will be deduplicated. If it
fits its responsibilities, a parent class of importers should
contain the duplicated logic, otherwise I will introduce new
helper classes.
* Implement Rhythmbox and Amarok 2.x importers
Rhythmbox stores its personal metadata in an XML file, which
should be easy to read through QXml SAX parser. When in doubt
about the format, I will consult Rhythmbox source code.
- Rhythmbox parser shall be robust, that is it shouldn't depend
on the order of information (this is obvious, but important)
- At this point both Amarok 1.4 and iTunes importers are
rewritten, so their implementations will be used as
a reference for Amarok 2.x and Rhythmbox importers.
- The subgoal is to make sure that adding additional import
targets is straightforward and easy, and if possible doesn't
introduce code duplication (this may involve modifying
StatSyncing framework).
- As with reimplementing importers, every piece of working code
should be tested to ensure that it is performing correctly,
and so that regressions don't occur in the future.
Tentative Timeline:
MIDTERM EVALUATION (deadline: August 02)
The hard goal for midterm evaluation is to have both FastForward
and iTunes importers reimplemented, and if possible (soft goal)
unit-tested.
FINAL EVALUATION (deadline: September 27)
The hard goal for final evaluation is to have all importers
implemented and working. The slightly softer goal is to have
them unit-tested. The soft goal is to implement two-way
synchronization.
June 17 - 23:
Drafting up test cases for FastForward and iTunes importers.
June 24 - 30:
Implementing the initial test suite for importers (exam week).
July 01 - 07:
Reimplementing FastForward importer, cleaning up data-reading
module. At the end of the week I expect to have working
implementation of FastForwardProvider.
July 08 - 14:
Reimplementing FastForward importer. At the end of the week I
expect to have a functional importer, possibly with some quirks
to iron out. Implementing unit tests to assert correctness.
July 15 - 21:
Reimplementing iTunes importer, cleaning up data-reading
module. As with FastForward, at the end of the week the
ITunesProvider should be working.
July 22 - 28:
Reimplementing iTunes importer. As with FastForward importer,
after this period the importer should be functional.
Implementing unit tests.
July 29 - August 04: MIDTERM EVALUATION
Ironing out last issues with reimplemented importers, submitting
current work for midterm evaluation.
August 05 - 11:
In-depth research of Rhythmbox metadata format, implementing
Rhythmbox importer. I do not expect the RhythmboxProvider to
be fully functional at the end of the week.
August 12 - 18:
Implementing Rhythmbox importer. RhythmboxProvider should be
working at the start of the week, whole synchronization should
work at the end of the week. Implementing unit tests.
August 19 - 25:
Implementing Amarok 2.x importer. Like FastForward and ITunes,
at the end of the week the AmarokProvider should be
functional.
August 26 - September 01:
Implementing Amarok 2.x importer. Ironing out any problems
with AmarokProvider. At the end of the week one-way import
should be working. Implementing unit tests.
September 02 - 08:
Investigating and implementing two-way synchronization.
I expect one working implementation of two-way synchronization
at the end of the week.
September 09 - 15:
Implementing two-way synchronization for remaining importers.
September 16 - 22:
Ironing out bugs, writing remaining tests. Cushion period in
case of delays.
September 23 - 29: FINAL EVALUATION
Ironing out bugs, writing remaining tests. Cushion period in
case of delays. Submitting work for final evaluation
Obligations from late May to early August:
I have school until the very early July, with exams starting the
last week of June. I will be putting my work on hold for the whole
period of Google Summer of Code, but up until early June I may still
be required to commit at least a few hours a week to my current job.
So from early June to late September I have no commitments except
for school, and from early July to late September no commitments
at all.
About me:
I'm a student at AGH University of Science and Technology, Kraków,
Poland. I study Computer Science, and I'm currently in my second
year. At the moment I'm employed as a part time C++ programmer at
X-Formation [2], Poland, where my work ranges from conducting
interviews with potential employees, through refactoring and
bugfixing to implementing new features for our product.
I'm a big fan of open source, and KDE in general. I've been using
Amarok a lot and have not only read the source, but contributed to
it - patch [3] is awaiting final evaluation and merge, and patch
[4] has recently been submitted for review. I'm currently working
on more changes to Amarok's codebase.
I am also familiar with Qt framework and have used it to prototype
several small applications. I'm an early adopter, and my fiddling
with Qt 5 beta has led me to uncover an important bug [5]. I also
have source-level experience with Boost libraries.
I'm very skilled in C++. Other than that I program mainly in Scala,
and Java when required (and my university requires that a lot).
I'm experienced with test driven development, continuous
integration, code reviews and other agile techniques.
I have work experience with both SVN and git, with the latter
being my VCS of choice for personal projects.
If my application is accepted, I am going to commit my full time to
complete the project, including putting my job on hold for the
duration of GSoC. I expect to be working 40 hours a week.I am
comfortable with working independently under a supervisor living
on the other side of the globe. Although I have not have worked
in this style before, I foresee no problems with this and can even
change my working hours if needed. I think that code reviews
and email contact are the best way of tracking progress in this
case. I am fluent in English, so communication shouldn't be
a problem.
Of course, I would work as well with a local mentor with whom
I could discuss issues over a coffee.
During the bonding period I expect to complete some more tasks for
Amarok, possibly continuing what I started with [3] (in review
comments I mentioned streamlining track sorting and making it
consistent across the source code). I will also use this time to
thoroughly prepare for the coding period.
I'm seeing GSoC as a way to kickstart me into the open source
development at large. After the end of Google Summer of Code program
I hope to remain in Amarok community as a frequent contributor.
Junior job link:
[3]
References:
[1] http://www.last.fm/help/faq?category=Scrobbling
[2] http://www.x-formation.com/
[3] https://git.reviewboard.kde.org/r/110070/
[4] https://git.reviewboard.kde.org/r/110139/
[5] https://bugreports.qt-project.org/browse/QTBUG-26832
-------------- next part --------------
A non-text attachment was scrubbed...
Name: application.diff
Type: text/x-patch
Size: 7385 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/amarok-devel/attachments/20130424/20503ea5/attachment-0001.diff>
More information about the Amarok-devel
mailing list