GSoC application review - reimplementing personal metadata importers

Konrad Zemek konrad.zemek at gmail.com
Wed Apr 24 02:33:13 UTC 2013


On 22.04.2013 14:56, Matěj Laitl wrote:
> On 21. 4. 2013 Konrad Zemek wrote:
>> 2013/4/20 Matěj Laitl <matej at laitl.cz>:
>>> On 20. 4. 2013 Konrad Zemek wrote:
>>> I think there are no importer tests, please check it and then be more
>>> certain in the proposal.
>> I did actually check that, but the lack of tests is a bit unusual for
>> me, so I figured they may be just well-hidden somewhere. I've
>> conducted more extensive research, and there are indeed none.
> Yeah, we've been more serious about test-driven-development only very
> recently. :) One reason is that our "business logic" code is unfortunately
> often interwined with GUI code, which makes testing unpleasant.
>
>> What I meant were integration tests with importer implementation as a
>> black box. I'm sure they can be written so that they bypass any
>> interactivity, and they would help me a lot with refactoring (as well
>> as providing a "done" condition).
> Okay, should be doable. Be warned that some of our code is not testing-
> friendly - which is another opportunity for you to make it so and be of even
> greater benefit to Amarok. ;)

Well, it wouldn't be fun if it were too easy. ;-)

>>> Nice. I should be able to review & merge your review request soon, but
>>> please still make a distinction between already merged and yet-in-review
>>> patches. Having submitted/merged one more little patch would be a small
>>> bonus, too.
>> I understand "contribute" as "write and submit code, hoping it will be
>> helpful", not necessarily that it's merged. Although again, I very
>> well recognize how it can be understood the other way (and probably my
>> understanding here is different than commonly accepted definition). I
>> didn't change that in hopes that it will be true in both senses by the
>> time of application; I will rewrite it if not.
> Well, unfortunately we've already seen review requests with just random (and
> actually harmful) changes -- I don't think we should consider these
> "contributions". This is certainly not the case for your well-done and non-
> trivial patch, which I should be able to merge soon.
>
>> And as much as I'd love to submit a lot more patches right now, I
>> can't really do that this week. I have an exam coming in the next few
>> days, as well as some mid-term tests.
> I see, good luck with your exams! The deadline for proposals is May 3, so
> maybe you'll manage to at least submit a small change by then.
>

I've just submitted a small patch which just "happened" as I was working 
on some other things. What I'm worried about is that every GSoC resource 
states in bold letters that application should be submitted early rather 
than late.

>> To alleviate this, I intend to add Rhythmbox as a new target for
>> synchronization. It stores following personal track metadata compatible with
>> Amarok:
>>          * user rating
>>          * playcount
>>          * last played time
>>          * time when the song was first added to collection
> Good. Note that "added to collection" is currently not part of
> synchronization. (you could add it, but maybe it isn't worth the effort)

I removed it from the description. Rating, playcount and last time 
played are already enough, and I think "time added to collection" is 
actually a bit of a shady thing to synchronize (i.e. I'm not sure if 
it's even correct to synchronize it).

> Also note that current iTunes import matches tracks by just filenames. I
> secretly hope that one can get metadata (at least: artist, album, title;
> better also: track number, disc number, album artist) out of iTunes db, but
> please check - inability to do so would be a show-stopper for iTunes provider.
>

Are you certain about that? I reviewed iTunes importer, and it reads 
information from an XML file. iTunes maintains an "iTunes Library.XML" 
file for the purpose of compatibility with other applications, and 
although I couldn't find an official specification for it, I found some 
example databases which contained all of the metadata you mention (and 
some more).

>> Implementation Details:
>>    (...)
>>          On the high level, the StatSyncing framework consists of the
>>          following elements
>>            (...)
>>          * Track, which is an abstraction of a track and contains relevant
>>            metadata used for trackmatching
> ...and methods to set the metadata.

Right.

>>          * Process, which is responsible for matching the tracks between
>>            collection and synchronizing them
> "collections"
>
>>          The project can be broken into three parts; reimplementation of
>>          existing importers (read-only), introduction of new importers
>>          (read-only) and finally an optional goal of making the
>>          synchronization work in read-write mode (so that foreign statistics
>>          can be updated as well as local ones). This last goal is optional
>>          (soft goal), and will involve renaming importers to reflect their
>>          new capabilities.
> Sounds good, except for the renaming. StatSyncing::ITunesProvider sounds well
> for both cases.

I actually had a conception to leave importers where they are, or in a 
directory called simply importers. After implementing two-way 
synchronization, importing would be only one of capabilities (the other 
being of course exporting), so that name would no longer be appropriate. 
:-) Of course all of that is still in the phase of a "fuzzy idea".
I removed the mention about renaming because it's too insignificant to 
even mention there, and especially considering how much explanation is 
needed for it to make sense.

>
>> Tentative Timeline:
>>          MIDTERM EVALUATION (deadline: August 02)
>>                  The hard goal for midterm evaluation is to have both
>> FastForward and iTunes importers reimplemented, and if possible (soft goal)
>> unit-tested.
> Good.
>
>>          FINAL EVALUATION (deadline: September 27)
>>                  The hard goal for final evaluation is to have all importers
>>                  implemented and working. The slightly softer goal is to have
>> them unit-tested. The soft goal is to implement two-way synchronization.
> This looks well too.
>
>>          June 17 - 23:
>>                  Drafting up test cases for FastForward and iTunes importers
>>          June 24 - 30:
>>                  Implementing the initial test suite for importers (exam
>> week) July 01 - 07:
>>                  Reimplementing FastForward importer, cleaning up
>> data-reading module, introducing StatSyncing framework
> Ehm, I think I know what you mean, but "introducing StatSyncing framework"
> sounds like it didn't existed before.
>
>>          July 08 - 14:
>>                  Reimplementing FastForward importer, implementing unit tests
>> July 15 - 21:
>>                  Reimplementing iTunes importer, cleaning up data-reading
>> module July 22 - 28:
>>                  Reimplementing iTunes importer, implementing unit tests
>>          July 29 - August 04: MIDTERM EVALUATION
>>                  Ironing out last issues with reimplemented importers,
>> submitting current work for midterm evaluation
>>          August 05 - 11:
>>                  In-depth research of Rhythmbox metadata format, implementing
>> Rhythmbox importer
>>          August 12 - 18:
>>                  Implementing Rhythmbox importer, implementing unit tests
>>          August 19 - 25:
>>                  Implementing Amarok 2.x importer
>>          August 26 - September 01:
>>                  Implementing Amarok 2.x importer, unit tests
>>          September 02 - 08:
>>                  Investigating and implementing two-way synchronization
>>          September 09 - 15:
>>                  Implementing two-way synchronization
>>          September 16 - 22:
>>                  Ironing out bugs, cushion period in case of delays
>>          September 23 - 29: FINAL EVALUATION
>>                  Ironing out bugs, cushion period in case of delays,
>> submitting work for final evaluation
> Looks better than the original timeline. OTOH the general remark (not
> necessarily mine, I just predict what the reviewers could say) to me more
> detailed remains. Perhaps you can mention class names, what would be working
> at the time, what will be NOT working yet...
>
>>          I'm a big fan of open source, and KDE in general. I've used Amarok
>>          a lot and not only read the source,
> This sounds like you don't use it anymore. perhaps "I've been using ..."?
>
>>          I am also familiar with Qt framework and have used it to prototype
>>          several small applications. I'm an early adopter, and my fiddling
>>          with Qt 5 beta has led me to uncover an important bug [4].
> Nice.
>
>>          I'm very skilled in C++. Other than that I program mainly in Scala,
>>          and Java when required (and my university requires that a lot).
>>          I'm experienced with test driven development, continuous
>>          integration, code reviews and other agile techniques.
> Do you have some experience with git and/or other VCSes? (self-assessment:
> would you be able to explain difference between `git rebase` and `git merge`?)
> But don't actually explain it in the proposal. :-)

I can! ;-)

>
>>          If my application is accepted, I am going to commit my full time to
>>          complete the project, including putting my job on hold for the
>>          duration of GSoC. I expect to be working 40 hours a week.
> Nice. What are your plans after the GSoC has ended? Do they involve Amarok in
> any way?
>
> [the questions above are rather hints for the proposal than something I'd like
> immediately answered]
>
> This starts to look really promising,
> 		Matěj

As always, many thanks for your help! I'm including the next iteration 
below (with diff in the attachment). There may be some errors due to the 
late hour - the day does not have quite enough hours lately to squeeze 
in everything I need to do.

         Konrad

Name: Konrad Zemek
Email Address: konrad.zemek at gmail.com
Freenode IRC Nick: kzemek
IM Service and Username: Skype, handle: konrad-kun
Location: Kraków, Poland. GMT+2 during the Summer.

Proposal Title:
     Reimplement Amarok 1.4 (FastForward)/iTunes import on top of
     Statistics Synchronization, and new Amarok 2.x and Rhythmbox
     as synchronization targets.

Motivation for Proposal / Goal:
     Currently, Amarok has the ability to import personal track metadata,
     like playcounts, ratings and last played time, from Amarok 1.4 and
     iTunes databases. Although this code still fulfills its purpose,
     the StatSyncing framework has since been added to the codebase and
     serves as a perfect base for a rewrite of importers. Perhaps the
     most significant advantage that would be obtained by implementing
     importers on top of StatSyncing is the ability to resynchronize
     information and to work with collections other than local.
     Still, synchronization with Amarok 1.4 and iTunes covers only
     a small portion of users that may be willing to use Amarok 2.x,
     but are tied to their current music player because of the
     consequence of losing their personal metadata. To alleviate this,
     I intend to add Rhythmbox as a new target for synchronization. It
     stores following personal track metadata compatible with Amarok:
     * user rating
     * playcount
     * last played time

     Rhythmbox is the default GNOME's music player, and since
     Ubuntu 12.04, the music player shipped by default in Ubuntu. I feel
     that this is an important target for metadata synchronization,
     easing not only the transition between music players, but between
     whole desktop environments - GNOME and KDE.
     To complete the set, synchronization with Amarok 2.x will be added,
     so that user can synchronize his personal track metadata between
     separate Amarok databases.

Implementation Details:
     This project breaks up in a few parts with well-defined deliverables
     (details about timeline are in Tentative Timeline section).

     On the high level, the StatSyncing framework consists of the
     following elements
     * Controller, which registers providers and starts synchronization
     * Config, used to store preferences for statistics synchronization
       and can be set through UI
     * Provider, which works with the backend and provides
       StatSyncing::Tracks
     * Track, which is an abstraction of a track and contains relevant
       metadata used for trackmatching, as well as methods to modify it
     * Process, which is responsible for matching the tracks between
       collections and synchronizing them
     * ScrobblingService, which is used for scrobbling capabilities
       (transferring information about played tracks; see [1])

     The project can be broken into three parts; reimplementation of
     existing importers (read-only), introduction of new importers
     (read-only) and finally an optional goal of making the
     synchronization work in read-write mode (so that foreign statistics
     can be updated as well as local ones). This last goal is optional
     (soft goal).

     * Importer tests
         There are no tests for importing in the current trunk. Since
         a big part of this project consists of refactoring, I will have
         to prepare a quality suite of tests to make sure that
         I introduce no regressions.

         - Implement whole test suite using QTest and gmock best
           practices
         - Test cases would involve using a dummy input, invoking
           importer and verifying collection.
         - The actual logic behind importing tracks will be changed, so
           these tests must treat importer as a black box (black box
           integration tests)

     * Reimplementing importers
         - Reimplement importers to use StatSyncing
         - Data-reading modules, would be cleaned up and factored out
           into their own classes that will be used by a subclass of
           StatSyncing::Provider
         - Current services using StatSyncing framework will be used as
           a reference (primarily last.fm service)
         - At each step where the code is functional (at least between
           reimplementing FastForwardImporter and ITunesImporter),
           supplement the tests so that new capabilities are accounted
           for (e.g. make sure that foreign playcounts are not simply
           stacked on locally saved playcounts with each synchronization)
         - Additionally, unit tests can now be written to confirm
           correct implementation
         - The importer code may prove to be repeatable (in the extreme
           case the only thing differing these importers will be their
           respective parsers); this code will be deduplicated. If it
           fits its responsibilities, a parent class of importers should
           contain the duplicated logic, otherwise I will introduce new
           helper classes.

     * Implement Rhythmbox and Amarok 2.x importers
         Rhythmbox stores its personal metadata in an XML file, which
         should be easy to read through QXml SAX parser. When in doubt
         about the format, I will consult Rhythmbox source code.

         - Rhythmbox parser shall be robust, that is it shouldn't depend
           on the order of information (this is obvious, but important)
         - At this point both Amarok 1.4 and iTunes importers are
           rewritten, so their implementations will be used as
           a reference for Amarok 2.x and Rhythmbox importers.
         - The subgoal is to make sure that adding additional import
           targets is straightforward and easy, and if possible doesn't
           introduce code duplication (this may involve modifying
           StatSyncing framework).
         - As with reimplementing importers, every piece of working code
           should be tested to ensure that it is performing correctly,
           and so that regressions don't occur in the future.

Tentative Timeline:
     MIDTERM EVALUATION (deadline: August 02)
         The hard goal for midterm evaluation is to have both FastForward
         and iTunes importers reimplemented, and if possible (soft goal)
         unit-tested.

     FINAL EVALUATION (deadline: September 27)
         The hard goal for final evaluation is to have all importers
         implemented and working. The slightly softer goal is to have
         them unit-tested. The soft goal is to implement two-way
         synchronization.

     June 17 - 23:
         Drafting up test cases for FastForward and iTunes importers.
     June 24 - 30:
         Implementing the initial test suite for importers (exam week).
     July 01 - 07:
         Reimplementing FastForward importer, cleaning up data-reading
         module. At the end of the week I expect to have working
         implementation of FastForwardProvider.
     July 08 - 14:
         Reimplementing FastForward importer. At the end of the week I
         expect to have a functional importer, possibly with some quirks
         to iron out. Implementing unit tests to assert correctness.
     July 15 - 21:
         Reimplementing iTunes importer, cleaning up data-reading
         module. As with FastForward, at the end of the week the
         ITunesProvider should be working.
     July 22 - 28:
         Reimplementing iTunes importer. As with FastForward importer,
         after this period the importer should be functional.
         Implementing unit tests.
     July 29 - August 04: MIDTERM EVALUATION
         Ironing out last issues with reimplemented importers, submitting
         current work for midterm evaluation.
     August 05 - 11:
         In-depth research of Rhythmbox metadata format, implementing
         Rhythmbox importer. I do not expect the RhythmboxProvider to
         be fully functional at the end of the week.
     August 12 - 18:
         Implementing Rhythmbox importer. RhythmboxProvider should be
         working at the start of the week, whole synchronization should
         work at the end of the week. Implementing unit tests.
     August 19 - 25:
         Implementing Amarok 2.x importer. Like FastForward and ITunes,
         at the end of the week the AmarokProvider should be
         functional.
     August 26 - September 01:
         Implementing Amarok 2.x importer. Ironing out any problems
         with AmarokProvider. At the end of the week one-way import
         should be working. Implementing unit tests.
     September 02 - 08:
         Investigating and implementing two-way synchronization.
         I expect one working implementation of two-way synchronization
         at the end of the week.
     September 09 - 15:
         Implementing two-way synchronization for remaining importers.
     September 16 - 22:
         Ironing out bugs, writing remaining tests. Cushion period in
         case of delays.
     September 23 - 29: FINAL EVALUATION
         Ironing out bugs, writing remaining tests. Cushion period in
         case of delays. Submitting work for final evaluation

Obligations from late May to early August:
     I have school until the very early July, with exams starting the
     last week of June. I will be putting my work on hold for the whole
     period of Google Summer of Code, but up until early June I may still
     be required to commit at least a few hours a week to my current job.
     So from early June to late September I have no commitments except
     for school, and from early July to late September no commitments
     at all.

About me:
     I'm a student at AGH University of Science and Technology, Kraków,
     Poland. I study Computer Science, and I'm currently in my second
     year. At the moment I'm employed as a part time C++ programmer at
     X-Formation [2], Poland, where my work ranges from conducting
     interviews with potential employees, through refactoring and
     bugfixing to implementing new features for our product.

     I'm a big fan of open source, and KDE in general. I've been using
     Amarok a lot and have not only read the source, but contributed to
     it - patch [3] is awaiting final evaluation and merge, and patch
     [4] has recently been submitted for review. I'm currently working
     on more changes to Amarok's codebase.
     I am also familiar with Qt framework and have used it to prototype
     several small applications. I'm an early adopter, and my fiddling
     with Qt 5 beta has led me to uncover an important bug [5]. I also
     have source-level experience with Boost libraries.

     I'm very skilled in C++. Other than that I program mainly in Scala,
     and Java when required (and my university requires that a lot).
     I'm experienced with test driven development, continuous
     integration, code reviews and other agile techniques.
     I have work experience with both SVN and git, with the latter
     being my VCS of choice for personal projects.

     If my application is accepted, I am going to commit my full time to
     complete the project, including putting my job on hold for the
     duration of GSoC. I expect to be working 40 hours a week.I am
     comfortable with working independently under a supervisor living
     on the other side of the globe. Although I have not have worked
     in this style before, I foresee no problems with this and can even
     change my working hours if needed. I think that code reviews
     and email contact are the best way of tracking progress in this
     case. I am fluent in English, so communication shouldn't be
     a problem.
     Of course, I would work as well with a local mentor with whom
     I could discuss issues over a coffee.

     During the bonding period I expect to complete some more tasks for
     Amarok, possibly continuing what I started with [3] (in review
     comments I mentioned streamlining track sorting and making it
     consistent across the source code). I will also use this time to
     thoroughly prepare for the coding period.

     I'm seeing GSoC as a way to kickstart me into the open source
     development at large. After the end of Google Summer of Code program
     I hope to remain in Amarok community as a frequent contributor.

Junior job link:
     [3]

References:
[1] http://www.last.fm/help/faq?category=Scrobbling
[2] http://www.x-formation.com/
[3] https://git.reviewboard.kde.org/r/110070/
[4] https://git.reviewboard.kde.org/r/110139/
[5] https://bugreports.qt-project.org/browse/QTBUG-26832

-------------- next part --------------
A non-text attachment was scrubbed...
Name: application.diff
Type: text/x-patch
Size: 7385 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/amarok-devel/attachments/20130424/20503ea5/attachment-0001.diff>


More information about the Amarok-devel mailing list