GSoC application review - reimplementing personal metadata importers

Konrad Zemek konrad.zemek at gmail.com
Wed Apr 24 17:46:43 UTC 2013


On 24.04.2013 13:19, Matěj Laitl wrote:
>>> Sounds good, except for the renaming. StatSyncing::ITunesProvider sounds
>>> well for both cases.
>> I actually had a conception to leave importers where they are, or in a
>> directory called simply importers.
> Well, for class names I'd stick to nomenclature of the "parent" framework - we
> loosely name implementations after ABCs, like Collection -> MtpCollection,
> Track -> SqlTrack etc. Notice that the statsyncing framework supports equally
> read-write providers (CollectionProvider) and read-only ones (Last.fm).
>
> For code location I'd stick to src/statsyncing/* and set fire on
> src/databaseimporter - no need to keep historic cruft.

Right. Naming shouldn't be an issue.

>> I removed the mention about renaming because it's too insignificant to
>> even mention there, and especially considering how much explanation is
>> needed for it to make sense.
> Right, take the above comments as my suggestions for actual implementation
> rather than the proposal.

Your suggestions have been immensely helpful. :-) I feel that this 
application is swiftly approaching its final form; apart from possible 
changing "in review" -> "merged" and maybe adding links to some more 
tasks I'll do for Amarok, the only changes left should be cosmetic. That 
said, I wanted to (and did) include two more things. These are not 
implementation specific anymore, but I felt I should submit them here 
for completeness.

First, I added that my application has been discussed over on this 
mailing list. I know that several projects highly favor students that 
have engaged with the community about their project. Second, I mentioned 
that I'm a titled finalist of Polish Olympiad in Informatics. I'm not 
sure how I didn't add that before, but that's the highlight of my career 
so far; it is quite an important contest in Poland.

Please let me know if you feel something is off, and again - many thanks 
for your help!

> Cheers,
> 	Matěj

     Konrad


Name: Konrad Zemek
Email Address: konrad.zemek at gmail.com
Freenode IRC Nick: kzemek
IM Service and Username: Skype, handle: konrad-kun
Location: Kraków, Poland. GMT+2 during the Summer.

Proposal Title:
     Reimplement Amarok 1.4 (FastForward)/iTunes import on top of
     Statistics Synchronization, and new Amarok 2.x and Rhythmbox
     as synchronization targets.

Motivation for Proposal / Goal:
     Currently, Amarok has the ability to import personal track metadata,
     like playcounts, ratings and last played time, from Amarok 1.4 and
     iTunes databases. Although this code still fulfills its purpose,
     the StatSyncing framework has since been added to the codebase and
     serves as a perfect base for a rewrite of importers. Perhaps the
     most significant advantage that would be obtained by implementing
     importers on top of StatSyncing is the ability to resynchronize
     information and to work with collections other than local.
     Still, synchronization with Amarok 1.4 and iTunes covers only
     a small portion of users that may be willing to use Amarok 2.x,
     but are tied to their current music player because of the
     consequence of losing their personal metadata. To alleviate this,
     I intend to add Rhythmbox as a new target for synchronization. It
     stores following personal track metadata compatible with Amarok:
     * user rating
     * playcount
     * last played time

     Rhythmbox is the default GNOME's music player, and since
     Ubuntu 12.04, the music player shipped by default in Ubuntu. I feel
     that this is an important target for metadata synchronization,
     easing not only the transition between music players, but between
     whole desktop environments - GNOME and KDE.
     To complete the set, synchronization with Amarok 2.x will be added,
     so that user can synchronize his personal track metadata between
     separate Amarok databases.

Implementation Details:
     This project breaks up in a few parts with well-defined deliverables
     (details about the timeline are in the Tentative Timeline section).

     On the high level, the StatSyncing framework consists of the
     following elements
     * Controller, which registers providers and starts synchronization
     * Config, used to store preferences for statistics synchronization
       and can be set through UI
     * Provider, which works with the backend and provides
       StatSyncing::Tracks
     * Track, which is an abstraction of a track and contains relevant
       metadata used for trackmatching, as well as methods to modify it
     * Process, which is responsible for matching the tracks between
       collections and synchronizing them
     * ScrobblingService, which is used for scrobbling capabilities
       (transferring information about played tracks; see [1])

     The project can be broken into three parts; reimplementation of
     existing importers (read-only), introduction of new importers
     (read-only) and finally an optional goal of making the
     synchronization work in read-write mode (so that foreign statistics
     can be updated as well as local ones). This last goal is optional
     (soft goal).

     * Importer tests
         There are no tests for importing in the current trunk. Since
         a big part of this project consists of refactoring, I will have
         to prepare a quality suite of tests to make sure that
         I introduce no regressions.

         - Implement whole test suite using QTest and gmock best
           practices
         - Test cases would involve using a dummy input, invoking
           importer and verifying collection.
         - The actual logic behind importing tracks will be changed, so
           these tests must treat importer as a black box (black box
           integration tests)

     * Reimplementing importers
         - Reimplement importers to use StatSyncing
         - Data-reading modules, would be cleaned up and factored out
           into their own classes that will be used by a subclass of
           StatSyncing::Provider
         - Current services using StatSyncing framework will be used as
           a reference (primarily last.fm service)
         - At each step where the code is functional (at least between
           reimplementing FastForwardImporter and ITunesImporter),
           supplement the tests so that new capabilities are accounted
           for (e.g. make sure that foreign playcounts are not simply
           stacked on locally saved playcounts with each synchronization)
         - Additionally, unit tests can now be written to confirm
           correct implementation
         - The importer code may prove to be repeatable (in the extreme
           case the only thing differing these importers will be their
           respective parsers); this code will be deduplicated. If it
           fits its responsibilities, a parent class of importers should
           contain the duplicated logic, otherwise I will introduce new
           helper classes.

     * Implement Rhythmbox and Amarok 2.x importers
         Rhythmbox stores its personal metadata in an XML file, which
         should be easy to read through QXml SAX parser. When in doubt
         about the format, I will consult Rhythmbox source code.

         - Rhythmbox parser shall be robust, that is it shouldn't depend
           on the order of information (this is obvious, but important)
         - At this point both Amarok 1.4 and iTunes importers are
           rewritten, so their implementations will be used as
           a reference for Amarok 2.x and Rhythmbox importers.
         - The subgoal is to make sure that adding additional import
           targets is straightforward and easy, and if possible doesn't
           introduce code duplication (this may involve modifying
           StatSyncing framework).
         - As with reimplementing importers, every piece of working code
           should be tested to ensure that it is performing correctly,
           and so that regressions don't occur in the future.

Tentative Timeline:
     MIDTERM EVALUATION (deadline: August 02)
         The hard goal for midterm evaluation is to have both FastForward
         and iTunes importers reimplemented, and if possible (soft goal)
         unit-tested.

     FINAL EVALUATION (deadline: September 27)
         The hard goal for final evaluation is to have all importers
         implemented and working. The slightly softer goal is to have
         them unit-tested. The soft goal is to implement two-way
         synchronization.

     June 17 - 23:
         Drafting up test cases for FastForward and iTunes importers.
     June 24 - 30:
         Implementing the initial test suite for importers (exam week).
     July 01 - 07:
         Reimplementing FastForward importer, cleaning up data-reading
         module. At the end of the week I expect to have working
         implementation of FastForwardProvider.
     July 08 - 14:
         Reimplementing FastForward importer. At the end of the week I
         expect to have a functional importer, possibly with some quirks
         to iron out. Implementing unit tests to assert correctness.
     July 15 - 21:
         Reimplementing iTunes importer, cleaning up data-reading
         module. As with FastForward, at the end of the week the
         ITunesProvider should be working.
     July 22 - 28:
         Reimplementing iTunes importer. As with FastForward importer,
         after this period the importer should be functional.
         Implementing unit tests.
     July 29 - August 04: MIDTERM EVALUATION
         Ironing out last issues with reimplemented importers, submitting
         current work for midterm evaluation.
     August 05 - 11:
         In-depth research of Rhythmbox metadata format, implementing
         Rhythmbox importer. I do not expect the RhythmboxProvider to
         be fully functional at the end of the week.
     August 12 - 18:
         Implementing Rhythmbox importer. RhythmboxProvider should be
         working at the start of the week, whole synchronization should
         work at the end of the week. Implementing unit tests.
     August 19 - 25:
         Implementing Amarok 2.x importer. Like FastForward and ITunes,
         at the end of the week the AmarokProvider should be
         functional.
     August 26 - September 01:
         Implementing Amarok 2.x importer. Ironing out any problems
         with AmarokProvider. At the end of the week one-way import
         should be working. Implementing unit tests.
     September 02 - 08:
         Investigating and implementing two-way synchronization.
         I expect one working implementation of two-way synchronization
         at the end of the week.
     September 09 - 15:
         Implementing two-way synchronization for remaining importers.
     September 16 - 22:
         Ironing out bugs, writing remaining tests. Cushion period in
         case of delays.
     September 23 - 29: FINAL EVALUATION
         Ironing out bugs, writing remaining tests. Cushion period in
         case of delays. Submitting work for final evaluation

Obligations from late May to early August:
     I have school until the very early July, with exams starting the
     last week of June. I will be putting my work on hold for the whole
     period of Google Summer of Code, but up until early June I may still
     be required to commit at least a few hours a week to my current job.
     So from early June to late September I have no commitments except
     for school, and from early July to late September no commitments
     at all.

About me:
     I'm a student at AGH University of Science and Technology, Kraków,
     Poland. I study Computer Science, and I'm currently in my second
     year. At the moment I'm employed as a part time C++ programmer at
     X-Formation [2], Poland, where my work ranges from conducting
     interviews with potential employees, through refactoring and
     bugfixing to implementing new features for our product.

     I'm a big fan of open source, and KDE in general. I've been using
     Amarok a lot and have not only read the source, but contributed to
     it - patch [3] is awaiting final evaluation and merge, and patch
     [4] has recently been submitted for review. I'm currently working
     on more changes to Amarok's codebase.
     I am also familiar with Qt framework and have used it to prototype
     several small applications. I'm an early adopter, and my fiddling
     with Qt 5 beta has led me to uncover an important bug [5]. I also
     have source-level experience with Boost libraries.

     I'm very skilled in C++. Other than that I program mainly in Scala,
     and Java when required (and my university requires that a lot).
     I'm experienced with test driven development, continuous
     integration, code reviews and other agile techniques.
     I have work experience with both SVN and git, with the latter
     being my VCS of choice for personal projects.
     I also hold a title of finalist of Polish Olympiad in Informatics,
     a prestigious annual programming competition for Polish high
     school students [6] (about the Olympiad; Google translate).

     If my application is accepted, I am going to commit my full time to
     complete the project, including putting my job on hold for the
     duration of GSoC. I expect to be working 40 hours a week.I am
     comfortable with working independently under a supervisor living
     on the other side of the globe. Although I have not have worked
     in this style before, I foresee no problems with this and can even
     change my working hours if needed. I think that code reviews
     and email contact are the best way of tracking progress in this
     case. I am fluent in English, so communication shouldn't be
     a problem.
     Of course, I would work as well with a local mentor with whom
     I could discuss issues over a coffee.

     During the bonding period I expect to complete some more tasks for
     Amarok, possibly continuing what I started with [3] (in review
     comments I mentioned streamlining track sorting and making it
     consistent across the source code). I will also use this time to
     thoroughly prepare for the coding period.

     I'm seeing GSoC as a way to kickstart me into the open source
     development at large. After the end of Google Summer of Code program
     I hope to remain in Amarok community as a frequent contributor.

     This proposal has been discussed over on Amarok-devel mailing
     list. [7]

Junior job link:
     [3]

References:
[1] http://www.last.fm/help/faq?category=Scrobbling
[2] http://www.x-formation.com/
[3] https://git.reviewboard.kde.org/r/110070/
[4] https://git.reviewboard.kde.org/r/110139/
[5] https://bugreports.qt-project.org/browse/QTBUG-26832
[6] 
http://translate.google.com/translate?js=n&sl=pl&tl=en&u=http://www.oi.edu.pl/l/35/
[7] http://mail.kde.org/pipermail/amarok-devel/2013-April/011946.html

-------------- next part --------------
A non-text attachment was scrubbed...
Name: application.diff
Type: text/x-patch
Size: 1629 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/amarok-devel/attachments/20130424/78d19fbe/attachment-0001.diff>


More information about the Amarok-devel mailing list