GSoC application review - reimplementing personal metadata importers

Konrad Zemek konrad.zemek at gmail.com
Sun Apr 21 14:02:20 UTC 2013


2013/4/20 Matěj Laitl <matej at laitl.cz>:
> On 20. 4. 2013 Konrad Zemek wrote:
>> Hi fellow developers,
>
> Hi,

Hi Matěj,
Thanks for the review. You do a lot of things around here. ;-)  I
incorporated all of your suggestions, but I wanted to reply on some
points.

>> For example I have experience with Boost, and
>> submitted a bug report recently with a proposed fix, which was
>> promptly integrated into upstream. On the one hand that's experience
>> with developing and open source, on the other hand Amarok has nothing
>> to do with Boost. Should I add that?
>
> Perhaps just a sentence.
>
>> Motivation for Proposal / Goal:
>>       Currently, Amarok has the ability to sync personal track metadata
>>       with Amarok 1.4 and iTunes databases.
>
> Not true, just one-way import. You should probably mention what these
> "personal data" are, it may not be clear to someone new to Amarok.

I meant sync as a one-way import, although I see how this can mean
both-way as well; it's clarified now.

>>       Although this code still
>>       fulfills its purpose, the StatSyncing framework has since been added
>>       to the codebase, and serves as a perfect base for a rewrite of
>>       importers. Perhaps the most significant advantage implementation
>>       on top of StatSyncing offers over the current one, is the ability
>>       to resynchronize information and to work with collections other
>>       than local.
>
> English probably uses less commas than our Slavic languages; *of* an
> implementation?
>
>>       Still, synchronization with Amarok 1.4 and iTunes covers only
>>       a small portion of users that may be willing to use Amarok 2.x,
>>       but because of the consequences of losing their personal metadata,
>>       are tied to their current music player. To alleviate this, I intend
>>       to add Rhythmbox as a new target for synchronization.
>
> What type of statistics that would be compatible with Amarok Rhythmbox
> maintains? Please research and mention it in the proposal.
>
>>       Rhythmbox
>>       it the default GNOME's music player, and since Ubuntu 12.04, the
>>       music player shipped by default in Ubuntu. I feel that this is an
>>       important target for metadata synchronization, easing not only the
>>       transition between music players, but between whole desktop
>>       environments - GNOME and KDE.
>
> Right.
>
>>       To complete the set, synchronization with Amarok 2.x will be added,
>>       so that user can synchronize his personal track metadata between
>>       separate Amarok databases.
>
> Good.
>
>> Implementation Details:
>>       This project breaks up in a few parts with well-defined deliverables
>>       (details about timeline are in Tentative Timeline section).
>>
>>       * Importer tests
>>               There are either no tests for importing in the current trunk,
>>               or they are in a wrong place. Regardless, since a big part of
>>               this project consists of refactoring, I will have to prepare
>>               a quality suite of tests to make sure that I introduce no
>>               regressions.
>
> I think there are no importer tests, please check it and then be more certain
> in the proposal.

I did actually check that, but the lack of tests is a bit unusual for
me, so I figured they may be just well-hidden somewhere. I've
conducted more extensive research, and there are indeed none.

>>               - Review current importer tests (if any)
>
> Better to check before submitting the proposal.
>
>>               - Implement whole test suite using QTest and gmock best
>>               practices
>>               - The main test cases would involve using a dummy input,
>>               invoking importer and verifying collection.
>>               - For best test results, at least the collection class may
>>               be mocked
>
> I think you don't know yet what you'll have to mock, so better be more
> general.
>
> I think there could be 2 types of tests: unit-test that would check whether
> you implement the trackprovider API correctly and return tracks with right
> metadata (when given a pre-filled database to import), and integration tests
> (perhaps optional if time permits?), which would actually run the
> synchronization and check the result. (problem: interactivity)

What I meant were integration tests with importer implementation as a
black box. I'm sure they can be written so that they bypass any
interactivity, and they would help me a lot with refactoring (as well
as providing a "done" condition).

>>       * Reimplementing importers
>>               - Reimplement importers to use StatSyncing
>>               - Data-reading modules, after needed changes, would become track
>>               providers
>
> Using fully qualified class name would clear any doubt from what you mean by
> "track providers". (because we have multiple in Amarok)
>
>>               - Current services using StatSyncing framework will be used
>>               as a reference (primarily last.fm service)
>>               - Clean-up the code responsible for data-reading
>
> I think you rather mean factor out data-reading code from importers to newly
> written trackproviders.
>
>>               - At each step where the code is functional (at least between
>>               reimplementing FastForwardImporter and ITunesImporter),
>>               supplement the tests so that new capabilities are accounted for
>>               (e.g. make sure that foreign playcounts are not simply stacked
>>               on locally saved playcounts with each synchronization)
>
> The example would be actually testing the statsyncing framework, but is still
> beneficial.
>
>>               - The importer code may prove to be repeatable (in the extreme
>>               case the only thing differing these importers will be their
>>               respective parsers); this code will be deduplicated. If it fits
>>               DatabaseImporter (a parent class for importers) responsibilities
>>               then the code should land in DatabaseImporter implementation,
>>               otherwise I will introduce new helper classes.
>
> Hmm, I think DatabaseImporter serves as a common superclass to the 2
> importers, now the track provider will serve the role. OTOH yes, it is
> important not to duplicate code, perhaps using mentioned helper classes and/or
> more elaborate class hierarchy.
>
> You should also mention whether the newly implemented provider would be read-
> only or whether you plan to make some of them read-write (perhaps optionally
> if time permits?).
>
> To prove that you understand the StatSyncing framework, you could also provide
> high-level *brief* overview how it currently works, what are the involved
> classes and what's their purpose. This is the rare part of Amarok that is
> actually documented, so it shouldn't be hard.

I included an overview of what I think to be the main elements. Nice
job on that by the way! It's really nicely documented.

>>       * Implement Rhythmbox and Amarok 2.x importers
>>               Rhythmbox stores its personal metadata in an XML file, which
>>               should be easy to read through QXml SAX parser. When in doubt
>>               about the format, I will consult Rhythmbox source code.
>>
>>               - Rhythmbox parser shall be robust, that is it shouldn't depend
>>               on the order of information (this is obvious, but important)
>>               - At this point both Amarok 1.4 and iTunes importers are
>>               rewritten, so their implementations will be used as a reference
>>               for Amarok 2.x and Rhythmbox importers.
>>               - The subgoal is to make sure that adding additional import
>>               targets is straightforward and easy, and if possible doesn't
>>               introduce code duplication.
>
> Yeah, nice point. That may actually involve touching the StatSyncing framework
> itself however.
>
>>               - As with reimplementing importers, every piece of working code
>>               should be tested to ensure that it is performing correctly,
>>               and so that regressions don't occur in the future.
>
> Right. The implementation part looks good otherwise from my PoV.
>
>> Tentative Timeline:
>>       June 17 - 23:
>>               Investigating tests and drafting up test cases
>>       June 24 - 30:
>>               Investigating tests and drafting up test cases (exam week)
>>       July 01 - 07:
>>               Preparing the test suite
>>       July 08 - 14:
>>               Implementing the test cases
>>       July 15 - 21:
>>               Reimplementing FastForward importer
>>       July 22 - 28:
>>               Reimplementing FastForward importer / iTunes importer
>>       July 29 - August 04:
>>               Reimplementing iTunes importer
>>       August 05 - 11:
>>               Implementing test cases for new functionality
>>       August 12 - 18:
>>               In-depth research of Rhythmbox metadata format
>>       August 19 - 25:
>>               Implementing Rhythmbox importer
>>       August 26 - September 01:
>>               Implementing Rhythmbox importer / Amarok 2.x importer
>>       September 02 - 08:
>>               Implementing Amarok 2.x importer
>>       September 09 - 15:
>>               Implementing test cases for new importers
>>       September 16 - 22:
>>               fix bugs, QA
>>       September 23 - 29:
>>               fix bugs, QA
>
> Hmmm, thinking more about the "fist tests, then reimplementation" approach I
> think it would least to a fair deal of wasted work. I agree that this is the
> right approach for "pure" reimplementation, but here you are actually
> reimplementing *and* changing the function. I think the tests you would write
> for old importers would require so significant changes to be adapted to
> StatSyncing that it would equal writing a brand new test suite.
>
> Therefore I think it would be best if you started with FastForward importer
> and then implement tests right after it starts to work, similarly for other
> importers. Writing tests right with the implementation might also be less
> monotonous. I also think you'll need more time for FF and iTunes iporters than
> 3 weeks.
>
> It may be wise to mention where the mid-term evaluation is and to provide
> criteria for us to judge your work. Perhaps your hard goal would be
> reimplemented FF and iTunes importers and soft goal to have them tested? What
> should be the criteria for final evaluation?
>
> Other general remark could be that the timeline could be more detailed.
>
>>       I'm a big fan of open source, and KDE in general. I've used Amarok
>>       a lot and not only read the source, but contributed to it [2].
>>       I am also familiar with Qt framework and have used it to prototype
>>       several small applications. I'm an early adopter, and my fiddling
>>       with Qt 5 beta has led me to uncover an important bug [3].
>
> Nice. I should be able to review & merge your review request soon, but please
> still make a distinction between already merged and yet-in-review patches.
> Having submitted/merged one more little patch would be a small bonus, too.

I understand "contribute" as "write and submit code, hoping it will be
helpful", not necessarily that it's merged. Although again, I very
well recognize how it can be understood the other way (and probably my
understanding here is different than commonly accepted definition). I
didn't change that in hopes that it will be true in both senses by the
time of application; I will rewrite it if not.
And as much as I'd love to submit a lot more patches right now, I
can't really do that this week. I have an exam coming in the next few
days, as well as some mid-term tests.

>>       If my application is accepted, I am going to commit my full time to
>>       complete the project, including putting my job on hold for the
>>       duration of GSoC. I expect to be working 40 hours a week. I am
>>       comfortable with working independently under a supervisor living
>>       on the other side of the globe. Although I have not have worked
>>       in this style before, I foresee no problems with this and can even
>>       change my working hours if needed. I think that code reviews
>>       and email contact are the best way of tracking progress in this
>>       case. I am fluent in English, so communication should be a problem.
>
> "shouldn't"? ;-) Otherwise looks well.

Oh, right. ;-)

>>       Of course I would work as well with a local mentor, with whom
>>       I could discuss issues over a coffee.
>>
>>       During the bonding period I expect to complete some more tasks for
>>       Amarok, possibly continuing what I started with [2] (in review
>>       comments I mentioned streamlining track sorting and making it
>>       consistent across the source code).
>
> If you have time, we would expect you'd also do more research on how you'll
> proceed with the implementation and perhaps some high-level sketching of the
> design.
>
> Regards,
>                 Matěj

Below is the new version of the application:

Name: Konrad Zemek
Email Address: konrad.zemek at gmail.com
Freenode IRC Nick: kzemek
IM Service and Username: Skype, handle: konrad-kun
Location: Kraków, Poland. GMT+2 during the Summer.

Proposal Title:
        Reimplement Amarok 1.4 (FastForward)/iTunes import on top of
        Statistics Synchronization, and new Amarok 2.x and Rhythmbox
        as synchronization targets.

Motivation for Proposal / Goal:
        Currently, Amarok has the ability to import personal track metadata,
        like playcounts, ratings and last played time, from Amarok 1.4 and
        iTunes databases. Although this code still fulfills its purpose,
        the StatSyncing framework has since been added to the codebase and
        serves as a perfect base for a rewrite of importers. Perhaps the
        most significant advantage that would be obtained by implementing
        importers on top of StatSyncing is the ability to resynchronize
        information and to work with collections other than local.
        Still, synchronization with Amarok 1.4 and iTunes covers only
        a small portion of users that may be willing to use Amarok 2.x,
        but are tied to their current music player because of the
        consequence of losing their personal metadata. To alleviate this,
        I intend to add Rhythmbox as a new target for synchronization. It
        stores following personal track metadata compatible with Amarok:
        * user rating
        * playcount
        * last played time
        * time when the song was first added to collection

        Rhythmbox is the default GNOME's music player, and since
        Ubuntu 12.04, the music player shipped by default in Ubuntu. I feel
        that this is an important target for metadata synchronization,
        easing not only the transition between music players, but between
        whole desktop environments - GNOME and KDE.
        To complete the set, synchronization with Amarok 2.x will be added,
        so that user can synchronize his personal track metadata between
        separate Amarok databases.

Implementation Details:
        This project breaks up in a few parts with well-defined deliverables
        (details about timeline are in Tentative Timeline section).

        On the high level, the StatSyncing framework consists of the
        following elements
        * Controller, which registers providers and starts synchronization
        * Config, used to store preferences for statistics synchronization
          and can be set through UI
        * Provider, which works with the backend and provides
          StatSyncing::Tracks
        * Track, which is an abstraction of a track and contains relevant
          metadata used for trackmatching
        * Process, which is responsible for matching the tracks between
          collection and synchronizing them
        * ScrobblingService, which is used for scrobbling capabilities
          (transferring information about played tracks; see [1])

        The project can be broken into three parts; reimplementation of
        existing importers (read-only), introduction of new importers
        (read-only) and finally an optional goal of making the
        synchronization work in read-write mode (so that foreign statistics
        can be updated as well as local ones). This last goal is optional
        (soft goal), and will involve renaming importers to reflect their
        new capabilities.

        * Importer tests
                There are no tests for importing in the current trunk. Since
                a big part of this project consists of refactoring, I will have
                to prepare a quality suite of tests to make sure that
                I introduce no regressions.

                - Implement whole test suite using QTest and gmock best
                  practices
                - Test cases would involve using a dummy input, invoking
                  importer and verifying collection.
                - The actual logic behind importing tracks will be changed, so
                  these tests must treat importer as a black box (black box
                  integration tests)

        * Reimplementing importers
                - Reimplement importers to use StatSyncing
                - Data-reading modules, would be cleaned up and factored out
                  into their own classes that will be used by a subclass of
                  StatSyncing::Provider
                - Current services using StatSyncing framework will be used as
                  a reference (primarily last.fm service)
                - At each step where the code is functional (at least between
                  reimplementing FastForwardImporter and ITunesImporter),
                  supplement the tests so that new capabilities are accounted
                  for (e.g. make sure that foreign playcounts are not simply
                  stacked on locally saved playcounts with each synchronization)
                - Additionally, unit tests can now be written to confirm
                  correct implementation
                - The importer code may prove to be repeatable (in the extreme
                  case the only thing differing these importers will be their
                  respective parsers); this code will be deduplicated. If it
                  fits its responsibilities, a parent class of importers should
                  contain the duplicated logic, otherwise I will introduce new
                  helper classes.

        * Implement Rhythmbox and Amarok 2.x importers
                Rhythmbox stores its personal metadata in an XML file, which
                should be easy to read through QXml SAX parser. When in doubt
                about the format, I will consult Rhythmbox source code.

                - Rhythmbox parser shall be robust, that is it shouldn't depend
                  on the order of information (this is obvious, but important)
                - At this point both Amarok 1.4 and iTunes importers are
                  rewritten, so their implementations will be used as
                  a reference for Amarok 2.x and Rhythmbox importers.
                - The subgoal is to make sure that adding additional import
                  targets is straightforward and easy, and if possible doesn't
                  introduce code duplication (this may involve modifying
                  StatSyncing framework).
                - As with reimplementing importers, every piece of working code
                  should be tested to ensure that it is performing correctly,
                  and so that regressions don't occur in the future.

Tentative Timeline:
        MIDTERM EVALUATION (deadline: August 02)
                The hard goal for midterm evaluation is to have both FastForward
                and iTunes importers reimplemented, and if possible (soft goal)
                unit-tested.

        FINAL EVALUATION (deadline: September 27)
                The hard goal for final evaluation is to have all importers
                implemented and working. The slightly softer goal is to have
                them unit-tested. The soft goal is to implement two-way
                synchronization.

        June 17 - 23:
                Drafting up test cases for FastForward and iTunes importers
        June 24 - 30:
                Implementing the initial test suite for importers (exam week)
        July 01 - 07:
                Reimplementing FastForward importer, cleaning up data-reading
                module, introducing StatSyncing framework
        July 08 - 14:
                Reimplementing FastForward importer, implementing unit tests
        July 15 - 21:
                Reimplementing iTunes importer, cleaning up data-reading module
        July 22 - 28:
                Reimplementing iTunes importer, implementing unit tests
        July 29 - August 04: MIDTERM EVALUATION
                Ironing out last issues with reimplemented importers, submitting
                current work for midterm evaluation
        August 05 - 11:
                In-depth research of Rhythmbox metadata format, implementing
                Rhythmbox importer
        August 12 - 18:
                Implementing Rhythmbox importer, implementing unit tests
        August 19 - 25:
                Implementing Amarok 2.x importer
        August 26 - September 01:
                Implementing Amarok 2.x importer, unit tests
        September 02 - 08:
                Investigating and implementing two-way synchronization
        September 09 - 15:
                Implementing two-way synchronization
        September 16 - 22:
                Ironing out bugs, cushion period in case of delays
        September 23 - 29: FINAL EVALUATION
                Ironing out bugs, cushion period in case of delays, submitting
                work for final evaluation

Obligations from late May to early August:
        I have school until the very early July, with exams starting the
        last week of June. I will be putting my work on hold for the whole
        period of Google Summer of Code, but up until early June I may still
        be required to commit at least a few hours a week to my current job.
        So from early June to late September I have no commitments except
        for school, and from early July to late September no commitments
        at all.

About me:
        I'm a student at AGH University of Science and Technology, Kraków,
        Poland. I study Computer Science, and I'm currently in my second
        year. At the moment I'm employed as a part time C++ programmer at
        X-Formation [2], Poland, where my work ranges from conducting
        interviews with potential employees, through refactoring and
        bugfixing to implementing new features for our product.

        I'm a big fan of open source, and KDE in general. I've used Amarok
        a lot and not only read the source, but contributed to it [3].
        I am also familiar with Qt framework and have used it to prototype
        several small applications. I'm an early adopter, and my fiddling
        with Qt 5 beta has led me to uncover an important bug [4]. I also
        have source-level experience with Boost libraries.

        I'm very skilled in C++. Other than that I program mainly in Scala,
        and Java when required (and my university requires that a lot).
        I'm experienced with test driven development, continuous
        integration, code reviews and other agile techniques.

        If my application is accepted, I am going to commit my full time to
        complete the project, including putting my job on hold for the
        duration of GSoC. I expect to be working 40 hours a week. I am
        comfortable with working independently under a supervisor living
        on the other side of the globe. Although I have not have worked
        in this style before, I foresee no problems with this and can even
        change my working hours if needed. I think that code reviews
        and email contact are the best way of tracking progress in this
        case. I am fluent in English, so communication shouldn't be
        a problem.
        Of course, I would work as well with a local mentor with whom
        I could discuss issues over a coffee.

        During the bonding period I expect to complete some more tasks for
        Amarok, possibly continuing what I started with [3] (in review
        comments I mentioned streamlining track sorting and making it
        consistent across the source code). I will also use this time to
        thoroughly prepare for the coding period.

Junior job link:
        [3]

References:
[1] http://www.last.fm/help/faq?category=Scrobbling
[2] http://www.x-formation.com/
[3] https://git.reviewboard.kde.org/r/110070/
[4] https://bugreports.qt-project.org/browse/QTBUG-26832
-------------- next part --------------
A non-text attachment was scrubbed...
Name: application.diff
Type: application/octet-stream
Size: 14349 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/amarok-devel/attachments/20130421/97810d53/attachment-0001.obj>


More information about the Amarok-devel mailing list