[Nepomuk] Nepomuk and lyrics stuff

Ignacio Serantes kde at aynoa.net
Wed Mar 20 18:08:04 UTC 2013


On Wed, Mar 20, 2013 at 6:28 PM, <phreedom at yandex.ru> wrote:

> On Среда 20 марта 2013 17:11:24 Ignacio Serantes wrote:
> > There are several places to obtains lyrics in a predictable format with
> an
> > API, for example LyricWiki, even with support to multiple languages. One
> > example with kana, rōmaji and English versions here: 宇多田ヒカル ー
> > 光<
> http://lyrics.wikia.com/%E5%AE%87%E5%A4%9A%E7%94%B0%E3%83%92%E3%82%AB%E3%8
> > 3%AB_(Hikaru_Utada):%E5%85%89> .
>
> Wow... that's quite  a nice format, and it contains some features which we
> definitely didn't think of when nmm was drafted. Makes me feel good about
> not
> prematurely standardizing on lyrics metadata.
>
> > There is too a file format to store synchronized lyrics,
> > http://en.wikipedia.org/wiki/LRC_(file_format) and there are several
> > servers offering this lyrics format.
>
> > The problem here is not obtaining lyrics but how reliable is the subtitle
> > because basically that databases are created by people like the
> Wikipedia.
>
> Of course but if there's a critical mass, it's only a matter of time
> before it
> becomes good enough. Is there a critical mass for syncronized lyrics?
> Human-
> made lyrics translation is a very appealing feature, but are these also
> available in a synchronized format?
>

For some titles yes, basically English translations. But, in my case, my
lyrics fetcher has a method to translate lyrics to any language using
google translate. So there is manual translation an automatic translation.
And, off course, the same for romanization process that is more easily
automatized but not reliable because, for example, Japanese romanization is
impossible so always needs a manual correction.

Lyrics are like movies subtitles. They have an original language and zero
or several translations and for certain languages, like Japanese or Korean,
there is a romanized form to. The main difference is most of the times
there are no timestamps except for LRC format.


>
> And the million dollar question, if we standardize on LRC which looks quite
> backwards-compatible with plaintext: should we store it as-is in a single
> property?
>
> Any nasty corner cases we might not like?
>
> A good starting point for the discussion seems to be nmm:lyrics in LRC
> format
> + plaintext dump into nie:plainTextContent.
>

If LRC lyrics and not LRC lyrics are supported seems good to me but maybe
we could use [00:00:00] when there is no timestamps.


>
> > Finally there are a bunch of lyrics fetchers because are easy to
> implement
> > and even I wrote two, one deprecated for Amarok 2 written in jscript, and
> > the one I used in my daily basics written in python.
> >
> > On Wed, Mar 20, 2013 at 4:52 PM, <phreedom at yandex.ru> wrote:
> > > On Среда 20 марта 2013 16:06:39 Ignacio Serantes wrote:
> > > > Extracted from ontology documentation:
> > > >
> > > > Plain-text representation of the content of a InformationElement
> > > > with all markup removed. The main purpose of this property is
> > > > full-text indexing>
> > > and
> > >
> > > > search. Its exact content is considered application-specific. The
> > > > user
> > >
> > > can
> > >
> > > > make no assumptions about what is and what is not contained within.
> > > > *Applications
> > > > should use more specific properties wherever possible*.
> > >
> > > *wherever possible*. The rationale for not adding a specific property
> > > like nmm:lyrics was that such a property might be underspecified and
> > > effectively useless. Also, this would mean lots of content types would
> > > get their own "nicely named plain-text version of the data without any
> > > strict
> > > serialization
> > > requirements" properties without any useful result either.
> > >
> > > To put "The user can make no assumptions about what is and what is not
> > > contained within" into musical context: typical data ripped off a
> lyrics
> > > site
> > > might contain lyrics only, or lyrics prepended with band, title or who
> > > knows
> > > what else, format can be quite "flexible" too, even worse if you use
> > > several
> > > lyrics sources.
> > >
> > > So, the user who knows what nmm:MusicPiece is, also knows that you can
> > > get a
> > > somewhat useful, but not machine-readable text dump in
> > > nie:plainTextContent which is likely to also contain lyrics, and that's
> > > exactly what you get from a
> > > typical lyrics site.
> > >
> > > Properly implemented lyrics needs a rather clean feed and who knows
> > > maybe
> > > it
> > > shouldn't even be implemented as a single text property. Maybe a
> > > subtitle-like
> > > approach "time-stamped text" is a better idea?
> > >
> > > Or, maybe I missed some important development and there's a very good
> > > authoritative lyrics DB with a predictable format and we should get
> > > started on
> > > defining nmm:lyrics? I don't monitor this actively...
> > >
> > > > When documentation informs you that other ontologies should be used
> > > > I
> > >
> > > have
> > >
> > > > doubts.
> > > >
> > > > On Wed, Mar 20, 2013 at 2:54 PM, <phreedom at yandex.ru> wrote:
> > > > > On Вторник 19 марта 2013 20:22:14 Ignacio Serantes wrote:
> > > > > > Hi list,
> > > > > >
> > > > > > As a first step to add music lyrics to Nepomuk I will add
> > > > > > support
> > > > > > for
> > > > > > lyrics frames in audio files in taglibextractor and this
> > > > > > data will
> > > > > > be
> > > > > > stored in
> > > > > > nie:plainTextContent<
> > > > >
> > > > > http://www.semanticdesktop.org/ontologies/nie/#plainTe
> > > > >
> > > > > > xtContent> because there is no better place to store this
> > > > > > information.
> > > > >
> > > > > This is the proper place to store lyrics.
>



-- 
Best wishes,
Ignacio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20130320/25578b64/attachment-0001.html>


More information about the Nepomuk mailing list