[Nepomuk] Nepomuk and lyrics stuff

phreedom at yandex.ru phreedom at yandex.ru
Wed Mar 20 18:50:23 UTC 2013


On Среда 20 марта 2013 19:08:04 Ignacio Serantes wrote:
> On Wed, Mar 20, 2013 at 6:28 PM, <phreedom at yandex.ru> wrote:
> > On Среда 20 марта 2013 17:11:24 Ignacio Serantes wrote:
> > > There are several places to obtains lyrics in a predictable format
> > > with
> > 
> > an
> > 
> > > API, for example LyricWiki, even with support to multiple languages.
> > > One example with kana, rōmaji and English versions here: 宇多田ヒカル ー
> > > 光<
> > 
> > http://lyrics.wikia.com/%E5%AE%87%E5%A4%9A%E7%94%B0%E3%83%92%E3%82%AB%E3
> > %8> 
> > > 3%AB_(Hikaru_Utada):%E5%85%89> .
> > 
> > Wow... that's quite  a nice format, and it contains some features which
> > we definitely didn't think of when nmm was drafted. Makes me feel good
> > about not
> > prematurely standardizing on lyrics metadata.
> > 
> > > There is too a file format to store synchronized lyrics,
> > > http://en.wikipedia.org/wiki/LRC_(file_format) and there are several
> > > servers offering this lyrics format.
> > > 
> > > The problem here is not obtaining lyrics but how reliable is the
> > > subtitle because basically that databases are created by people
> > > like the> 
> > Wikipedia.
> > 
> > Of course but if there's a critical mass, it's only a matter of time
> > before it
> > becomes good enough. Is there a critical mass for syncronized lyrics?
> > Human-
> > made lyrics translation is a very appealing feature, but are these also
> > available in a synchronized format?
> 
> For some titles yes, basically English translations. But, in my case, my
> lyrics fetcher has a method to translate lyrics to any language using
> google translate. So there is manual translation an automatic translation.
> And, off course, the same for romanization process that is more easily
> automatized but not reliable because, for example, Japanese romanization is
> impossible so always needs a manual correction.
> 
> Lyrics are like movies subtitles. They have an original language and zero
> or several translations and for certain languages, like Japanese or Korean,
> there is a romanized form to. The main difference is most of the times
> there are no timestamps except for LRC format.

I don't yet know what is the best approach to handle multiple languages. N3 in 
theory lets you do tricks like {<uri> <property> literal at language. }, but is 
this supported in Nepomuk-KDE? Is there a language code for romanized 
japanese(most likely no:( )?

> > And the million dollar question, if we standardize on LRC which looks
> > quite backwards-compatible with plaintext: should we store it as-is in
> > a single property?
> > 
> > Any nasty corner cases we might not like?
> > 
> > A good starting point for the discussion seems to be nmm:lyrics in LRC
> > format
> > + plaintext dump into nie:plainTextContent.
> 
> If LRC lyrics and not LRC lyrics are supported seems good to me but maybe
> we could use [00:00:00] when there is no timestamps.

Non-LRC is already supported using nie:plainTextContent. There's simply no 
point in introducing a dedicated property without also placing some useful 
restrictions on it. it's hard to tell for me how broken the  [00:00:00] 
approach is.

But to make the right decision, we need to know for sure which formats have 
the critical mass... this is something you probably know better than me.

> > > Finally there are a bunch of lyrics fetchers because are easy to
> > 
> > implement
> > 
> > > and even I wrote two, one deprecated for Amarok 2 written in
> > > jscript, and the one I used in my daily basics written in python.
> > > 
> > > On Wed, Mar 20, 2013 at 4:52 PM, <phreedom at yandex.ru> wrote:
> > > > On Среда 20 марта 2013 16:06:39 Ignacio Serantes wrote:
> > > > > Extracted from ontology documentation:
> > > > > 
> > > > > Plain-text representation of the content of a
> > > > > InformationElement
> > > > > with all markup removed. The main purpose of this property
> > > > > is
> > > > > full-text indexing>
> > > > 
> > > > and
> > > > 
> > > > > search. Its exact content is considered
> > > > > application-specific. The
> > > > > user
> > > > 
> > > > can
> > > > 
> > > > > make no assumptions about what is and what is not contained
> > > > > within.
> > > > > *Applications
> > > > > should use more specific properties wherever possible*.
> > > > 
> > > > *wherever possible*. The rationale for not adding a specific
> > > > property
> > > > like nmm:lyrics was that such a property might be underspecified
> > > > and
> > > > effectively useless. Also, this would mean lots of content types
> > > > would get their own "nicely named plain-text version of the
> > > > data without any strict
> > > > serialization
> > > > requirements" properties without any useful result either.
> > > > 
> > > > To put "The user can make no assumptions about what is and what
> > > > is not contained within" into musical context: typical data
> > > > ripped off a> 
> > lyrics
> > 
> > > > site
> > > > might contain lyrics only, or lyrics prepended with band, title
> > > > or who knows
> > > > what else, format can be quite "flexible" too, even worse if you
> > > > use
> > > > several
> > > > lyrics sources.
> > > > 
> > > > So, the user who knows what nmm:MusicPiece is, also knows that
> > > > you can get a
> > > > somewhat useful, but not machine-readable text dump in
> > > > nie:plainTextContent which is likely to also contain lyrics, and
> > > > that's exactly what you get from a
> > > > typical lyrics site.
> > > > 
> > > > Properly implemented lyrics needs a rather clean feed and who
> > > > knows
> > > > maybe
> > > > it
> > > > shouldn't even be implemented as a single text property. Maybe a
> > > > subtitle-like
> > > > approach "time-stamped text" is a better idea?
> > > > 
> > > > Or, maybe I missed some important development and there's a very
> > > > good
> > > > authoritative lyrics DB with a predictable format and we should
> > > > get
> > > > started on
> > > > defining nmm:lyrics? I don't monitor this actively...
> > > > 
> > > > > When documentation informs you that other ontologies should
> > > > > be used
> > > > > I
> > > > 
> > > > have
> > > > 
> > > > > doubts.
> > > > > 
> > > > > On Wed, Mar 20, 2013 at 2:54 PM, <phreedom at yandex.ru> wrote:
> > > > > > On Вторник 19 марта 2013 20:22:14 Ignacio Serantes wrote:
> > > > > > > Hi list,
> > > > > > > 
> > > > > > > As a first step to add music lyrics to Nepomuk I
> > > > > > > will add
> > > > > > > support
> > > > > > > for
> > > > > > > lyrics frames in audio files in taglibextractor and
> > > > > > > this
> > > > > > > data will
> > > > > > > be
> > > > > > > stored in
> > > > > > > nie:plainTextContent<
> > > > > > 
> > > > > > http://www.semanticdesktop.org/ontologies/nie/#plainTe
> > > > > > 
> > > > > > > xtContent> because there is no better place to store
> > > > > > > this
> > > > > > > information.
> > > > > > 
> > > > > > This is the proper place to store lyrics.


More information about the Nepomuk mailing list