[Nepomuk] Nepomuk and lyrics stuff

phreedom at yandex.ru phreedom at yandex.ru
Thu Mar 21 13:08:04 UTC 2013


On Четверг 21 марта 2013 11:48:14 Ignacio Serantes wrote:
> On Wed, Mar 20, 2013 at 7:50 PM, <phreedom at yandex.ru> wrote:
> > On Среда 20 марта 2013 19:08:04 Ignacio Serantes wrote:
> > > On Wed, Mar 20, 2013 at 6:28 PM, <phreedom at yandex.ru> wrote:
> > > > On Среда 20 марта 2013 17:11:24 Ignacio Serantes wrote:
> > > > > There are several places to obtains lyrics in a predictable
> > > > > format
> > > > > with
> > > > 
> > > > an
> > > > 
> > > > > API, for example LyricWiki, even with support to multiple
> > > > > languages. One example with kana, rōmaji and English
> > > > > versions here: 宇多田ヒカル ー 光<
> > 
> > http://lyrics.wikia.com/%E5%AE%87%E5%A4%9A%E7%94%B0%E3%83%92%E3%82%AB%E3
> > 
> > > > %8>
> > > > 
> > > > > 3%AB_(Hikaru_Utada):%E5%85%89> .
> > > > 
> > > > Wow... that's quite  a nice format, and it contains some
> > > > features which we definitely didn't think of when nmm was
> > > > drafted. Makes me feel good about not
> > > > prematurely standardizing on lyrics metadata.
> > > > 
> > > > > There is too a file format to store synchronized lyrics,
> > > > > http://en.wikipedia.org/wiki/LRC_(file_format) and there are
> > > > > several servers offering this lyrics format.
> > > > > 
> > > > > The problem here is not obtaining lyrics but how reliable is
> > > > > the
> > > > > subtitle because basically that databases are created by
> > > > > people
> > > > > like the>
> > > > 
> > > > Wikipedia.
> > > > 
> > > > Of course but if there's a critical mass, it's only a matter of
> > > > time
> > > > before it
> > > > becomes good enough. Is there a critical mass for syncronized
> > > > lyrics?
> > > > Human-
> > > > made lyrics translation is a very appealing feature, but are
> > > > these also available in a synchronized format?
> > > 
> > > For some titles yes, basically English translations. But, in my
> > > case, my lyrics fetcher has a method to translate lyrics to any
> > > language using google translate. So there is manual translation an
> > > automatic
> > 
> > translation.
> > 
> > > And, off course, the same for romanization process that is more
> > > easily
> > > automatized but not reliable because, for example, Japanese
> > > romanization> 
> > is
> > 
> > > impossible so always needs a manual correction.
> > > 
> > > Lyrics are like movies subtitles. They have an original language and
> > > zero or several translations and for certain languages, like
> > > Japanese or> 
> > Korean,
> > 
> > > there is a romanized form to. The main difference is most of the
> > > times
> > > there are no timestamps except for LRC format.
> > 
> > I don't yet know what is the best approach to handle multiple languages.
> > N3 in
> > theory lets you do tricks like {<uri> <property> literal at language. },
> > but
> > is
> > this supported in Nepomuk-KDE? Is there a language code for romanized
> > japanese(most likely no:( )?
> 
> As far as I know no for both.
> 
> > > > And the million dollar question, if we standardize on LRC which
> > > > looks
> > > > quite backwards-compatible with plaintext: should we store it
> > > > as-is in a single property?
> > > > 
> > > > Any nasty corner cases we might not like?
> > > > 
> > > > A good starting point for the discussion seems to be nmm:lyrics
> > > > in LRC format
> > > > + plaintext dump into nie:plainTextContent.
> > > 
> > > If LRC lyrics and not LRC lyrics are supported seems good to me but
> > > maybe we could use [00:00:00] when there is no timestamps.
> > 
> > Non-LRC is already supported using nie:plainTextContent. There's simply
> > no point in introducing a dedicated property without also placing some
> > useful restrictions on it. it's hard to tell for me how broken the 
> > [00:00:00] approach is.
> > 
> > But to make the right decision, we need to know for sure which formats
> > have the critical mass... this is something you probably know better
> > than me.
> I don't understand you about the formats. LRC is widely used and for plain
> text there is several apis available.

Ok so it is THE standard. Good to know.

> About LRC the format is simple, the displayed text must be erased when
> there is other timestamp so the next example don't broke LRC format:
> [00:00.00]Line one
> [00:00.00]Line two
> [00:00.00]Line three
> [99.60.60]
> 
> This will display three lines for all the song duration but this will
> required support for lrc format.

Ok.

> So in brief and for sure:
> 1) nie:plainTextContent for plain lyrics

Yes, for now.

> For future development
> 1) A new ontology, nmm:lyrics, for lrc format.
> 2) Add support for transliteration/romanization.
> 3) Add support for multiple languages.

Sounds almost like a generic subtitle ontology. We should give it a shot. Now 
if only someone familiar with subtitles could provide some input...

> > > > > Finally there are a bunch of lyrics fetchers because are
> > > > > easy to
> > > > 
> > > > implement
> > > > 
> > > > > and even I wrote two, one deprecated for Amarok 2 written in
> > > > > jscript, and the one I used in my daily basics written in
> > > > > python.
> > > > > 
> > > > > On Wed, Mar 20, 2013 at 4:52 PM, <phreedom at yandex.ru> wrote:
> > > > > > On Среда 20 марта 2013 16:06:39 Ignacio Serantes wrote:
> > > > > > > Extracted from ontology documentation:
> > > > > > > 
> > > > > > > Plain-text representation of the content of a
> > > > > > > InformationElement
> > > > > > > with all markup removed. The main purpose of this
> > > > > > > property
> > > > > > > is
> > > > > > > full-text indexing>
> > > > > > 
> > > > > > and
> > > > > > 
> > > > > > > search. Its exact content is considered
> > > > > > > application-specific. The
> > > > > > > user
> > > > > > 
> > > > > > can
> > > > > > 
> > > > > > > make no assumptions about what is and what is not
> > > > > > > contained
> > > > > > > within.
> > > > > > > *Applications
> > > > > > > should use more specific properties wherever
> > > > > > > possible*.
> > > > > > 
> > > > > > *wherever possible*. The rationale for not adding a
> > > > > > specific
> > > > > > property
> > > > > > like nmm:lyrics was that such a property might be
> > > > > > underspecified
> > > > > > and
> > > > > > effectively useless. Also, this would mean lots of
> > > > > > content types
> > > > > > would get their own "nicely named plain-text version of
> > > > > > the
> > > > > > data without any strict
> > > > > > serialization
> > > > > > requirements" properties without any useful result
> > > > > > either.
> > > > > > 
> > > > > > To put "The user can make no assumptions about what is
> > > > > > and what
> > > > > > is not contained within" into musical context: typical
> > > > > > data
> > > > > > ripped off a>
> > > > 
> > > > lyrics
> > > > 
> > > > > > site
> > > > > > might contain lyrics only, or lyrics prepended with
> > > > > > band, title
> > > > > > or who knows
> > > > > > what else, format can be quite "flexible" too, even
> > > > > > worse if you
> > > > > > use
> > > > > > several
> > > > > > lyrics sources.
> > > > > > 
> > > > > > So, the user who knows what nmm:MusicPiece is, also
> > > > > > knows that
> > > > > > you can get a
> > > > > > somewhat useful, but not machine-readable text dump in
> > > > > > nie:plainTextContent which is likely to also contain
> > > > > > lyrics, and
> > > > > > that's exactly what you get from a
> > > > > > typical lyrics site.
> > > > > > 
> > > > > > Properly implemented lyrics needs a rather clean feed
> > > > > > and who
> > > > > > knows
> > > > > > maybe
> > > > > > it
> > > > > > shouldn't even be implemented as a single text property.
> > > > > > Maybe a
> > > > > > subtitle-like
> > > > > > approach "time-stamped text" is a better idea?
> > > > > > 
> > > > > > Or, maybe I missed some important development and
> > > > > > there's a very
> > > > > > good
> > > > > > authoritative lyrics DB with a predictable format and we
> > > > > > should
> > > > > > get
> > > > > > started on
> > > > > > defining nmm:lyrics? I don't monitor this actively...
> > > > > > 
> > > > > > > When documentation informs you that other ontologies
> > > > > > > should
> > > > > > > be used
> > > > > > > I
> > > > > > 
> > > > > > have
> > > > > > 
> > > > > > > doubts.
> > > > > > > 
> > > > > > > On Wed, Mar 20, 2013 at 2:54 PM, <phreedom at yandex.ru> wrote:
> > > > > > > > On Вторник 19 марта 2013 20:22:14 Ignacio Serantes wrote:
> > > > > > > > > Hi list,
> > > > > > > > > 
> > > > > > > > > As a first step to add music lyrics to
> > > > > > > > > Nepomuk I
> > > > > > > > > will add
> > > > > > > > > support
> > > > > > > > > for
> > > > > > > > > lyrics frames in audio files in
> > > > > > > > > taglibextractor and
> > > > > > > > > this
> > > > > > > > > data will
> > > > > > > > > be
> > > > > > > > > stored in
> > > > > > > > > nie:plainTextContent<
> > > > > > > > 
> > > > > > > > http://www.semanticdesktop.org/ontologies/nie/#p
> > > > > > > > lainTe
> > > > > > > > 
> > > > > > > > > xtContent> because there is no better place
> > > > > > > > > to store
> > > > > > > > > this
> > > > > > > > > information.
> > > > > > > > 
> > > > > > > > This is the proper place to store lyrics.


More information about the Nepomuk mailing list