Podcast Support

Sat Nov 21 19:58:43 CET 2009

Long list. Hope I can give you some clarity.

Answers and comments inline.

On Fri, Nov 20, 2009 at 23:55, Mathias Panzenböck
<grosser.meister.morti at gmx.net> wrote:
> Hi.
>
> Working on the podcast support I encountered a few things. Is not so much
> questions that I have (except for a few) as stating how I implemented it or loud
> thoughts about the podcast support. Still, answers to the few questions would be
> helpful. :)
>
> == Multiple Enclosures ==
> All podcast feed formats support multiple enclosures. Currently this is not
> handled so the 2nd overwrites the first. This is not nice. But how to handle
> this? Create a 2nd podcast entry? An episode is not really a track, it's a set
> of tracks, maybe a playlist itself? This would mean to change a lot. Just create
> a 2nd episode that is a copy except for the file? Well it would have the same
> ID, so this might be kind of a problem. I think *usually* when there are several
> enclosures in one episode they are alternative versions of the same things
> (different formats like mp3 Vs ogg Vs wma or different mirrors including a
> bittorrent file as one of the "mirrors"). Maybe just pick the version thats
> appropriate for Amarok? How do I find out which extensions are supported by Amarok?
>
> Question: What should be done here?

I've never really come across a feed with multiple enclosures per
item. Can you give me some examples?
For the case of alternative formats we have the MultiSourceCapability.
Or at least I think we can use it for that.
The other one, I'm not sure. Depends on what the additional enclosures
are used for and whether we want to support this.
What I suggest: we should do mimetype filtering on enclosures to get
the playable content, assume it's alternative encodings of the same
episode and use the MultiSourceCap.
In any case, since this is not common practice in podcasting and in my
experience, purely theoretical;  it's very low on the TODO.

>
> == Text Format ==
> The description and other text fields of podcast feeds *might* be html, xhtml or
> just plain text. In atom feeds this is actually marked by the type="..."
> attribute to each element containing text (can be "text", "html" or "xhtml").
> Amarok does not provide a way to save this information so I made the assumption,
> that the title, author etc. fields are plain text (because they are used that
> way in Amarok) and that only the description field is html (because the code I
> wrote to display it in the info applet makes that assumption and it seems not to
> be used anywhere else).

Everything except description is indeed clear text.

Here is what I've been planning to do since integrating your
PodcastReader rewrite:
Add an additional filed to PodcastMetaCommon, enum DescriptionType{
HtmlBody, HtmlDescription, ItunesSummery, ClearTextDescription }. The
listed order is also the priority. We only save one "description"
type.

This was your idea and I'm convinced this is a very good and already
proven solution. No need to second guess yourself.

I guess a more general name is in order to prevent confusion with the
RSS element. How about shownotes?

>
> So I have to convert whatever I get from the podcast feed to either plain text
> or html. The latter is no problem, but converting html to plain text is. First
> I'd have to strip all tags and then I'd have to resolve all the >250 predefined
> html entities (like &auml; etc.). And because when the atom type attribute is
> "html" the content is actually CDATA that makes up HTML and not Xhtml, so I
> cannot parse this with a xml parser (there might be <br> instead of <br/>,
> missing </p> etc.).
>
> Is there already a function for converting html to plain text in Qt/KDE? I know
> that in Qt there already is a table of all html entities in some inaccessible
> internal part (and a second time in WebKit when it is compiled in). Sadly these
> tables are not accessible. If I'd write such a function myself I would embed a
> html entity table a 2nd (or 3rd) time. Kinda waste of memory (well, not much).
> It would also bloat the PodcastReader code a bit.
>
> Question: Should I include the entity table and do the resolving and tag
> stripping by myself (won't be a problem for me)?

There is no need to convert from clear-text to HTML. We save the
(cleaned up) HTML to database and use it directly in the info widget
if possible.

>
> Another option would be to add more attributes to PodcastEpisode (and/or Track?)
> that stores the information on what type the corresponding field actually has.
> But that would involve changes to database tables (additional fields and that
> would break existing databases?) and if fields other than the description would
> get this, I guess this would involve changes to lots of parts of Amarok in order
> to handle it right. So I guess not an option at all.

Don't be afraid to add a database field. I've done this a couple of
times already. We have a version sting in the admin table and can
write update functions for that. Just take a look at
SqlPodcastProvider::updateDatabase().

>
> Apropos: For feeds that do not support a type attribute (RSS 1.0/2.0), I found
> out there is already a function in Qt to guess whether it is (or might be html)
> or not:
> http://doc.trolltech.com/4.5/qt.html#mightBeRichText
> Haven't used it yet, though.

We'll start to use it and see if it works.

>
> == Fields ==
> There are some fields in PodcastMetaCommon that seem not to be used and where
> not even read: summary, subtitle and I think author wasn't read either (or was
> it?). I do read them from the feed. In RSS 1.0/2.0 I do guessing about this
> this, because there actually is only the <description> element in the standard
> but there are often other elements used. I decide what to use this way:

I guess this is leftover from the 1.4 porting or perhaps I just added
them since these are tags in the feed. In any case, doesn't seem we
are using them, yet.

>
> If only the description std element is there:
> description=description
>
> If itunes:summary is there:
> summary=description, description=itunes:summary
> (Hm, maybe not that of a good guessing on this one, but usually description is
> shorter than itunes:summary.)

I say compare lengths and keep the longest as description.

>
> In itunes:summary and body are there:
> subtitle=description, summary=itunes:summary, description=body
>
> In Atom there is no guessing:
> subtitle=subtitle, summary=summary, description=content
>
> However, subtitle and summary seem not to be used anywhere yet, or did I
> overlook something?

Let's consider how and where we'll use them in the future then.
Subtitle: I would like to have as always visible, slightly desaturated
(grey) underneath the episode title in the podcast browser. Makes
sense doesn't it ;)

Summary: this is just an alternative to description in my mind. Apple
say so as well here [1].
It's not supposed to contain HTML, so in case anything is wrong with
the HtmlDescription we save in the description field, we can fall back
to this.

>
> You see that Atom seems to be an awesome format that already thinks about a lot
> of cases that aren't covered by RSS 1.0/2.0. However, one thing it's missing is
> some kind of <description> for *feeds*. The summary and content elements are for
> episodes only, the feed element only has a subtitle child, so I guess users will
> likely use a provides RSS feed instead (where we have to guess the content type
> of the <description>).

In my experience atom feeds are not available for podcasts unless
auto-generated by a CMS. Since iTunes doesn't support atom it can be
considered irrelevant for podcasting. We have had users request
support though and sometimes the atom feed is just easier to find.
The only reason not to support it is no developer interest. But you
fixed that :) as long as the code is not causing bug we can't fix or
you stick around, there will be atom support in amarok.

>
> But I have yet to find anything to put in the keywords field of
> PodcastMetaCommon. Maybe <category>?

Some feed authoring tools (or CMS's) have a special keywords field.
But I think the itunes:category is indeed very suited to be added to
the list regardless of these special tags.

Bart

[1] http://www.apple.com/itunes/podcasts/specs.html#summary