D19109: [Extractor] Add metadata to extractors

Tue Feb 19 22:16:37 GMT 2019

bruns added a comment.

  In D19109#415710 <https://phabricator.kde.org/D19109#415710>, @astippich wrote:

  > In D19109#414968 <https://phabricator.kde.org/D19109#414968>, @bruns wrote:
  >
  > > In D19109#414758 <https://phabricator.kde.org/D19109#414758>, @astippich wrote:
  > >
  > > > A few general remarks:
  > > >
  > > > - I really do not like that there are two lists of supported mimetypes now which have to be kept in sync
  > >
  > >
  > > I think this is trivial enough. Also this is covered by the unit test.
  >
  >
  > My fear is that it is easily forgotten, but I did not see the autotest. Still, do you think it is feasible to generate the mimetype stringlist from the JSON data to remove the duplication?

  These are not completely duplicate - e.g. the officeextractor (pre-2007) uses runtime detection of some binary helpers. If these are not found, the list returned by the plugin is empty. The plugin has no direct access to its metadata, as it is only available from the loader and there is no possibility to pass it back, so it can not default to it.

  >>> - Do we really need versioning per mimetype? IMHO it is sufficient to have a version number per extractor. From my experience, fixing an extractor usually impacts all its supported mimetypes, and rarily affects only one mimetype.
  >> 
  >> Past experience tells otherwise. There have been feature extensions and bugfixes for specific mimetypes, just look at your own commits
  >> 
  >> - "fix ape disc number extraction"
  >> - "implement more tags for asf metadata"
  >> - ...
  >> 
  >>   I want to reduce reindexing as much as possible.
  > 
  > And I can give you examples where this was not the case :).

  ... which does not **prohibit** bumping the version for **all** affected encoders. Also, there is nothing disallowing to skip versions, e.g. if "foo/bar" is 2.1, and "foo/baz" is 1.3, and both get a major bump, both can be set to 3.0.

  This is also only the case because TagLibExtractor was stupidly written (which D18826 <https://phabricator.kde.org/D18826> fixes). The other extractors do not have that many special codepath.

  > Well, I find it cumbersome to implement this fine-grained control, but otherwise people will probably yell because of high cpu usage...
  >  At least, I would like to group duplicated mimetypes such as audio/wav and audio/x-wav, but that is not possible with JSON, is it?

  You can reorder any aliasing mimetypes.

  Another question is, why do we have "audio/wav" and "audio/x-wav" in the first place? Are there really files where one type is a reported for one file, and the other for other files? Wouldn't it be better to just have the canonical type? At least on my computer, shared-mime-info only has audio/x-wav, listing audio/wav and audio/vnd.wave as aliases. Aliases should never be returned by QMimeDatabase.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D19109

To: bruns, #baloo, #frameworks, ngraham, astippich, poboiko
Cc: kde-frameworks-devel, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20190219/db8efcb9/attachment-0001.html>