Review Request: Amarok can incorrectly scan non-ascii tags
Commit Hook
null at kde.org
Sat Mar 17 21:42:20 UTC 2012
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/104310/#review11510
-----------------------------------------------------------
This review has been submitted with commit 711b585e6284e1346c8661d60e6bf04c0223fd8c by Sam Lade to branch master.
- Commit Hook
On March 17, 2012, 9 a.m., Alexey Neyman wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/104310/
> -----------------------------------------------------------
>
> (Updated March 17, 2012, 9 a.m.)
>
>
> Review request for Amarok.
>
>
> Description
> -------
>
> Amarok incorrectly scans files with non-ascii characters in tags. The symptom is that
> some of the files have two "invalid UTF character" symbols instead of a single non-ascii
> character (looks like <?><?>, question mark inside a black circle). Most visible effect
> of this issue is that some albums end up in Various Artists because one of the tracks
> had artist name corrupted in this way. It is not limited to artist name, though -
> there are tracks with corrupted album names or titles.
>
> The reason for this issue is as follows. When Amarok invokes collection scanner
> process, it receives the results from the amarokcollectionscanner over a pipe. Here is
> a snippet of code from src/core-impl/collections/db/ScanManager.cpp:
>
> void
> ScannerJob::getScannerOutput()
> {
> m_incompleteTagBuffer += m_scanner->readAll();
> }
>
> The m_incompleteTagBuffer is declared in src/core-impl/collections/db/ScanManager.h:
>
> QString m_incompleteTagBuffer
>
> However, m_scanner->readAll() returns QByteArray, not QString. This is okay for ASCII
> characters (which are 1 byte in UTF8), but breaks in case of multibyte sequences. If
> readAll() method returns a block which terminates in a middle of the multibyte sequence,
> conversion to QString in ScannerJob::getScannerOutput replaces the last character with
> "invalid UTF character" symbol. When the next block is read, it starts in the middle of
> UTF8 multibyte sequence - so it gets replaced with one more "invalid UTF character"
> symbol. Thus, a single multibyte UTF8 character is replaced with two "invalid character"
> symbols.
>
> The solution implemented by the attached patch is to store incomplete information as
> QByteArray and search for partial ("</directory>") or full ("</scanner>") elements in the
> byte stream, before conversion to QString. Complete blocks can be safely converted to
> QString, as the multibyte characters are inside the XML tags.
>
>
> Diffs
> -----
>
> src/core-impl/collections/db/ScanManager.h 5f0d153
> src/core-impl/collections/db/ScanManager.cpp 97d0b1c
>
> Diff: http://git.reviewboard.kde.org/r/104310/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Alexey Neyman
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/amarok-devel/attachments/20120317/82ebc283/attachment.html>
More information about the Amarok-devel
mailing list