Encoding of Generated Patches and Pastebin Plugin

Mon Dec 19 08:43:50 UTC 2011

On 18.12.11 22:53:20, David Narvaez wrote:
> Before I start, I'd like to say I hate encoding issues...
> 
> I was trying out the Pastebin plugin for the patches I sent recently
> and I (very) accidentally noticed it has encoding issues. I tracked
> the issue down the following patch, which solves my particular test
> case:
> 
> -    QByteArray bytearray =
> "paste_code="+QUrl::toPercentEncoding(urlToData(source->file()), "/");
> +    QByteArray bytearray =
> "paste_code="+QUrl::toPercentEncoding(QString::fromUtf8(urlToData(source->file())),
> "/");
> 
> that assumes the source of the patch review is in UTF-8, but I guess
> we can't assure that, right?

Right.

> Does anybody knows if there's any way to
> guarantee all files created for patch review are stored UTF-8 (or any
> other encoding as long as it is uniform)?

No there's none, if you read in files from disk you always need to let
the user decide what the encoding is - at least optionally. The code can
try to detect the encoding (there are classes in kdelibs for that), but
that can go wrong and the user needs to dictate the encoding than.

> Could that be something governed by user settings which we can query?

He might be getting the data from elsewhere sometimes, so this could be
a default but still a per-file decision needs to be possible.

However since we can assume this is 'binary data', its not necessary to
read the file at all. Simply write a conversion-function which works on
the binary data. That is, assume plain-ascii for the content and encode
based on that. Reading parts of
http://en.wikipedia.org/wiki/Percent-encoding I'd assume thats what the
browser does when you upload a file to pastebin directly.

Andreas