Review Request 115547: Bug fix : Allows further support of special character in project name

Andreas Pakulat apaku at gmx.de
Sun Feb 9 23:07:07 UTC 2014



> On Feb. 8, 2014, 12:16 a.m., Sven Brauch wrote:
> > Besides the fact that your implementation is ... curious and very hard to understand, I do not think this is the correct way to solve the problem, honestly. The whole idea of percent-encoding application the project name is imo flawed. Percent signs have their own share of problems in the various places where the project name is used.
> > 
> > Instead, I think we should just have a list of allowed characters (I'd in fact go with just 0-9, a-Z and _ and disallow starting with a number, then you can use it everywhere without fear of ending up with anything broken) and check against that on the UI side, aka disallow even creating projects with names not matching that by not allowing to confirm the dialog with an invalid project name.
> > 
> > If people really want projects with names which contain plus signs, they can rename the stuff in the template themselves in my opinion.
> 
> Michael Ferris wrote:
>     I most certainly see how this code is harder to understand, I was too focus on making performant code than readable. I can very easily change that.
>     However, before I make any changes, I would like to know what we actually expect from this.
>     I understand your feeling about the percent encoding and I thought it was curious myself.
>     What I could do, as you suggested, is simply enforce the use of letters digits and underscores which I think would be best.
>     
>     Also, from what I can see the actual location of the project and its name are currently linked together.
>     What could be done is to accept all characters for the project name, but having restrictions for the location.
>
> 
> Sven Brauch wrote:
>     Performant code is very important in many places, but not here. This function is called once per day ™ for a length-20-ish input, which will take something in the microsecond range no matter how you implement it. So the focus should clearly be on readability instead of performance ;)
>     If there's a choice between two equally readable options of which one is faster, the faster one is of course preferrable, but it is not a good idea to sacrifice readability for irrelevant performance gain.
>     
>     In my opinion the fix for this issue would be to remove the whole percent encoding stuff, and go to the UI dialog itself where you enter the name, and just disable the "Next" button (and display a message in a label, the label is already there even, it displays "Empty project name" in the beginning) when the text field contains invalid characters.
>     No clever performance manevuer is required there either, e.g. using QRegExp with \w* sounds like a good plan ;)
>
> 
> Andreas Pakulat wrote:
>     I've been out of touch with this code for quite some time, but I assume the need for percent-encoding as well as allowing 'more characters' is that people can use arbitrary strings in the project name. If thats correct, I disagree with disallowing using any character for the projects name. Its a user-defined name for his project, there should not be arbitrary limitations on what he can use. Especially not for purely technical reasons that the application can fix itself.
>     
>     I suggest to solve this without sacrificing readability of the filesystem directory name if the project name contains something outside the allowed character range, in particular I'd do a simple 's/[^a-zA-Z0-9_- .]/_/g'. That is replace anything which is neither a letter, number, space or simple punctuation with an underscore. And show this to the user before the project gets created so he is aware. Maybe the 'letter' and 'number' checks shouldn't be done via a regex but rather through QChar::isLetter&Co to allow non-ascii letters for people living in non-english locales. There's no reason to disallow german umlauts or french accent-characters in the filesystem these days afaik.
> 
> Sven Brauch wrote:
>     Look at for example the Qt -> Terminal -> C++ Template. It uses the project name ("Foo") I enter like this:
>     
>     #ifndef foo_H
>     #define foo_H
>     
>     #include <QtCore/QObject>
>     
>     class foo : public QObject
>     {
>     Q_OBJECT
>     public:
>         foo();
>         virtual ~foo();
>     };
>     
>     #endif // foo_H
>     
>     There's no way to make that work as expected when you allow non-[a-Z0-9_] characters in the project name.
>     And entering umlauts here gives you broken (uncompilable) template project right now (it gets it right in the cpp file but uses variable names with umlauts in cmake). Entering spaces crashes KDevelop :D
>     When you fix the invalid uses of weird characters in code, then the next thing which explodes at you is moc:
>     [ 20%] Generating ????????.moc
>     moc: Cannot open options file specified with @
>     (let's see if reviewboard can handle this character)
>     Forging all this in a way such that it works for arbitrary project names is just too much hassle to be worth the feature in my opinion.
>     
>     So, to allow what you suggest, we'd either have to "fix" all templates to not use the project name at all (loss of functionality), provide an encoded version of the project name to the templates (potentially unexpected behaviour as described in the bug) or let the user enter two names, one technical and one non-technical (bad UI in my opinion).
>     Replacing characters only for the non-filesystem parts doesn't work as easily, since especially in C++ templates, code will refer to filesystem names eventually.
>     
>     Thus, the only solution which fixes the problem once and for all without weird side effects and acceptable loss of "functionality" (functionality which has not ever been working, mind you) is just checking the name when the user enters it and rejecting names which are potentially invalid right from the start.
>     
>     Greetings,
>     Sven
> 
> Milian Wolff wrote:
>     If easily possible, we can also add a transliteration run which converts "ä" into "ae" - but only if theres some existing Qt api for that.
>     
>     Otherwise, lets keep it simple and just forbid special chars.
> 
> Andreas Pakulat wrote:
>     I don't see the problem with that either. If you want to use the project name as base for language identifiers it has to be 'fixed' first as well. The replacement needed there would be different of course as there are much less characters allowed as identifier than in a filesystem. Some languages (iirc including C++11) do allow non-ascii letters as identifiers though. So this should ideally be solved on the language level, but even a simple replacement should do the job (maybe including the removal of duplicated '_').
>     
>     The templates already have special variables for replacement that hint that a given string should be a valid identifier based on the application name, so whats the probem with a simple replace before assigning the value?
> 
> Sven Brauch wrote:
>     The problem I see is that e.g. äää.h is a valid file name, but #include "äää.h" possibly does not work (or did I miss anything and C++ now has a well-defined encoding for their source files ;). Same goes for cmake. Of course it would be possible to work around this by introducing several variables, but in my opinion the odds (and harm done if it happens) that template authors will screw this up and users will end up with unusable template results are quite big for a quite minimal gain.
>     
>     I'm with Milian -- if we can easily do transliteration, let's do that, otherwise screw special characters.
> 
> Sven Brauch wrote:
>     Look, not even reviewboard managed to handle my special character properly (nor does e.g. bugzilla) -- that's how good real-world support for unciode is in applications today ;)

Did you fix up that comment? Because I don't see any error when viewing those umlauts in reviewboard.

And yes, you're right that encodings can be a problem in C++. So apparently filesystem names should also use ascii-only to avoid such traps (though frankly I doubt you'd hit this particular case in the real world very often, there are only very very few systems where the filesystem encoding is different from the users local 8 bit encoding - across the 3 major platforms).

How about disabling anything but ascii letters, numbers and a few select punctuation characters ([.-_] should be safe right?) but then allow users to change the projects name afterwards through the UI? Currently kdevelop does not have any way to do this and I've had to do this on quite some projects manually in the past, but I also know where to look for the name in the first place.


- Andreas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/115547/#review49232
-----------------------------------------------------------


On Feb. 8, 2014, 12:04 a.m., Michael Ferris wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/115547/
> -----------------------------------------------------------
> 
> (Updated Feb. 8, 2014, 12:04 a.m.)
> 
> 
> Review request for KDevelop.
> 
> 
> Bugs: 282783
>     http://bugs.kde.org/show_bug.cgi?id=282783
> 
> 
> Repository: kdevplatform
> 
> 
> Description
> -------
> 
> Allows the location of the project to have more character then only digits, letters, spaces and %. Still checks for unsupported character and insert the percent encoded value.
> 
> 
> Diffs
> -----
> 
>   plugins/appwizard/projectselectionpage.cpp 6270869 
> 
> Diff: https://git.reviewboard.kde.org/r/115547/diff/
> 
> 
> Testing
> -------
> 
> Create a new project with test:<>*?/\\|!@#$%?&%2A()_+ and the location was test%3A%3C%3E%2A%3F%2F%5C%5C%7C!@#$%^&%2A()_+
> Here we can see all unsupported character have been percent encoded and the rest stayed the same.
> 
> 
> Thanks,
> 
> Michael Ferris
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20140209/aa521139/attachment-0001.html>


More information about the KDevelop-devel mailing list