robots.txt in quickgit.kde.org

Ben Cooksley bcooksley at kde.org
Tue Jan 5 21:30:52 UTC 2016


On Wed, Jan 6, 2016 at 3:17 AM, Kevin Funk <kfunk at kde.org> wrote:
> On Wednesday, December 30, 2015 12:57:23 PM Ben Cooksley wrote:
>> On Tue, Dec 29, 2015 at 11:16 PM, Kevin Funk <kfunk at kde.org> wrote:
>> > On Tuesday, December 29, 2015 10:39:01 PM Ben Cooksley wrote:
>> >> On Tue, Dec 29, 2015 at 7:59 AM, Lydia Pintscher <lydia at kde.org> wrote:
>> >> > On Sun, Dec 27, 2015 at 12:35 PM, Ben Cooksley <bcooksley at kde.org>
> wrote:
>> >> >>> Is there some place where search engines can easily index our source
>> >> >>> code or are we shooting ourselves in the foot here?
>> >> >>
>> >> >> We could probably make it available by publishing the source trees
>> >> >> used by LXR / EBN.
>> >> >> This would only have the main branches obviously rather than
>> >> >> everything
>> >> >> though.
>> >> >>
>> >> >> I haven't checked, but LXR may already make it's copy of the code
>> >> >> accessible...>
>> >> >
>> >> > I think making our sourcecode available to search engines is pretty
>> >> > important for the reasons already mentioned by others. Do you need
>> >> > help for it? If you write down what's needed I can help find someone
>> >> > to do it.
>> >>
>> >> I've now provisioned https://sources.kde.org/
>> >
>> > I'm not sure this is super useful, to be honest (as mentioned in #kde-
>> > sysadmins already).
>> >
>> > This is really just plain file serving, with no cross-references to either
>> > LXR (or apidocs). This is basically a dead-end when you follow a result
>> > on Google.
>> >
>> > Wouldn't it be possible to let robots index https://lxr.kde.org/source/
>> >
>> >  instead? We have the infrastructure...
>>
>> We'll give it a shot.
>
> Just to stress again this would be *really* useful to have.

????

>
> I answered a post on SO:
>   http://stackoverflow.com/a/34612692/592636
>
> Tried to link kwallet's FindGpgpme.cmake into the answer; and there's *no*
> easy way quickly get a link to KDE infrastructure serving the file via Google
> (not even api.kde.org).
>
> Try googling for "kwallet findgpgme.cmake" (very specific search after all):
>   https://www.google.de/search?q=kwallet+findgpgme.cmake
>
> -> First result: Github..., rest: mildly interesting
>
>
> Different issue I just noticed: There's no way to get the plain-text (raw)
> representation of a given file on LXR, is there? Would be useful as well.

There isn't a link in our templates, but my Google fu (and subsequent
tests confirm) that adding the parameter "_raw=1" to a LXR source view
URL will return the file without any HTML around it.

>
> Cheers,
> Kevin

Regards,
Ben

>
>> > Of course we need to blacklist all the pages allowing to actively *search*
>> > LXR for robots, in order to avoid abuse.
>>
>> Note that despite robots.txt, many spiders (including Google, Yahoo
>> and Bing) will actively disregard the instructions in there.
>> While they may not return the results - or omit snippets of the page
>> content - they have all been guilty (at least in the past) of
>> disregarding our restrictions, resulting in downtime (which have in
>> some cases necessitated full host reboots to fix) for numerous KDE.org
>> subsites in the past.
>>
>> This is why QuickGit and WebSVN have extremely restrictive robots.txt
>> policies, in addition to blacklist rules within our web server
>> configurations.
>>
>> > Cheers,
>> > Kevin
>>
>> Regards,
>> Ben
>>
>> >> > Cheers
>> >> > Lydia
>> >>
>> >> Regards,
>> >> Ben
>> >>
>> >> > --
>> >> > Lydia Pintscher - http://about.me/lydia.pintscher
>> >> > KDE e.V. Board of Directors / KDE Community Working Group
>> >> > http://kde.org - http://open-advice.org
>> >> >
>> >> >>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to
>> >> >>> unsubscribe <<>>
>> >> >>
>> >> >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to
>> >> >> unsubscribe
>> >> >> <<
>> >
>> > --
>> > Kevin Funk | kfunk at kde.org | http://kfunk.org
>
> --
> Kevin Funk | kfunk at kde.org | http://kfunk.org


More information about the Plasma-devel mailing list