robots.txt in quickgit.kde.org
Ben Cooksley
bcooksley at kde.org
Tue Dec 29 23:57:23 UTC 2015
On Tue, Dec 29, 2015 at 11:16 PM, Kevin Funk <kfunk at kde.org> wrote:
> On Tuesday, December 29, 2015 10:39:01 PM Ben Cooksley wrote:
>> On Tue, Dec 29, 2015 at 7:59 AM, Lydia Pintscher <lydia at kde.org> wrote:
>> > On Sun, Dec 27, 2015 at 12:35 PM, Ben Cooksley <bcooksley at kde.org> wrote:
>> >>> Is there some place where search engines can easily index our source
>> >>> code or are we shooting ourselves in the foot here?
>> >>
>> >> We could probably make it available by publishing the source trees
>> >> used by LXR / EBN.
>> >> This would only have the main branches obviously rather than everything
>> >> though.
>> >>
>> >> I haven't checked, but LXR may already make it's copy of the code
>> >> accessible...>
>> > I think making our sourcecode available to search engines is pretty
>> > important for the reasons already mentioned by others. Do you need
>> > help for it? If you write down what's needed I can help find someone
>> > to do it.
>>
>> I've now provisioned https://sources.kde.org/
>
> I'm not sure this is super useful, to be honest (as mentioned in #kde-
> sysadmins already).
>
> This is really just plain file serving, with no cross-references to either LXR
> (or apidocs). This is basically a dead-end when you follow a result on Google.
>
> Wouldn't it be possible to let robots index https://lxr.kde.org/source/
> instead? We have the infrastructure...
We'll give it a shot.
>
> Of course we need to blacklist all the pages allowing to actively *search* LXR
> for robots, in order to avoid abuse.
Note that despite robots.txt, many spiders (including Google, Yahoo
and Bing) will actively disregard the instructions in there.
While they may not return the results - or omit snippets of the page
content - they have all been guilty (at least in the past) of
disregarding our restrictions, resulting in downtime (which have in
some cases necessitated full host reboots to fix) for numerous KDE.org
subsites in the past.
This is why QuickGit and WebSVN have extremely restrictive robots.txt
policies, in addition to blacklist rules within our web server
configurations.
>
> Cheers,
> Kevin
Regards,
Ben
>
>> > Cheers
>> > Lydia
>>
>> Regards,
>> Ben
>>
>> > --
>> > Lydia Pintscher - http://about.me/lydia.pintscher
>> > KDE e.V. Board of Directors / KDE Community Working Group
>> > http://kde.org - http://open-advice.org
>> >
>> >>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to
>> >>> unsubscribe <<>>
>> >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe
>> >> <<
>
> --
> Kevin Funk | kfunk at kde.org | http://kfunk.org
More information about the Plasma-devel
mailing list