robots.txt in quickgit.kde.org

Kevin Funk kfunk at kde.org
Tue Jan 5 22:05:04 UTC 2016


On Wednesday, January 06, 2016 10:30:52 AM Ben Cooksley wrote:
> On Wed, Jan 6, 2016 at 3:17 AM, Kevin Funk <kfunk at kde.org> wrote:
> > On Wednesday, December 30, 2015 12:57:23 PM Ben Cooksley wrote:
> >> On Tue, Dec 29, 2015 at 11:16 PM, Kevin Funk <kfunk at kde.org> wrote:
> >> > On Tuesday, December 29, 2015 10:39:01 PM Ben Cooksley wrote:
> >> >> On Tue, Dec 29, 2015 at 7:59 AM, Lydia Pintscher <lydia at kde.org> 
wrote:
> >> >> > On Sun, Dec 27, 2015 at 12:35 PM, Ben Cooksley <bcooksley at kde.org>
> > 
> > wrote:
> >> >> >>> Is there some place where search engines can easily index our
> >> >> >>> source
> >> >> >>> code or are we shooting ourselves in the foot here?
> >> >> >> 
> >> >> >> We could probably make it available by publishing the source trees
> >> >> >> used by LXR / EBN.
> >> >> >> This would only have the main branches obviously rather than
> >> >> >> everything
> >> >> >> though.
> >> >> >> 
> >> >> >> I haven't checked, but LXR may already make it's copy of the code
> >> >> >> accessible...>
> >> >> > 
> >> >> > I think making our sourcecode available to search engines is pretty
> >> >> > important for the reasons already mentioned by others. Do you need
> >> >> > help for it? If you write down what's needed I can help find someone
> >> >> > to do it.
> >> >> 
> >> >> I've now provisioned https://sources.kde.org/
> >> > 
> >> > I'm not sure this is super useful, to be honest (as mentioned in #kde-
> >> > sysadmins already).
> >> > 
> >> > This is really just plain file serving, with no cross-references to
> >> > either
> >> > LXR (or apidocs). This is basically a dead-end when you follow a result
> >> > on Google.
> >> > 
> >> > Wouldn't it be possible to let robots index https://lxr.kde.org/source/
> >> > 
> >> >  instead? We have the infrastructure...
> >> 
> >> We'll give it a shot.
> > 
> > Just to stress again this would be *really* useful to have.
> 
> ????

Ah, I see robots.txt on lxr.kde.org allows crawling source/ now.

Thanks!
 
> > I answered a post on SO:
> >   http://stackoverflow.com/a/34612692/592636
> > 
> > Tried to link kwallet's FindGpgpme.cmake into the answer; and there's *no*
> > easy way quickly get a link to KDE infrastructure serving the file via
> > Google (not even api.kde.org).
> > 
> > Try googling for "kwallet findgpgme.cmake" (very specific search after 
all):
> >   https://www.google.de/search?q=kwallet+findgpgme.cmake
> > 
> > -> First result: Github..., rest: mildly interesting
> > 
> > 
> > Different issue I just noticed: There's no way to get the plain-text (raw)
> > representation of a given file on LXR, is there? Would be useful as well.
> 
> There isn't a link in our templates, but my Google fu (and subsequent
> tests confirm) that adding the parameter "_raw=1" to a LXR source view
> URL will return the file without any HTML around it.
> 
> > Cheers,
> > Kevin
> 
> Regards,
> Ben
> 
> >> > Of course we need to blacklist all the pages allowing to actively
> >> > *search*
> >> > LXR for robots, in order to avoid abuse.
> >> 
> >> Note that despite robots.txt, many spiders (including Google, Yahoo
> >> and Bing) will actively disregard the instructions in there.
> >> While they may not return the results - or omit snippets of the page
> >> content - they have all been guilty (at least in the past) of
> >> disregarding our restrictions, resulting in downtime (which have in
> >> some cases necessitated full host reboots to fix) for numerous KDE.org
> >> subsites in the past.
> >> 
> >> This is why QuickGit and WebSVN have extremely restrictive robots.txt
> >> policies, in addition to blacklist rules within our web server
> >> configurations.
> >> 
> >> > Cheers,
> >> > Kevin
> >> 
> >> Regards,
> >> Ben
> >> 
> >> >> > Cheers
> >> >> > Lydia
> >> >> 
> >> >> Regards,
> >> >> Ben
> >> >> 
> >> >> > --
> >> >> > Lydia Pintscher - http://about.me/lydia.pintscher
> >> >> > KDE e.V. Board of Directors / KDE Community Working Group
> >> >> > http://kde.org - http://open-advice.org
> >> >> > 
> >> >> >>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to
> >> >> >>> unsubscribe <<>>
> >> >> >> 
> >> >> >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to
> >> >> >> unsubscribe
> >> >> >> <<
> >> > 
> >> > --
> >> > Kevin Funk | kfunk at kde.org | http://kfunk.org
> > 
> > --
> > Kevin Funk | kfunk at kde.org | http://kfunk.org

-- 
Kevin Funk | kfunk at kde.org | http://kfunk.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/plasma-devel/attachments/20160105/11f60056/attachment.sig>


More information about the Plasma-devel mailing list