robots.txt in quickgit.kde.org

Kevin Funk kfunk at kde.org
Tue Jan 5 14:17:12 UTC 2016


On Wednesday, December 30, 2015 12:57:23 PM Ben Cooksley wrote:
> On Tue, Dec 29, 2015 at 11:16 PM, Kevin Funk <kfunk at kde.org> wrote:
> > On Tuesday, December 29, 2015 10:39:01 PM Ben Cooksley wrote:
> >> On Tue, Dec 29, 2015 at 7:59 AM, Lydia Pintscher <lydia at kde.org> wrote:
> >> > On Sun, Dec 27, 2015 at 12:35 PM, Ben Cooksley <bcooksley at kde.org> 
wrote:
> >> >>> Is there some place where search engines can easily index our source
> >> >>> code or are we shooting ourselves in the foot here?
> >> >> 
> >> >> We could probably make it available by publishing the source trees
> >> >> used by LXR / EBN.
> >> >> This would only have the main branches obviously rather than
> >> >> everything
> >> >> though.
> >> >> 
> >> >> I haven't checked, but LXR may already make it's copy of the code
> >> >> accessible...>
> >> > 
> >> > I think making our sourcecode available to search engines is pretty
> >> > important for the reasons already mentioned by others. Do you need
> >> > help for it? If you write down what's needed I can help find someone
> >> > to do it.
> >> 
> >> I've now provisioned https://sources.kde.org/
> > 
> > I'm not sure this is super useful, to be honest (as mentioned in #kde-
> > sysadmins already).
> > 
> > This is really just plain file serving, with no cross-references to either
> > LXR (or apidocs). This is basically a dead-end when you follow a result
> > on Google.
> > 
> > Wouldn't it be possible to let robots index https://lxr.kde.org/source/
> > 
> >  instead? We have the infrastructure...
> 
> We'll give it a shot.

Just to stress again this would be *really* useful to have.

I answered a post on SO:
  http://stackoverflow.com/a/34612692/592636

Tried to link kwallet's FindGpgpme.cmake into the answer; and there's *no* 
easy way quickly get a link to KDE infrastructure serving the file via Google 
(not even api.kde.org).

Try googling for "kwallet findgpgme.cmake" (very specific search after all):
  https://www.google.de/search?q=kwallet+findgpgme.cmake

-> First result: Github..., rest: mildly interesting


Different issue I just noticed: There's no way to get the plain-text (raw) 
representation of a given file on LXR, is there? Would be useful as well.

Cheers,
Kevin

> > Of course we need to blacklist all the pages allowing to actively *search*
> > LXR for robots, in order to avoid abuse.
> 
> Note that despite robots.txt, many spiders (including Google, Yahoo
> and Bing) will actively disregard the instructions in there.
> While they may not return the results - or omit snippets of the page
> content - they have all been guilty (at least in the past) of
> disregarding our restrictions, resulting in downtime (which have in
> some cases necessitated full host reboots to fix) for numerous KDE.org
> subsites in the past.
> 
> This is why QuickGit and WebSVN have extremely restrictive robots.txt
> policies, in addition to blacklist rules within our web server
> configurations.
> 
> > Cheers,
> > Kevin
> 
> Regards,
> Ben
> 
> >> > Cheers
> >> > Lydia
> >> 
> >> Regards,
> >> Ben
> >> 
> >> > --
> >> > Lydia Pintscher - http://about.me/lydia.pintscher
> >> > KDE e.V. Board of Directors / KDE Community Working Group
> >> > http://kde.org - http://open-advice.org
> >> > 
> >> >>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to
> >> >>> unsubscribe <<>>
> >> >> 
> >> >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to
> >> >> unsubscribe
> >> >> <<
> > 
> > --
> > Kevin Funk | kfunk at kde.org | http://kfunk.org

-- 
Kevin Funk | kfunk at kde.org | http://kfunk.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/plasma-devel/attachments/20160105/7ce343a6/attachment-0001.sig>


More information about the Plasma-devel mailing list