kde forum data
gleban at gmail.com
Fri Aug 24 19:14:50 UTC 2012
thank you all for helping with this.
The way in which I tried accessing all the posts is by repeatedly asking
for a rss feed, each time with a different start offset.
I start with
after processing this feed I increase the start to 100, then 200, etc. In
this way, theoretically I should be able to get the whole history from 2004
to now. When I was trying to download the posts yesterday I noticed that
phpBB has a limit in the start parameter. I can only go up to start =
20.000. After that I get an error page. Since this is an issue only for the
offline mode (when we need to import the past data) I solved the problem
using the parameter t in calling the search.php. Using this parameter I can
get rss feed for all posts in a particular topic. Topic ids currently go
from 0 to 100.000 so I just needed to do 100.000 url calls :) It's done now
- I've downloaded the whole history and I won't need to do it again. Since
I searched by topic ids I did get the whole history (even the posts with id
Thanks again for looking into this.
On Fri, Aug 24, 2012 at 4:11 PM, Stuart Jarvis <jarvis at kde.org> wrote:
> On Friday 24 Aug 2012 22:41:05 Ben Cooksley wrote:
> > On Fri, Aug 24, 2012 at 2:51 AM, Stuart Jarvis <jarvis at kde.org> wrote:
> > > Hi everyone,
> > Hi Stuart,
> > > I guess kde-ww is the right list to ask this. Please see the query
> > > from one of our partners in the ALERT project*
> > >
> > > Any ideas why the RSS would be limited to post 89447?
> > Can't think of any particular reason off the top of my head - there is
> > certainly no deliberate constraint on getting RSS feeds of older
> > material.
> > However I can't say it was specifically designed to return older content.
> Thanks for getting back to me. It is an unusual use case. The idea is to
> provide a non-invasive way for the ALERT system to collect the archives of
> project with the added benefit that the same parser can be used for live
> > It is likely they are striking a limit on the number of search results
> > returned (as the RSS feed is powered by our Sphinx search backend).
> > Could we have some details on how they are conducting the RSS feed
> > retrieval so I can debug why this is happening?
> Some more details from Gregor Leban (copied in):
> yes, it is strange.
> Here is for example the feed that i have in the ascending time order for
> the whole history:
> as you can see, the oldest post si from Fri, 21 May 2004 03:02:50 GMT and
> this url:
> and the p argument in the url is the post id. Do you know when KDE started
> using forums - in 2004 or sooner?
> So in this case, the number of results is already limited to 100. Could
> be an issue with changes to forum software in the past (my memory on the
> history is a bit hazy)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the kde-www