kde forum data
bcooksley at kde.org
Fri Aug 24 23:07:10 UTC 2012
On Sat, Aug 25, 2012 at 7:14 AM, Gregor Leban <gleban at gmail.com> wrote:
> thank you all for helping with this.
> The way in which I tried accessing all the posts is by repeatedly asking for
> a rss feed, each time with a different start offset.
> I start with
> after processing this feed I increase the start to 100, then 200, etc. In
> this way, theoretically I should be able to get the whole history from 2004
> to now. When I was trying to download the posts yesterday I noticed that
> phpBB has a limit in the start parameter. I can only go up to start =
> 20.000. After that I get an error page. Since this is an issue only for the
Okay, you probably ran into a limit on the number of results.
> offline mode (when we need to import the past data) I solved the problem
> using the parameter t in calling the search.php. Using this parameter I can
> get rss feed for all posts in a particular topic. Topic ids currently go
> from 0 to 100.000 so I just needed to do 100.000 url calls :) It's done now
> - I've downloaded the whole history and I won't need to do it again. Since I
> searched by topic ids I did get the whole history (even the posts with id <
The forum currently has 107490 topics (topic 107490 being the newest) however.
> Thanks again for looking into this.
> On Fri, Aug 24, 2012 at 4:11 PM, Stuart Jarvis <jarvis at kde.org> wrote:
>> On Friday 24 Aug 2012 22:41:05 Ben Cooksley wrote:
>> > On Fri, Aug 24, 2012 at 2:51 AM, Stuart Jarvis <jarvis at kde.org> wrote:
>> > > Hi everyone,
>> > Hi Stuart,
>> > > I guess kde-ww is the right list to ask this. Please see the query
>> > > below
>> > > from one of our partners in the ALERT project*
>> > >
>> > > Any ideas why the RSS would be limited to post 89447?
>> > Can't think of any particular reason off the top of my head - there is
>> > certainly no deliberate constraint on getting RSS feeds of older
>> > material.
>> > However I can't say it was specifically designed to return older
>> > content.
>> Thanks for getting back to me. It is an unusual use case. The idea is to
>> provide a non-invasive way for the ALERT system to collect the archives of
>> project with the added benefit that the same parser can be used for live
>> > It is likely they are striking a limit on the number of search results
>> > returned (as the RSS feed is powered by our Sphinx search backend).
>> > Could we have some details on how they are conducting the RSS feed
>> > retrieval so I can debug why this is happening?
>> Some more details from Gregor Leban (copied in):
>> yes, it is strange.
>> Here is for example the feed that i have in the ascending time order for
>> the whole history:
>> as you can see, the oldest post si from Fri, 21 May 2004 03:02:50 GMT and
>> this url:
>> and the p argument in the url is the post id. Do you know when KDE started
>> using forums - in 2004 or sooner?
>> So in this case, the number of results is already limited to 100. Could
>> be an issue with changes to forum software in the past (my memory on the
>> history is a bit hazy)
> kde-www mailing list
> kde-www at kde.org
More information about the kde-www