kde forum data

Ben Cooksley bcooksley at kde.org
Fri Aug 24 23:07:10 UTC 2012


On Sat, Aug 25, 2012 at 7:14 AM, Gregor Leban <gleban at gmail.com> wrote:
> Hi,

Hi Gregor,

> thank you all for helping with this.
> The way in which I tried accessing all the posts is by repeatedly asking for
> a rss feed, each time with a different start offset.
> I start with
> https://forum.kde.org/search.php?keywords=&terms=all&author=&tags=&sv=0&sc=1&sf=all&sk=t&sd=a&feed_type=RSS2.0&feed_style=BASIC&st=0&submit=Search&countlimit=100&start=0
>
> after processing this feed  I increase the start to 100, then 200, etc. In
> this way, theoretically I should be able to get the whole history from 2004
> to now. When I was trying to download the posts yesterday I noticed that
> phpBB has a limit in the start parameter. I can only go up to start =
> 20.000. After that I get an error page. Since this is an issue only for the

Okay, you probably ran into a limit on the number of results.

> offline mode (when we need to import the past data) I solved the problem
> using the parameter t in calling the search.php. Using this parameter I can
> get rss feed for all posts in a particular topic. Topic ids currently go
> from 0 to 100.000 so I just needed to do 100.000 url calls :) It's done now
> - I've downloaded the whole history and I won't need to do it again. Since I
> searched by topic ids I did get the whole history (even the posts with id <
> 90.000).

Okay.
The forum currently has 107490 topics (topic 107490 being the newest) however.

>
> Thanks again for looking into this.
> Best,
> gregor

Regards,
Ben

>
>
> On Fri, Aug 24, 2012 at 4:11 PM, Stuart Jarvis <jarvis at kde.org> wrote:
>>
>> On Friday 24 Aug 2012 22:41:05 Ben Cooksley wrote:
>> > On Fri, Aug 24, 2012 at 2:51 AM, Stuart Jarvis <jarvis at kde.org> wrote:
>> > > Hi everyone,
>> >
>> > Hi Stuart,
>> >
>> > > I guess kde-ww is the right list to ask this. Please see the query
>> > > below
>> > > from one of our partners in the ALERT project*
>> > >
>> > > Any ideas why the RSS would be limited to post 89447?
>> >
>> > Can't think of any particular reason off the top of my head - there is
>> > certainly no deliberate constraint on getting RSS feeds of older
>> > material.
>> > However I can't say it was specifically designed to return older
>> > content.
>>
>> Thanks for getting back to me. It is an unusual use case. The idea is to
>> provide a non-invasive way for the ALERT system to collect the archives of
>> a
>> project with the added benefit that the same parser can be used for live
>> updates.
>>
>> > It is likely they are striking a limit on the number of search results
>> > returned (as the RSS feed is powered by our Sphinx search backend).
>> > Could we have some details on how they are conducting the RSS feed
>> > retrieval so I can debug why this is happening?
>>
>> Some more details from Gregor Leban (copied in):
>>
>> ---
>> yes, it is strange.
>> Here is for example the feed that i have in the ascending time order for
>> the whole history:
>>
>> https://forum.kde.org/search.php?keywords=&terms=all&author=&tags=&sv=0&sc=1&sf=all&sk=t&sd=a&st=0&feed_type=RSS2.0&feed_style=HTML&countlimit=100&submit=Search
>>
>>
>> as you can see, the oldest post si from Fri, 21 May 2004 03:02:50 GMT and
>> has
>> this url:
>> https://forum.kde.org/viewtopic.php?f=119&t=66734&p=89443#p89443
>>
>> and the p argument in the url is the post id. Do you know when KDE started
>> using forums - in 2004 or sooner?
>> ---
>>
>> So in this case, the number of results is already limited to 100. Could
>> this
>> be an issue with changes to forum software in the past (my memory on the
>> forum
>> history is a bit hazy)
>>
>> Cheers,
>> Stu
>
>
>
> _______________________________________________
> kde-www mailing list
> kde-www at kde.org
> https://mail.kde.org/mailman/listinfo/kde-www
>


More information about the kde-www mailing list