[akregator] [Bug 408079] New: Fetching feeds causes duplicated items

Daniel Roschka bugzilla_noreply at kde.org
Wed May 29 19:20:00 BST 2019


https://bugs.kde.org/show_bug.cgi?id=408079

            Bug ID: 408079
           Summary: Fetching feeds causes duplicated items
           Product: akregator
           Version: GIT (master)
          Platform: Debian unstable
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: feed parser
          Assignee: kdepim-bugs at kde.org
          Reporter: danielroschka at phoenitydawn.de
  Target Milestone: ---

When fetching a feed multiple times akregator duplicates existing items when
the content of a fetched item differs from the content of the same item already
available locally. I'm suffering from this bug now since 10+ years and would
like to see it finally gone.

Here is my theory why it happens:

Instead of using the guid only to compare two items for equality, Akregator
builds a hash over title, description, content, link and author
(https://github.com/KDE/akregator/blob/0d588dcbfb9cc93dec5b6bcbf3b01336ca1d09ce/src/feed/feed.cpp#L581-L585
and
https://github.com/KDE/akregator/blob/0d588dcbfb9cc93dec5b6bcbf3b01336ca1d09ce/src/article.cpp#L189)
and checks that as well, unless the guid started with "hash:". I believe this
is not according to the specification, which states:

> guid stands for globally unique identifier. It's a string that uniquely identifies the item.
> When present, an aggregator may choose to use this string to determine if an item is new.
> 
> <guid>http://some.server.com/weblogItem3207</guid>
> 
> There are no rules for the syntax of a guid. Aggregators must view them as a string. It's up to
> the source of the feed to establish the uniqueness of the string.

http://www.rssboard.org/rss-specification#ltguidgtSubelementOfLtitemgt

The current behavior produces duplicate items when authors fix typos in their
posts or when software inserts random bits in the data (e.g. in Javascript
included in the markup (Podlove Publisher is known for that
(https://github.com/podlove/podlove-publisher/blob/192a2710b6ad3d0f5eff67f4daacb5d6dac6ab4a/lib/modules/subscribe_button/button.php#L88))).
The latter case is particularly annoying as it produces a new item every single
time akregator fetches the feed.

I'd be happy to provide additional information if necessary.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the Kdepim-bugs mailing list