.oO(jsteinmann)
>for example, look at an rss feed, new items get added,
but the old ones are
>still there.
Only in your RSS aggregator, which usually keeps old entries.
The feed
itself only contains the most recent items.
>If I just parse the xml and insert it into the database,
i'll end
>up with duplicates.
Then you should check for dupes before you insert them into
the DB.
Dependent on the data the DB might be able to handle this
itself, for
example with an INSERT ... ON DUPLICATE KEY UPDATE statement.
>Let's say I only insert ones with todays date, ok that
>will work with avoiding duplicates, but if they modify or
remove something,
>that would also need to be done in the database as well,
and you can't do that
>unless you're attaching something that can identify it in
the database.
Correct. And usually there is something that can be used to
identify a
particular record, maybe the name of the source, a title and
the date
when it was released for the first time or something like
that. These
informations could be merged into an MD5 hash for example to
get an
almost unique record identifier.
Micha