Skip to main content
owimmer
Participant
January 22, 2018
Answered

Finding duplicates after HDD failure w/ Lightroom, using Metadata + Histogram

  • January 22, 2018
  • 4 replies
  • 511 views

Hello,

I had a mechanical HDD failure, but was able to recover my files.  Good news is that all files are here and Metadata (capture date/time) is still available.  Problem is that most pictures appear 3-5 times after I import them into Lightroom.  The 'avoid duplicates' switch was ON during import.  Usually works well, but not in this case.

When I browse through the photos, they a) show exactly the same capture date/time (down to the second) b) are optically the same and c) have ALMOST identical histograms.  Some histograms are an exact match, but others vary marginally (say 3-5% difference).  Variations are very minor, such as an on-screen shift of 1-2mm to the right/left, but histogram distribution trends are matching.  Another minor difference can be that the histogram for one picture shows a smooth chart, but the histogram for the suspected duplicate shows a non-smooth line. With non-smooth I mean that the chart resembles more a very dense column chart than a smooth line chart (as if there were fewer data points available).

Does anybody have a better option to delete suspected duplicates than a manual check based on a) capture time and b) histogram? 3rd party duplicate check software does not work either in this case.  I am wondering whether there is any reason why 2 identical files would show a very minor (less than 5% range) histogram in Lightroom.

Thank you,

Oliver

    This topic has been closed for replies.
    Correct answer john beardsworth

    Have you tried this? Lightroom Plugins - Duplicate Finder for Lightroom 

    I don't think there's a clever way using LR's built-in features - you always have to use your eyes.

    4 replies

    owimmer
    owimmerAuthor
    Participant
    January 22, 2018

    Thank you, this Lightroom plug-in worked for me.  I had previously tried Easy Duplicate Finder (3rd party software) as well as Lightroom's 'avoid duplicates upon import' function, neither of which recognised the files as duplicates.  The plug-in you suggested actually does recognise the correct files as duplicates.

    What really puzzles me though - and Adobe support couldn't clarify this either - is why these duplicate files show marginally different histograms.  I always thought that relying on a combination of a) time shot + b) histogram would be a definitive way of spotting duplicate photos.

    dj_paige
    Legend
    January 22, 2018
    What really puzzles me though - and Adobe support couldn't clarify this either - is why these duplicate files show marginally different histograms.  I always thought that relying on a combination of a) time shot + b) histogram would be a definitive way of spotting duplicate photos.

    These are not the same photos, there has been some modification to them (obviously, if the size is different), and so of course the histogram will not match exactly. Perhaps the problem is the word "duplicate", in your case the photos have the same file names, but are not exact duplicates.

    owimmer
    owimmerAuthor
    Participant
    January 22, 2018

    I worked through 4k files now and noticed that the very small differences in histogram data may actually be a bug in Lightroom.  When I move quickly between 3 duplicate photos and check out the respective histogram data on the right, they initially all appear smooth.  But once I go back and forth just between the same pictures multiple times, after a few moments, some of those previously smooth histograms change to non-discrete bar charts. Perhaps the data is a bit much to handle for Lightroom (catalogue is 70k on a thunderbolt-connected external HDD and the current import is nearly 5k too. 

    dj_paige
    Legend
    January 22, 2018
    The 'avoid duplicates' switch was ON during import.  Usually works well, but not in this case.

    Duplicates are defined as same file name, same capture time and same file size. Obviously, some of your photos that you think are duplicates do not match on all three criteria (probably file size).

    Another minor difference can be that the histogram for one picture shows a smooth chart, but the histogram for the suspected duplicate shows a non-smooth line.

    Yes, this again indicates that you have different file sizes on some of your "duplicates". Thus you may be able to use your operating system to sort everything by file name and file size, and then the smaller size would probably be the ones you don't want to import.

    Also, it seems to me that the failure of the HDD is irrelevant here, if these are all photos imported after the failure of the HDD, I don't see why you mentioned it.

    dj_paige
    Legend
    January 22, 2018

    owimmer  wrote

    Does anybody have a better option to delete suspected duplicates than a manual check based on a) capture time and b) histogram?

    Restore a backup of your catalog file from just before the crash?

    john beardsworth
    Community Expert
    Community Expert
    January 22, 2018

    Good point.

    john beardsworth
    Community Expert
    john beardsworthCommunity ExpertCorrect answer
    Community Expert
    January 22, 2018

    Have you tried this? Lightroom Plugins - Duplicate Finder for Lightroom 

    I don't think there's a clever way using LR's built-in features - you always have to use your eyes.