Copy link to clipboard
Copied
I am using PSE 2020 on a Windows 10 PC and this question arises from a wider discussion about how to tidy up a messy folder structure - https://community.adobe.com/t5/photoshop-elements-discussions/tools-and-tips-for-tidying-messy-file-...
I now have a clean catalog on an external hard drive and a messy set of old folders on a PC. Before deleting any old files on the PC, I want to positively confirm that the new catalog is complete and includes everything that I want from the the PC.
@MichelBParis suggested importing the old folders in to the new catalog on the basis that, if the files are not present, they will be imported and will be clearly visible on the folder view. If the files are present, the PSE will just skip them.
Unfortunately, the way PSE identifies duplicates seems to be complicated and some files that look like duplicates to me are not being skipped - they are imported and now look like duplicates in the new catalog.
So, I am trying to understand the criteria for skipping files and trying to analyse my files to identify a pattern of difference.
As an example, I did a "Get... from files and folders" from an old folder with 97 files. All 97 files already seemed to be in the catalog on a similar folder in the new drive, but only 66 were skipped and 31 were imported. I cannot see a) why the 31 files were not seen as duplicates and b) what the difference between the group of 31 is, compared to the 66 that were skipped. (There are details of the file properties in the discussion above that I am not including here too keep it as brief as possible.)
I have read a few articles about this:
https://johnrellis.com/psedbtool/photoshop-elements-faq.htm#_All_the_different
https://johnrellis.com/psedbtool/photoshop-elements-faq.htm#_Unknown_month,_day,
These suggest that setting date/time fields to "unknown" can cause problems and I am looking in to this, but I don't think it is likely that these files were editted like that.
This article https://johnrellis.com/psedbtool/photoshop-elements-faq.htm#_Find_and_delete
suggests a way to identify duplicates in the existing catalog, which isn't really my situation. It might shed some light to use this method to see if it picks up the 31 duplicates that I have just created, but it doesn't seem like an very promising line to follow.
There have been a few other posts I have read that don't seem to directly answer my situation:
I am still digesting this one to see if it has advice in it that would help me
Exiftool has been recommended to analyse the files and I am going to look into that.
In the meantime, I would be glad of any advice about:
a) how to reconcile the new catalog against the old messy file structure
b) understanding why importing files identifies some as already present and skips them, but imports other which also seem to be duplicates.
Thanks in advance,
Billy.
Copy link to clipboard
Copied
Exiftool has been recommended to analyse the files and I am going to look into that.
In the meantime, I would be glad of any advice about:
a) how to reconcile the new catalog against the old messy file structure
b) understanding why importing files identifies some as already present and skips them, but imports other which also seem to be duplicates.
Thanks in advance,
Billy.
By @Billy5FE5
Hi Billy,
First, abut exiftool. It's indeed the industry reference for metadata management. Most users don't use it directly, since it's a command line editor, but you'll find many softwares integrating its features in a handy graphical interface (such as exiftoogui ).
As suggested in the post by John R Ellis, the problem is mainly that there are much too many interpretations and implementation of data management with different softwares. I am convinced that the solution to your problem lies in a difference in how the Explorer and the Organizer manage the real date_taken in various circumstances. That's a topic which has been discussed widely since 20 years. John R Ellis has left PSE after PSE8 and is now the expert for those matters in Lightroom classic. I am sure that a closer check of the non matching duplicates in exiftool or similar will reveal the difference when comparing the data in the Organizer catalog. The problem will be to explain how the different workflows have created that change. In all cases, there has been metadata changes when saving or copying files between:
- different softwares (including the Explorer)
- different backups
- different catalogs
Somewhere in the process, there is some specific metadata which is missing and where the organizer tries to find the equivalent metadata field. A number of real bugs have been reported, and different interpretations of the exif standards are still too frequent.
Now, if we come back to your particular search, we may hope to find the real reason of the failure to find 100% of the duplicates. But it will be linked to the particular history of the files on your computer(s); only you can know.
Since I can't give you the precise explanation, I want to stress my own experience since the start of the Organizer 20 years ago.
- I have had duplicates issues like everybody else, but duplicates in a catalog without explanation are very rare and not significant.
- I don't use watched folders because I prefer to try re-importing some folders manually and I have never had significant issues with the duplicates filtering.
So, my earlier suggestion did work for me, especially for clearing duplicates on the computer but not in the catalog. In some instances I was able to find a big number of duplicates created from the import or the copy/restore of batches between different computers or catalogs.
From experience I can say that working on a single catalog, sharing my catalog+library on an external drive for different computers and the like do really protect against problematic copies.
-
Copy link to clipboard
Copied
Thanks @MichelBParis .
I'll go away and look at this systematically and in depth.
My first step was to copy the two folders in question in to a seperate test area then import them in to a new catalogue. When I did that, PSE identified all 97 as duplicates which seems in line with your belief that the root is differences in how file explorer and organise manage dates.
It's going to take me a while to get to know the tools but I am going to look through the differences between the files and I expect that I'll see a pattern (eventually).
Thanks again for your advice.
Copy link to clipboard
Copied
Thanks @MichelBParis .
I'll go away and look at this systematically and in depth.
My first step was to copy the two folders in question in to a seperate test area then import them in to a new catalogue. When I did that, PSE identified all 97 as duplicates which seems in line with your belief that the root is differences in how file explorer and organise manage dates.
It's going to take me a while to get to know the tools but I am going to look through the differences between the files and I expect that I'll see a pattern (eventually).
Thanks again for your advice.
By @Billy5FE5
We are focusing on the ability of the downloader to catch duplicates, but there has been real bugs/issues with geotagging ( rounding errors...) and time zones/DST. More recently when downloading present duplicates in smartphones or certain cameras. Something I have not seriously tested: what if you want to import a file which is a strict copy of another but with different tags? Same date_taken, but possibly nearly the same size? For me, they are not duplicates and you would risk skipping the good one with the tagging.
Another interesting fact is that while the organizer is built to avoid duplicates, it offers the command to create them voluntarily. Menu File > Duplicate. The duplicate is created in the same folder with a ' copy' suffix. You can move it elsewhere if you want. Both the original and the duplicate are imported in a new catalog !
Copy link to clipboard
Copied
I made a batch file to use exiftool to extract metadata from corresponding files and run it through fc, but I wasn't able to see differences in any field that caused one sort of file to be skipped and the others to be imported. (I haven't exhausted this approach yet though.) I also created a spreadsheet that would let me compare the tags that were on the files and didn't see a pattern there.
What I noticed is that all edited files are being imported (or least, 79 out of the 79 editted files in the folders I have tested importing from which seems like a strong correlation).
This may be confusing, so let me clarify what I mean. I have NOT edited the files since I backed up and restored the catalog. The files were editted years ago as part of the workflow when they were first imported. Then they were backed up and restored to the new catalog but when I reimport them, they are not being skipped as duplicates. They are being re-imported, even though they should be duplicates of the version that has been created from the restore.
This doesn't account for all of the cases so there is more than one issue to be tracked down, but it will be a sizeable portion of the problem files, so this has been some progress.
Now I'm going to go away and chew over what to do about it.