Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

How to fix folders with mass number of duplicate (exact) files before importing to PSE 2019

Community Beginner ,
Jul 11, 2019 Jul 11, 2019

Hello,

I'm posting on this forum on my mother's behalf.  I have to admit, I am pulling my hair out after spending the last month building my mother, a digital photographer, a brand new computer.  Here's the situation:

  • My mother has 19 years work of photos (~85,000)
  • She is not good with computers and I have tried my best to be supporting, and I am at the end of my rope!
  • Her previous computer's extra internal hard drive died and she hadn't performed a backup onto the external drive for some time.  She ended up loosing some photos.
  • I decided it was time for me to build her a brand new, powerful desktop computer, which I successfully did.  The computer had multiple internal hard drives and a number of backup schemes so we'll never lose another file.
  • After building the new computer, we had to upgrade her to Photoshop Elements 2019, because PSE 8-10 was no longer supported on Windows 10.
  • After installing PSE 2019, I tried importing her ~85,000 photo files using PSE Organizer's import feature.
  • And then... after an hour of two... PSE was done and reported that ~9,000 of her ~85,000 photos were duplicates (date/size exact... not "similar").
  • I examined a few files and to my horror, my mother's method for organizing files did in fact produce many exact duplicates (a result of her copying and pasting duplicates all over her folders).  She would make these duplicates in Windows Explorer (not through the PSE catalog interface).
  • I did some research to determine if there's a setting that allows duplicates to be imported and discovered somewhere in the Adobe forum that this particular setting was removed in more recent versions of PSE.  The reason for this that PSE Organizer should be considered a database and there shouldn't be duplicate entries in a database.
    • As a software engineer, who meticulously manages thousands of files every day, I agree with this philosophy - duplicates are horrible.
    • However, for a 62 year old mother who doesn't have this experience or perspective (and has spent her life putting her family first at her expense), this concept of avoiding duplicates just doesn't make sense.  (I'll skip the heated argument part.)
  • So I am trying to figure a few things:
    • Is there truly no way to have PSE 2019 allow exact duplicates to be imported into the catalog?  I would have figured different file paths would have allowed this to be possible.
    • If not... then I'm stuck trying to eliminate the duplicate files before re-importing the catalog from scratch.  I know there are tools that do this, but they present the duplicates on a photo-by-photo basis, instead of identifying and grouping the photos in sub-folders.  Example below...

Here's an example of what my mom has going on.  Folders are showing in <CAPS>, where as individual files are shown with lowercase names and the .jpeg extension.  I've bolded all the files that are considered duplicates.  From my mother's perspective, she would copy (duplicate) a file if if also fit a sub-category.

  1. She would start by uploading all the photos she took at Christmas time in 2018 into the <2018>\<CHRISTMAS> folder.
  2. Then she would want to further sub-categorize the photos based on family members in the photo, so she would:
    1. Create sub-folders like: <PETS>, <SMITH_FAMILY>, and <WILSON_FAMILY>.
    2. And then, instead of moving (cutting) the files from the CHRISTMAS folder into the sub-folders, she would COPY and PASTE (duplicate) these photos in the bub-category photos.
  3. Then she would make a folder to organize photos to make a photo book on Snapfish.  She'd then copy photos from all over her other folders into the <MY_2018_PHOTOBOOK> folder (creating more duplicates).
  4. (See the folder structure below as a simplified example...)

  • <2016>
    • image_100.jpeg
    • ...
    • image_123.jpeg
  • <2017>
    • image_200.jpeg
    • ...
    • image_234.jpeg
  • <2018>
    • <CHRISTMAS>
      • <PETS>
        • image_03.jpeg
      • <SMITH_FAMILY>
        • image_01.jpeg
        • image_03.jpeg
      • <WILSON_FAMILY>
        • image_04.jpeg
        • image_02.jpeg
      • image_01.jpeg
      • image_02.jpeg
      • image_03.jpeg
      • image_04.jpeg
      • image_05.jpeg
      • image_06.jpeg
  • <MY_2018_PHOTOBOOK>
    • image_02.jpeg
    • image_03.jpeg
    • image_04.jpeg
    • image_123.jpeg
    • image_200.jpeg

In the end, I thought I could simply import the parent folder containing the folders <2016>, <2017>, <2018>, <MY_2018_PHOTOBOOK>.  And this is where I encountered the message that ~9,000 photos were skipped because they were duplicates.  And I can't figure out the order PSE Organizer is importing files and how is chooses which files (which technically have duplicates) will be import and which it will exclude.  For example, will it import image_03.jpeg from the <PETS> folder and then skip the image_03.jpeg from the <SMITH_FAMILY> and <2018> folders?  Or will it take image_03.jpeg from the <2018> folder and then skip it in any lower sub-folder?  And then will it find image_03.jpeg in the <MY_2018_PHOTOBOOK> folder before the <2018> folder, or will it find it in the <2018> folder first, because numbers have higher path priority than letters?

Is there a tool I can use that will easily show me a grouping (by folder) of duplicates files by folder?  Meaning, it would present me some prompt indicating "Hey, you have 5 photos in your <MY_2018_PHOTOBOOK> folder that are duplicates in these other folders.  Would you like to delete the duplicates in the other folders all at once, or in the <MY_2018_PHOTOBOOK>, all at once.

So I'm begging for help because I'm about to fake my own death and run away from this mess. I hope this is making sense and that someone out there has been through this and can give some guidance.

Thanks in advance...

Phil

1.6K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 11, 2019 Jul 11, 2019

Yikes!!!  I feel your pain, Phil. 

So, first, a couple of questions.  Did your mother use the Organizer on her earlier versions, and if so, do the earlier catalogs still exist?  (Do a search for catalog.pse*db, where *=the PSE version number, to find the location of the catalog folder.)

Has your mother used any of the database features of the catalog to tag her files?  As you obviously understand, the Organizer has many database features that totally eliminate the need for duplicate files.  So, I guess my real question is, why does your mother need the duplicates?

I had always thought that the Organizer treated files on a different path as non-identical.  But after reading your post, I performed some small experiments and confirmed what you say.

So, here is a possible workaround that may make your task easier if your mother insists on keeping the duplicates.  Batch rename the duplicate files.  For example, rename the images in each folder with the folder name as a prefix, e.g. Rename Image 03.jpg to Christmas.Pets.Image 03.jpg.  (I have confirmed that files with different names are not considered duplicates.)

I may have some other suggestions when you report back with answers to my other questions.

Added: After re-reading your post, I am wondering why you want to identify (and delete?) the duplicates before importing?  The importing process takes care of removing the duplicates.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 12, 2019 Jul 12, 2019

Before going to details, I think the real question is to know if your purpose is to please your mother... or yourself (or anyone wanting to take advantage of her present library).

In my experience, there are two main 'frames of mind' to organize information and structure one's way to memorize things. I'll call them the cabinet/drawer/folder hiererachical mind and the keywords 'Google search' or 'Word' mind. I am not sure if there is a unique frame of mind for database users... even they all understand the drawbacks of duplicates.

The need for hierarchy makes a personal organization nearly useless for any other users. Just think about bird lovers: what help will the hierarchical scheme will be if you don't already know the hierarchy of species/subspecies?

On the other hand, just look at teenagers using Google: they'll immediately find by meaning and associations the few words to find what they need? That's typical of family search. If you strictly think family trees, you'll have difficulties to find people by married names, first names, nicknames. If you assign a set of keywords for what you think is relevant for a search, you can combine them and find anything.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 22, 2019 Jul 22, 2019

Hi Michel... Thanks for your response.  I think your first question is irrelevant to solving the problem... but the answer is "both".

I definitely see what you are saying about the "Google mind", and I agree with you on that.  I haven't been able to convince her that's the right way to go, yet. 

For example, she would import her photos from her phone once a month.  To do this, she would create a new folder each month with the filename pattern YYYY-MM and then store the imported photos in that folder.  Her reasoning for this is that she could think back to a time frame and find the photo by going to the correct YYYY-MM folder.  For example, she might wand to go back to the 2017 beach trip.  "Well... the beach is in the summer of 2017, so I'll look between 2017-05 and 2017-09."

The solution for organizing her phone photos is simple, honestly.  She can use the Amazon Photo app to upload the photos at full res to the cloud and then have them synced with her desktop computer.  The beautiful thing about this is that Amazon Photo's app renames the images to a format of YYYY-MM-DD-HH-MM-SS.jpg (where the date is the time the photo was taken). So it's really simple to sort through them!  You could keep them all in the same folder and not worry about filename number rollover.

At the end of the day, I couldn't convince her to do that... she wants folders because that's how her mind works.  Sigh...

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 22, 2019 Jul 22, 2019

Hi Greg... Thanks for the response!

Unfortunately, her old computer tanked and so we didn't have time to export the Organizer's catalog.  And as for using tags instead of creating duplicates: Well, I know that's a good idea, but she does not want to deal with that and thinks it to be a bad way to manage files.  So we'll agree to disagree.


Thanks for giving it a test!  We did talk about possibly batch renaming the files.  But here's the kicker... if you then use a tool like CCleaner or Duplicate Photo Finder to find duplicate files, they still can detect the file content as a match, even if the filenames are different.  The good news is that I trained my mother on how to use CCleaner's duplicate file finding tool in order to clean up her folders containing all her photos.  This has been a good exercise for her, and she has come around and agrees that having duplicate files is not good.  I think she is feeling good about cleaning up duplicate files.

To address your last question, the importing process does prevent importing duplicates, but as I mentioned in my post, there's no deterministic way of knowing which file it keeps and which it rejects.  (I explain this in the paragraphs right below my example folder tree.)  Hope that makes more sense.

Thanks for the quick response on this.  If you have a recommendation on how to batch rename files, I'd be interested in that.  I'm sure there's a script somewhere, but was wondering if there's a GUI tool you use that I could get for my mother to use; she's not going to like the idea of using the command line 🙂

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 22, 2019 Jul 22, 2019
LATEST

If you have a recommendation on how to batch rename files, I'd be interested in that.  I'm sure there's a script somewhere, but was wondering if there's a GUI tool you use that I could get for my mother to use; she's not going to like the idea of using the command line 🙂

I'm not sure why you need to batch rename files anymore if your mother has decided to delete the duplicates.  But here is an easy way:

In Windows File Explorer:

1.  Open the File Folder in Details View and sort the files by date order (if that is how you want the order to be).

2.  Select all the files (Ctrl+A).

3.  Right-click on the top file and choose Rename.  (The original file name will be selected.)

4.  Give the file your desired base name.

5.  Hit Enter.  The files will be renamed with the base name and a numeric suffix in order.

Snap10_2019.07.22_20h21m54s_000_SnapCollage.png

An alternative batch renaming can be performed in Elements.  But that will only make sense if you already have the files imported into a catalog in the Organizer.  I'll await further word from you before explaining those options.  Hopefully, the Windows OS method will serve your purpose before you import the files into Elements.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines