Detect dublicate gif files without OCR

Question

I have high number of GIF files in a folder that each GIF file include a year number like following:

But among the GIFs in that folder, it may have been repeated hundreds of times every year

Now I want to detect dublicate GIF files by photoshop without OCR

Are there any idea to do this?

for example open each GIF file in photoshop and select whole of year number and then save specific characteristics or coordinates of selected section in txt file and then compare all specific characteristics or coordinates in txt file and find dublicate GIFs.

note that don't ask why I want to do this and what is source of this GIF files and why I don't want use OCR

OCR is very weak for about 13000 number of GIF files and it may not be accurate or leave some GIFs unscanned!

I attach 10 number of GIF files to this post for test.

Pubg32486011zfgs · Accepted Answer

The trouble is the line in your images. They won't create the same hash.Now I wrote following script that check each two GIF files in a folder in very high speed! import cv2import numpy as npimport osfrom concurrent.futures import ProcessPoolExecutor, as_completedimport keyboarddef calculate_similarity_percentage(file1, file2):    # Read the GIF files    gif1 = cv2.VideoCapture(file1)    gif2 = cv2.VideoCapture(file2)    # Read the first frame of each GIF    _, frame1 = gif1.read()    _, frame2 = gif2.read()    # Resize frames to a common size    frame1 = cv2.resize(frame1, (640, 480))    frame2 = cv2.resize(frame2, (640, 480))    # Convert frames to grayscale    gray1 = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)    gray2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)    # Apply Gaussian blur to the frames    blurred1 = cv2.GaussianBlur(gray1, (15, 15), 0)    blurred2 = cv2.GaussianBlur(gray2, (15, 15), 0)    # Find the absolute difference between the two blurred frames    diff = cv2.absdiff(blurred1, blurred2)    # Threshold the difference image    _, thresholded = cv2.threshold(diff, 30, 255, cv2.THRESH_BINARY)    # Calculate the difference percentage    total_pixels = thresholded.size    non_zero_pixels = cv2.countNonZero(thresholded)    difference_percentage = (non_zero_pixels / total_pixels) * 100    # Calculate the similarity percentage    similarity_percentage = 100 - difference_percentage    return similarity_percentagedef compare_files_chunk(files_chunk, directory):    results = []    for i in range(1, len(files_chunk)):        file_path1 = os.path.join(directory, files_chunk[i - 1])        file_path2 = os.path.join(directory, files_chunk[i])        similarity_percentage = calculate_similarity_percentage(file_path1, file_path2)        results.append(f"Comparison between {files_chunk[i - 1]} and {files_chunk[i]}: {similarity_percentage:.2f}%")    return resultsdef compare_all_files(directory, output_file):    files = sorted([f for f in os.listdir(directory) if f.lower().endswith('.gif')])    num_files = len(files)    chunk_size = min(num_files, os.cpu_count() * 4)  # Adjust the chunk size based on your system's capabilities    with open(output_file, 'w') as f_out, ProcessPoolExecutor() as executor:        futures = []        for i in range(0, num_files, chunk_size):            files_chunk = files[i:i + chunk_size]            future = executor.submit(compare_files_chunk, files_chunk, directory)            futures.append(future)        for future in as_completed(futures):            results = future.result()            f_out.write('
'.join(results) + '
')            # Check for 'F6' key press to stop the process            if keyboard.is_pressed('F6'):                print("Process stopped by user.")                returndef main():    directory = r'E:\Desktop\Armies\L1816_2'    output_file = r'E:\Desktop\Armies\comparison_results.txt'    try:        compare_all_files(directory, output_file)        print("Comparison completed. Results saved in:", output_file)    except Exception as e:        print(f"Error: {str(e)}")if __name__ == "__main__":    main()

Abambo · Answer

How many different years are there? If you have only several years, you could create a mask for each year, you mask out the lines, save the resulting file as a PNG. Those with the same size are probably the same.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded