Copy link to clipboard
Copied
I receive about 500 to 1000 photos of charts or graphs on a daily basis that i have to remove all the components of the photos except their numerical values.
This is an example of my photos:
I select graph bars using Magic Wand Tools in Photoshop:
Then move selected area to between logos and the numerical values:
And finally, by removing the various components of the images, only numerical values remain:
I have to do these steps with Photoshop action feature because the number of my daily photos is very high.
I can do these steps with action feature automatically if the distance between the edge of the graph bars and the first digit of the numerical values is the same in all images. but the main problem here is that the space between the numerical values and the graph bars in the images received daily is not the same.
For example, this is another photo where the distance between the numerical values and the bars of the graph is very close:
If I apply the steps I went through for the previous image using Photoshop action feature on this image, surely some of the numerical values will be removed in this image.
How can I delete all the components of my daily images except the numerical values through the Photoshop action feature?
Can I move selected area exactly to before the numeric values using the Photoshop script?
Note:
I have been facing this difficult problem for about three months and I hope I can find a solution here
Sorry if there are any spelling errors in the text because my English is very poor. I wrote this text with the help of Google Translate.
Copy link to clipboard
Copied
Use OCR tool for this sample image to understand what I mean:
The example you posted in your last post lacks the commas in the numbers – why?
You seem to constantly add new variabilities to the task.
As for OCR:
Acrobat’s OCR gets me this rtf:
That the negative numbers are ignored altogether seems a bit unexpected to me, but the »2« in line 2 appears distinguishable by size.
Of course I have no idea how to incorporate Acrobat etc. in a Scripted process.
I work in image editing after all.
I can use OCR tool only in one case. There should be scannable icons among the numerical values and logos:
Or any better way to prevent merging of numeric values and numbers in logos during OCR
I hope you can fully understand my limitations in using OCR tool with this example
Do you have influence on the how the files are created?
Copy link to clipboard
Copied
Use OCR tool for this sample image to understand what I mean:
The example you posted in your last post lacks the commas in the numbers – why?
You seem to constantly add new variabilities to the task.
As for OCR:
Acrobat’s OCR gets me this rtf:
That the negative numbers are ignored altogether seems a bit unexpected to me, but the »2« in line 2 appears distinguishable by size.
Of course I have no idea how to incorporate Acrobat etc. in a Scripted process.
I work in image editing after all.
I can use OCR tool only in one case. There should be scannable icons among the numerical values and logos:
Or any better way to prevent merging of numeric values and numbers in logos during OCR
I hope you can fully understand my limitations in using OCR tool with this example
Do you have influence on the how the files are created?
By @c.pfaffenbichler
The values received in daily photos may be different. It is possible that the photos received today will be in eight digits and have commas, but the photos received tomorrow will be in single digits and decimals and will not have commas.
I am not involved in the creation of the received photos. I edit the photos I post here to show my own descriptions.
In another forum, someone was able to remove even »g« logo in the sample image below consists of lot of alternating steps of surface blur, gaussian blur, unsharp mask and levels adjustments.
I requested him to send me the action file used. I will post the file here if I get it.
The values in the photos received daily may be as follows:
1/23
1.23
123,123,123
123$
123123123
And maybe some other types of values that I don't remember.
In my opinion, the best way for this project is to select the graph bars and move the selection to before the numerical values and delete the selected content. The only problem with this idea is the different space between the graph bars and the numerical values in the different photos. If it is possible to move the selection in different photos accurately to before the numerical values with the script, it is easy to remove the extra content of the photos.
Copy link to clipboard
Copied
Why do you call these images »photos« by the way?
Are they actually taken with a camera or created all-digitally (somehow)?
Copy link to clipboard
Copied
Why do you call these images »photos« by the way?
Are they actually taken with a camera or created all-digitally (somehow)?
By @c.pfaffenbichler
My English is very poor.
I speak in this forum using Translator.
I use both »photos« and »images« for the files I edit in Photoshop😁
Copy link to clipboard
Copied
Why do you call these images »photos« by the way?
Are they actually taken with a camera or created all-digitally (somehow)?
By @c.pfaffenbichler
I ended up removing 90% of the extra content from the images. How can I select the main content using the height and remove the remaining 10% extra content? Does Photoshop have the ability to deselect selected content that does not have the desired height?
Copy link to clipboard
Copied
I for one do not understand what you mean.
So please post meaningful screenshots or sketches to clarify what you are talking about.
Copy link to clipboard
Copied
In my opinion, the best way for this project is to select the graph bars and move the selection to before the numerical values and delete the selected content.
By @abolfazl28627254vbil
Isn't it faster to select the entire zone with numbers using a polygon lasso or a path, and then apply a mask?
Copy link to clipboard
Copied
Isn't it faster to select the entire zone with numbers using a polygon lasso or a path, and then apply a mask?
By @r-bin
Ah, but this is in the context of scripting many thousands of variable source images that vary greatly from one to the next...
Copy link to clipboard
Copied
If Photoshop had feature to set the minimum and maximum diameter of solor range selection, maybe the problem would be solved.
Copy link to clipboard
Copied
solor
By @abolfazl28627254vbil
solor = color
Copy link to clipboard
Copied
If Photoshop had feature to set the minimum and maximum diameter of solor range selection, maybe the problem would be solved.
By @abolfazl28627254vbil
You seem to expect vector/text data capabilities from pixel data.
I think you should accept what Photoshop is and does. (At least at current, what the future may bring – who knows?)
Also: In the sample images you provided in various posts the numbers were quite different in height, so it may be hard to avoid a Scripting approach.
Copy link to clipboard
Copied
Also: In the sample images you provided in various posts the numbers were quite different in height, so it may be hard to avoid a Scripting approach.
By @c.pfaffenbichler
So the only help that Photoshop can give me right now is to increase transparent space between objects to right so that I can use OCR tool afterwards.
If can do this without converting selected objects in image to dedicated layers, image editing speed will increase. For example, select the transparent space with magic want tool and expand it with any way. By doing this, numbers in the logos will not be merged with the numerical values during OCR.
Copy link to clipboard
Copied
If can do this without converting selected objects in image to dedicated layers, image editing speed will increase. For example, select the transparent space with magic want tool and expand it with any way. By doing this, numbers in the logos will not be merged with the numerical values during OCR.
By @abolfazl28627254vbil
expand = distance
Copy link to clipboard
Copied
So the only thing that can help me now is this idea. If you know a script that can do this, please provide it.Also: In the sample images you provided in various posts the numbers were quite different in height, so it may be hard to avoid a Scripting approach.
By @c.pfaffenbichlerSo the only help that Photoshop can give me right now is to increase transparent space between objects to right so that I can use OCR tool afterwards.
If can do this without converting selected objects in image to dedicated layers, image editing speed will increase. For example, select the transparent space with magic want tool and expand it with any way. By doing this, numbers in the logos will not be merged with the numerical values during OCR.
By @abolfazl28627254vbil
Copy link to clipboard
Copied
I see no point in pursuing this as »the transparent space« could also include the space between numbers, between numbers and units of measurement, between the letters in or elements of the logos, …
How could this be meaningfully automated when it appears to come down to the same problem as before?
Copy link to clipboard
Copied
I see no point in pursuing this as »the transparent space« could also include the space between numbers, between numbers and units of measurement, between the letters in or elements of the logos, …
How could this be meaningfully automated when it appears to come down to the same problem as before?
By @c.pfaffenbichler
by selecting white background and inverse selection we can select different components of a images. then by expanding accurately, bars, logos, and numerical values can be accurately separated.
I have not mentioned one point before: The font of numerical values and its unit in the images received every day is the same. The distance between the numerical values and the graph bars in the received images is the same every day and is specific to the images of that day but this distance decreases to a few millimeters by increasing the bars of the graph.
For example, this is one of the thousands of images that I received today:
In all images I received today, the font of the numerical values and the unit of the numerical values are the same. The only thing that is different in images received today is the distance between the numerical values and the graph bars. For example, if the number of bars increases to 6 bars, the distance between the graph bars and numerical values will decrease by a few millimeters.
The unit of numerical values and the font of numerical values in the received photos are the same every day and are specific to that day. The only difference between the photos received every day is the distance between the numerical values and the graph bars, because to increase the number of graph bars, the scale of the image components must be reduced.
The main problem here is that it is not possible to define this distance reduction in daily received images for photoshop action feature.
Maybe you think that I will use the selection and expansion of the images received today for the images received tomorrow. I have to say that I select and expand based on the images received every day and the fonts in the images of that day.
Maybe I should calculate the minimum distance between the graph bars and the numerical values in the received daily images, and then drag the graph bars to the right by the same calculated minimum distance to cover the logos. For this I need to find image of the chart with the highest number of bars among thousands of images! This is not possible manually and I cannot count the number of bars one by one in the thousands of images received daily. Maybe the script can quickly find image with the highest number of bars among thousands of images.
Copy link to clipboard
Copied
Another solution that can help me is to group photos based on the number of bars in the graphs. For example, all photos that have 5 bars should be placed in one folder. All photos with 6 bars should be placed in another folder and this procedure should be repeated for photos with other bars as well. With this, I can perform a different action for each group of photos.
Copy link to clipboard
Copied
I provided a Script that worked for the sample images; maybe you need to start doing your own trouble-shooting?
I guess the easiest solution would be wrapping the operation in a try-clause – that way the files that don’t yield the intended results would not be processed but at least they should not stop the process for the other files.
By @c.pfaffenbichler
A script can do this by going through the following steps:
1) select white background using magic wand tool
2) inverse selection
3) Expanding the selection to ensure that all logos are surrounded, but to the extent that the selection of numerical values and bars and logos are not merged. (We can set the expand value each time in the script)
4) If the script can recognize the order of the content of the selections, it should first clear from the first bar to the last bar. Then go to the first logo and clear from the first logo to the last logo. Do not change the rest of the selections.
Copy link to clipboard
Copied
I noticed that your script "keep only right row". we must remove two left selected row Because it is impossible to select a number automatically and the selection of a number may be divided into two categories. But the selection of logos and blocks is not two-devide
Copy link to clipboard
Copied
@abolfazl28627254vbil wrote:
I noticed that your script "keep only right row". we must remove two left selected row Because it is impossible to select a number automatically and the selection of a number may be divided into two categories. But the selection of logos and blocks is not two-devide
What is that supposed to mean?
Copy link to clipboard
Copied
Did you test your method with these several actual sample images? Did the method also work for these three photos? - I don't know how you use Work Path for remove blocks and logos. If possible, please record video from your steps and post here.
Copy link to clipboard
Copied
You can't choose based on the height because the height of some logos may be exactly the same as the height of the numbers!
Copy link to clipboard
Copied
We can check on the height and width, or area etc.
Even if 5% are missed, this has to be better than nothing. The time saved on the bulk can be used on the outliers.
Copy link to clipboard
Copied
It seems a Script could handle the sample images (at least those that I downloaded, I might have missed some), but I guess it would just be a question of time until the next image with different parameters is presented …
Copy link to clipboard
Copied
It seems a Script could handle the sample images (at least those that I downloaded, I might have missed some), but I guess it would just be a question of time until the next image with different parameters is presented …
By @c.pfaffenbichler
wow it looks great.
Please check the following with your own script:
1) chart images with numerical values with different units such as $ or KM:
2) A graph where one of its values is zero and no bars are created for zero values:
3) Does the font of numerical values affect your script? If some charts do not have logos and only have a graph bars and numerical values, will it affect your script?
The only thing the script should be able to do in all these images is to keep the numerical values with their units