Skip to main content
Participant
February 5, 2009
Question

How to extract text and image information from postscript file

  • February 5, 2009
  • 5 replies
  • 5067 views
I want to write a programe,and extract text and image information from postscript file using Java.Is it possible? How to extract ?

Thank!
This topic has been closed for replies.

5 replies

_gplqy98_Author
Participant
February 7, 2009
To Dov: Thank you very much, I will read the pdf file detail!Thank you!
_gplqy98_Author
Participant
February 6, 2009
To Helge:Thank you very much!I also want to know ghostscript how to do these things, are there any files to provide the ghostscript's internal method and principle?
_gplqy98_Author
Participant
February 6, 2009
To Dov:Thank you very much!This is my first to deal with postscript file,I still have some question. At first,I open the postscript file as txt file,so, I intend to scan the txt file to extract text and graphics information according with the rules of postscript language, but I'm not sure that it can work.You say that postscript streams provide procedures with these function, could you talk about it detail? And is my thought about scanning the txt file according with ps's rule right?
Dov Isaacs
Legend
February 6, 2009
First of all, PostScript is not a "text" file. It can and often does contain binary data. Since PostScript streams often contain nested procedures, unless you process the procedure definitions and can "execute" them, you cannot simply "scan" a file to get what you want. No, I can't talk about this in detail since it is quite complex. But Adobe does have the PostScript Language Reference Manual on-line for download at . Look that over and you will have a fairly healthy respect as to the task involved.

- Dov
- Dov Isaacs, former Adobe Principal Scientist (April 30, 1990 - May 30, 2021)
Dov Isaacs
Legend
February 5, 2009
PostScript itself is a programming language. Although there are operators to express text, vector graphics, and raster images, most PostScript streams provide procedures with these functions embedded and as such, you cannot actually determine such data by simply scanning the PostScript file. You need to fully interpret the PostScript file or add procedures to such a PostScript file to isolate such text and graphics data when the PostScript program is run.

- Dov
- Dov Isaacs, former Adobe Principal Scientist (April 30, 1990 - May 30, 2021)
Participating Frequently
February 5, 2009
I'd rather use a PostScript interpreter to do this. Ghostscript is a good choice, and look into the pstoascii script that comes with ghostscript how to do things like that.

Helge