Inspiring

Question

How to extract an image from PDF document and save on disk

Forum|Forum|6 years ago
May 27, 2019
3 replies
4883 views

I am exploring SDK samples, where I have found a sample code to extract image info using PDDocEnumResources API which is calling callback procedure with Cos obj, as per sample code it is easy to extract image info of XObject as mentioned in this screenshot but how to extract this Image Stream from CosObj ?

This topic has been closed for replies.

T

Test Screen Name

Legend

You must read streams linearly (from the start towards the end). You can call ASStmRead as often as you need, in a loop, each time returning the number of bytes read.

For an image stream, in any case, you need every byte of the data. You also need to use cosOpenFiltered unless you are reading DCTDecode to treat as a JPEG file.

Have you read the PDF Reference to understand the different image pixel formats (1,2,4,8,12 bits per pixel) and colour spaces you might encounter. This is not a small project. Rendering it is an alternative, but this uses difficult APIs, and in decades of using the Acrobat SDK I have avoided them.

T

Test Screen Name

Legend

This MIGHT be the right object ( pretty small chance) but the usual thing is to start with the page and navigate recursively through the XObject and other resources to find images. A PDF contains steams for countless purposes.

As I noted though, it is not a JPEG nor any image file. You need to parse the image data and convert to the required format.

Thom Parker

Community Expert

The easy way is too purchase PDF CanOpener, which you'll need anyway if you are writing plug-ins.

COS Level Editor for PDF

You can extract the raw byte data from the stream with the CosStream functions.

CosObj cosStmln = ... your cos stream object...

ASInt32 nEncodeLen = CosStreamLength(cosStmIn);

char* pBuff = (char*)ASmalloc(nEncodeLen);

ASInt32 nTotal = 0, nLen;

ASStm stm = CosStreamOpenStm(cosStmIn,cosOpenRaw);

ASStmRead(pBuff,1,nEncodeLen,stm)

ASStmClose(stm);

// save data to file

ASfree(pBuff);

This gets you the raw (encoded) data, note that the encoding is "FlateDecode" This means its basically a JPEG. So you can save the raw data with the ".jpg" postfix and it should work.

Thom Parker - Software Developer at PDFScriptingUse the Acrobat JavaScript Reference early and often

T

Test Screen Name

Legend

Actually, I think you mean that DCTDecode is basically a JPEG. All the other formats are not directly usable; you have to decode them. A PDF doesn't just contain a bunch of convenient image files ready for use.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded