Skip to main content
_Moveon21
Inspiring
May 27, 2019
Question

How to extract an image from PDF document and save on disk

  • May 27, 2019
  • 3 replies
  • 4883 views

I am exploring SDK samples, where I have found a sample code to extract image info using PDDocEnumResources API which is calling callback procedure with Cos obj, as per sample code it is easy to extract image info of XObject as mentioned in this screenshot  but how to  extract this Image Stream from CosObj ?

This topic has been closed for replies.

3 replies

Legend
May 29, 2019

You must read streams linearly (from the start towards the end). You can call ASStmRead as often as you need, in a loop, each time returning the number of bytes read.

For an image stream, in any case, you need every byte of the data. You also need to use cosOpenFiltered unless you are reading DCTDecode to treat as a JPEG file.

Have you read the PDF Reference to understand the different image pixel formats (1,2,4,8,12 bits per pixel) and colour spaces you might encounter. This is not a small project. Rendering it is an alternative, but this uses difficult APIs, and in decades of using the Acrobat SDK I have avoided them.

Legend
May 28, 2019

This MIGHT be the right object ( pretty small chance) but the usual thing is to start with the page and navigate recursively through the XObject and other resources to find images. A PDF contains steams for countless purposes.

As I noted though, it is not a JPEG nor any image file. You need to parse the image data and convert to the required format.

Thom Parker
Community Expert
Community Expert
May 27, 2019

The easy way is too purchase PDF CanOpener, which you'll need anyway if you are writing plug-ins.

COS Level Editor for PDF

You can extract the raw byte data from the stream with the CosStream functions.

CosObj cosStmln = ... your cos stream object...

ASInt32 nEncodeLen = CosStreamLength(cosStmIn);

char* pBuff = (char*)ASmalloc(nEncodeLen);

ASInt32 nTotal = 0, nLen;

ASStm stm = CosStreamOpenStm(cosStmIn,cosOpenRaw);

ASStmRead(pBuff,1,nEncodeLen,stm)

ASStmClose(stm);

// save data to file

ASfree(pBuff);

This gets you the raw (encoded) data, note that the encoding is "FlateDecode" This means its basically a JPEG. So you can save the raw data with the ".jpg" postfix and it should work.

Thom Parker - Software Developer at PDFScriptingUse the Acrobat JavaScript Reference early and often
Legend
May 27, 2019

Actually, I think you mean that DCTDecode is basically a JPEG. All the other formats are not directly usable; you have to decode them. A PDF doesn't just contain a bunch of convenient image files ready for use.