How to extract an image from PDF document and save on disk

Participant ,
May 27, 2019

Copy link to clipboard

Copied

I am exploring SDK samples, where I have found a sample code to extract image info using PDDocEnumResources API which is calling callback procedure with Cos obj, as per sample code it is easy to extract image info of XObject as mentioned in this screenshot  but how to  extract this Image Stream from CosObj ?

TOPICS
Acrobat SDK and JavaScript

Views

1.0K

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

How to extract an image from PDF document and save on disk

Participant ,
May 27, 2019

Copy link to clipboard

Copied

I am exploring SDK samples, where I have found a sample code to extract image info using PDDocEnumResources API which is calling callback procedure with Cos obj, as per sample code it is easy to extract image info of XObject as mentioned in this screenshot  but how to  extract this Image Stream from CosObj ?

TOPICS
Acrobat SDK and JavaScript

Views

1.0K

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
May 27, 2019 0
Adobe Community Professional ,
May 27, 2019

Copy link to clipboard

Copied

The easy way is too purchase PDF CanOpener, which you'll need anyway if you are writing plug-ins.

COS Level Editor for PDF

You can extract the raw byte data from the stream with the CosStream functions.

CosObj cosStmln = ... your cos stream object...

ASInt32 nEncodeLen = CosStreamLength(cosStmIn);

char* pBuff = (char*)ASmalloc(nEncodeLen);

ASInt32 nTotal = 0, nLen;

ASStm stm = CosStreamOpenStm(cosStmIn,cosOpenRaw);

ASStmRead(pBuff,1,nEncodeLen,stm)

ASStmClose(stm);

// save data to file

ASfree(pBuff);

This gets you the raw (encoded) data, note that the encoding is "FlateDecode" This means its basically a JPEG. So you can save the raw data with the ".jpg" postfix and it should work.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
May 27, 2019 0
Most Valuable Participant ,
May 27, 2019

Copy link to clipboard

Copied

Actually, I think you mean that DCTDecode is basically a JPEG. All the other formats are not directly usable; you have to decode them. A PDF doesn't just contain a bunch of convenient image files ready for use.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
May 27, 2019 0
Participant ,
May 28, 2019

Copy link to clipboard

Copied

Thanks for your reply,

I have Used the code snippet provided by you, I am able to save the binary raw stream in the file on disk with the .jpg extension but not able to open the file in any of the image previewers it is kind of corrupt but when I extract the same file using the extract image custom tool, the size is completely different  75 bytes vs 24 KB it seems I am still doing something wrong, I am using following code for

As you can see I am using the PDDocEnumResources API for all XObject Enumeration in PDF

PDDocEnumResources (pdDoc, 0, 0, ASAtomFromString("XObject"), cosEnumProcCB, pdDoc);

I think I am using the wrong stream I should have fetched Image stream which should be available under CosObj as key-value pair like other attributes?

static ACCB1 ASBool ACCB2 CosEnumProc (CosObj obj, CosObj value, void* clientData)

{

if ((CosObjGetType(obj) != CosStream) || (!CosDictKnownKeyString(CosStreamDict(obj), "Subtype")))

   E_RETURN(true);


ASInt32 nEncodeLen = CosStreamLength(obj);

char * pBuff = (char * ) ASmalloc(nEncodeLen);

ASInt32 nTotal = 0, nLen;

ASStm stm = CosStreamOpenStm(obj, cosOpenRaw);

ASStmRead(pBuff, 1, nEncodeLen, stm);

ASStmClose(stm);


// save data to file

ASFile asFile;

char str[500] = "/Users/moveon/DATA/Shared/t/test.jpg";

ASPathName asPathName = ASFileSysCreatePathFromPOSIXPath(NULL, str);

ASInt32 iRet = ASFileSysOpenFile64(ASGetDefaultFileSys(), asPathName,

ASFILE_CREATE | ASFILE_WRITE, & asFile);


if (iRet != 0) {

   ASFileSysReleasePath(NULL, asPathName);

}


if (ASFileWrite(asFile, pBuff, strlen(pBuff)) != strlen(pBuff)) {

   ASFileSysReleasePath(NULL, asPathName);

   ASFileClose(asFile);

   ASRaise(PDDocErrorAlways(pdErrUnableToWrite));

}

ASfree(pBuff);

}

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
May 28, 2019 0
Adobe Community Professional ,
May 28, 2019

Copy link to clipboard

Copied

So Mr Test is correct It's DCTDecode that's JPEG encoding FlateDecode is different. Which makes the issue very different.

There are no direct conversion functions for images in the Acrobat SDK. And with color space variations it's not easy to even extract the pixel data. Extracting the stream data will not help you. What you'll need to do is render the stream in a local graphics context, then you can save it as a bitmap. using the regular system functions.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
May 28, 2019 0
Participant ,
May 29, 2019

Copy link to clipboard

Copied

Thanks, Thom Parker & Test screen for your reply

can you provide some code snippet or reference link where I can find, how to render the stream in a local graphics context and save it as a bitmap?

in the code snippet that I have Shared above, I have found that

ASStmRead(pBuff, 1, nEncodeLen, stm);

it is reading only 75 bytes from the stream, is there any way to seek in a stream ? because I have found that there is one method

CosStreamPos: It Gets the byte offset of the start of a Cos stream's data in the PDF file (which is the byte offset of the beginning of the line following the stream token). Use this method to obtain the file location of any private data in a stream that you need to read directly rather than letting it pass through the normal Cos mechanisms. For example, this could apply to a QuickTime video embedded in a PDF file.

Does it seem like I need to seek first at this offset before reading the actual raw data? but how to seek in the stream?

Thanks in advance

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
May 29, 2019 0
Most Valuable Participant ,
May 28, 2019

Copy link to clipboard

Copied

This MIGHT be the right object ( pretty small chance) but the usual thing is to start with the page and navigate recursively through the XObject and other resources to find images. A PDF contains steams for countless purposes.

As I noted though, it is not a JPEG nor any image file. You need to parse the image data and convert to the required format.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
May 28, 2019 0
Most Valuable Participant ,
May 29, 2019

Copy link to clipboard

Copied

You must read streams linearly (from the start towards the end). You can call ASStmRead as often as you need, in a loop, each time returning the number of bytes read.

For an image stream, in any case, you need every byte of the data. You also need to use cosOpenFiltered unless you are reading DCTDecode to treat as a JPEG file.

Have you read the PDF Reference to understand the different image pixel formats (1,2,4,8,12 bits per pixel) and colour spaces you might encounter. This is not a small project. Rendering it is an alternative, but this uses difficult APIs, and in decades of using the Acrobat SDK I have avoided them.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
May 29, 2019 0