Skip to main content
Participant
June 2, 2025
Question

Problem with turning a ocr document into pdf using

  • June 2, 2025
  • 1 reply
  • 136 views

Hi!  I am trying to test out turning an OCR document into a searchable PDF.  This works fine when I use the online tool with the same PDF.  I am trying to get it to work using the PDF Services API.  The end result is I get an empty zip file.  I think I have used the sample code verbatim with the exception of replacing my keys plus an outside method passes in the file location instead of hard coding it to

"resources/ocrInput.pdf".   It does not seem to matter if I use OCR params or not, the result is the same - an empty zip file.

 

As a side note, I have tested another method that extracts text out of a PDF to test that and it works, I just can't seem to use the OCR job.   I mention that because it is not the file locations input or output that I can see.  Below is my code.  Any suggestions on where I can check or what I might be doing wrong.  Logging in the API itself?  Other suggestions?  Many thanks in advance!

async ocrIntoPDF(filePath: string) {

let readStream;
try {
// Initial setup, create credentials instance
const credentials = new ServicePrincipalCredentials({
clientId: "xxxxx",
clientSecret: "yyyyy"
});

// Creates a PDF Services instance
const pdfServices = new PDFServices({credentials});

// Creates an asset(s) from source file(s) and upload
readStream = fs.createReadStream(filePath);
const inputAsset = await pdfServices.upload({
readStream,
mimeType: MimeType.PDF
});

// Create parameters for the job
const params = new OCRParams({
ocrLocale: OCRSupportedLocale.EN_US,
ocrType: OCRSupportedType.SEARCHABLE_IMAGE_EXACT
});

// Creates a new job instance
const job = new OCRJob({inputAsset, params});

// Submit the job and get the job result
const pollingURL = await pdfServices.submit({job});
const pdfServicesResponse = await pdfServices.getJobResult({
pollingURL,
resultType: OCRResult
});

// Get content from the resulting asset(s)
const resultAsset = pdfServicesResponse.result.asset;
const streamAsset = await pdfServices.getContent({asset: resultAsset});

// Creates a write stream and copy stream asset's content to it
const outputFilePath = this.createOutputFilePath();
console.log(`Saving asset at ${outputFilePath}`);

const writeStream = fs.createWriteStream(outputFilePath);
streamAsset.readStream.pipe(writeStream);
} catch (err) {
if (err instanceof SDKError || err instanceof ServiceUsageError || err instanceof ServiceApiError) {
console.log("Exception encountered while executing operation", err);
} else {
console.log("Exception encountered while executing operation", err);
}
} finally {
readStream?.destroy();
}


};

    1 reply

    Participant
    June 3, 2025

    I have this working now.  The problem is the code from the samples for creating the output is different from extract text and the ocr.  It appears the extract text is always downloading a zip file while the ocr is returning a straight PDF.  

    I found this by running the sample code directly and found that it worked and then investigating the differences.