Chinese OCR failed - Node.js

Report · Dec 21, 2020

I tried to use OCR of PDF Tools API to identify a chinese pdf, but something went wrong.

I modified OCRLocale to 'ZH_CN' according to document 'OCR with explicit language', like this:

const options = new PDFToolsSdk.OCR.options.OCROptions.Builder()
        .withOcrType(PDFToolsSdk.OCR.options.OCRSupportedType.SEARCHABLE_IMAGE_EXACT)
        .withOcrLang(PDFToolsSdk.OCR.options.OCRSupportedLocale.ZH_CN)
        .build();

I run the code but an error occurs, the error info:

2020-12-22T10:40:11.252:[DEBUG]: Resolved request uri : https://senseicore-ue1.adobe.io/services/v2/status/owFg2AYgawPrJCHU54YF7e4lWUj2teJVException encountered while executing operation ServiceApiError: vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
    at E:\project\test\adobe-dc-pdf-tools-sdk-node-samples\node_modules\@adobe\documentservices-pdftools-node-sdk\src\internal\api\cpf-api.js:170:20
    at IncomingForm.<anonymous> (E:\project\test\adobe-dc-pdf-tools-sdk-node-samples\node_modules\formidable\lib\incoming_form.js:107:9)   
    at IncomingForm.emit (events.js:310:20)
    at IncomingForm._maybeEnd (E:\project\test\adobe-dc-pdf-tools-sdk-node-samples\node_modules\formidable\lib\incoming_form.js:557:8)     
    at JSONParser.parser.onEnd (E:\project\test\adobe-dc-pdf-tools-sdk-node-samples\node_modules\formidable\lib\incoming_form.js:532:10)   
    at JSONParser.end (E:\project\test\adobe-dc-pdf-tools-sdk-node-samples\node_modules\formidable\lib\json_parser.js:29:8)
    at IncomingMessage.<anonymous> (E:\project\test\adobe-dc-pdf-tools-sdk-node-samples\node_modules\formidable\lib\incoming_form.js:132:30)
    at IncomingMessage.emit (events.js:322:22)
    at endReadableNT (_stream_readable.js:1187:12)
    at processTicksAndRejections (internal/process/task_queues.js:84:21) {
  requestTrackingId: 'owFg2AYgawPrJCHU54YF7e4lWUj2teJV',
  statusCode: 500
}

The same problem also appears on the OCRSupportedLocale.ZH_HK.

Please help to solve this problem, thank you.

Report · Dec 22, 2020

Thanks for sharing the details, it is a known issue and it is due to OCR language mapping in SDK. We will update here once the issue is fixed, appreciate your patience in the meantime !!!

Chinese OCR failed - Node.js

Photos