Skip to main content
Participant
January 18, 2024
Question

Very long exexution time - PDF Extract API

  • January 18, 2024
  • 1 reply
  • 1347 views

Hi All

 

I am new to the PDF Extract API and today I have set up mye first php script to convert pdf files. The API returns neccesarry assetID, documentURI and jobID without any errors in any of the cURL requests. I get all 200 and 201 status-codes, which I'm supposed to. However, the json result/file is never returned by the script. I have created a loop/sleep functionality to monitor the status of the project. The status of the extract-job is written to a .log file like below. The job has been running for two hours now and I guess that's not normal. Have anyone had the same experience as described? If anyone could point me in the right direction it would be great.

Here is the code used to check the status of the job:

while(!$jobCompleted) {
    // Step 4: Poll the job status
    $ch4 = curl_init($jobStatusUrl);
    curl_setopt($ch4, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch4, CURLOPT_HTTPHEADER, array(
        'Authorization: Bearer ' . $accessToken,
        'x-api-key: ' . $apiKey
    ));
    
    $response4 = curl_exec($ch4);
    $info4 = curl_getinfo($ch4);
    
    $logMessage = date('Y-m-d H:i:s') . " - Checking job status... ";  // Prepare log message
    
    if(curl_errno($ch4)) {
        $logMessage .= 'Curl error (Step 4): ' . curl_error($ch4) . "\n";
        file_put_contents($logFile, $logMessage, FILE_APPEND);  // Log the message
        break;
    } else {
        // Check HTTP status code
        if ($info4['http_code'] == 200) {
            // Parse the response
            $responseData4 = json_decode($response4, true);
            
            // Check if job is completed
            if (isset($responseData4['status']) && $responseData4['status'] == 'SUCCEEDED') {
                $jobCompleted = true;
                $downloadUrl = $responseData4['output']['href'];  // The API should provide the download URL in the response when the job is succeeded.
                $logMessage .= "Job completed. Download URL: " . $downloadUrl . "\n";
                file_put_contents($logFile, $logMessage, FILE_APPEND);  // Log the message
                break;
            } else {
                $logMessage .= "Job is still processing...\n";
                file_put_contents($logFile, $logMessage, FILE_APPEND);  // Log the message
            }
        } else {
            $logMessage .= "Failed to get the job status. HTTP Code: " . $info4['http_code'] . "\n";
            file_put_contents($logFile, $logMessage, FILE_APPEND);  // Log the message
            break;
        }
    }
    
    curl_close($ch4);
    
    // If job not completed, wait for a while before retrying
    if (!$jobCompleted) {
        sleep($retryAfterSeconds);
    }
}
This topic has been closed for replies.

1 reply

Raymond Camden
Community Manager
Community Manager
January 18, 2024

It should not take more than 5 or seconds. When you look at the result of the job call, what do you see? Just share an example.

Participant
January 18, 2024

Thank you for getting back to me on this. Below you'll find data from my log file showing the curl response. I have replaced any key and tokens with the value <dummy-valu> in the logs.

 

2024-01-18 14:31:54 - Step 1 Response: HTTP Code: 200
2024-01-18 14:31:54 - Step 1 Response Body: {"uploadUri":"https://dcplatformstorageservice-prod-us-east-1.s3-accelerate.amazonaws.com/<dummy-value>%40techacct.adobe.com/<dummy-value>?X-Amz-Security-Token=<dummy-value>&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240118T143154Z&X-Amz-SignedHeaders=content-type%3Bhost&X-Amz-Expires=3600&X-Amz-Credential=ASIAWD2N7EVPKNW3NNPO%2F20240118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=<dummy-value>","assetID":"urn:<dummy-value>"}
2024-01-18 14:31:54 - Step 2 Response: HTTP Code: 200
2024-01-18 14:31:54 - Step 2 Response Info: Array
(
[url] => https://dcplatformstorageservice-prod-us-east-1.s3-accelerate.amazonaws.com/<dummy-value>
[content_type] =>
[http_code] => 200
[header_size] => 898
[request_size] => 1028
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.52912
[namelookup_time] => 0.013479
[connect_time] => 0.022402
[pretransfer_time] => 0.039754
[size_upload] => 337982
[size_download] => 0
[speed_download] => 0
[speed_upload] => 638907
[download_content_length] => 0
[upload_content_length] => 337982
[starttransfer_time] => 0.096888
[redirect_time] => 0
[redirect_url] =>
[primary_ip] => 108.157.213.183
[certinfo] => Array
(
)

[primary_port] => 443
[local_ip] => 10.7.71.100
[local_port] => 39656
)

2024-01-18 14:31:55 - Step 3 Response: HTTP Code: 201
2024-01-18 14:31:55 - Step 3 Response Info: Array
(
[url] => https://pdf-services.adobe.io/operation/extractpdf
[content_type] =>
[http_code] => 201
[header_size] => 611
[request_size] => 1417
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.776256
[namelookup_time] => 1.7E-5
[connect_time] => 0.093126
[pretransfer_time] => 0.192063
[size_upload] => 219
[size_download] => 0
[speed_download] => 0
[speed_upload] => 282
[download_content_length] => -1
[upload_content_length] => 219
[starttransfer_time] => 0.77618
[redirect_time] => 0
[redirect_url] =>
[primary_ip] => 44.198.86.118
[certinfo] => Array
(
)

[primary_port] => 443
[local_ip] => 10.7.71.100
[local_port] => 53172
)

2024-01-18 14:31:55 - Step 3 Response Headers: Array
(
[HTTP/1.1 201 Created] =>
[Server] => openresty
[Date] => Thu, 18 Jan 2024 14:31:55 GMT
[Transfer-Encoding] => chunked
[Connection] => keep-alive
[x-request-id] => <dummy-value>
[location] => https://pdf-services-ue1.adobe.io/operation/extractpdf/<dummy-value>/status
[retry-after] => 1
[Access-Control-Allow-Origin] => *
[Access-Control-Allow-Credentials] => true
[Access-Control-Expose-Headers] => *
[Access-Control-Max-Age] => 60
[Access-Control-Allow-Methods] => GET, POST, PUT, DELETE, OPTIONS
[Access-Control-Allow-Headers] => Authorization,Content-Type,X-Api-Key,User-Agent,If-Modified-Since,x-api-app-info
)

Raymond Camden
Community Manager
Community Manager
January 18, 2024

When you hit the Job url, you should be checking the JSON body result. Can you show _that_?