Hi,
I'm (trying to) extract data from a pdf using the code in extract_txt_table_info_with_figure_tables_rendition_from_pdf.py as a template.
It works fine - and manages to download the file to /tmp/sdk_result/ but then breaks on
result.save_as(output_file_path) with the following error:
INFO:adobe.pdfservices.operation.internal.io.file_ref_impl:Moving file at /tmp/sdk_result/dbe74a20abb911ed8aaf57ef4efd14e7.zip to target /mnt/batch/tasks/shared/LS_root/mounts/clusters/<pathinfo>/1990965948.zip --------------------------------------------------------------------------- OSError Traceback (most recent call last) Input In [19], in <cell line: 28>() 79 output_file_name = name[:-3] + "zip" 80 output_file_path = output_path + output_file_name ---> 81 result.save_as(output_file_path) 83 except (ServiceApiException, ServiceUsageException, SdkException): 84 logging.exception("Exception encountered while executing operation") File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/adobe/pdfservices/operation/internal/io/file_ref_impl.py:48, in FileRefImpl.save_as(self, destination_file_path) 46 os.mkdir(dir) 47 if not os.path.exists(abs_path): ---> 48 os.rename(self._file_path, abs_path) 49 return 50 raise SdkException("Output file {file} exists".format(file=destination_file_path)) OSError: [Errno 18] Invalid cross-device link: '/tmp/sdk_result/dbe74a20abb911ed8aaf57ef4efd14e7.zip' -> '/mnt/batch/tasks/shared/LS_root/mounts/clusters/<pathinfo>/1990965948.zip'
It looks to me as though the /tmp/sdk_result/ folder is not on the same drive as the user folders, so os.rename is throwing an error. If you search the forum there are several other people who have encountered the same problem in different environments but, other than running everything on local C: (not an option!), there appears to be no resolution.
My workaround has been to specify that the output path should be 'tmp/sdk_result/filename.zip' and then to use shutil.copyfile to copy it to where I want it to be. This is obviously not ideal though, and it's not unusual to have temporary storage away from data.
Please can you suggest a better workaround (or change the library so that it works in cloud environments)? Simply being able to specify where the temporary data are stored would be sufficient (or avoiding using os.rename in the library at all).
Thanks, James.