Hi,
Not sure if anybody uses this community forum anymore as most posts are way older. I'll try get help anyway.
For setup, I followed following steps.
1) Downloaded my credential under Adobe PDF Extract API Free Tier
2) Added separate variables for client_id, client_secret, and organization_id in my user variables
2) Created and activated a virtual environment with python 3.8
3) Installed my pdfservices_sdk using pip install and all requirements with pip install -r requirements.txt
4) Tried running following script:
import os
from pdfservices_sdk.pdfservices import PDFServices
from pdfservices_sdk.auth import Credentials
# Access environment variables
client_id = os.environ.get('adobe_pdf_extract_client_id')
client_secret = os.environ.get('adobe_pdf_extract_client_secret')
organization_id = os.environ.get('adobe_pdf_extract_organization_id')
# Initialize SDK with credentials from environment variables
credentials = Credentials.service_account_credentials_builder()\
.with_client_id(client_id)\
.with_client_secret(client_secret)\
.with_organization_id(organization_id)\
.build()
pdf_services = PDFServices(credentials=credentials)
# Directory containing PDF files
pdf_dir = "C:\\Users\\hamza\\Downloads\\CODING\\Adobe_pdf_extract\\adobe-dc-pdf-services-sdk-python\\ALL"
output_dir = "C:\\Users\\hamza\\Downloads\\CODING\\Adobe_pdf_extract\\adobe-dc-pdf-services-sdk-python\\ALL\\retrieved_text_from_pdfs"
# Ensure output directory exists
os.makedirs(output_dir, exist_ok=True)
# Process each PDF in the directory
for pdf_file in os.listdir(pdf_dir😞
if pdf_file.endswith(".pdf"😞
input_pdf_path = os.path.join(pdf_dir, pdf_file)
output_json_path = os.path.join(output_dir, pdf_file.replace(".pdf", "_text.json"))
# Call the Extract API
extract_options = pdf_services.extract_pdf_options_builder().add_element_to_extract("text").build()
operation = pdf_services.create_extract_pdf_operation(extract_options)
operation.add_input(input_pdf_path)
# Execute the operation and save the output
result = operation.execute()
with open(output_json_path, "wb") as f:
f.write(result)
print(f"Text extracted from {pdf_file} and saved to {output_json_path}")
However, I am consistantly getting ModuleNotFoundError: No module named 'pdfservices_sdk'. Tried reinstalling it changing python version to 3.11. still the same error.
Can someone tell what am I missing here?