Skip to main content
yevheniiy81221888
Known Participant
November 12, 2019
Answered

Acrobat/Reader CosStringValue returns trash when file name consist of russian letters

  • November 12, 2019
  • 2 replies
  • 1511 views

Hello, I try to retrieve attachment name but in case when attachment name contains russian letters adobe functions returns trash.

 

Usage:

 

 

ACCB1 ASBool ACCB2 AdobePluginHelper::GetEmbeddedFiles(CosObj obj, CosObj value, void *clientData)
{
    auto adobe = CAdobePlugin::GetAdobeMethods();
    auto attachedFiles = (vector<string>*)clientData;

    PDFileAttachment fileAttachment = adobe->PdFileAttachmentFromCosObj(value);

    //Grab the file's name using the cos object dictionary and the File Specifcation String key. 
    ASTCount len = 0;
    std::string sFileName(adobe->CosStringValueFromCosObject(adobe->CosDictObjGet(value, adobe->AtomFromString("F")), &len));

    if (!sFileName.empty())
    {
        attachedFiles->push_back(sFileName);
    }

    return true;
}

void AdobePluginHelper::GetAttachedFiles(PDDoc document, vector<string>& attachedFiles)
{
    auto adobe = CAdobePlugin::GetAdobeMethods();

    PDNameTree nameTree = adobe->PdDocGetNameTree(document, adobe->AtomFromString("EmbeddedFiles"));
    
    if (adobe->PdNameTreeIsValid(nameTree))
    {
        //Apply the enum function to the nametree so it can iterate through, extracting the attachments.
        adobe->PdNameTreeEnum(nameTree, &GetEmbeddedFiles, &attachedFiles);
    }
}
char* Implementation::Adobe::CosStringValueFromCosObject(CosObj obj, ASTCount* nBytes)
{
    return CosStringValue(obj, nBytes);
}
CosObj Implementation::Adobe::CosDictObjGet(CosObj dict, ASAtom key)
{
    return CosDictGet(dict, key);
}

 

 

Original:

 

In other cases it works fine.

This topic has been closed for replies.
Correct answer Karl Heinz Kremer

Why are you not using the PDFileAttachment.PDFileAttachmentGetFileName() method? It returns an ASText object, which you can then process using the correct encoding. 

2 replies

Legend
November 12, 2019

Also: a Cos strong will contain text in PDFDocEncoding, and that does not have any Cyrillic in its character table, so it would be impossible to see Cyrillic. Likely it is escaped UCS-2 Unicode, as described under “PDF strings” in the PDF Reference. But I agree, where there is an ASText API this will be easier. 

Karl Heinz  Kremer
Community Expert
Karl Heinz KremerCommunity ExpertCorrect answer
Community Expert
November 12, 2019

Why are you not using the PDFileAttachment.PDFileAttachmentGetFileName() method? It returns an ASText object, which you can then process using the correct encoding. 

yevheniiy81221888
Known Participant
November 12, 2019

Thx, don't know about that function

Karl Heinz  Kremer
Community Expert
Community Expert
November 12, 2019

Take a look at the sample code that comes with the SDK to see how to convert from an ASText object - using one of the encodings that Acrobat supports - to a string that you can then acutally use. The Acrobat plug-in API is pretty complex, one reason for that is that there are several abstraction levels in the SDK. You don't have to work with the COS level in most cases, because the SDK actually provides a higher level interface (e.g. the PDFileAttachment object), which is usually much easier to use. Good luck!