• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Adobe Reader XI & iFilter indexing

Participant ,
Oct 22, 2012 Oct 22, 2012

Copy link to clipboard

Copied

In Adobe Reader X they removed the ability to do iFilter indexing from the Windows Indexing services of PDF files.  To get around this you had to install Adobe Reader 9.x on 32bit systems or the iFilter 9 on 64bit systems.

Does anyone know if Adobe Reader XI follows the same stance of not exposing its iFilters to the Windows Indexing service?  I've done some searching but this answer does not seem to exist yet where I can find it.

Views

35.1K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

New Here , Jan 09, 2013 Jan 09, 2013

Votes

Translate

Translate
Adobe Employee ,
Oct 23, 2012 Oct 23, 2012

Copy link to clipboard

Copied

Hi,

IFilter to Windows Indexing service are added back Reader XI. IFilters are available even with the latest updates of Reader X (were added abck in 10.1). These are 32-bit ifilters and only works on 32-bit plaforms.

Thanks

Manish

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Oct 23, 2012 Oct 23, 2012

Copy link to clipboard

Copied

I know there were issues with the iFilter being exposed to the Windows Search Indexer however with 10.1 have those been corrected with 11?  We've been relying on 9.x to do PDF indexing with because of the 10 fiasco.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 09, 2013 Jan 09, 2013

Copy link to clipboard

Copied

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jan 09, 2013 Jan 09, 2013

Copy link to clipboard

Copied

Wonderful, thank you very much!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 10, 2013 Jan 10, 2013

Copy link to clipboard

Copied

Leith, have you installed this filter? I just did and it (having serious problems with PDF search yet again in Document Exchange), and it seemed to make matters worse, not better.

Kate

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jan 10, 2013 Jan 10, 2013

Copy link to clipboard

Copied

Kate Carrillo wrote:

Leith, have you installed this filter? I just did and it (having serious problems with PDF search yet again in Document Exchange), and it seemed to make matters worse, not better.

Kate

No I have not.  The older process for me is still working so I've continued to use that.  I was not going to upgrade the filter driver until we moved to a new system.  Our Document Exchange systems are still using either the iFilter 9 for 64bit systems or Reader 9.5.3 for 32bit systems.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 14, 2013 Jan 14, 2013

Copy link to clipboard

Copied

Hi Leith,

Just proceed with caution and a lot of testing. I installed the latest version and it actually made the search with Document Exchange worse, we had to roll back.

Kate

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 14, 2013 Jan 14, 2013

Copy link to clipboard

Copied

Hi kate

What is the configuration of  Document Exchange server

Can you povide the Flavor of windows OS  and version of  Document Exchange server.

Thanks and regard

Abhijit

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 16, 2013 Jan 16, 2013

Copy link to clipboard

Copied

Hi Abhijit,

Windows Server 2008 R2

Document Exchange 6.1.1

Regards,

Kate

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 17, 2013 Jan 17, 2013

Copy link to clipboard

Copied

Hi Kate

PDF indexing on Document Exchange 6.1.1 is not supported by PDF iFilter 64 11.0.01

(please refer to the system requirement section http://www.adobe.com/support/downloads/detail.jsp?ftpID=5542).

Thanks and regards

Abhijit

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jan 17, 2013 Jan 17, 2013

Copy link to clipboard

Copied

What about how indexing is done and iFilters are exposed from v9 to v11 makes it unsuported? 

Document Exchange is a DotNetNuke module that uses the Lucene indexing service to hook into iFilters to then aggregate data.  I wrote a useful How To regarding it and PDF indexing on their website.

http://www.bring2mind.net/Support/Forums/tabid/143/aff/21/aft/8578/afv/topic/Default.aspx

Maybe a better question is this.  How are we supposed to be indexing PDF files then?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 17, 2013 Jan 17, 2013

Copy link to clipboard

Copied

Hi Leith

This is a known issue, which was introduced as part of IFilter sandboxing in Adobe Reader and Acrobat X.

Since Microsoft defines two types of interfaces which a standard IFilter should implement. Once is a file based interface and other is a stream based interface. The file based interface loads our IFilter in a non-sandboxed mode, which is a security issue. So we stopped supporting file based interface as part of IFilter sandboxing in X. Windows Search uses the stream based interface, and runs in a sandboxed mode, which we support.

Thanks and regard

Abhijit

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 01, 2013 Mar 01, 2013

Copy link to clipboard

Copied

How to use the Acrobat Reader iFilter:

1. Assign the Process to a Job (aka sandbox it):

     hProc = GetCurrentProcess();

     hJob = CreateJobObjectW(NULL, L"filterProc");

      AssignProcessToJobObject(hJob,hProc);

2. Lookup the CLSID for the Acrobat Reader iFilter ("PDF Filter") You need to search the registry for the correct CLSID.

3. Open instance to the CLSID using IID_IFilter: CoCreateInstance(CLSID_IPDF,NULL,CLSCTX_INPROC_SERVER,IID_IFilter, (LPVOID*) &iFilter);

4. Open Interface for IPersiststream: iFilter->QueryInterface(IID_IPersistFile, (void **) &iPersistStream)

5. Open the PDF file using a stream: SHCreateStreamOnFile(szPDFFileName, STGM_READ, &iStream);

6. Load the Stream into the IPersistStream interface: iPersistStream->Load(iStream);

7. Initialize the IFilter:

      dwFlags = IFILTER_FLAGS_OLE_PROPERTIES;

      hr = iFilter->Init(IFILTER_INIT_CANON_SPACES               |

                              IFILTER_INIT_SEARCH_LINKS               |

                              IFILTER_INIT_INDEXING_ONLY               |

                              IFILTER_INIT_APPLY_INDEX_ATTRIBUTES     |

                              IFILTER_INIT_APPLY_OTHER_ATTRIBUTES,

                              0,

                             NULL,

                            &dwFlags);

8.Proceed with the standard process to get the data out (ie iFilter->GetChunk(), iFilter->GetText(), iFilter->GetValue(), etc)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jul 23, 2013 Jul 23, 2013

Copy link to clipboard

Copied

I tried this but didnt manage to get it to work...  I've removed the result checks for clarity.

HRESULT hr;

          if ( FAILED( ::CoInitialize( NULL ) ) )

          {

                    return -1;

          }

          HANDLE hProc = GetCurrentProcess();

  HANDLE hJob = CreateJobObjectW(NULL, L"filterProc");

          BOOL bAssigned = AssignProcessToJobObject(hJob,hProc);

          CComQIPtr<IFilter>  pFilter;

          LPWSTR guidstr = L"{E8978DA6-047F-4E3D-9C78-CDBE46041603}";

          GUID guid;

          hr = CLSIDFromString(guidstr, (LPCLSID)&guid);

          hr = CoCreateInstance(guid, NULL,CLSCTX_INPROC_SERVER,IID_IFilter, (LPVOID*) &pFilter);

          IPersistStream *pPersistStream;

          hr = pFilter->QueryInterface(IID_IPersistFile, (void **) &pPersistStream);

 

          IStream *pStream;

          hr = SHCreateStreamOnFile(L"c:\\SVNWORK\\moo.pdf", STGM_READ, &pStream);

          hr = pPersistStream->Load(pStream);

The error is when calling Load on pPersistStream ....

Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call.  This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 23, 2013 Jul 23, 2013

Copy link to clipboard

Copied

Since this problem is before the iFilter is used, you might have more luck in a Microsoft dev forum.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jul 23, 2013 Jul 23, 2013

Copy link to clipboard

Copied

This seems to be related to the ifilter Load(IStream) implementation. Load(IPersistFile) doensnt work and the method descibed above doesnt work. I think it would be invaluable to everyone to get a simple working example of how to properly (from Adobe's point of view) open and read text from a pdf programatically.

Doesnt have to be C++.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jul 23, 2013 Jul 23, 2013

Copy link to clipboard

Copied

Think i have the problem...

The original instructions say....

4. Open Interface for IPersiststream: iFilter->QueryInterface(IID_IPersistFile, (void **) &iPersistStream)

However this is wrong... it needs to be

iFilter->QueryInterface(IID_IPersistStream, (void **) &iPersistStream)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 23, 2013 Jul 23, 2013

Copy link to clipboard

Copied

I stand corrected.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jul 24, 2013 Jul 24, 2013

Copy link to clipboard

Copied

Ok so update on this.

With my change this works fine on Win 7.

However the Adobe 11 iFIlter class refuses to instantiate on windows xp even if the process is assigned to a job. The only way i managed to instantiate the class on WinXP was to rename my exe to filtdump.exe (then everything works fine). Which seems to imply that the filter has filtdump.exe hardcoded in.

Nasty.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 04, 2014 May 04, 2014

Copy link to clipboard

Copied

LATEST

There is an excellent article on working with IFilters in C#: Implementing a TextReader to extract various files contents using IFilter - CodeProject


As Adobe Reader XI IFilter doesn't support IPersistFile.Load(..) anymore because of Microsoft's requirement for IFilters to work with stream data instead of file data, it's a problem for custom file search engines like those built using Lucene.net.


The main point here is that you need to use IPersistStream for Adobe PDF IFilter 11.x instead of IPersistStore.

Here's a relevant discussion on the related topic at StackOverflow:

http://stackoverflow.com/questions/7313828/using-ifilter-in-c-sharp-and-retrieving-file-from-databas...[^]

I've used these two sets of recommendations to re-write the way how IFilter is obtained for PDF files. Obviously this solution isn't the best one (one could just provide bytes to FilterLoader instead), but at least it works (and doesn't break the logic).

There is also no check if filter is Adobe's or not, but such check should be done, as other PDF IFilters might not support IPersistStream. One should also check for its version (e.g., Adobe PDF IFilter 9.x supports IPersistFile w/o any problems).

Here's the updated code:

public static IFilter LoadIFilterFromIPersistFile(string path, string extension)

{

  var fileExt = System.IO.Path.GetExtension(path);

   // Obtaining IFilter first

IFilter filter = LoadIFilter(extension);

  if (null == filter) return null;  


// Custom case for PDF

if (fileExt == ".pdf")

{

  // read file first

using (var s = new FileStream(path, FileMode.Open))

  {

  // Copy the content to global memory

byte[] buffer = new byte[s.Length];

  s.Read(buffer, 0, buffer.Length);

  IntPtr nativePtr = Marshal.AllocHGlobal(buffer.Length);

  Marshal.Copy(buffer, 0, nativePtr, buffer.Length);

   // Create a COM stream

System.Runtime.InteropServices.ComTypes.IStream comStream;

  CreateStreamOnHGlobal(nativePtr, true, out comStream);

   // Load the contents to the iFilter using IPersistStream interface

var persistStream = (IPersistStream)filter; // similar to persistFile case

if (null == persistStream)

  throw new Exception("IPersistStream is not implemented by the current interface");

  // loading

persistStream.Load(comStream);  

  return InitIFilterForPdf(filter);

      }

   }  

  else

{

  var persistFile = (filter as IPersistFile);

  if (null == persistFile)

  throw new Exception("IPersistFile is not implemented by the current interface");

  persistFile.Load(path, 0);

return InitIFilter(filter);

}

}

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines