Adobe Reader XI & iFilter indexing

Report · Oct 22, 2012

In Adobe Reader X they removed the ability to do iFilter indexing from the Windows Indexing services of PDF files. To get around this you had to install Adobe Reader 9.x on 32bit systems or the iFilter 9 on 64bit systems.

Does anyone know if Adobe Reader XI follows the same stance of not exposing its iFilters to the Windows Indexing service? I've done some searching but this answer does not seem to exist yet where I can find it.

Report · Oct 23, 2012

Hi,

IFilter to Windows Indexing service are added back Reader XI. IFilters are available even with the latest updates of Reader X (were added abck in 10.1). These are 32-bit ifilters and only works on 32-bit plaforms.

Thanks

Manish

Report · Oct 23, 2012

I know there were issues with the iFilter being exposed to the Windows Search Indexer however with 10.1 have those been corrected with 11? We've been relying on 9.x to do PDF indexing with because of the 10 fiasco.

Report · Jan 09, 2013

Take the XI Version of iFilter

ftp://ftp.adobe.com/pub/adobe/acrobat/win/11.x/PDFFilter64Setup.msi

Report · Jan 09, 2013

Wonderful, thank you very much!

Report · Jan 10, 2013

Leith, have you installed this filter? I just did and it (having serious problems with PDF search yet again in Document Exchange), and it seemed to make matters worse, not better.

Kate

Report · Jan 10, 2013

Kate Carrillo wrote:
Leith, have you installed this filter? I just did and it (having serious problems with PDF search yet again in Document Exchange), and it seemed to make matters worse, not better.
Kate

No I have not. The older process for me is still working so I've continued to use that. I was not going to upgrade the filter driver until we moved to a new system. Our Document Exchange systems are still using either the iFilter 9 for 64bit systems or Reader 9.5.3 for 32bit systems.

Report · Jan 14, 2013

Hi Leith,

Just proceed with caution and a lot of testing. I installed the latest version and it actually made the search with Document Exchange worse, we had to roll back.

Kate

Report · Jan 14, 2013

Hi kate

What is the configuration of Document Exchange server

Can you povide the Flavor of windows OS and version of Document Exchange server.

Thanks and regard

Abhijit

Report · Jan 16, 2013

Hi Abhijit,

Windows Server 2008 R2

Document Exchange 6.1.1

Regards,

Kate

Report · Jan 17, 2013

Hi Kate

PDF indexing on Document Exchange 6.1.1 is not supported by PDF iFilter 64 11.0.01

(please refer to the system requirement section http://www.adobe.com/support/downloads/detail.jsp?ftpID=5542).

Thanks and regards

Abhijit

Report · Jan 17, 2013

What about how indexing is done and iFilters are exposed from v9 to v11 makes it unsuported?

Document Exchange is a DotNetNuke module that uses the Lucene indexing service to hook into iFilters to then aggregate data. I wrote a useful How To regarding it and PDF indexing on their website.

http://www.bring2mind.net/Support/Forums/tabid/143/aff/21/aft/8578/afv/topic/Default.aspx

Maybe a better question is this. How are we supposed to be indexing PDF files then?

Report · Jan 17, 2013

Hi Leith

This is a known issue, which was introduced as part of IFilter sandboxing in Adobe Reader and Acrobat X.

Since Microsoft defines two types of interfaces which a standard IFilter should implement. Once is a file based interface and other is a stream based interface. The file based interface loads our IFilter in a non-sandboxed mode, which is a security issue. So we stopped supporting file based interface as part of IFilter sandboxing in X. Windows Search uses the stream based interface, and runs in a sandboxed mode, which we support.

Thanks and regard

Abhijit

Report · Mar 01, 2013

How to use the Acrobat Reader iFilter:

1. Assign the Process to a Job (aka sandbox it):

hProc = GetCurrentProcess();

hJob = CreateJobObjectW(NULL, L"filterProc");

AssignProcessToJobObject(hJob,hProc);

2. Lookup the CLSID for the Acrobat Reader iFilter ("PDF Filter") You need to search the registry for the correct CLSID.

3. Open instance to the CLSID using IID_IFilter: CoCreateInstance(CLSID_IPDF,NULL,CLSCTX_INPROC_SERVER,IID_IFilter, (LPVOID*) &iFilter);

4. Open Interface for IPersiststream: iFilter->QueryInterface(IID_IPersistFile, (void **) &iPersistStream)

5. Open the PDF file using a stream: SHCreateStreamOnFile(szPDFFileName, STGM_READ, &iStream);

6. Load the Stream into the IPersistStream interface: iPersistStream->Load(iStream);

7. Initialize the IFilter:

dwFlags = IFILTER_FLAGS_OLE_PROPERTIES;

hr = iFilter->Init(IFILTER_INIT_CANON_SPACES |

IFILTER_INIT_SEARCH_LINKS |

IFILTER_INIT_INDEXING_ONLY |

IFILTER_INIT_APPLY_INDEX_ATTRIBUTES |

IFILTER_INIT_APPLY_OTHER_ATTRIBUTES,

0,

NULL,

&dwFlags);

8.Proceed with the standard process to get the data out (ie iFilter->GetChunk(), iFilter->GetText(), iFilter->GetValue(), etc)

Report · Jul 23, 2013

I tried this but didnt manage to get it to work... I've removed the result checks for clarity.

HRESULT hr;

if ( FAILED( ::CoInitialize( NULL ) ) )

{

return -1;

}

HANDLE hProc = GetCurrentProcess();

HANDLE hJob = CreateJobObjectW(NULL, L"filterProc");

BOOL bAssigned = AssignProcessToJobObject(hJob,hProc);

CComQIPtr<IFilter> pFilter;

LPWSTR guidstr = L"{E8978DA6-047F-4E3D-9C78-CDBE46041603}";

GUID guid;

hr = CLSIDFromString(guidstr, (LPCLSID)&guid);

hr = CoCreateInstance(guid, NULL,CLSCTX_INPROC_SERVER,IID_IFilter, (LPVOID*) &pFilter);

IPersistStream *pPersistStream;

hr = pFilter->QueryInterface(IID_IPersistFile, (void **) &pPersistStream);

IStream *pStream;

hr = SHCreateStreamOnFile(L"c:\\SVNWORK\\moo.pdf", STGM_READ, &pStream);

hr = pPersistStream->Load(pStream);

The error is when calling Load on pPersistStream ....

Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.

Report · Jul 23, 2013

Since this problem is before the iFilter is used, you might have more luck in a Microsoft dev forum.

Report · Jul 23, 2013

This seems to be related to the ifilter Load(IStream) implementation. Load(IPersistFile) doensnt work and the method descibed above doesnt work. I think it would be invaluable to everyone to get a simple working example of how to properly (from Adobe's point of view) open and read text from a pdf programatically.

Doesnt have to be C++.

Report · Jul 23, 2013

Think i have the problem...

The original instructions say....

4. Open Interface for IPersiststream: iFilter->QueryInterface(IID_IPersistFile, (void **) &iPersistStream)

However this is wrong... it needs to be

iFilter->QueryInterface(IID_IPersistStream, (void **) &iPersistStream)

Report · Jul 23, 2013

I stand corrected.

Report · Jul 24, 2013

Ok so update on this.

With my change this works fine on Win 7.

However the Adobe 11 iFIlter class refuses to instantiate on windows xp even if the process is assigned to a job. The only way i managed to instantiate the class on WinXP was to rename my exe to filtdump.exe (then everything works fine). Which seems to imply that the filter has filtdump.exe hardcoded in.

Nasty.

Report · May 04, 2014

There is an excellent article on working with IFilters in C#: Implementing a TextReader to extract various files contents using IFilter - CodeProject

As Adobe Reader XI IFilter doesn't support IPersistFile.Load(..) anymore because of Microsoft's requirement for IFilters to work with stream data instead of file data, it's a problem for custom file search engines like those built using Lucene.net.

The main point here is that you need to use IPersistStream for Adobe PDF IFilter 11.x instead of IPersistStore.

Here's a relevant discussion on the related topic at StackOverflow:

http://stackoverflow.com/questions/7313828/using-ifilter-in-c-sharp-and-retrieving-file-from-databas...[^]

I've used these two sets of recommendations to re-write the way how IFilter is obtained for PDF files. Obviously this solution isn't the best one (one could just provide bytes to FilterLoader instead), but at least it works (and doesn't break the logic).

There is also no check if filter is Adobe's or not, but such check should be done, as other PDF IFilters might not support IPersistStream. One should also check for its version (e.g., Adobe PDF IFilter 9.x supports IPersistFile w/o any problems).

Here's the updated code:

public static IFilter LoadIFilterFromIPersistFile(string path, string extension) 
{
  var fileExt = System.IO.Path.GetExtension(path);
   // Obtaining IFilter first
 IFilter filter = LoadIFilter(extension); 
  if (null == filter) return null;   


 // Custom case for PDF  
 if (fileExt == ".pdf") 
 {
  // read file first  
 using (var s = new FileStream(path, FileMode.Open)) 
  {
  // Copy the content to global memory  
 byte[] buffer = new byte[s.Length]; 
  s.Read(buffer, 0, buffer.Length); 
  IntPtr nativePtr = Marshal.AllocHGlobal(buffer.Length); 
  Marshal.Copy(buffer, 0, nativePtr, buffer.Length);
   // Create a COM stream
  System.Runtime.InteropServices.ComTypes.IStream comStream; 
  CreateStreamOnHGlobal(nativePtr, true, out comStream);
   // Load the contents to the iFilter using IPersistStream interface  
 var persistStream = (IPersistStream)filter; // similar to persistFile case  
 if (null == persistStream) 
  throw new Exception("IPersistStream is not implemented by the current interface"); 
  // loading  
 persistStream.Load(comStream);   
  return InitIFilterForPdf(filter); 
      }
   }   
  else
  {
  var persistFile = (filter as IPersistFile); 
  if (null == persistFile) 
  throw new Exception("IPersistFile is not implemented by the current interface"); 
  persistFile.Load(path, 0); 

 return InitIFilter(filter); 
 } 
}

Adobe Community

Adobe Reader XI & iFilter indexing

1 Correct answer