Copy link to clipboard
Copied
In Adobe Reader X they removed the ability to do iFilter indexing from the Windows Indexing services of PDF files. To get around this you had to install Adobe Reader 9.x on 32bit systems or the iFilter 9 on 64bit systems.
Does anyone know if Adobe Reader XI follows the same stance of not exposing its iFilters to the Windows Indexing service? I've done some searching but this answer does not seem to exist yet where I can find it.
Take the XI Version of iFilter
ftp://ftp.adobe.com/pub/adobe/acrobat/win/11.x/PDFFilter64Setup.msi
Copy link to clipboard
Copied
Hi,
IFilter to Windows Indexing service are added back Reader XI. IFilters are available even with the latest updates of Reader X (were added abck in 10.1). These are 32-bit ifilters and only works on 32-bit plaforms.
Thanks
Manish
Copy link to clipboard
Copied
I know there were issues with the iFilter being exposed to the Windows Search Indexer however with 10.1 have those been corrected with 11? We've been relying on 9.x to do PDF indexing with because of the 10 fiasco.
Copy link to clipboard
Copied
Take the XI Version of iFilter
ftp://ftp.adobe.com/pub/adobe/acrobat/win/11.x/PDFFilter64Setup.msi
Copy link to clipboard
Copied
Wonderful, thank you very much!
Copy link to clipboard
Copied
Leith, have you installed this filter? I just did and it (having serious problems with PDF search yet again in Document Exchange), and it seemed to make matters worse, not better.
Kate
Copy link to clipboard
Copied
Kate Carrillo wrote:
Leith, have you installed this filter? I just did and it (having serious problems with PDF search yet again in Document Exchange), and it seemed to make matters worse, not better.
Kate
No I have not. The older process for me is still working so I've continued to use that. I was not going to upgrade the filter driver until we moved to a new system. Our Document Exchange systems are still using either the iFilter 9 for 64bit systems or Reader 9.5.3 for 32bit systems.
Copy link to clipboard
Copied
Hi Leith,
Just proceed with caution and a lot of testing. I installed the latest version and it actually made the search with Document Exchange worse, we had to roll back.
Kate
Copy link to clipboard
Copied
Hi kate
What is the configuration of Document Exchange server
Can you povide the Flavor of windows OS and version of Document Exchange server.
Thanks and regard
Abhijit
Copy link to clipboard
Copied
Hi Abhijit,
Windows Server 2008 R2
Document Exchange 6.1.1
Regards,
Kate
Copy link to clipboard
Copied
Hi Kate
PDF indexing on Document Exchange 6.1.1 is not supported by PDF iFilter 64 11.0.01
(please refer to the system requirement section http://www.adobe.com/support/downloads/detail.jsp?ftpID=5542).
Thanks and regards
Abhijit
Copy link to clipboard
Copied
What about how indexing is done and iFilters are exposed from v9 to v11 makes it unsuported?
Document Exchange is a DotNetNuke module that uses the Lucene indexing service to hook into iFilters to then aggregate data. I wrote a useful How To regarding it and PDF indexing on their website.
http://www.bring2mind.net/Support/Forums/tabid/143/aff/21/aft/8578/afv/topic/Default.aspx
Maybe a better question is this. How are we supposed to be indexing PDF files then?
Copy link to clipboard
Copied
Hi Leith
This is a known issue, which was introduced as part of IFilter sandboxing in Adobe Reader and Acrobat X.
Since Microsoft defines two types of interfaces which a standard IFilter should implement. Once is a file based interface and other is a stream based interface. The file based interface loads our IFilter in a non-sandboxed mode, which is a security issue. So we stopped supporting file based interface as part of IFilter sandboxing in X. Windows Search uses the stream based interface, and runs in a sandboxed mode, which we support.
Thanks and regard
Abhijit
Copy link to clipboard
Copied
How to use the Acrobat Reader iFilter:
1. Assign the Process to a Job (aka sandbox it):
hProc = GetCurrentProcess();
hJob = CreateJobObjectW(NULL, L"filterProc");
AssignProcessToJobObject(hJob,hProc);
2. Lookup the CLSID for the Acrobat Reader iFilter ("PDF Filter") You need to search the registry for the correct CLSID.
3. Open instance to the CLSID using IID_IFilter: CoCreateInstance(CLSID_IPDF,NULL,CLSCTX_INPROC_SERVER,IID_IFilter, (LPVOID*) &iFilter);
4. Open Interface for IPersiststream: iFilter->QueryInterface(IID_IPersistFile, (void **) &iPersistStream)
5. Open the PDF file using a stream: SHCreateStreamOnFile(szPDFFileName, STGM_READ, &iStream);
6. Load the Stream into the IPersistStream interface: iPersistStream->Load(iStream);
7. Initialize the IFilter:
dwFlags = IFILTER_FLAGS_OLE_PROPERTIES;
hr = iFilter->Init(IFILTER_INIT_CANON_SPACES |
IFILTER_INIT_SEARCH_LINKS |
IFILTER_INIT_INDEXING_ONLY |
IFILTER_INIT_APPLY_INDEX_ATTRIBUTES |
IFILTER_INIT_APPLY_OTHER_ATTRIBUTES,
0,
NULL,
&dwFlags);
8.Proceed with the standard process to get the data out (ie iFilter->GetChunk(), iFilter->GetText(), iFilter->GetValue(), etc)
Copy link to clipboard
Copied
I tried this but didnt manage to get it to work... I've removed the result checks for clarity.
HRESULT hr;
if ( FAILED( ::CoInitialize( NULL ) ) )
{
return -1;
}
HANDLE hProc = GetCurrentProcess();
HANDLE hJob = CreateJobObjectW(NULL, L"filterProc");
BOOL bAssigned = AssignProcessToJobObject(hJob,hProc);
CComQIPtr<IFilter> pFilter;
LPWSTR guidstr = L"{E8978DA6-047F-4E3D-9C78-CDBE46041603}";
GUID guid;
hr = CLSIDFromString(guidstr, (LPCLSID)&guid);
hr = CoCreateInstance(guid, NULL,CLSCTX_INPROC_SERVER,IID_IFilter, (LPVOID*) &pFilter);
IPersistStream *pPersistStream;
hr = pFilter->QueryInterface(IID_IPersistFile, (void **) &pPersistStream);
IStream *pStream;
hr = SHCreateStreamOnFile(L"c:\\SVNWORK\\moo.pdf", STGM_READ, &pStream);
hr = pPersistStream->Load(pStream);
The error is when calling Load on pPersistStream ....
Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.
Copy link to clipboard
Copied
Since this problem is before the iFilter is used, you might have more luck in a Microsoft dev forum.
Copy link to clipboard
Copied
This seems to be related to the ifilter Load(IStream) implementation. Load(IPersistFile) doensnt work and the method descibed above doesnt work. I think it would be invaluable to everyone to get a simple working example of how to properly (from Adobe's point of view) open and read text from a pdf programatically.
Doesnt have to be C++.
Copy link to clipboard
Copied
Think i have the problem...
The original instructions say....
4. Open Interface for IPersiststream: iFilter->QueryInterface(IID_IPersistFile, (void **) &iPersistStream)
However this is wrong... it needs to be
iFilter->QueryInterface(IID_IPersistStream, (void **) &iPersistStream)
Copy link to clipboard
Copied
I stand corrected.
Copy link to clipboard
Copied
Ok so update on this.
With my change this works fine on Win 7.
However the Adobe 11 iFIlter class refuses to instantiate on windows xp even if the process is assigned to a job. The only way i managed to instantiate the class on WinXP was to rename my exe to filtdump.exe (then everything works fine). Which seems to imply that the filter has filtdump.exe hardcoded in.
Nasty.
Copy link to clipboard
Copied
There is an excellent article on working with IFilters in C#: Implementing a TextReader to extract various files contents using IFilter - CodeProject
As Adobe Reader XI IFilter doesn't support IPersistFile.Load(..) anymore because of Microsoft's requirement for IFilters to work with stream data instead of file data, it's a problem for custom file search engines like those built using Lucene.net.
The main point here is that you need to use IPersistStream for Adobe PDF IFilter 11.x instead of IPersistStore.
Here's a relevant discussion on the related topic at StackOverflow:
http://stackoverflow.com/questions/7313828/using-ifilter-in-c-sharp-and-retrieving-file-from-databas...[^]
I've used these two sets of recommendations to re-write the way how IFilter is obtained for PDF files. Obviously this solution isn't the best one (one could just provide bytes to FilterLoader instead), but at least it works (and doesn't break the logic).
There is also no check if filter is Adobe's or not, but such check should be done, as other PDF IFilters might not support IPersistStream. One should also check for its version (e.g., Adobe PDF IFilter 9.x supports IPersistFile w/o any problems).
Here's the updated code:
public static IFilter LoadIFilterFromIPersistFile(string path, string extension)
{
var fileExt = System.IO.Path.GetExtension(path);
// Obtaining IFilter first
IFilter filter = LoadIFilter(extension);
if (null == filter) return null;
// Custom case for PDF
if (fileExt == ".pdf")
{
// read file first
using (var s = new FileStream(path, FileMode.Open))
{
// Copy the content to global memory
byte[] buffer = new byte[s.Length];
s.Read(buffer, 0, buffer.Length);
IntPtr nativePtr = Marshal.AllocHGlobal(buffer.Length);
Marshal.Copy(buffer, 0, nativePtr, buffer.Length);
// Create a COM stream
System.Runtime.InteropServices.ComTypes.IStream comStream;
CreateStreamOnHGlobal(nativePtr, true, out comStream);
// Load the contents to the iFilter using IPersistStream interface
var persistStream = (IPersistStream)filter; // similar to persistFile case
if (null == persistStream)
throw new Exception("IPersistStream is not implemented by the current interface");
// loading
persistStream.Load(comStream);
return InitIFilterForPdf(filter);
}
}
else
{
var persistFile = (filter as IPersistFile);
if (null == persistFile)
throw new Exception("IPersistFile is not implemented by the current interface");
persistFile.Load(path, 0);
return InitIFilter(filter);
}
}