Skip to main content
February 13, 2017
Question

Does iFilter look into text within 'table' in the document?

  • February 13, 2017
  • 5 replies
  • 702 views

There is a table with the pdf document, and it the iFilter is not providing the text contained with the table. Is this a known issue?

This topic has been closed for replies.

5 replies

Legend
February 15, 2017

Ok, thanks for that. It does look as if you've found a specific problem with iFilter. (Most people find problems with their text instead).  Are you using iFilter directly, to get a text stream, or are you using with an indexing engine? If it's a text stream, search to see if Mattresses or the letters in it appear elsewhere; ordering issues are not unusual.

February 15, 2017

we have iFilter installed on Windows server, and we are also using code to extract the text (TextReader)

Legend
February 15, 2017

Please don't make me analyse all that. I wanted just the single word you stated as a problem, Mattresses I think it was. Does it copy/paste correctly to another app or not?

February 15, 2017

Yes when i copy and pasted into notepad, i was able to see the word 'Mattresses'

Legend
February 14, 2017

Please complete the test. Select some problem text, copy it, and paste to another app. What do you see?

February 14, 2017

Am able to 'select' the text from the pdf and paste into notepad++, am pasting the results here:

Chewing Gum

Packaged, cartoned Class III

Chocolate

Packaged, cartoned Class III

Cloth

Cartoned and not cartoned

- Natural fiber, viscose Class III

- Synthetic5 Class IV

Cocoa Products

Packaged, cartoned Class III

Coffee

- Canned, cartoned Class I

- Packaged, cartoned Class III

Coffee Beans

Bagged Class III

Cotton

Packaged, cartoned Class III

Diapers

- Cotton, linen Class III

- Disposable with plastics and

nonwoven fabric (in cartons)

Class IV

- Disposable with plastics and

nonwoven fabric (uncartoned),

plastic wrapped

Group A plastics

Dried Foods

Packaged, cartoned Class III

Fertilizers

Bagged

- Phosphates Class I

- Nitrates Class II

Fiberglass Insulation

- Paper-backed rolls, bagged or

unbagged

Class IV

File Cabinets

Metal

- Cardboard box or shroud Class I

Fish or Fish Products

Frozen

- Nonwaxed, nonplastic packaging Class I

- Waxed-paper containers, cartoned Class II

- Boxed or barreled Class II

- Plastic trays, cartoned Class III

Canned

- Cartoned Class I

Frozen Foods

Nonwaxed, nonplastic packaging Class I

- Waxed-paper containers, cartoned Class II

- Plastic trays Class III

Table A-2-2.3 Alphabetized Listing of Commodity

Classes (Continued)

Commodity

Commodity

Class

Fruit

Fresh

- Nonplastic trays or containers Class I

- With wood spacers Class I

Furniture

Wood

- No plastic coverings or foam

plastic cushioning

Class III

- With plastic coverings Class IV

- With foam plastic cushioning Group A plastics

Grains — Packaged in Cartons

- Barley Class III

- Rice Class III

- Oats Class III

Ice Cream Class I

Leather Goods Class III

Leather Hides

Baled Class II

Light Fixtures

Nonplastic

- Cartoned Class II

Lighters

Butane

- Blister-packed, cartoned Group A plastics

- Loose in large containers

(Level 3 aerosol)

Outside of scope

Liquor

100 proof or less, 1 gal (3.8 L) or

less, cartoned

- Glass (palletized)6 Class IV

- Plastic bottles Class IV

Marble

Artificial sinks, countertops

- Cartoned, crated Class II

Margarine

- Up to 50 percent oil (in paper or

plastic containers)

Class III

- Between 50 percent and 80

percent oil (in any packaging)

Group A plastics

Matches

Packaged, cartoned

- Paper Class IV

- Wood Group A plastics

Mattresses

- Standard (box spring) Class III

- Foam (in finished form) Group A plastics

Meat, Meat Products

- Bulk Class I

- Canned, cartoned Class I

Legend
February 14, 2017

A screen shot may show a mixture of useful text, useless text, graphics and vectors. Please try the test I suggested and report the result.

February 14, 2017

I am unable to 'select' the text from both sides of the page, am able to 'select' just on one side at a time:

Fruit

Fresh

- Nonplastic trays or containers Class I

- With wood spacers Class I

Furniture

Wood

- No plastic coverings or foam

plastic cushioning

Class III

- With plastic coverings Class IV

- With foam plastic cushioning Group A plastics

Grains — Packaged in Cartons

- Barley Class III

- Rice Class III

- Oats Class III

Ice Cream Class I

Leather Goods Class III

Leather Hides

Baled Class II

Light Fixtures

Nonplastic

- Cartoned Class II

Lighters

Butane

- Blister-packed, cartoned Group A plastics

- Loose in large containers

(Level 3 aerosol)

Outside of scope

Liquor

100 proof or less, 1 gal (3.8 L) or

less, cartoned

- Glass (palletized)6 Class IV

- Plastic bottles Class IV

Marble

Artificial sinks, countertops

- Cartoned, crated Class II

Margarine

- Up to 50 percent oil (in paper or

plastic containers)

Class III

- Between 50 percent and 80

percent oil (in any packaging)

Group A plastics

Matches

Packaged, cartoned

- Paper Class IV

- Wood Group A plastics

Mattresses

- Standard (box spring) Class III

- Foam (in finished form) Group A plastics

Meat, Meat Products

- Bulk Class I

- Canned, cartoned Class I

Legend
February 14, 2017

There isn't even such a thing as a table in a PDF. It's just text, arranged in a regular way, that might have lines around it. Nothing to inform the iFilter that something is a table, page header etc. (in most cases). So your text extraction issue isn't specifically table related. Try doing a copy/paste of the table text in Acrobat Reader to see what happens.

February 14, 2017

Following is an screenshot of the pdf document and iFilter doesnot extract the text 'Mattresses'