Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Parsing simple HTML

New Here ,
Dec 27, 2010 Dec 27, 2010

Hi, I'm new to InDesign scripting and only wish I discovered it sooner! I absolutely love it! However, I'm working on a script which is somewhat 80% complete now, but have grinded to a halt. I need now to parse a simple html string, and format it in ID.

I'm working with Javascript on CS5

The input string would look something like:

<p>Here is a sample <b>bold</b> format, <i>italic</i> string.</p>

I'd need to support tags p,b,i,dl,dd,dt.

I'm guessing converting the p tags to \r is not an issue, but I'm unsure how to apply a style to part of a paragraph. Also, would I need to create a style for each style tag that I want to support?

I've been reading about XML support but I'm confused as to how I'd use it for this problem. Would I use it as it's not a complete HTML nor XML string.

Any tips / code would be greatly appreciated, thanks!

TOPICS
Scripting
964
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 28, 2010 Dec 28, 2010

If you don't care about nesting, you should be able to do a simple parse with a few GREPs. If you need to take nesting into account, your best bet is probably to use a state machine to parse it...

Google state machines for more info...

Harbs

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Dec 28, 2010 Dec 28, 2010

Yeah, that's what I did by way of experimenting with HTML Input/Output: write a state machine. It doesn't need to be very complicated either. If you can be (quite) sure your HTML is properly formatted -- preferably, correctly-formed XHTML is best -- all you have to do is "push" each open tag, then process the last one on "pop".

Since InDesign text works per paragraph, I think you'd best scan back for an "open P" tag (or, in the case of not-properly formatted HTML, any other block element), then insert all of the text text and apply formatting.

It's quite fun to write this -- up to a certain point, anyway.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Jan 01, 2011 Jan 01, 2011
LATEST

A while ago I wrote this script:

http://www.ixta.com/InDesign/scripts/html2charstyle.html

It does not support the mentioned block-level tags, though.

http://www.w3.org/TR/CSS2/visuren.html#block-boxes

For those you would add paragraph styles and hard returns, and don't forget to support the class attributes .

Dirk

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines