1. Overview

Pdx takes the basic syntax of Perl's Pod, adds much of the functionality of Sdf, and implements it in a maintainable set of python modules and objects.

The goal is a simple markup language which:

Is usable for embedded pod markup of code
Is usable as a standalone tool for automated document generation.
Can be extended and customized by the user with no knowledge of python.
Can be extended for new drivers (e.g., XML and groff) and new features by a developer who doesn't happen to be a python guru.

2. Project Architecture

The dir tree is:


---the actual code---
Pdx/
  __init__.py
  Base.py
  Docbook.py
  Html.py
  Latex.py

---scripts for the bin dir---
scripts/
  pdx2docbook.py
  pdx2html.py
  pdx2latex.py

---documentation---
doc/
  go
  default_cfg.pdx           settings for the documents
  article_style.pdx         tuned for this project
  deshist.pdx               design history (includes the perl Pdx notes)
  devguide.pdx              developer's guide
  devguide_base.pdx         general material about Base.py
  devguide_html.pdx         specific to Html.py
  manual.pdx                user's manual

---testing---
test/
  all_tests.py              pyunit master module
  basic_tests.py            the tests
  go_test                   run the tests
  go_doc                    build documentation
  go_dist                   build distribution tarball
  go_web                    install tarball on website
  view_latex nn             convert testnn's results for xdvi view
  view_docbok nn            convert testnn's results for xdvi view 
  testdata/
    default_cfg.pdx         default settings
    article_style.pdx       local stylesheet
    dummy.pdx               used for includes
    dummy_html.pdx          used for html includes
    dummy_latex.pdx         used for latex includes
    dummy_docbook.pdx       used for docbook includes
    testnn.pdx              tests (nn=01,02,...)
    oraclenn.html,tex,sgml  oracles or "known good" results (nn=01,02,...)
  
---stylesheets---
doc/styles/
  default_cfg_.pdx          generic defaults file
  article_style.pdx         generic stylesheet
  docbook_article.pdx       docbook "article"
  html_article.pdx          html web page, with sidebars and header/footer
  latex_article.pdx         "article" 
  latex_foils.pdx           use foils.sty
  latex_seminar.pdx         use seminar.sty

Modules (per pydoc):

3. Testing

3.2. Adding tests

In test/testdata, copy test01.pdx and tweak as needed to exercise your new feature. Remember to change the cfg title = ... and desc = ....

In basic_tests.py, add a method for your new test number, for each driver. Also add it to the list of "cases" in the suite definition.

Run the go_test script. Since there is no oracle ("known good" output) the test will fail.

Bring up the resulting oracle (e.g., oracle11.html) in the driver-specific viewer and check it out. For Html, we can use a web browser. For latex, we need to run latex and view the dvi (using viewtex). To view the test11.tex file, run viewtex 11.

If the test result is wrong debug your program and try again. When all is well, update the oracle by kill_html nn:

  kill_html 11

This will remove the old oracle, and install your new one.

Continue testing for all the drivers.

4. Module architecture

Base.py does most of the work, calling the driver modules when needed for output. The driver classes inherit Base.Base. Each driver's __init__ can add more startup features, such as additional flags.

4.1. Base

The __init__ method creates an object and loads various default values. The parse method reads and uses the input parameters, then opens and reads the file.

4.1.1. Parameters

Commandline options are all in the long form ("--....."). Typically these are used to toggle switches on and off, e.g., whether or not to generate a table of contents. The internal data members for the switches end in _p, from the Lisp idiom for predicates.

The input parameters with a leading underscore ("_") cannot be set or reset directly by the end user via =define or =cfg commands. The idea is that these are critical to the internal integrity of the system.

In general, we set defaults, override these with command line parameters, and override these further with =cfg or =define commands. Individual drivers, or even individual methods can add additional data members.

For example, when we are deciding whether or not to do a table of contents:

.
Source Value

data toc_p (table of contents)

default toc_p = 1

commandline --toc or --notoc

=cfg toc_p = 1 (or toc_p = 0)

=def =def toc_p=1 (or (=def toc_p=0)

4.1.2. Input

The idea is to read into an array called self.lines, and then process this array twice (to resolve forward references). Reading from the array is controlled by self.morelines() and self.nextline(). Typically:

    while self.morelines():
        line=self.nextline()
        (process the line)

self.morelines is checking self.ndx, the index of the array. If you need to backup, decrement that value:

  self.ndx=self.ndx-1

self.nextline() increments ndx, gets the line, and does macro expansion on "@...@" terms when expand_p=1.

The raw array is modified by the do_include method. This reads an external file and splices the lines into the array, so the processing assumes they were there all along. In theory, we could get a loop of includes. There is not (yet) a check for this -- so be careful.

4.1.3. Errors

We give simple errors on the first pass, if we can safely continue processing, but don't do the second pass. We stop immediately if a fatal error is found:

  self.err("this is bad, but we can go on")
  self.fatal("we're dead right now")

4.1.4. Passes

The 2-pass loop in parse has to reset some data members (e.g., counters) for each pass. Of course, one thing it does is reset self.ndx to 0. Finally, we do the self.process_file.

On the second pass, we worry about printing all the data we've collected. For one thing, we need to do a preamble and postamble around the basic content.

4.1.5. Processing

self.process_file is a loop with a long series of regexp matches. If the match succeeds, the appropriate subroutine is called and the loop is "continued".

The regular expression matches are roughly ordered by expected frequency. A possible improvement would be a generic pattern match on "^\s*=([a-z1-9_]+)", and then a lookup table for specific routines.

Each do_ method handles as much as it can, but passes responsibility to driver_ functions as needed. This split is sure to change as new functions and new drivers are added.

4.2. Html

Html.pm is the initial driver for the system, and should be considered the template for other drivers. Considerations for developers include:

verbatims must be escaped
Html.py must keep track of header levels and prepare the table of contents itself.

Creator: Harry George
Updated/Created: 2002-06-24