Pdx

Pdx Design History

 

Table of Contents

1999-08-14

Experimented with pod and with sdf. I like the syntax of pod except for the over...back mechanism. I like the functionality of sdf but I think it got out of control. In particular, I want to allow user-definition of name-value pairs. That generalizes my earlier efforts at a configuration section.

1999-08-15

Began project. A lot of time spent learning modules (a la h2xs), references, and objects. Gradually reworked the architecture to make easy addition of functionality.

Added:

  • font style ( bold, italic, code)
  • heads and indented toc
  • verbatim

1999-08-16

Bootstrapping the design history.

Nested Lists

Added nested lists, with various styles
Unordered
Begin with =list *
  • item 1
  • item 2
Numbered
Begin with =list 1
  1. item 1
  2. item 2
Alphabetic
Begin with =list A
  1. item 1
  2. item 2
Begin with =list a
  1. item 1
  2. item 2
Descriptive
Begin with =list []
item 1
Is the first item
item 2
Is the second item
Altogether:
Descriptions, with
  • Bullets, with
    1. Numbers, with
      1. Upper-case alphas, with
        1. Lower-case aplhas

=cut

Added treatment of =cut. Eventually, would like to do full-blown literate programming.

Cleaner parsing

For a while I had trouble with detecting lines begining in spaces. I tried to order the regex checks in process_file by frequency of occurrence. Obviously, this means that non-special (i.e., non "=") lines should go high in the list. I tried:

  elsif ($line =~ /^\s*[^ =]/) {$self->out($line);}
However, this matched on indented "=" lines, resulting in failed parses. Finally found that I had some tabs and actually needed
  elsif ($line =~ /^\s*[^ \t=]/) {$self->out($line);}

Include

Next, we do include_verbatim:

dummy.pdx

line 1: This is I<included> from the dummy.pdx file.
line 2: An embedded include, to another dir:
=include ../doc/dummy2.pdx
line 4
line 5
line 6: final line

Next we do a pdx include, which may be nested. So far, we don't check for circular includes. We want to include on the first pass and just use the results on the second pass. We'll use a splice command, using ndx as the offset. We mark before and after with "|".

dummy.pdx| line 1: This is included from the dummy.pdx file. line 2: An embedded include, to another dir: line 1: This is included from the dummy.pdx file. line 2: An embedded include, to another dir:

line 1: This is I<included> from the dummy.pdx file.
line 2: An embedded include, to another dir:
=include_verbatim dummy2.pdx
line 4
line 5
line 6: final line

line 4 line 5 line 6: final line

line 4 line 5 line 6: final line

|

1999-08-17

Changed the list marker for descriptions from "d" to "[]", comparable to LaTeX. This allows us to later support setting starting values for the enumeration markers -- not that I ever expect to do that.

Changed the input args from a string to a list, thus allowing binaries like

pdx2html:
  #!/usr/bin/perl
  use Pdx;
  my $pdx = new Pdx;
  $pdx->parse('--2html',@ARGV));

Time to redo the architecture. I'd wanted to:

  1. Allow several outputs in the same run
  2. Keep it all in one file

But that means the various output drivers are competing for namespace. I'll try going with 1 output at a time, and use "require" to get the driver.

1999-08-18

Moved the format-specific code to separate files. it took quite a while to find the proper combination of idioms.

  1. Made a dir called PdxD (Pdx Drivers), patterned after DBI's DBD.
  2. Considered doing full pm's, but went to pl's because these are just requires in the sense of include.
  3. Do the require when the format is determined. Because it is runtime:
      can't use the $self->{...} notation, 
      so use $$self{...}.
    
  4. Took a long time to figure out the file handle deref. Finally found:
    This works:
      print {$$self{_HTMLFILE}} "$txt";
    But this does not:
      my $output = {$$self{_HTMLFILE}};
      print $output "$txt"; 
    

Still not right for references back from Html.pl to Pdx.pm.

1999-08-20

Restructure

Completely redid the architecture (again). Now structured as:
  Pdx/
    AUTHORS
    Base/
      Base.pm
      Changes
      MANIFEST
      Makefile.PL
      RCS/
    COPYING
    Changes
    Drivers/
      Changes
      Html.pm
      MANIFEST
      Makefile.PL
      RCS/
    MANIFEST
    Makefile.PL
    RCS/
    README
    bin/
      pdx2html
    doc/
      deshist.pdx
    go                  rebuild design history (deshist.pdx)
    test.pl

The inheritance works like this:

  Pdx/Base/Base.pm
    package Pdx::Base::Base;
    ...
    @ISA = qw(Exporter AutoLoader);

  Pdx/Drivers/Html/pm
    package Pdx::Drivers::Html;
    ...    
    @ISA = qw(Pdx::Base::Base Exporter AutoLoader);
    ,,,
    sub new {
        my $classname = shift @_;
        my $self = $classname->SUPER::new(@_);
        return $self;
    }

More fonts

Added underlined text.

More includes

Fixed =include_verbatim.

Fixed =include, which inserts directly into the input stream: line 1: This is included from the dummy.pdx file. line 2: An embedded include, to another dir: line 1: This is included from the dummy.pdx file. line 2: An embedded include, to another dir:

line 1: This is I<included> from the dummy.pdx file.
line 2: An embedded include, to another dir:
=include_verbatim dummy2.pdx
line 4
line 5
line 6: final line

line 4 line 5 line 6: final line

line 4 line 5 line 6: final line

Added legacy pod over... back:

item 1
This is item 1.
item 2
This is item 2

Tables

my caption
FeatureExample
multi-line rows this is a long, long, long, long, long, long, long, long,long, long, long, long, long, long, long, long, line.

with a break
embedded font markups This is italics

Links

External links, e.g., README.

Internal links, e.g., Table of Contents

Index entry

Driver-specific code

This is a single line of raw HTML. This is a block of raw HTML, with multiple lines.

It's time for the LaTeX module. After coding up Latex.pm, I found I had to rework some of Html.pm. Fortunately, it all got simpler and smaller as I went.

Filters

To allow pdx to be used as a filter, it needs to be able to read from STDIN and write to STDOUT. That is done via:

  pdx2html --stdin --stdout
Of course, these options can be used independently as well.

Groff

Took a shot at it. The groff and tbl documentation were inadequate for my needs. For example, which input option is used to shift output from ps to dvi? How are table rows formatted?

It seemed to generate a valid ps file, but the formatting was pitiful.

1999-08-21

Ok, it's clean enough to use. I'll start building tests scripts instead of working from this design history itself.

Things to consider:

  • Driver-unique, user-controlled pre- and postamble. This allows, e.g., generation of foils or memos. Maybe also allow appending stuff to the existing preamble (e.g., javascripts).
  • =include_for, to have driver-unique includes. The included file in turn would usually be an autogenerated block in a =begin driver block
  • Write the developer's guide.
  • Write the user's guide.

More Includes

Here is include_for html: This is a block of pdx to be included only in html files. Thus it could be used for autogenerated chunks of html, generated elsewhere and placed here in a begin block. E.g.,:

This is raw HTML

Here is include_for latex:

Tables

Faced with Latex table layout, I changed the =table command, to be:
  =table [widths]; grid; caption
  where:
     widths is ',' delimited list given in percentages, e.g., [25,75]
     grid is 'grid' or 'nogrid' (actually, grid or anything else)
     caption is the caption of the table

This was easy to insert into the Html.pm, but I stumbled over the Latex.pm. Need to use textwidth, with the colwidth percentages, to get values for the tabular column format string, e.g.:

  {|p{1.25in}|p{3.8in}|}

After an hour of looking around in tex and latex docs, I couldn't find the tex commands to compute directly in tex, so I just stuck in a default in Latex.pm's new method. Tweaked this to get the table to line up approximately. I tried tabular* for a while, but it's textwidth option was fighting with the p values, so I went back to staight tabular. Anyway, it looks respectable now, and the long line wraps nicely. Need a fancier table? Build it in a driver-specific begin block.

...Found the magic incantation:

  p{0.25\textwidth}
So I removed the hack approximation of textwidth. Then worked on providing breaks in a table cell. First, we'll use an HTML-style break tag, and convert that to linebreaks in various drivers:
  <BR>  --> \\
We then set each cell inside a minipage so breaks will take effect.

I've found that a simple break is adequate to get crude formatting inside a table.

=cfg

This is a convenient way to set a lot of =define statements. Actually, I'm putting it in to be compatible with my current verison of pdx2html. The idea is a collection of name=value pairs:

  =cfg
  creator_name = Harry George
  =end cfg

Preparing the distribution

Even though I started these modules with h2xs, cleanupwas a struggle. Finally found the magic:

 perl -MExtUtils::MakeMaker -e "WriteMakefile()"

That allowed several mods under the Pdx dir, without actually having a Pdx module.

1999-08-22

Next I found that the "Pdx::Html" notation works only for a fully installed module, or for the make-induced /blib/lib/Pdx/Html.pm, which mimics the final format. Otherwise, I was stumbling around with stuff like "Pdx::Html::Html". Finally put symbolic links into Pdx dir to the .pm's. Then I added the parent of the Pdx dir to my PERL5LIB env var. Finally my scripts would run under development conditions.

I'll have to take out the symbolic links when it comes time to do make dist.

Next I tidied up the Html/test.pl so it was fairly driver-independent. Then actually began the tests. Html went fine. After copying to Latex, the Latex tests went fine until xrefs. Working on that now. Seems to be use of underscore in a ref.

...Yep, that was it. Underscores (even escaped with backslash) are not allowed. Also, misspelled "makeindex" in go2, so my index wasn't showing. All is well now.

Tried running deshist with and without table of contents. Decided to stay with table of contents.

Ahhh. The sun is out, so no more computing for a while.

1999-08-23

Users asked for font control (esp. color), numbered sections in HTML, and lists inside table cells.

Color: Use, e.g.:

  =for html <FONT COLOR = "RED">
  some red text
  =
  =for html <FONT COLOR = "BLACK">
  some black text
some red text some black text

But FONT is deprecated, so one should really use style sheets. That means providing user-defined preambles. And, since the default preamble may be ok, we provide preamble_append. These are driver unique, so we get:

  =preamble_for html 
  (block of preamble, totally replacing defaults)

  =preamble_append_for html
  (add to default after defaults but prior to </HEAD>)

  =postamble_for html
  (no known need yet, but I put it in)

Thus a user can hard code a style sheet into the document, or can =include another file (e.g., a site standard) which has a preamble_for statement. Tested the hard-coded approach in test08.

Numbered sections: This is primarily an HTML or text driver problem, since Latex already does numbering. I looked through the specifications for HTML and CSS, and didn't see a treatment of numbered sections (headings). So I'll do one myself.

...Just completed it. Provided an option for --numbered_heads, which can be set from the commandline or form the cfg section (or from a define of course). In Base.pm, that splits to the driver_head and driver_head_numbered methods. In Latex, it is just a matter of a "*" (as in section vs section*). In HTML, I coded up a stack for the form:

  level:num.num.num...
e.g.
  1.
  1.1
  1.2
  2.
  2.1.
  2.1.1.
  3.
becomes:
  1:1.
  2:1.1.
  2:1.2.
  1:2.
  2:2.1.
  3:2.1.1.
  1:3.

I pop the stack until at the right level, then make a new "1", or increment the existing last num as appropriate. Then join the resulting number to the given name, and save that for the toc.

1999-08-25

Reread material on pod, sdf, sgmltools, and docbook. Next driver is probably docbook -- that gives easy translation to many other formats. Even then, the pdx syntax is terser than the others, giving faster development productivity.

Speaking of which, I changed the syntax of =define xyz 123 to =def xyz=123 (with optional spaces around the equal sign).

Now, what tagging for value substitution? Std templates use "@", so we'll try:

  =def xyz = 123
  ...
  There were @@xyz@@ ways to solve the problem.
Allow only [a-z0-9_] in the define name, to cut the chance of accidental matches.

Also, turn expansion on and off via expand_p. Default is off. Put it in nextline, so it is always picked up. We'll have to turn it off and on surrounding verbatims. But we'll leave it active (at user's discretion) for includes, since they may be used to load templates.

1999-08-27

After several hours of debugging yesterday and today, I found an extraneous backtick (`) in the Base.pm. Only in perl would a program try to compile and run with random chars. So I installed python and took a look. I like the modula-3 aspects. I don't like the enforced indents -- even though I'd probably do them anyway. Also downloaded python-based Zope. Maybe I'll convert Pdx to python.

Next I found and resolved a normal bug: The first line of a pdx-style list item was being printed without a space or newline. As a result it was effectively concatenated with the next line. Put in a newline and all is well.

Ok, back to the users guide....Added a "_force_p" to force update even if the timestamp is up-to-date. Tweaked Html.driver_table_row to make cells top aligned.

...Ok, ok, I need to do escape chars. I'd been holding out for ... being an emphasize mode, but I'm getting killed by trying to describe markups for Pdx itself. I'll escape lt, gt, amp, quot, and the pound sign.

...Did the escapes.

...When I got to graphics, realized I needed a "center" environment, so added it. Also noted the break function using <BR >

...Completed the user's guide.

1999-08-28

Built the release process. Had to add a mkdist script, MANIFEST.SKIP, and put the pdx2html and pdx2latex scripts into their respective EXE_FILES.

The mkdist script is:

rm *~
rm testdata
mkdir testdata
cp ../testdata/*.pdx testdata
cp ../testdata/oracle??.html testdata
perl Makefile.PL verbose
make test
make distcheck
make dist
mv *.tar.gz ..
rm -rf testdata
ln -s ../testdata testdata

The idea is to use the linked testdata dir for normal development, and then hardcode it locally during the distribution process. I do several tests, then move the .gz file to the Pdx level. All the dirs have basically the same mkdist (with adjustments for tests).

Had to restructure the dirs:

  Pdx/
    Base/
      AUTHORS
      COPYING
      MANIFEST
      MANIFEST.SKIP
      README
      Base.pm
      Changes
      Makefile.PL
      doc/
        deshist.pdx
        devguide.pdx
        userguide.pdx
        (other files)
        go                  script to build html's for docs
      mkdist
      RCS/
   HTML                    (same structure for all drivers)
      AUTHORS
      COPYING
      MANIFEST
      MANIFEST.SKIP
      README
      Html.pm
      Makefile.PL
      pdx2html
      test.pl
      testdata/             usually a link to ../testdata
                            but rebuilt during mkdist
      go                    "make test"
      killtest              remove an oracle
      mkdist
      RCS

For a while I was copying to my working proj/perl dir and installing there via PREFIX. Once that was working, I did the real thing -- installing as root into the official perl dirs.

Then tweaked doc's go script to use the official pdx2html and pdx2latex. We are now bootstrapped to the installed Pdx series. Of course, the test scripts still point to the local Pdx sources. ` Hmmm, maybe I should generate the html and pdf files prior to installation, in case folks can't build them. Makes for a big tar file for Base, but let's try it. ...Did it. It comes to 250K.

...Made a mkdriver script to clone Html for new drivers. Made a mkperlproj to help build mkdriver. Ready to get going with Docbook.

1999-09-02

Started Docbook but had to detour to do homework. Need a way to do TeX math, so let's put in a =texmath...=end marker. Non-TeX drivers can generate a graphics file and include that. See test10.

...Adding to Latex was of course trivial. For Html, I used latex2html's pstogif and made a script "math2gif", based on my earlier "mathtoepsi". Look around for some ps2png, but have't found anything so far.

Found that once I officially installed Pdx, I had trouble finding the in-development version. I have the PERLLIB env var set ok, but it was finding the official one first. So I just moved the official Pdx to Pdx.save for the duration.

1999-09-14

Have been using the package for a while. Decided to drop the hardcoded pre/post ambles in favor of stylesheets. Cleaner code and more powerful.

2000-01-12

Completely rebuilt it in python. Generally easier to read and maintain. I took out the pod-style verbatim, which was an odd duck for the parser anyway. The new dirtree is also simpler:

RCS

AUTHORS
COPYING
INSTALL
MANIFEST
Pdx/
  RCS
  Base.py
  Docbook.py
  Html.py
  Latex.py
  __init__.py
  go1           run Base tests
  go2           run Html tests
  go3           run Latex tests
  go4           run Docbook tests
  testdata/
    (testxx.pdx and oraclexx.html/tex/sgml) 
README
TODO
VERSION
bin
doc
go               run mkdist
go1              run setup.py build
go2              run setup.py test
go3              run setup.py install
mkdist.py        build the distriution and tar/gzip it
mkdocs.py        build the docs, using the installed system
pdx2docbook.py   script for docbook output
pdx2html.py      script for html output
pdx2latex.py     script for latex output
setup.py         similar in intent to distutils, but not yet using it        
styles/
  (stylesheets)



New functionality for the =cfg and =def:
=table [10,40];grid;.
=row B<Code>  & B<Example>   & B<Description>
=row ==       & xyx ==  foo  & use foo as-is
=row +==      & xyz +== foo  & append foo to existing xyz as-is
=row =        & xyx =  foo   & use foo after processing in-line markups
=row +=       & xyz += foo   & append foo to existing xyz after processing in-line markups

I also took out the --force mechanism. It was failing to handle changes to "included" files. If you want to avoid reprocessing a file, do up a script of your own.

2000-02-26

Again rebuilt the dirtree. This time to support the mypythonproj and setup.py approach.

---boilerplate---
AUTHORS
COPYING
INSTALL
MANIFEST
README
TODO
VERSION
setup.py

---the actual code---
RCS/
__init__.py
Base.py
Docbook.py
Html.py
Latex.py

---scripts for the bin dir---
pdx2docbook.py
pdx2html.py
pdx2latex.py

---documentation---
doc/
  go
  pdx.gif                   for the web page banner
  default_cfg.pdx           settings for the documents
  article_style.pdx         tuned for this project
  deshist.pdx               design history (includes the perl Pdx notes)
  devguide.pdx              developer's guide
  manual.pdx                user's manual
  user_install.pdx          included in manual.pdx
  user_tutorial.pdx         included in manual.pdx
  hello.pdx                 example document
  mktexdocs                 generate documents in TeX
  pdxpix.fig,eps,jpg        example picture
  userguideR23.gif          example math
  userguideR24.gif          example math

---testing---
  
---stylesheets---
Pdx/styles/
  default_cfg_.pdx  `        generic defaults file
  article_style.pdx         generic stylesheet

2000-02-27

Lots of work on the documentaiotn. In the process I discovered that the self.data (the dictionary for macros and def's) expects string values, and I need to do "str(value)" to assure they are prepared. Fixing that took care of some troubles with turining commandline flags on and off.

2000-04-17

Converted to distutils "setup.py" last week, and published to web. Per a comment by F. Lundh in c.l.p, used "readlines(BUFFERSIZE)", with BUFFERSIZE set to 100000.

The process_file collection of regular expressions is a candidate for a better algorithm. Took a look at something like:

  tag_pat=re.compile(r'^\s*=([a-z_]+)')
  head_pat=re.compile(r'[1-5]\s+(.+)')
  ...
  tag_map={'head':(Base.do_head,head_pat),....}
  ...
  m=tag_pat.search(line)
  if m:
    tag=m.group(1)
    if tag_map.has_key(tag):
       (funct,tag_re)=tag_map[tag]
       self.funct(line)

Didn't get it completely working, so stayed with current brute force approach. Look at it later.

2000-05-20

Worked on styles. Have Latex seminar.sty and foils.sty. Have a cleaned up HTML article, with sidebars et al. Then tidied up the test process --- for html anyway; need to do it for the others. I looked at Guido's regression test mechanism. Either I didn't understand it or it isn't doing what I need, so I;ll stay with my own for a while longer.

DocBook XML DTD v 4.0 came out. I downloaded it and will work on generating it properly for the docbook driver.

Next worked on redirection of includes. The problem is this: Assume file1.pdx is a chunk of documentation which includes file1a from another dir. Next, file2 (in yet another dir) wants to include file1. How does pdx find the right file1a? I run into this when file1 is a viewfoil presentation with includes of its own, but I want to also include file1 in class notes.

We need a way to redirect the basedir of an include. This should operate recursively. However, I've done a quick hack:

  =def savedir=@includedir@
  =def includedir=xyz
  =include file1a
  =def includedir @savedir@

The include function looks at the home dir of the original pdx file (i.e. file2), and then at xyz trying to find file1. That is, the main file's path is the default source for includes, with 1 alternative path.

Rethinking, I really should be looking at the algorithm for cpp's #include. But I'll try this: As each include is started, push the abspath onto a stack of paths, and pop off the stack at the end. For each new include, compute the normed path of the concatenation of all the paths on the stack.

 
Creator: Harry George
Updated/Created: 2001-09-03