Tinderbox v9.2.0 Help in PDF form - and a future challenge

mwra · March 29, 2022, 8:14pm

The challenge is not just ‘pdf’ as a format but whether the format is usable (text formatting is correct, links work). The Shortcuts option seems baffling lite on documentation. Has anyone here used it.

What wkhtmltopdf did well was:

Make a page-numbered PDF
Running headers/footers
~~Make a Table of Contents linked to content~~ Done from Tinderbox
ToC links to page numbers.
Bookmarks for all headings
Destinations for all headings & TOC items (albeit with cryptic name). Destinations are potentially a big win as it enables deep linking into a PDF—very few (free/affordable) PDF tools can do destinations…

mwra · March 30, 2022, 4:12pm

I’d forgotten the moving part in this and realised I’d set some of the $AnchorName values by hand rather than calculate them from the read-only $HTMLExportPath.

This should fix some broken in-PDF links in the previous versions. Download links:

I’ve also spend rather longer than I thought properly documenting the process in the Tinderbox Help doc, as a precursor to making a more generic model to share with the community. The above solution is ‘easy’ as there are no aliases in the exported data. Exporting data with aliases is a new challenge…

mitchelln · March 31, 2022, 10:46am

Hi Mark,
What is your wkhtmltopdf command for this conversion, if you would?
Nic

mwra · March 31, 2022, 1:38pm

Yes, no problem—with the proviso that this technique is not recommended for Tinderbox beginners, there being a lot of advanced techniques involved. So, the sed to process the HTML links to other notes (i.e. normally other HTML pages) is:

sed -E 's:href=\"help[^\.]*\/([^\.]+)\.html\":href=\"#x\1\":g' print-export-source.html > print-export-source-proc.html

Then, the wkhtmltopdf command line is (noting two inline version-specific mentions):

wkhtmltopdf --enable-local-file-access --outline --page-size "A4" --footer-spacing 4 --print-media-type --footer-center "[page] of [topage]" --footer-font-name "Helvetica Neue" --footer-font-size 11 --footer-line --footer-spacing 5 --header-spacing 5 --header-line --header-center "Tinderbox v9.2.0 Manual" --header-font-name "Helvetica Neue" --enable-toc-back-links toc --enable-toc-back-links --toc-header-text "Tinderbox v9.2.0 Manual - Table of Contents" --toc-text-size-shrink 1 --toc-level-indentation 4em --enable-local-file-access "print-export-source-proc.html" "Tinderbox v9.2.0 Manual.pdf"

I can also happily report that wkhtmltopdf v0.12.6 (the current—and likely last Mac version) does work on my M1 MB Pro under 12.3. Best guess is the earlier failure was because I was running the above code from the wrong working directory.

For those wanting to use this method in their work, a big benefit of the wkhtmltopdf approach is that it creates PDF destinations. So? PDF ‘destinations’ are little documented/discussed and very few affordable/free tools support their creation. A problem with PDF bookmarks is that they are not persistent. Mac Preview even stores them outside the document (without informing the user) so don’t help for a distributable document. Bookmarks link to the pagination of the PDF (PDF is predicated on per-digital print era concepts). Delete a page and all book pages will be off by one page.

By comparison, a PDF destination points to specific locations in the document and is thus resistant to editing of the document. A further, and unintended, usability win is that PDF destinations are addressable in a web browser in the same way as an HTML in-page anchor using a # prefix. Admittedly, the wkhtmltopdf PDF destination names are ad hoc codes (and can’t be user-set). But, for those who have the need, the ability to URL-link to a particular heading in PDF document cannot be overlooked.

To get to using the above, some more set-up is needed. Note the method can cope with a max of one alias per original note. Why? Tinderbox by design concept (and cromulent in c.2000) is to export all notes to discrete HTML pages with the filename based on the note $Name (calculated on-the-fly during export). Thus aliases reuse the originals export filename. If an alias exports in the same folder as the original Tinderbox is smart enough to to suffix a 1 to the filename. But, exporting all note’s to one page doesn’t work so well: Tinderbox will increment filenames but—in the moment—we don’t know at both ends of the link the correct name. So, for now, to use this technique don’t use aliases in your exportable data. Possible untested workaround: use an empty note whose only $Text is an ^include()^for the what was previously the alias.

Wrapper template:

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>^title^</title>
<link rel="stylesheet" href="^root^css/screen.css" type="text/css"> 
<link rel="stylesheet" href="^root^css/manual.css" type="text/css">
<link rel="stylesheet" href="^root^css/print.css" type="text/css" media="print">
</head>
<body id="manual">
<div id="main">
<h^value($OutlineDepth)^>^title^</h^value($OutlineDepth)^>
^text^
^children("/Templates/PDF-source-item")^</div>
</body>
</html>

Recursing per-item template:

^if($Name!="Recent Changes")^<div class="header^value($OutlineDepth-1)^" id="x^value($AnchorName)^^if($IsAlias)^x^endIf^">
<h^value($OutlineDepth-1)^>^title(this)^</h^value($OutlineDepth-1)^>
^text^

^if(children)^^children("/Templates/PDF-source-item")^^endIf^
</div>^endIf^

Before use, make an agent finding all exportable notes (in the single page export) and give it this agent action:

$AnchorName =$HTMLExportPath.split("/").at(-1).replace(".html","");

This calculates the exported filename, including it path within the root export folder. Thus the action strips the path and file extension and uses the bare (exported HTML) filename as a string stored in $Anchor name. This is then used as an HTML id attribute value at the start of each note are it is added to the single page export.

I hope this isn’t too complex to follow. ()It took some figuring out when I first designed the code!)

The export is done via a root-level note called ‘print-export-source’ whose $Text is:

^include("/Help","/Templates/PDF-source-page")^

and which uses an export template whose code is:

^text^

Exporting that note creates ‘print-export-source.html’ upon which the two command lines act. Important: the two command lines assume the Terminal’s pwd is the folder holding the exported single-page HTML file.

CSS. This doesn’t have to be ‘in’ the HTML file but must be locally accessible to it, e.g. same folder or sub folders.

Images. this method does not support images embedded in notes. Why? We then don’t know the image filename or local path without a full export. I use ^do()^ macros in text to insert appropriate HTML <img ... tag data into the HTML.

I think that the gist of it. Only simple, once set-up. There is a fair amount of work to set up a document for this but once done, it is as simple as exporting the the single-age HTML.

Footnotes:

the sed was originally run as an $HTMLExportCommand action but, long story short, it didn’t work and we never figured out what. As you still need to run the wkhtmltopdf command it’s no big deal. Plus having the pre-sed HTML page helps with de-bugging link issues.
pay careful attention to use of templates. Ideally avoid pages that export their children via ^children^ if you want the children in the ToC and addressable via a link. For mature docs that may mean—from tiresome experience—re-designing a docs export template if, as with TBX Help, the document is exported both as individual pages (for app HTML Help) and as single-page HTML for this task.
beyond the command line, use of wkhtmltopdf setting is a challenge for thus user. In true command line style, the manual’ is terse, incomplete and leaves all sorts of unanswered questions. IOW, you may encounter a lot of trial-and-error before your customised CL works. I’m unable to provide tech support on wkhtmltopdf use. There is a light-traffic Goole newsgroup you can try but I think it’s pretty much abandoned now. Also see: https://wkhtmltopdf.org.
why not use $ID instead of $AnchorName? Honestly, I don’t recall bit I think it’s because I first designed this method back in v4.x/v5.x before $ID was introduced (in v5.6.0).
why the ‘x’ prefix on $AnchorName in the HTML id value? A valid HTML ID needs to start with a letter (not number) and a small range of other characters. The ‘x’ prefix thus sanitises anchors for $name values like ‘9.2.0’ (9_2_0) or ‘.bold’ (_bold). Yes, a fair degree of HTML knowledge, as well at Tinderbox , is needed here to avoid head-scratching silent failures (i.e. no output and no explanation as to why).

Apart from those few small points, it’s really quite simple