Links and Single Page HTML Export

Hello,

I’m returning to Tinderbox after some time away and need help exporting a container of notes as a single HTML page while preserving links between them. I’m using the default HTML template, but I’m running into two issues:

  1. The <a> tags try to link to separate HTML files (which don’t exist since I want a single-page export).

  2. The <h2> headers (note titles) don’t have anchor tags, making it impossible to link to them within the same page.

The reason I want this is for interactive fiction—I need everything in one file so I can process it with an LLM. I’ve tried exporting in Ink or Twee format, but my custom templates didn’t work out. I believe HTML would work fine if I could get the linking right.

Does this approach make sense? Any advice on structuring the export template to keep everything in one file with working links?

Here’s an example of the current output:

<!DOCTYPE html>
<html>
<head>
	<meta http-equiv="content-type" content="text/html; charset=utf-8">
	<title>Container</title>
</head>
<body>

<h1>Container</h1>
This is the container.

<h2>Second Link</h2>
<p>This is the second link.</p>

<h2>First Link</h2>
<p>This is the first link.</p>

<h2>Start node.</h2>
<p>This is the start node. It links to two other nodes.</p>

<p><a href="Container/First Link.html">First Link</a></p>
<p><a href="Container/Second Link.html">Second Link</a></p>

</body>
</html>

Looking forward to any suggestions!

Thanks!

So as a workaround, I’m exporting the notes as individual files, then using a bash script (thanks ChatGPT!) to merge them into a single html file with the all the links converted to anchor tags.

So this will work, though I’d love a native solution in Tinderbox :slight_smile:

In short, there is no native solution. Also if you export a document to multiple pages including some composed of multiple notes links from single-note pages to the embedded notes also will not work. The limit is inherent in the HTML export design (which dates back 25 years). But…

There is a workaround, which I developed to allow ‘printing’ TBXs to PDF (actually it’s a multi-step process, but this gives the picture). Rather than write an essay on this, download the aTbRef TBX zip file. Unpack it and locate the /Templates container (it is in the root of the outline towards the bottom. In that container locate two templates:

  • PDF-source-page. This is used for the root note of the multi-item page. It calls…
  • PDF-source-item. This recursing template exports all descendants of the above.

Importantly, there is one further step to take after the exported HTML page is created. Assuming your multiple note page exports to ‘myfile-raw.html’ then you need to run this command line:

sed -E 's:href=\"index[^\.]*\/([^\.]+)\.html\":href=\"#\1\":g' myfile-raw.html > myfile.html

Note: if the HTML filename you actually includes spaces, enclose either/both filenames in the command line in straight quotes.

In theory you can run the sed part of the command line during export via $HTMLExportCommand but it proved problematic. That said the latter was in v5.x days—ten years back!— and Macs, Tinderbox and the OS are all now faster. However, the above does work, as the occasional PDF versions of the app’s Help (which happens to be created from a TBX file) and of aTbRef can attest.

That’s it. As noted, with is a workaround for your solution, but I don’t know of another solution nor of any plans to allow multiple-notes-to-one-page export that preserve inter note links.

†. If you are confident with command line use and get $HTMLExportCommand to work, do report back. My solution is for one (large!) HTML file so the extra step isn’t a dig deal. If I had 0s/00s scattered across a doc-wide export, I think an inline solution would be better. Re-trying $HTMLExportCommand is on my spike somewhere but I just don’t have the spare time to do an open-ended set of tests. Thus any feedback on this would be welcome. :slight_smile:

‡. Part of the issue is Tinderbox links are note to note (or alias), i.e. one to one. Including one note inside another presents a further possible challenge. Let’s say note A links to note B. Exporting B as part of A’s HTML page is covered by the above. But A exported alone cannot link to an included B as A has no way to know B is now part of some other page. Or if B is included multiple times in one or several pages, to which one does A’s link point to? Recall: links are one-to-one not one-to-many. This is part of the reason composite export is more complex than imagined. Websites produced out of a single database can track this sort of thing, but then you loose all the power of Tinderbox.

1 Like

Some more background on $HTMLExportCommand, having burrowed into my email archive. It turns out, I tried re-integrating the above external solution as recently as 2022. It failed—silently. Best guess—after discussion with Eastgate—is the problem of escaping. The CL needs both single quotes for the sed argument and within that (CL-)escaped double quotes which are a necessary part of the regex needed for the replacement task:

sed -E 's:href=\"index[^\.]*\/([^\.]+)\.html\":href=\"#\1\":g'

So we’ve got escaping in the command line and a further layer of escaping for storage in the XML of the TBX file. Somewhere in that nesting of escapes something is going awry.

When last discussed in 2022, the suggested solution was:

Rather than dealing with multiple levels of escapes, I’d suggest saving the sed command as a shell script. Make it executable with chmod, and then use that shell script in runCommand.

Essentially, I’m doing this in aTbRef (method described above) without the overhead of requiring the knowledge to set up the call to the shell. The point I observe is most Mac users are not confident/regular users of the shall. So the suggestion, whilst valid and well-intended, is IMO simply not valid in the deployment context.

I note all this here as some CL-adept folk among us might use an alternative approach. For instance one using less escaping such as seems to derail use of $HTMLExportCommand.

Essentially, my script looks for URLs in href values in the HTML like index[something].html and changes them to #(something). IOW, from a link to a jump-link to an in-page named as for the normally exported filename of the included page. Note: the export templates (up-thread) deal with setting up those anchors in the HTML of the exported composite page.

I lay this out here in case someone in the community with CL smarts can suggest a way an ordinary/casual user with no CL experience can do this task without missing around under the hood of macOS (a scary proposition for many).

The difficulty of this also explains why aTbref has so many small articles. It allows both better addressability and avoids the problem with inclusion breaking links

One further workaround of sorts, and which I feel scales badly, is to make links in the TBX. But for those that are known to point to embedded notes, replace the calls with web links pointing to an anchor (which you’ll also construct in your export template) in URL of the exported note that forms the parent/base of the composite page. Not simple, I grant, but if the need is there, that is the route to take.

By way of recap there are tow edge-case export issues here:

  • Tinderbox has no method to ‘auto-translate’ exported intra-TBX links where the target note is exported as an include to another note.
  • Tinderbox links are one-to-one. If the target embedded note is used multiple times, Tinderbox has no way to know which one of those to which to point the exported link.

Thus, unless your export process demands composites, consider planning on using discrete files. Or, don’t use intra-note links, but that rather defeats the utility of a hypertext!

†. Oops, a point I missed for @LittleHouse75 earlier. aTbRef exports a root index.html and all other site files are thus within an /index/ sub-folder of the aTbRef site. In my sed call I use ‘index’ as a filter to ensure I don’t ‘fix’ web links pointing outside the aTbRef site. So you may want to adjust the sed regex accordingly for your use.

2 Likes

Interesting! Some quick notes:

  1. Why didn’t your templates work out? I would expect that would be the most straightforward approach. Way back in the day — pre-Twine — I ran up an export template for TiddlyWiki. That ought to be far easier nowadays.

  2. In terms of processing a bunch of documents with an LLM, I would think that LLMs ought to be able to understand sites! But you could always concatenate a set of exported files into one big file.

  3. Your templates could easily include anchors:

<h2 id="^title > ^title </h2>
  1. Instead of using text links, you could write a macro ^do(link,destination,anchor> where “destination” is the name of the destination note (i.e. its ^title) and. “anchor” is the anchor text.

The keys to consolidating lots of notes in one export file are ^include(), which includes a designated note, and ^children, which includes all the children of ‘this’ note.

I know this is cryptic — I’m in a frightful rush — but with luck it will make some sense and others can elaborate and improve.

I wonder if $HTMLExportCommand evaluates eval()? If so the attribute could store:

eval("sed -E 's:" + 'href=\"index[^\.]*\/([^\.]+)\.html\":href=\"#\1\":' + "g'")

then perhaps the call might work.

why not runCommand?

Does that work in the context of export and the $HTMLExportCommand. The aim is that the necessary find/replace happens during export so the exported file is ready to use. If the CL is stored outside the TBX the file is not portable so fails the starting use case.

In my own case (up-thread) post-processing a single file, but my hunch is the OP isn’t using just one composite output file, so manually post-rpocessing each might not warrant the effort.

I have a solution that works, see attached. Fun, :): Without seeing @eastgate response, I had a similar idea to use replace to create anchor tags.

Here is the solution.

5Cs_Becker_InternalAnchorsTemplate.tbx (308.0 KB)

You put an action in the header of the text, something like this:

^action(
  var:string vNoteName=' id="'+$Name.replace(" ","")+'"';
  var:string vText=exportedString(this,"tRender");
  var:list vList=vText.extractAll('href="(?!#)(.*).html"');
  var:string vLink=;
  var:string vLink2=;
  vList.each(x){
    vLink=x.extract('href="(?!#)(.*)"');
    vLink2="#"+vLink.replace(" ","").replace(".html","");
    vText=vText.replace(vLink,vLink2);
    vLink=;
    vLink2
  };
)^

Note: I could not get extractAll() to only extract the back reference, so I needed to take a two step process. @eastgate or @mwra, perhaps you have a better idea.

Now, all you need to do is select “Export as Selected Note” from the file menu, and it works.

I developed a bonus idea. See “CreateDraft” stamp. If you want to create drafts to make versioning the Export Selected Note capability, you’ll want to use this. This simple stamp will create a draft of the “report” and version it in a drafts folder. Note, one reason why I do this is that I’ve found that when you select Export Selected Note, if you try to save the selected note as a different name, Tinderbox does not appear to save it. It only wants to save the file if the draft name and the note name being saved are the same (see below: I’m trying to save the note as Report2.html, and it does not get saved). @eastgate, am I doing something wrong?

1 Like

I think this can be done with less stress using plain old templates.

1 Like

Not sure what you mean? I am using templates. What would be the different approach?

This is by design; the exported note is always exported =using its own name, in whatever file folder you selected in the dialog.

Yes, I see what you mean and it makes sense. From a usability point of view however, one needs to either change the note name in Tinderbox before exporting or create another version. One is not able to change the name of the exported file on the file with the Finder window. This is not a big deal, as there is an easy workaround by using a Draft versioning stamp. :slight_smile:

FWIW, exported root-level notes via ‘Export selected note’, do ask for a name before saving regardless of whether or not the note has been exported previously. In my big projects, reports get a root-level stub note using a template of just ^text^ and an include. Thus in the aTbref examples above root note ‘print-export-source’ has this $Text:

^include("/A Tinderbox Reference File","PDF-source-page")^

You can see it in the aTbRef TBX. Exporting the latter produces a single HTML file of the whole of aTbRef’s content (bar the sitemap, search page, etc.).

Using either my technique or @satikusala’s it is worth noting these caveats:

  • These techniques were developed for making large composite reports, i.e. a ‘site’ as a single (HTML) document, and in @satikusala’s for post-processing via pandoc to get a word document (as that’s all that ordinary executive folk understand—the tools we use here are alien to most folk). That means the expectation is that:
    • inter-note links all point within the exported page and so heir HTML needs fixing to point intra-doc, or …
    • are formal web links web at large (or to local drives), i.e. content outside the source TBX. these should not be altered.
  • The expectation is images used by notes’ text are stored outside Tinderbox. Images inline in $Text are exported alongside the the per-note page in the hierarchy of exported folders. If using a composite the relative path from HTML to to image differers.
  • For portability of the HTML whether used as such or post-processed, it is a good ideas to include all CSS inline in the <head> of the document.
  • The techniques were not developed with complex JavaScript use in mind so addition consideration may be needed for such use.

I’ve not sufficient time to go ver this in detail but would note a couple of points:

I add per-note HTML anchor (normally set as the id attribute of a <div> wrapper around the note’s entire included content. For the names of the anchors I found it safest to use the source note’s

if(!$HTMLDontExport){
   if($HTMLExportFileName){
      $AnchorName=$HTMLExportFileName;
   }else{
      $AnchorName=$HTMLExportPath.split("/").at(-1).replace(".html","");
   };
};
// Assumption: export settings do not allow spaces in export filenames
// Assumption: there are no duplicate note names in the included notes

(Note: I only just added that code comments into the aTbRef master code as a reminder to future self!)

Until I did this, I found that as my notes often have non-alphanumeric characters I’d hit issues with the anchors not working as expected. Using the app’s HTML export name essentially ‘sanitises’ the anchor name and makes it HTML compliant. IIRC, HTML5 anchors can use A-Za-z0-9 and underscore but may not start with a number character.

Don’t ignore the issue of duplicate filenames generating identical anchors. If this occurs an in-page link will go to the first of the anchors of that name by page code order, i.e. not the place you imagined. If you need duplicate note names, then some additional pre-processing is required. I would generate the $AnchorNames, then check for duplicate names: if dupes are found, set the second (and subsequent) same names notes to use an explicit $HTMLExportFileName—i.e. not one generated on the fly. Thus ‘My Note’ #1 →auto-generated ‘MyNote.html’. For the first dupe we manually set the export name to ‘MyNote2.html’, then ‘MyNote3.html’, etc. This won’t upset normal HTML use of full-file export.

A possible problem here is it will process all links, including valid links to web targets outside the document. In context, this is another early lesson when writing aTbRef: friends don’t let friends build exporting main content at root level. As found with Map use, using flat content at root has unwanted complications. My root exports the ‘home’ page of the HTML (content) and other root exports are restricted to extra-content material such a CSS/JS/sitemaps/single reports, etc.

Thus, apart from the home page (‘index/html’ for aTbRef) all content pages are in or descended from one exported folder (‘index/’) for aTbRef’. This allows me to seed the folder name into my replacement regex (see up-thread) ensuring I only edit links that should have in-page targets as their URLs will start “index/”.

Darn it, I just thought of possible edge case if an included note links to the top level note as its URL excludes ‘index’ … and sure enough such links do still point outside the ‘fixed’ page. :frowning: Luckily the aTbRef content only has three such but it is three too many!

Aside, although I often forget when making web links in aTbRef, I do periodically check for such and ensure all such out-of-TBX links have a specific link type (in my case ‘web reference’). I think I may need to look at leveraging that for better link triage.

I do think the newer approach of processing the HTML before it is added to the output file data makes sense. My old approach worked round problems encountered (reasons up-thread) with $HTMLExportCommand. The trick the newer method uses if for a note to … export itself!

The latter involves the note template calling the same note via exportedString() using a ^action()^ call at the top of the template. The exportedString()returns the HTML-encoded content of the note. But as we are still ‘in’ action code we can use action code text tools to edit URLs (assuming we can find the right ones!). The ‘fixed’ HTML is stored in a valuable and then becomes the template output when inserted via ^value()^ (it must come after the ^action()^ element).

The upside here is that we are only doing the extra processing one note at a time rather than the whole generated page (aTbRef makes an 8.6MB HTML file). Both @satikusala and I are on M4 Macs with lots of RAM but I suspect anything capable of running the current Tinderbox app should cope this the latter method better than the former.


Sorry for all the long posts here. It won’t interest most, if only for going too deep. However, the outcome—if fixed—should be useful to lots in the community who can then ‘just’ use the finished code/model. Working it out here does allow anyone else to spot errors or suggest better approaches.

1 Like

BTW, I modified my code to address links with an https as well as # by modifying the RegEx to href="(?!#|https://)(.*)".

Also, in a conversation with @mwra I realized a missed edge case in my implementation is that I’m addressing notes with the same. I’ll be updating my file to address this edge case as well as to make the implemtation for those that are working in Markdown, as this case currently does not work with Markdown.

Although we’re all encouraged to use ‘https://’ protocol rather than ‘http://’, not everyone does (not least there is a cost). But if you make the s in the regex s* (i.e. zero or more s characters) then you cover both bases. So:

href="(?!#|https*://)(.*)"

An assumption also surfaced in a conversation with @satikusala is that we assume all in-page links are relative URLs, and ass such are not full URLs that start will a protocol. As in HTML, the :// is the protocol marker then any opening text (protocol or IP number) followed by :// could be used as the marker of URLs to leave alone as their target lies outside the current document.

I aslo recall that since my $AnchorName method I’d moved to a more flexible approach of using $ID and idEncode(dataStr) for the HTML ID.

1 Like

What an amazing community! I left my initial post, I was hit by covid, I come back and – wow! What a thread! Thank you @mwra , @eastgate , @satikusala.

I’m slow, and still recovering, but beginning to digest all that is here. This is really going to help me build my Tinderbox toolkit.

I’ll report back in a few more days how I’m progressing with this.

2 Likes