TB2Word (Export from Tinderbox via Markdown and pandoc to Word with citations and footnotes!)

TB2Word

I wanted to write a quick note on exporting from a Tinderbox note in Markdown to a Word .docx file using pandoc, a .bibtex Bibliography automatically generated created by Bookends in the Chicago style using a CSL template.

I am an anthropologist, writing in the humanities, and for a long time, I’ve wanted to be able to create .docx files from Tinderbox, formatted in a very specific, but simple, way. (Times New Roman, 12-point font, 1-inch margins, footnotes, citations in Chicago Style, and double spaced). It’s the standard in the corner of academia where I write.

I’ve used Bookends for a decade and a half. For a long time, I could not get Pandoc to export proper footnotes, and I relied on scanning the document in Bookends. Generating a word file took a lot of fiddling. It slows things down, and basically meant I’d start drafting a project in Tinderbox, and them move to Scrivener at some point. Now, I don’t have to.

I’ve cracked the nut of exporting to Word, inspired by the TinderBox meetup with the author of Bookends, and by Bernardo Vasconcelos and Michael Becker’s discussion of pandoc export.

I wanted to share my results.

TBXConfig.zip (467.2 KB)

The ZIP file contains a folder TBXConfig, which should be placed in the home folder. You will then have to modify the paths in the document. Look for /Users/dtubb and replace with your own. (e.g. look in the Stamp code

Pandoc:Convert with Pandoc
var theExportedFile;
var theText;
var theConvertedFile;

theExportedFile=$exportDir("TBXConfigNote")+$ExportFileName+$fileType;

theConvertedFile=$exportDir("TBXConfigNote")+$ExportFileName+$convertToType;

$theConversionCommand=$pandocDir("TBXConfigNote")+"pandoc --reference-doc /Users/dtubb/TBXConfig/pandoctemplates/timesnewroman-doblespaced.docx -f markdown+simple_tables+table_captions+yaml_metadata_block+smart --bibliography /Users/dtubb/TBXConfig/bibliographies/bookendsSyncedBibTeXFile.bib --citeproc --csl=/Users/dtubb/TBXConfig/csl/chicago-note-bibliography.csl -s "+theExportedFile+" -o "+ theConvertedFile;

theText=exportedString(this,"tMarkdown");

runCommand("pbcopy ",theText);
runCommand("touch " + theExportedFile); 
runCommand("pbpaste > " + theExportedFile);
runCommand($theConversionCommand);

You will also have to install pandoc. (I use home-brew, from the terminal).

brew install pandoc

It will probably take some fiddling to get setup right. I could have abstracted the commands into attributes more. But, I haven’t.

TB2Word

At a high level, what is happening is a TBX stamp calls lightly modified code written by Becker that:

  • Exports a note on the fly to an .md file.

  • Uses Pandoc to convert that export.md file to an export.docx file

    • Using a Word reference file that I crafted.

    • Using a CSL bibliography template that I downloaded.

    • Using a BibTeX file generated by Bookends.

The stamp code is mostly Becker’s. It generates the following $theConversionCommand:

/usr/local/bin/pandoc --reference-doc /Users/dtubb/TBXConfig/pandoctemplates/timesnewroman-doblespaced.docx -f markdown+simple*_tables+table_*captions+yaml*_metadata_*block+smart --bibliography /Users/dtubb/TBXConfig/SlipBox.bib --citeproc --csl=/Users/dtubb/TBXConfig/csl/chicago-note-bibliography.csl -s /Users/dtubb/TBXConfig/exports/export.md -o /Users/dtubb/TBXConfig/exports/export.docx

The result, a perfectly formatted Word document, (export.docx) with Times New Roman, 12 point font, 1-inch margins, and double spacing, with citations are properly rendered in footnotes.

The trick, was modifying a Word reference file as a template timesnewroman-doblespaced.docx to match what I was looking for. For Word, this is a .docx that has all the styles that pandoc uses. It can be edited in Word, in the style editor.

For a bonus, I also got Marked preview working.

Marked Preview

As a bonus, this HTMLPreviewCommand lets a note be previewed in Marked, with a custom .css style, academic.css. For Marked, I set the style to the custom.css. To make that work, you’ll need to install it using Marked’s settings, and then setup the custom processor in the advanced tab:

$HTMLPreviewCommand:

/usr/local/bin/pandoc -f markdown+simple*_tables+table_*captions+yaml*_metadata_*block+smart -t html --bibliography /Users/dtubb/TBXConfig/bibliographies/bookendsSyncedBibTeXFile.bib --citeproc --csl=/Users/dtubb/TBXConfig/csl/chicago-note-bibliography.csl`

In Marked, you will need to set some settings in the advanced tab.

/usr/local/bin/pandoc
-f markdown+simple*_tables+table_*captions+yaml*_metadata_*block+smart --bibliography /Users/dtubb/TBXConfig/SlipBox.bib --citeproc --csl=/Users/dtubb/TBXConfig/csl/chicago-note-bibliography.csl

You’ll have to check automatically enable for new windows, and enable custom processor, and update permissions.

Conclusion

All in all. With a lot of help from the forum, I can now write in Tinderbox, preview in Marked, and export to Word, with citations and footnotes. It’s a goal I’ve tried to achieve, intermittently, for a long time. It’s something I think scholars in the humanities might find useful.

All in all, it’s exciting for my evolving SlipBox.[1]

Here’s a screenshot:

References

Ahrens, Sönke. How to Take Smart Notes: One Simple Technique to Boost Writing, Learning and Thinking – for Students, Academics and Nonfiction Book Writers. CreateSpace Independent Publishing Platform, 2017.


  1. Ahrens, How to Take Smart Notes. ↩︎

4 Likes

Nicely done! It’s super exciting and liberating once you finally start to get this working, isn’t it. :slight_smile:

Here are a few other points I’ve learned along with way:

  • AppleSilicon, for those using AppleSilicon, your pandoc Paths will probably be different, e.g., 1/opt/homebrew1 vs. /user/local/bin. See: Notice for pandoc and Homebrew users upgrading to Apple M1 - steps for addressing installation error
  • Retaining Headings in Word, it took me months to find the solution for this, Word, with the pandoc to Word export, does not immediately recognize the internal Word sytles. If you have a H1 or # styling in Tinderbox this style will not be properly associated with the “Heading 1” format in Word, or any other custom Word formats you might have created in your Word template, e.g., Author. To get around this. you need to wrap your headers with a div, e.g., <div custom-style="Heading 1"><h1>$Name</h1></div> or <div custom-style="Author"><p class="author">$Name</p></div>. My method is a bit more complicated as I have the heading numbers dynamically generated, e.g., <div custom-style="Heading ^value($HeadingDepth)^">, to reduce the complexity of the file, which can be a bit tricky to setup but once setup you don’t need to touch it again
  • Markdown vs. HTML, personally, I’ve found mixing markdown and HTML to be the way to go, remember, markdown is just shorthand for HTML. In TBX templates HTML is easier, especially if you’re working with images and links, in the body of text Markdown is easier to use, especially when you’re trying to format heading and bullets.
  • Dynamically numbered/formatted headings, remember, you can use action code and templates to have Tinderbox use the container structure to dynamically manage your heading 1, heading 2, heading 3 outline structure (there are several videos and meetups that cover this."
  • Citation tools, Bookends works great for this process, as does other citation tools, like Zotero (you have many options.
  • Citation export, If you’re using a multi-note strategy to construct your file Pandoc will generate the references and bibliography file on each note and NOT add a consolidated bibliography file at the end. To accommodate for this I’ve developed several “generator draft” strategies which creates an interim note the consolidates multiple notes into one before performing the Pandoc export. There are several approach to this to consider, especially if you’re exporting chapter sections rathe than a whole book.
  • Glossaries, don’t forget that using recursive includes are great for dynamically formatting sections of an export, e.g., glossary tables vs. bulleted lists vs. how captions are set to images based on the type of citation style you’re using (all of this can be dynamically handled in templates). We’ve covered this in past meetups.
  • Tables, Warning, no matter how hard you try, Word WILL NOT let you pass formatted tables from TBX through Pandoc to Word. Word strips out the HTML. The only way I’ve found to automated this is to mess with updating the Word XML post export which is an unreliable hack. I’d love to try to solve this.
  • PowerPoint, in addition to exporting Word Docs, you can also use this pandoc method to export PowerPoint files. It is a bit more of set in TBX but it works.
1 Like

It is exciting. It’s been a bone I’ve been trying to solve forever.

That’s interesting. For me, Markdown headings do work in Word. e.g. the markdown in the note: # Heading 1 becomes a properly styled as Heading 1. (I think I had to turn of the flag in the action inspector, Markup Text.). I don’t generally use that many headings, and never want them dynamically generated. But, the markdown is working. I think. Ill look at your demos, and see if I can add that.

On tables, Markdown’s table support does seem to work for me:

| Item         | Price     | # In stock |
|--------------|-----------|------------|
| Juicy Apples | 1.99      | *7*        |
| Bananas      | **1.89**  | 5234       |

The trick is the Compact style in word needs to be single spaced, no indent, etc. The version I sent earlier, doesn’t have that done properly.

I think the markdown+simple_tables+table_captions+yaml_metadata_block+smart is important.

Powerpoint export, using your example, is going to seriously change my life as a professor. I might actually use powerpoint in the classroom. I’ve not done that in ages.

Citation export, glossaries, and doing multiple notes are things I will explore.

Thanks for the inspiration here, in your simple pandoc example from 2021, and the bookends talk!

All, if you try this demo and it does not work for you, here is one reason. The paths to the word template, bibliography and csl file have been hard coded. You’ll wan to edit this to your paths. Best case scenario would be to update this demo and pull the paths in as variables from TBXConfig.

Also, it looks like an “=” is needed after -reference-doc and there should be a --metadata-file= after -f. I also removed the -f flag as it is not needed in this context.

2 Likes

@dtubb in your HTMLPreview command can you share what you’re doing with this markdown+simple_tables+table_captions+yaml_metadata_block+smart?

Ok, this is cool. The primary thing I’ve not been able to figure out is how to get the Word file to maintain the border formatting that I have in the templates.

Yes, the standard markdown will work, you’re right. But, if you’re passing HTML or CSS through this process, it will not, ergo the need for the div tags.

Here’s what I think is going on (and it probably should not be the asterisk(:

markdown+simple_tables+table_captions:

Enables simple tables and table captions in Markdown. This allows creating tables with headers and captions in the Markdown document that will be properly formatted in the Word output, see the Pandoc help.

  • +yaml_metadata_block

Allows adding YAML metadata blocks in Markdown. e.g. it can be used to specify document title, author, etc. that Pandoc can use when generating the Word document. This page has more It can probably be removed, for our purposes, as I don’t use it. I aspire to.

  • +smart: Enables smart punctuation conversions, like converting straight quotes to curly quotes, dashes to em-dashes, etc.

I think I picked up it all online somewhere a few years ago, and it seemed to work in order to get things to render in Marked with footnotes!

In short, it enables various Markdown extensions.

1 Like

Michael, I wonder if you might try creating first a Libre Office file or a Latex file from TB using pandoc, and then converting that file to Word? Apparently, this can be done in the command line for libre office.I came across a post recently, mentioning this intermediary technique to get better Word files.

That would be something to try, however, the only issue I’ve struggled with, with the current setup, is the formatting of table borders. Based on my cost-time analysis this is something I’ve learned to live with, for now.

I’m sorry if I don’t understand correctly your demand. I don’t use directly Tinderbox to export my YAML formatted documents. I copy/paste my file into RStudio. The solution you built is great and I think it must be possible to do it very well in Tinderbox on condition, I suppose, of inserting some piece of Latex code in your YAML, where you put your header-includes. For instance:

header-includes:
  - \usepackage{multirow}