Export individual HTML files from agent results

From everything I’ve read an M1 Mac is a great place to start!

If you’ve got Pandoc working via the command line then it shouldn’t be that hard to get it working via script. What is the complete command you are using from the command line that works?

It looks like this:

Mac-mini-Arek:pandoc shijianhui$ pwd
/Users/arkadiuszszlaga/pandoc
Mac-mini-Arek:pandoc shijianhui$ pandoc --version
pandoc 2.13
Compiled with pandoc-types 1.22, texmath 0.12.2, skylighting 0.10.5,
citeproc 0.3.0.9, ipynb 0.1.0.1
User data directory: /Users/shijianhui/.local/share/pandoc
Copyright (C) 2006-2021 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
Mac-mini-Arek:pandoc shijianhui$ ls
test.html
Mac-mini-Arek:pandoc shijianhui$ pandoc test.html -f html -t markdown -s -o test.md
Mac-mini-Arek:pandoc shijianhui$ ls
test.html test.md

Thanks again for help!

And what Pandoc command did you use in the script that didn’t work (including path)?

I’ve tried twice with different paths:

-- Have an export folder ready. Select Tinderbox notes and run. 
-- Assumes Pandoc is installed at /usr/local/bin/. See https://pandoc.org/installing.html
-- NB: overwrites any like-named .md files in that folder

set prefix to "@" -- character(s) used to distinguish internal links from external ones

set pandocCmd to "/Users/shijianhui/.local/share/pandoc -f html -t markdown_mmd" -- html to MultiMarkdown
set sedCmd to "sed -E 's/" & prefix & "(\\[.+\\]).+\\)/[\\1]/g;t'" -- grab anchor, surround by [[ ]]
set cmdStr to pandocCmd & " | " & sedCmd -- assemble the "pipe"

set theFolder to (choose folder with prompt "Choose a folder to receive the exported MD files")

tell front document of application "Tinderbox 8"
	repeat with aNote in selections
		tell aNote
			set theFilePath to POSIX path of theFolder & (value of attribute "Name") & ".md" -- name file after note
			set theHTML to evaluate with "exportedString(this,$HTMLExportTemplate)"
			set theMMD to do shell script "echo " & quoted form of theHTML & " | " & cmdStr
			do shell script "touch " & quoted form of theFilePath -- create file if doesn't exist
			do shell script "echo " & quoted form of theMMD & "> " & quoted form of theFilePath -- write to file
		end tell
	end repeat
end tell

and

-- Have an export folder ready. Select Tinderbox notes and run. 
-- Assumes Pandoc is installed at /usr/local/bin/. See https://pandoc.org/installing.html
-- NB: overwrites any like-named .md files in that folder

set prefix to "@" -- character(s) used to distinguish internal links from external ones

set pandocCmd to "/Users/shijianhui/pandoc -f html -t markdown_mmd" -- html to MultiMarkdown
set sedCmd to "sed -E 's/" & prefix & "(\\[.+\\]).+\\)/[\\1]/g;t'" -- grab anchor, surround by [[ ]]
set cmdStr to pandocCmd & " | " & sedCmd -- assemble the "pipe"

set theFolder to (choose folder with prompt "Choose a folder to receive the exported MD files")

tell front document of application "Tinderbox 8"
	repeat with aNote in selections
		tell aNote
			set theFilePath to POSIX path of theFolder & (value of attribute "Name") & ".md" -- name file after note
			set theHTML to evaluate with "exportedString(this,$HTMLExportTemplate)"
			set theMMD to do shell script "echo " & quoted form of theHTML & " | " & cmdStr
			do shell script "touch " & quoted form of theFilePath -- create file if doesn't exist
			do shell script "echo " & quoted form of theMMD & "> " & quoted form of theFilePath -- write to file
		end tell
	end repeat
end tell

I’m still learning AppleScript so there is an extremely high probability that I have made a silly mistake and don’t see it.

The trick is to have the path point to where Pandoc is installed on your machine, which may vary depending on the method of installation.

Pre-M1, if Pandoc is installed via Homebrew, then the path used in the original AppleScript should work.

set pandocCmd to "/usr/local/bin/pandoc -f html -t markdown_mmd" -- html to MultiMarkdown

Sometimes the $PATH variable needs editing. When you enter
echo "$PATH"
(with the quotes around $PATH) in the command line what do you see? Does it include /user/local/bin ?

If you’re still stuck then maybe @Bernard-0 or someone else who knows more about the command line can help. It shouldn’t be too hard, and is probably worth it, because Pandoc does some useful things.

1 Like

Pandoc doesn’t mind filenames with spaces at all :wink: When you have a path with a blank space in it in the CL you need to escape it or add quotes:

cd ~/Dropbox/Michael\ Becker/ 
cd "~/Dropbox/Michael Becker"

Otherwise, it would read the empty space as the end of the path.

It’s the same in the M1 Mac :wink:

2 Likes

I have reinstalled homebrew and pandoc and everything is working now as it should, thanks for your help!!!

I’d add an ‘amen’ here. For those of us who use the Command Line less often this might seem like ‘dumb design’. But, spaces matter on the Command Line.


To expand for later readers who aren’t familiar with the Mac’s Unix Command Line…

A very simplistic analogy is that Tinderbox stores a list as one long string so needs the user (or you code) to insert semi-colons so the apps can process the string into a set of list items. In that context, the space in your command line code act as delimiters. In the CL this is normally how the computer separates the program called (which usually (always?) comes first) from input parameters. Consider this CL - exactly what it does isn’t important, just the layout of the code:

sort -k1,1 -k2,2 test.txt

Look for spaces and we see the sort operator getting three discrete inputs: k1,1, -k2,2, and `test.txt’. So we run our test and need to try a new one for which we provide an a file ‘text 1.txt’. so, unthinking, we type in:

sort -k1,1 -k2,2 test 1.txt

Now the sort gets four discrete inputs: k1,1, -k2,2, test, and `1.txt’. Not our intention! We need to tell the CL that ‘test 1.txt’ is all one input, so:

sort -k1,1 -k2,2 'test 1.txt'

So if doing lots of CL work I tend to avoid that bu not using spaces in the names, e.g. :

sort -k1,1 -k2,2 test_1.txt

You could still use quotes there, if its it’s a quasi-rule you’ve learned:

sort -k1,1 -k2,2 `test_1.txt`

The quotes aren’t needed, but, whatever. :slight_smile: For the same reasons, if making OS folders where I’ll be using full or partial OS paths (e.g. partial to sub-folders) I tend to use all lowercase a-z0-9_ characters only. It saves lots of head-scratching.

Sorry, bit of a long answer, but coming to CL work via Tinderbox ‘just’ to do something in Tinderbox, I never got to sit in Unix 101 classes so the above sort of issues tripped me up. Plus no one tells you (until you screw up) as they fall in the “too obvious to warrant mention category” .

Oh, and CL experts feel free to correct me if the above is wrong. I do realise '' and "" quotes get treated slightly differently but I don’t think that nuance matters in this context.

2 Likes

Yes.

The RFF part is a little more problematic. If you use ^value($value)^ in your markdown export template TBX will convert RTF bullets to markdown bullets. As for the rest it will pass it through as straight text.

Just tested! Works great. Thanks did not know that.

1 Like

@sumnerg, per your comment on this thread about being able to add commandline: Updating one note's attribure value with the attribute value's of another note that has the same name - #15 by sumnerg.

How would one go about merging your export individual HTML files script with your add “_” to spaces script (both above) with triggering the addition of a pandoc command line to take the HTML file and cover it to a .docx file, e.g. “pandoc -s test_1.html -o test_1.docx”? The test_1 would be the name of each individual exported note.

@satikusala Do you want the result to be one .docx file including the contents of the selected notes or an individual .docx file for each note?

I can get the one integrated file natively withing Tinderbox, select the container, then the export selected note. Tinderbox will merge all the files into one doc and the pandoc command line will convert it to a docx.

What I’m missing is the ability to export a batch of individual files and convert the to the docx. So, I think all of your existing code will work, just need to figure out how to add the pandoc command line to it.

Feeding individual files to pandoc would require scripting Finder.

If your goal is one .docx file with the contents of a Tinderbox container then wouldn’t it be easier just to add the pandoc command to a script that exports that container? (That is easily done, I think, if that is your goal).

Or are you trying to combine files exported from different TBX or something like that?

EDIT. I think I see. To convert to .docx it seems you have to write to a file first.(?)

Here’s a revised script that converts the selected notes to individual .docx after exporting them to .html files.

For scripting simplicity the .docx extension is just tacked on the end after .html. That can be made more cosmetic if needed by adding some more lines.

-- Have an export folder ready. Select Tinderbox notes and run. 
-- NB. Will overwrite existing files of same name.

set text item delimiters to "_"

set theFolder to (choose folder with prompt "Choose a folder to receive the exported HTML files")

tell front document of application "Tinderbox 8"
	repeat with aNote in selections
		tell aNote
			set theFilePath to POSIX path of theFolder & (words of (value of attribute "Name" as text) as text) & ".html" -- name file after note
			set theHTML to evaluate with "exportedString(this,$HTMLExportTemplate)"
			do shell script "echo " & quoted form of theHTML & " | " & "/usr/local/bin/pandoc -s -o " & theFilePath
		end tell
	end repeat
end tell

tell application "Finder" to repeat with aFile in (files of theFolder as alias list)
	set fileName to POSIX path of aFile
	if name extension of aFile is "html" then do shell script "/usr/local/bin/pandoc -s " & fileName & " -o " & fileName & ".docx"
end repeat

1 Like

@sumnerg, you are AWESOME! Looks good. I see what you mean by the .HTML extension being added along with the .docx extension. Pandoc, however, is converting the file perfectly, which is great!

RE the extra lines. I think I get what you mean, just need to tweak the AppleScript a bit.

I’m amazed with how cool this is. You’re right, it is really neat how you can use AppleScript to manipulate TBX attributes.

Thank you Sumner for your work with .docx How would I export .md files instead?
Unfortunately, I do not know applescript and am a newbie at Pandoc.

Thanks in advance,
Tom

Hi Tom. I’ve had good luck with the script behind the first disclosure triangle (‘Script that exports multiple selected notes to a folder’) in this post. It assumes you are writing “as normal” (e.g. styled text, or “rich text”) in Tinderbox and want to export to (Multi)Markdown. Just post if that is not what you want or you can’t get the script to work. Sumner

Awesome. It worked like a charm. Thanks Sumner!
Tom

Great job guys, I have a lot to learn from you !

1 Like