Exporting rich text as plain text (for JSON export)

Hi! I was asked by someone to share some data I’d previously collected in a tinderbox document and I wanted to export all this as JSON for maximum shareability.

So I sat down with my export hat on and tried to figure out how to do this. I made a simple template:

{"name":"^value(attributeEncode($Name))^","eventrefnumber":"^value(attributeEncode($eventrefnumber))^","text":"^text^"}

There are more attributes I’m going to have to export, but this is enough of a showcase. For one exported note, I get the following:

{"name":"Taliban Compound Struck","eventrefnumber":"2009-11-CA-056","text":"<p>Dec. 2: Taliban Compound Struck</p>

<p>NEWS RELEASE ISAF Joint Command - Afghanistan  2009-11-CA-056 For Immediate Release  KABUL, Afghanistan (Dec. 02) - International forces conducted an air strike against a Taliban commander in a remote area of eastern Afghanistan yesterday.  The Taliban commander was the target of the precision strike in Kunar province's Dara Noor district, which occurred  in an open area away from civilian compounds or infrastructure.  Assessment of the strike continues.</p>"}

Actually when I view this file in VS Code it gives me a warning and shows me the file as such:

CleanShot 2024-03-23 at 12.19.05

So I can see that there are a bunch of invisible characters that tinderbox’s rich text window has created. This file is also not ‘good’ JSON also any more because of the newline character in there etc. I tried to open the file using jq and it complained that:

$ jq '.' /myfile.json

jq: parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 3, column 486

So I had a couple of questions:

  1. how can I do whatever escaping / conversion or urlencoding(?)/escaping I need to do so that I can export these arbitrary sections of text as part of the JSON export? I looked here but couldn’t quite figure out which of those was appropriate. Note that it’s probably ok to just replace all these errant characters with a single space or some kind of delimiter. In the end, the person just wants the text. Formatting isn’t so crucial.
  2. For my understanding: why, in the export template, do we do ^text^ and not ^value(attributeEncode($text))^? I saw in some places in the docs that the ^value($SomeAttribute) is the way to get those attributes, but it wasn’t clear to me why that pattern didn’t apply to the text attribute.

Any pointers to docs or suggestions on these two points would be much appreciated! Thanks!

I think, browsing a bit more in the forums, I found the answer to my problem:

{"name":"^value(attributeEncode($Name))^","eventrefnumber":"^value(attributeEncode($eventrefnumber))^","text":"^value(jsonEncode($Text))^"}

So $Text is the way to get the text. Not sure what that ^text^ thing is, but no matter :slight_smile:

And then the jsonEncode() function seems to do exactly what I need!

My code editor still complains about some of those hidden characters, but I’m able to read the data with jq no problem.

My next practical question would be, how do I update a whole bunch of notes such that they all use my new JSON template and not the default HTML one. I think I’ll also have to change the fileextension for when they’re exported as well.

Is this a job for a stamp? I’m a bit worried about that as there are thousands of these notes…

Is there any reason you are not using String.jsonEncode() or jsonEncode()?

1 Like

No reason whatsoever! I didn’t realise there was a function like that. I just saw you comment about using jsonEncode so I tried it out and it worked.

1 Like

Not quite sure how to trigger a full export. I made an agent which collected the 4835 notes together in one place, and then I selected them all and used the File menu to ‘export selected notes’, but only 1321 files were exported. I suspect that some of them have the same names? (1.json e.g.) so my question is how do I specify the filename of the files being exported?. (Or alternatively, tell Tinderbox to not overwrite files but rather add some increment value on the end etc).

Answering my own question, it seems that HTMLExportFileName is what I’m looking for!

1 Like

[My apologies for terse post earlier, as I spotted your question just as we were sitting down to lunch]

The other thing to watch for making JSON is the ‘trailing comma’ bug. Formally, JSON does not allow this:

[{
  "id": 12345,
  text": "foo"
},
{
  "id": 12345,
  text": "bar"
}, // <- trailing comma
]

Realising that loop-based output is liable to create the latter, many JSON based processes simply allow for the mistake. Others don’t, q.v. fuse.js fuzzy search—as trialled here. The search uses the whole of aTbref content exported as JSON data. I had all sorts of problems until @jackbaty kindly reported I’d left a hanging comma in my JSON that gave a silent [sic] fail in fuse.js.

I think the issue with attributeCode() in this context is the line breaks in $Text were not being turned into the \n suitable for JavaScript-based JSON, same for tabs, etc.

I’ve recently had cause (requests from users of the data) to export several large projects to JOSN. Luckily, they called for a flat JSON file with no nested objects, though Tinderbox can do that too: the current aTbRef TBX zip includes templates that show how I made the JSON for the new search feature. But, note that if exporting your Tinderbox outline as nested JSON objects you’ve a lot of ‘hanging commas’ to avoid (nor is this something the app can necessarily guess for you—you need to add ^export^ based checks accordingly).

2 Likes

Luckily no nested objects in my notes at all!

Still struggling a bit with the export + making sure that indeed all notes are exported. If I trigger File → export HTML, it seems to do a full export, but that the json files are respected and exported as such. (I set that export template on the note itself). But I seem to be getting different numbers of files out. A quick check on the number of exported JSON files:

find "/path/to/dir/" -type f -name "*.json" | wc -l

gives me 2214 files. But the agent I have set up to find all files with that prototype, it finds 4835 children. Not sure how to explain the difference. Could the agent be finding duplicates/aliased notes as well?

From what I can see, the bit in the docs that states:

“To avoid filename naming collisions, Tinderbox has to check if the intended export name already exists in the currently exported-to folder. If a duplicate name might arise, a suffix will be added.”

might not actually be happening? I can see / count that I have 59 items in one section, but on export, only 43 files are inside that folder. And there are no files with suffixes even though I can see a few notes with the same note name…

^text^ processes the styled text of the note through HTML markup and an export template, adding tabs to denote character breaks and such.

^text(plain)^ and ^value($Text) give you the unstyled text.

1 Like

Agents are designed to only alias each note once. If both original and alias(es) are in-scope of the query, the original is aliased in the agent. If only alias(es) of a note are in-scope the agent aliases the first alias by (source) $Outline order. So no dupes.

I would check settings of $HTMLDontExport (does this note export) and $HTMLExportChildren (do children of this container export) . The latter overrides the former. So if doing a whole doc export, you might have container with 10 exporting notes, but if the container is set to nt export its children, none of the children will export. However, if selected and exported using ‘Export Selected Note(s)’ they would export.

Doing whole doc export, notes only (re-export) if they have changes (Text and Displayed Attributes, IIRC—not any attribute). So, a good idea if resting whole-doc export is to delete any pre-existing files inside the target folder.

As you are counting ‘.json’ files with your command line, check that all files exporting to JOSN have the correct file extension set.

Notes are unaware of edits to their template (note). So to re-export a doc after a tem[plate-only code change you need to delete previously exported notes first.

Beyond that it might be necessary to take a look at the file. Likely the data may not be for public sharing so feel free to DM me if you’d like me to take a look in private.

1 Like

Thanks. I’ve def been deleting the whole exported data set in between exports just to be sure.

1 Like