TEI and oXygen Editor for Tinderbox Documents

andreas · March 15, 2018, 12:08am

Is it possible to use the TEI framework within a Tinderbox-document, which – if I’m not completely mistaken – is basically an XML-Document.

Is it, hence, possible to work with Tinderbox-Documents within XML-Editors such as oXygen? Where do I have to be careful?

mwra · March 15, 2018, 8:43am

What is ‘TEI’ as distinct from general XML? A TBX file is valid XML, which I’ve documented lightly here. Note that the text ($Text) of a note is stored in Rich Text ‘RTFD’ form so if reading/writing $Text you’r XML editor will need to be able to translate RTFD.

Are you looking to edit existing XML or to add custom elements to it.? Although , over Tinderbox’s life some people have created TBX files outside Tinderbox, I don’t believe that mechanism was ever a design intent but rather an affordance of the fact the file format is XML. IOW, I don’t think XML editing of TBXs is heavily tested and I’m unaware of any documentation of it beyond the link above.

I think you would do better to contact Eastgate directly as you conversation is likely to move beyond the scope of normal Tinderbox use. HTH!

andreas · March 15, 2018, 11:55am

Thanks @mwra for looking into this.

I just came across TEI as a supposedly widespread approach of working with XML mainly in the realm of the Digital Humanities and social science in order to represent text in digital form using a specific guideline = TEI-Guideline suggested by the TEI-Conrtium.

It would be interesting to see if Tinderbox could accommodate those TEI-Guideline (basically a set of specific tags) as a specific Attributes-set.

The idea then could be to group those TEI-Attributes (like the built-in Scrivener- or Simplenote-Attributes) trough a Prototype in Tinderbox as KeyAttributes which need to be filled in in order to produce marked up Text (=XML) within Tinderbox that later could easily be displayed outside Tinderbox without needing to export the data using html-Export. Think of Tinderbox, in this use case, as an explicit XML-Editor just like oXygen-XML-Editor.

Does that make sense?

Anyone familiar with oXygen-XML-Editor?

mwra · March 15, 2018, 12:30pm

I think this is potentially divergent from where Tinderbox design has headed of late. Those expressing an opinion on $Text generally seem to be pushing it further from plain text suitable for holding embedded mark-up into a more of an RTF/word-processor like space.

From what I could ascertain from a brief visit to the TEI (which lacks any small, clear examples of TEI format) this is a method for marking up plain so as to capture all kinds of aspects of the text such as might arise when doing (humanities) analysis of text - especially non-digital sources, e.g. capturing manuscript marginalia, etc.

Looking at the TEI XML code samples I fund, I think TEI would need quite a lot of attributes defined for mark-up (I couldn’t find an exact list of TEI tags).

I see the benefits, though it seems quite a large engineering commitment (i.e. cost) to Eastgate for some niche functionality. I 'd suggest take this idea directly to Eastgate with a bit more detail as to the changes you propose to the app - it seems the issue is less the XML format of TBX files but rather, how to make Tinderbox insert customised TEI ‘alternative’ $Text into a TBX.

andreas · March 15, 2018, 12:45pm

That makes sense to me @mwra. Thank you!

I will contact @eastgate and ask what they think about it.

eastgate · March 15, 2018, 3:46pm

I’ve been generally familiar with TEI for a long time, thanks to my association with hypertext pioneer Elli Mylonas (Brown University). It’s an important and rigorous approach to encoding texts – to marking them up in sensible and meaningful ways.

The goals of TEI are orthogonal to Tinderbox. TEI is about representing texts and manuscripts – about capturing all the text’s interesting facets in a format amenable to data processing. Tinderbox, in this context, is about commentary on the text – about analysis and coding.

One thing that might interest you, though, is the following exercise:

Take some modest, interesting piece of text that you’d like to encode. Say, a Marvell poem.
Split it into the highest-level structural unit you want to mark up: perhaps, stanzas.
Split again into the next highest-level structure: perhaps, lines.
Now, write export templates to export this as validly-encoded TEI.

This should be straightforward. Once you’ve mastered this, you can increase detail and find more ways to generalize the approach.

jmm · March 15, 2018, 11:10pm

Hi to all. I haven’t had the time to keep looking into the technical aspects of Tinderbox lately, but I would like to have a say on this, as it is not exaggerated to say that the rtf path will progressively exclude Tinderbox from my workflow.

Web links are my main problem with rtf. I use them in Tinderbox for inline referencing because it makes no sense to me to keep notes so short that it is feasible to write their source as Tinderbox attributes -otherwise great for classification purposes.

The XML format you are discussing is important for preservation, and perhaps for quick and risky editing. Let’s look at what happens with web links in TB’s bakstage. Here are the three takes at them I can think of:

This is what gets encoded in the test.tbx file:

For the web link in the first paragraph, there is no immediate way to view whether it has a source:

<text >At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.</text >

<link name=“*untitled” sourceid=“1521157429” … URL=“x-devonthink-item://F904B108-33C1-4B3C-A24B-4D2771D7BE8F” />

For the second paragraph, I can see that the paragraph has a source, but there is no immediate way to identify its link because there are many untitled links in the .tbx file:

<text >Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet [@notitle2010, 203]. </text >

<link name=“*untitled” sourceid=“1521157429” … URL=“x-devonthink-item://B4FEE38B-E303-499F-A206-D65AA61BACA2” />

Only for the third paragraph both the source and its link could easily be identified:

<text >Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet [@title2015, 408]. </text>

<link name="[@title2015, 408]” sourceid=“1521157429” … URL=“x-devonthink-item://7E780413-4FD4-4C09-9146-0089874A3801” />

Therefore, it is important for me to follow the third procedure: write titles for web links. The problem with web links in Tinderbox is that they seem to be placed to a specific location in the pane, not integrated in the text. For me this has proved to be an unsormountable problem to use Tinderbox effectively.

This problem would be perfectily solved with markdown, because the link would be in the text, written as [@markdown2015, 408](x-devonthink-item://3ADA3328-585F-404A-9CF9-EC60447B25F1), automatically rendered as @markdown2015, 408.

I understand Tinderbox as an app centered in content. It has great export options to other file formats, so that the file can be further processed with specific tools, either its content or its formatting for final publishing in the web or pdf. The customizable import and export options are a great strength because TB does well what is thought for, and then it can be integrated in a wider processing workflow.

Both for content and format processing I find markdown superior to rtf. As it happens now, I cannot even reelaborate the text content of annotations in Tinderbox because their references get displaced. For the same reason, text cannot be transfered back and forth to Sublime Text for heavier text manipulation tasks.

If I may give my opinion as a user, I already find important for TB to follow the more modern markdown path. Admittedly, I don’t know how hard it would be to implement. There are many inspiring open source markdown editors, both for the desktop and mobile: Leanote, Turtl, Markdown Edit, Meemo, Dillinger… and my favourite, because of its live preview: Typora, although not open source.

PaulWalters · March 15, 2018, 11:43pm

Tinderbox 7 already supports markdown, both for preview in your document and for export. There is no need to be limited to either RTF or plain text. You have the choice of each.

jmm · March 16, 2018, 12:02am

Yes, TB supports previews and exports for markdown, but not other crucial features. The convenience of live preview for rtf in the text pane is not available for markdown: one would have to constantly switch back and forth to the preview pane (the DevonThink links above in markdown are very distracting to work with without live preview); links between notes inside the text of markdown notes are not possible, don’t render as links among notes in map view, etc.

PaulWalters · March 16, 2018, 9:10am

Wouldn’t this require a rewrite of most of the text, views, export, link, and map functionality of Tinderbox? Rather large. And potentially break a lot of folk’s existing documents?

mwra · March 16, 2018, 11:21am

@jmm Forcing people to use Markdown or similar for $Text would suit only a few users and pose an nigh insurmountable barrier to use to many Tinderbox users who have no knowledge of or interest in ‘code’ (to whom mark-up code is the same a programming). We need to be careful in assuming the needs of the general Tinderbox user.

I think the purpose of link ‘names’ is misunderstood above. The name parameter in the stored XML <link> (here) actually relates to the link’s link type: see here and here. Tinderbox’ link storage method reflects pre-Web hypertext. The links are stored in a discrete link table outside text. For text or web links (i.e.based on selections in $Text) the link anchor text is not stored as text but as offsets to the plain text representation of $Text. Aside: I presume this is why, in the XML, each note has a <text> element as well as the <rtfd> tag representing the RTF form of the $Text. The link table is stored as a discrete table external to any individual note, the connection to notes being done by way of the $ID value.

@eastgate can confirm but my assumption, based on the default link types in TBXs, is that the link types are a nod to early hypertext notions of using links to represent argument or process and to facilitate computation across the hypertext (although initially Tinderbox macros had no facility for the latter, it coming in advances to action code c.v5). As link types are actually seen most often in Maps they also get used as visual link labels and form the only real user-definable metadata on the link (a link not being an object itself).

A further current complication is the move to Apple frameworks in v6+ gave aspects to framework methods like smart quotes and smart links. I believe the aim was that ‘smart’ links—i.e. those (web) links auto-detected/generated in RTF text—would be ‘adopted’ as true Tinderbox links but initial results showed this to be problematic. For now smart links exist only in the RTF text (and thus only the stored <rtfd> data) and are not normal Tinderbox links. I believe the hope is at some point to reunite the two, again Eastgate would be better placed to comment.

All that background aside, it does seem there is a possible emergent feature request for the (plain?) anchor text of a note to be recorded as part of (or accessible via) the link both in the underlying XML and in export (in the general ‘HTML’, i.e. marked-up export sense). IOW, being able to pass to another rendering/reader system the plain text forming the anchor to the link.

jmm · March 19, 2018, 12:02am

I need to pass links within text (web links in TB’s terms) to other apps. But HTML tags do not allow for the text content included in the format to be further processed. Therefore, my feature request would be to get those links as text.

Moreover, within TB the automatic processing of text is not possible without loosing all linked references beyond recovery. In the following image one web link has been lost (originally in “HERE”) and the other (originally in [@smith2016, 340]) misplaced after a simple $Text.replace.

Additionally, it shows how I used to title links. And by the way I don’t get what Target, Title, Class are for.

I still think markdown would be a solution, and incredibly easy for those prepared to learn to use Tinderbox. But I could write my notes (that is, reference and rewrite them) if this behaviour of web links is corrected.

This is if I am not missing some other approach to referencing and rewriting annotations in Tinderbox.

PaulWalters · March 19, 2018, 12:18am

http://www.acrobatfaq.com/atbref7/index/Dialogs/CreateLinkpop-over.html

Perhaps there’s a misunderstanding, because it is not the case that web links are “not integrated in the text”. A web link can be added to any text in the text pane, and Tinderbox creates the proper HTML tags to export that. Any Markdown processor does exactly the same thing – converting the markdown tags to the same HTML that Tinderbox produces.

For example this:

which includes two web links, including one in DEVONthink format, is rendered by Tinderbox HTML export as this

If you were to examine HTML produced by a Markdown processor such as Marked, you would get the same results.

So, this note using Markdown

produces Markdown that is rendered by Marked as

<p>This is a <a href="http://www.google.com">link to a web page</a></p>

<p>This is a <a href="x-devonthink-item://91DFE88C-C78A-4F79-BD74-7A40A8749468">link to a DEVONthink document</a></p>

Same thing that Tinderbox produces. (Markdown aficionados tend to forget that the point of Markdown is to render it as rich text, eventually. In this case, rich text produced by HTML.)

Sorry to be dense, but this thread is very difficult to follow. It is not clear what the complaint is.

mwra · March 19, 2018, 12:08pm

From your screen-grab, I can see you are setting the link type as your ‘title’. Link types are only used internally, to differentiate different sort of links and they are (by default) shown as link labels within a map view. Regardless the original design role of the link type is as a means of typing links in discrete groups.

As explained in the article indicated by @PaulWalters, above, when generating web links—which by original design are HTML links—the ‘title’ box in the link creation dialog populates the HTML title attribute of the <a> link element. Here it is in the HTML 5.2 spec: title, along with target and class for CSS/DOM use - (see v5.2 spec <a> element). There is no means to set an id attribute for the exported link.

A known limitation (feature enhancements have been suggested) is that these link elements cannot yet be read/set via either action or export code. Thus setting any of title, target or class (or exported id) for a link needs to be done manually which makes it an inefficient task at scale. It might be useful, if this aspect of link/export control is improved, to have option of the link anchor text (for text or web links) being used as the link ‘title’.

Separately, I’m not sure about the effects of replace actions within $Text or the design intent in this regard.

I think the way to look at the overall issue is not to class these problems a failures against a user workflow for which the app was never explicitly designed but rather to lo0k at feature requests that would remove or alleviate these issues without affecting other user activities. Removing RTF text spaces for others in favour of forcing them to use Markdown to facilitate XML work would seem a retrograde approach. FWIW, I’m certainly amongst those who’ve made previous suggestions in the export context so I’m not hostile to the use case we’re trying to resolve here.

jmm · June 6, 2018, 9:03am

I’ve tested other approaches before replying to your knowledgeable comments. I now agree that markdown is not a must.

Tinderbox stores text in xml and supports markdown and html for preview and for export. The technical separation between xml and the text editing panel in rtf scapes me. But I keep having the problem that after using Split, Explode, $Text.replace, etc., weblinks are displaced in Text, Preview and, as HTML tags, in the HTML pane.

I have resorted to writing manually the full HTML (no more markdown) code for web links in text, as <a href="x-devonthink-item://C24D39F1-6429-4932-AD95-8BD43141789F">[@Smith2016, p. 34]</a>. This avoids the automatic displacement of weblinks, but it doesn’t come without problems of its own. I still wonder if a feature request to @eastgate like the one quoted above could solve the problem.

andreas · October 3, 2018, 10:53pm

I have to bring this up again as I was trying to use the xml-Importer of Airtable to import a Tinderbox-file.

I therefore duplicated the Tinderbox-file, added the extension .xml and thus seemed to be good to go.

As an XPath I tried many different options:

//item
//attrib
tinderbox/item
tinderbox//item

and so forth.

But the only thing I managed to map was (using XPath //item): text (=$Text). Anything else like $Subtitle or $Name I couldn’t map.

Any ideas, @eastgate, @mwra, @PaulWalters, @jmm?

PaulWalters · October 3, 2018, 11:58pm

Tinderbox XML is very dense with a lot of structure and a lot of tags – and probably the majority of the data in the XML would be of no use to your Airtable database. In other words, trying to xpath the Tinderbox XML you’re going to do a lot of trial and error testing that can be avoided.

If I were doing this (and I haven’t attempted it) I would export from Tinderbox to plain text with an export template that creates XML specifically designed just to have the data I want to import to Airtable. That way you have total control over the process.

So you’d basically create an intermediate XML file and then import that to Airtable.

Even simpler, is to export from Tinderbox to CSV (covered in several threads here already) and import the CSV into Airtable.

mwra · October 4, 2018, 6:40am

I’d concur with the last.

If you haven’t already, do read the section of aTbRef that describes the Tinderbox XML file structure. Bear in mind that a note’s inherited attribute values (default or via prototypes) are not stored as these are derived from source once the file is loaded.

andreas · October 4, 2018, 7:35am

Thank you @PaulWalters and @mwra.

Searching the several CSV-threads … I came to find that this one was straight forward and worked as advertised – thanks to @mwra .