DEVONthink Replicates vs. Tinderbox Aliases

john999 · November 19, 2019, 7:45am

Another scenario showcasing the importance of DEVONthink-like replicates: the WordNet use case. The synset music#5, medicine#4 which branches out to mutually exclusive hypernyns yet related synset hierarchy cannot be showcased in Tbx due to one level limitation of aliases. WordNet is the most accurate representation of the English lexicon and a powerful tool for cross-reference of hierarchical relationships. Relationships in Tbx are visually flat in both the outline and attribute browser. Hyperbolic view will loose its purpose on large data sets. Most features in TBx are also geared towards text body content and not so much towards container text and, in my opinion, this is a tremendous limitation especially for those that parse and derive relationships from chunks of broken body text - paragraphs, sentences, etc. Hierarchical visualization still rules…WordNet lexical database proves exactly that.

mwra · November 19, 2019, 10:04am

‘loose’ or ‘lose’? I certainly don’t believe hyperbolic view is necessarily constrained by node count. Indeed, it is upon my larger documents that I’ve been playing with its use.

Anyway, hyperbolic view doesn’t care about the (stored) outline of the notes, as it only plots linked items. In an outline parent/child relationships are assumed but not formalised as Tinderbox links (though one may add such if desired).

$Text (body text) vs $Name (container text)? I would say quite the opposite. Indeed, Tinderbox’s original design envisaged notes with small texts with little if any styling. Contrast long runs of styled (rich) text with embeds/inclusions as some would now prefer. Indeed, in my current large Tinderbox project most notes have no $Text at all and work is not suffering because of it. There’s nothing wrong with $Text, but it’s a more project leaning more on the link structure of the hypertext and the ‘text’ is essentially all metadata (attribute values). Using $Text instead of metadata would make the project much harder as most queries would need to regex mine $text instead of doing (easier/less computational) look-ups and tests of attribute values.

Other than the full unicode support and expanded visual styling coming on the back of the new v6+ app re-build I don’t think feature balance is overly towards note $Text.

As a long term forum mod/admin in this and previous Tinderbox user forums, my experience is text ($Text) related issues feature disproportionately in users’ questions. But we are a broad church, people discuss the issues that interest or bother them and we try to answer them. Tinderbox has a lot of features. Even after 14 years of writing and curating aTbRef I still find things I’ve missed, misunderstood or simply forgotten.

I do see the point of replicates, though they seem a functional sticking plaster for outlines. If we need replicates, it might imply we’re holding overly tightly to a particular view/organising style. For instance, if you link items appropriately you could use hyperbolic view to traverse the ‘outline’ without the need to bother about the storage (outline). Doing DAM work a while back, I saw this outline constraint run through attempts to support early photo metadata. Apps used to have complex (outline) trees of keywords but, apart form where strict vocabularies were used, outlines fell from favour due to the weakness described above.

Perhaps we all need to explore and help improve the hyperbolic view (still very new) as a better way to understand note relationships.

john999 · November 19, 2019, 10:52am

Hey “know”=now

john999 · November 19, 2019, 10:56am

Concept mapping is what Hyperbolic view is to Tbx, in theory. And hyperbolic view on an entire controlled vocabulary is not much of use when you have over 8000 concepts with relationships between them; but that’s my view on it.

john999 · November 19, 2019, 11:16am

And that is exactly my point! Why can’t $Name also be metadata (attribute values)? Every piece of textual information can be grouped in some way and to its exhaustive limit. A paragraph of textual body content can be parsed to create an exhaustive taxonomy. That can only be created by grouping narrower concepts. Once that’s accomplished, every piece of information becomes its own “block” which in turn becomes mutually exclusive to any other concept. You can find all the relationships in all the text you want but you will never “unstick” multiple paragraph content from multiple containers to fit the specificity of the narrowest of concepts. Anyhow, I grasp that the majority of people don’t think in terms of conceptual blocks and prose is the norm

john999 · November 19, 2019, 11:19am

I concur! … (extra dots required for reply minimum character length )

john999 · November 19, 2019, 11:33am

You can see all you want but the information will still remain static. Containers are not used for a predetermined scheme in my view. They are used as sticking points once information and relationships emerge. It’s a complicated subject matter which requires some understanding of linguistic concepts such as lexical semantics and a few other in the field of linguistics. And the courage to let go of the idea of textual matter. hierarchies are more powerful than you’d think…

mwra · November 19, 2019, 12:50pm

For sure. One discussion to have, as the view matures is to scope the view, in terms of how far out you plot. Unless you want to explore the whole, it may you might want to just look N links out from focus which inherently reduces the number of notes/links to plot. In this regard, I think hypertext tools in general are lacking.

Back in v5, I built a TBX for a client that was an 8 level outline with c.35,000 items IIRC. The performance issue is long outlines. IOW, the whole outline—as currently expanded—that usually extends beyond the part of it seen on screen and which we generally want to be able to scroll up and down at speed.

Where there is some nesting (e.g. aTBRef’s source doc) I find keeping branches folded helps—as the overall expanded outline remains smaller. For big projects, I work out of a root level container, i.e. such that all the ‘content’ (as opposed to templates, prototypes and general notes, etc.) can be ‘folded’ away in one place. Indeed, by hoisting that container (Focus view) in outline view it inherently reduces the immediate maximum scope (outline size) of the piece upon which you are working. sometimes it works better (in performance terms) if one uses several tabs scoped to different parts of the outline instead of scrolling around the whole thing. I’ll admit, the latter is my natural behaviour but I’m learning the the former so as to help not overload the view visualisation.

What are you doing with the WordNet corpus that is specific to Tinderbox? I ask out of genuine interest. Tinderbox isn’t a formal concept mapping tool in that it lets its user map out relationships but doesn’t intend to provide some formalisms of CM such as might be expected in a formal CM tool.

I’ve found Tinderbox is best for discovering structure. Once so discovered, it can be that —especially for large datasets—that some more specialised tool is a better place to build out the full data structure.

It is. I wonder if out terminology/frame if reference is aligned? The $Name attribute is the ‘title’ of a note. It is used to title items in the view pane (more strictly, $DisplayName is used) and it is used as the title at the top of the text pane ($DisplayName is not used here, as sometime you need to be able to see the actual $Name). So $Name is metadata. If doubting, add $Name to a note’s key attributes table.

$Name has a different role when it comes to linknig within Tinderbox. Although, under the hood, a link uses note $ID values, generally we either drag links in the UI or use action code. Action code uses a note’s path (IDs were invisible to the user until c.v5+), i.e. $Path or its $Name as a proxy. An implicit assumption is that $Name is unique within the current TBX, otherwise the full $Path is needed). A confusing factor that the linking code doesn’t allow for escaping† certain common non-alphanumeric characters such as parentheses and semi-colons, as these are misconstrued as code rather than content. Not a problem if planning your novel, more of a problem if the note title ($Name) is a reference to something like a scientific paper. You can bowdlerise the $Name to aid automated linking, but then you need to ensure you retain the true title (normally in a separate user attribute) for all other purposes. Of course if a note becomes a container its $Name is now part of its childrens’ $Path, so the same care needs to be taken.

† Sadly, I’ve yet to ascertain and document the true full set of ‘bad’ characters for $Name/$path value with respect to links.

Most won’t trip across this issue of titles and link-safe characters, but when you do (as in my own personal research work) it is a bit of a headache. But as an edge case, it’s not what the majority of users face so can’t be presumed to be high up on a fix list (as I suspect it involves code very much at the guts of the app and so not the area for light surgery).

Amen. I didn’t mean to appear to disparage them, not least as I work primarily in outline for my personal research work (due mainly to the size of the datasets).

john999 · November 19, 2019, 1:19pm

Hmmm…very good tip. Never thought of that in terms of performance.

john999 · November 19, 2019, 2:17pm

Our thinking is aligned…Tbx’s features are geared primarily to work with $Text rather than $Name. For instance, you can create #tags in your text pane, create a stamp, and then automatically extract textual #tags into attribute tags - kudos to @JFallows. A paragraph can be in the form of prose or in the form of a list - list paragraph, etc. Since each paragraph has a main idea, each main idea can become a container for that paragraph and each sentence can turn into multiple containers itself e.g. a sentence that explains, reduces, and clarifies. Those verbs can become containers to the main topic of your paragraph. Hence, the idea of exhaustive taxonomy. That paragraph is exhaustively parsed to its most specific semantic relationship. Now, the idea of extracting #tags from text to attribute tags using a stamp is no longer viable in this circumstance. Most of my main concepts have turned into containers. However, it would still be of great use if I could still create a stamp and extract the #tags from the container $Name.l but that is no longer possible. This is obviously very specific to my case - even though Tbx encourages small chunks of body text…sense the irony here?. The idea is that what remains of most of my $Text is a description of the most specific concept/topic/object and therefore the $Text has become 80% $Name. Semantically, the $Name is no different than $Text. I’ve simply arranged the Text (Just as one can create a list paragraph from a sequential item list in running prose paragraph) and the I’ve structured the text into into containers (as sections contain paragraphs so can main ideas contain sentences)…of course Tbx cannot differentiate between my choices because it can only think in terms of $Text. The point is that $Text means very little in the way I use it but most features in Tbx favor $Text. I don’t know if I was capable to explain well enough to make more sense…I hope I did

john999 · November 19, 2019, 2:29pm

Also think Codes and Quotes in Atlas.ti! The tool understands that paragraphs are meaningless to a user and that’s why you can code quotes and arrange them hierarchically. Again, it allows breakdown of information to its most specific concept/topic…

john999 · November 19, 2019, 2:58pm

I parse content based on semantic relationships and WordNet groups words in terms of synonyms (called synsets) and senses (all senses a word might have). In the English lexicon, these synsets cannot all be mutually exclusive. If im not mistaken, I believe WordNEt has a hierarchal depth up to 16. Some, of these synsets are mutually inclusive in different domains/concepts and at different hierarchal levels. I constantly run into a need to classify in different containers based on WordNEt and with thousands of notes I can’t worry about manually duplicating repeating concepts in difference containers. Tbx allows for max 1 level cloning/replicating/aliasing of containers. Considering WordNet comprises a system 16 levels deep, it becomes difficult to manage a concept that needs to live multiple levels deep and across multiple groupings.

mwra · November 19, 2019, 2:58pm

TBH, neither had I until I did by mistake. One wrinkle though, if doing linking, all tabs share the same link park, but separate document windows do not. There seems no gain in the latter and I’ve made a feature request that all windows (i.e. all tabs) share their link park. But for now, the preceding is the status quo.

mwra · November 19, 2019, 3:11pm

Thanks - and interesting to hear about the task.

I may be misunderstanding, but the description doesn’t imply a need for an alias to support descendant context as—if I’ve understood correctly—you are testing if a term is ‘in’† a particular container so its descendants are moot. I can see it’s a pain to all lots of aliases across the corpus but it’s actually less content than discrete notes.

† Be aware that, for legacy reasons, the inside() query is more expansive than you might initial consume. It’s true not only if the original note is in that container but if an alias of that note is there. By comparison, the same is not true of descendants and descendedFrom(). To save asking, there is already a feature request for a more restrictive version of the inside() query that only counts originals as a valid match.

In Tinderbox vernacular the tags are in $Tags; this represents the general notion iin many apps of a single grab-bag of terms (or ‘keywords’) relating to an item. So for a note tilted “Some Note”, you can get a list of the note’s tags like so:

$MyList=$Tags("Some Note");

$MyList now holds a list of those tags.

mwra · November 19, 2019, 3:39pm

So let’s assume the container is title “Topic X”. You stamp used on it’s descendants would be:

$Tags = $Tags("Topic X");

In fact there are a host of different ways, some more automatic, via which this can be done. Prototypes/Inheritance is another method for approaching this.

Where I may be missing something is the meaning of the term #tags. I think this refers to the method, especially in some simple note tools without discrete metadata fields, to mark a word (or underscore_joined_phrase) as a keyword/tag. I only ask because if wrong, can you set me straight as to your meaning. But I’m unclear as to what the ‘tags’ in a name. I guess you have a note called “Some term #tagA #tag_B”. Is this so.

Not for Tinderbox. When you refer to these in the app, the $Name is the title of the note, the $Text is the note’s body copy, though there’s no reason the title ($Name) string can’t be repeated in the $Text as the first paragraph (or any p[art of $Text).

I don’t believe Tinderbox has ever ha, as a design intent, a decomposition of $Text. You can do various tests and manipulations on $Text. To Tinderbox a paragraph is, to my understanding any substring delimited by one or more consecutive hard line returns. there is do defined concept of a sentence (you’d need to regex parse for all possible grammatically allowed‡ sentence end markers)

‡ Remembering things like BrE/AmE differences on closing quotes before/after closing punctuation.

I don’t understand what these ‘choices’ are. Could you explain a bit more?

All interesting stuff. I’d agree that an 8k plus item outline is going to require careful useful to remain responsive, but I’m not seeing anything impossible yet.

If it helps us better understand the issue, buy all means post a small TBX with enough info to explain the sorts of tasks you’re trying to do and can’t, or if easier post a before (“What I have”) and after (“What I want”) file. Perhaps, if doing so start a new thread to avoid undue topic drift here (conscious we’ve already wandered some way from replicants)

My hunch is overall size is impeding normal initial-stage process formalisation (in terms of use of Tinderbox features to best advantage), quite naturally leading to frustration and lack of process. Anyway, if we fellow users can help, just ask away!