Two more questions

GaiusScotius · May 24, 2021, 8:33am

First, thanks to everyone who posted answers to my previous questions, I greatly appreciate your help and would have jumped into the Zoom meeting on Saturday had it not been an Important Person’s birthday!

Two more questions if I may.

First, I get the impression that TBX documents are standalone, or at the very least intended to be standalone. Does this precluding creating links between notes in different documents? Let me give a use case. The UK Office of National Statistics publishes a Standard Occupational Classification, basically an extensive list of jobs. By extensive, I mean a structured set of about 30,000 of them. I have in mind importing this into a TBX document to create a structured, navigable and queryable view of what is otherwise just a vast spreadsheet. Each occupational description – each line in the ONS spreadsheet – would become a separate TDX note. I then want to be able to reference this classification from notes in other TBX documents (i.e. create a link to a note in the SOC document from a note in another document). Is this possible?

Second question. Action Script is fine for manipulating notes – that’s what it was designed for – but I’d like to able to run statistical analyses of information held in collections of notes, which is much better done in a specialist language like R. Is it possible to call out from from Action Script to functions or scripts in another language (what I’m really asking is, does Action Script have an FFI?). Languages of choice would be R, Lua and Python. I’d rather not have to write to out intermediate files if that can be avoided.

As an aside to Q1, I’d like to make the SOC document read-only so that it can’t be accidentally mangled. I presume TBX would be happy with that?

Also, there are clearly many, many ontologies of potential benefit to users across the world. I for one am an avid user of Princeton’s WordNet. Is there a community resource for sharing such things?

Thanks again for your help.

mwra · May 24, 2021, 1:44pm

No. TBXs are self-contained.Apart from data that means customisations link link types, user attributes, etc., are local to a single doc (there are ways to migrate settings, but that’s a separate topic). Tinderbox can open more than one document at a time and you can create as many TBX files as you word. IIRC, there are about 600 TBXs on my system, but note that’s a combination of private work, consulting, and all my Tinderbox community support works (people’s problem files, demos I’ve made, etc.)

A link from a note to a note is possible via the tinderbox:// pseudo-protocol (as also used by apps like DEVONthink, Bookends, etc.) which effectively works as a local URL pointing inside an app’s data: see more.

Tinderbox doesn’t have fixed limits, but it does have practical ones. A low en-end CPU with 8MB may handle a big doc less well than a fast one with 16 or 32MB RAM. If you have a lot of notes, keep them small and clean (avoid embedded images in text, complex text formatting) and don’t try and do lots of regex based queries across all notes all the time. In the latter case, IOW some tasks scale badly and obviously so—after the fact.

In short, yes, but don’t expect everything to be instantaneous. I think a better way to go is to export data tables from Tinderbox (unlike some apps—no names—export is very flexible). For instance you might do best to surface the stands of connection in Tinderbox and export to a large data table, either Tidyverse-ready, or manipulated in R. The you can create R scripts Tinderbox can call and/or go R → R/Bookdown (print) or R → R/Shiny (interactive, web) to play with the data analytically.

I’d avoid the common hubristic assumption of can’t the computer just … in one app. Generally no, and when you look at what you’re expecting, the hubris of the assumption shines forth. Given this, what you want is that the app sits in its part of the data chain in a productive way. In that context, Tinderbox is good and impressive. Most of my early-stage exploratory research would have been much more difficult—if not impossible—without Tinderbox (or a tool like it) giving me support for incremental formalism arising from emergent structure.

IF you are just going to dump the ONS SOC d/b into a TBX file, I think you are right to then link you to richer data in discrete files. With 30k source notes you’ll want to keep that file lean. What’s not clear is what sort of queries you want to run as, ironically, the emergent relationships—whether captured as metadata (i.e. Tinderbox attribute value) or via links/linktypes)—could be pertinent to giving you a better queryable dataset.

I believe so. See $ReadOnly.

Nothing as such but this community is very ‘sharing’ and you can upload to link to data files in the forum, e.g. in the File Exchange sub-forum.

satikusala · May 24, 2021, 3:44pm

@GaiusScotius, when asking for ReadOnly are you asking at the file live or at the note level. I’m not sure you can “lock” a file like you can in PDF and make it read only. I believe the reference by @mwra refers to a boolean that sets the $Text to read only, which is great if you don’t want to accidentally edit your data, but it can be turned off.

GaiusScotius · May 24, 2021, 4:17pm

@satikusala Thanks, I hadn’t come across the $ReadOnly attribute.

What I’d meant by making the SOC fike read only was setting its file system access permissions to -r–r--r–. I then thought, oops, what if Tinderbox needs to write to the file to update something like a time last accessed stamp. If TBX expected to open document files read-write, a read only file might upset it, hence the question.

mwra · May 24, 2021, 4:18pm

Sorry, yes. I don’t know if it does apply (undocumented) to other note attributes in addition. Or that might be an option.

However, other than by actions such as a rule, I can force some note to always be, for example, the second child of the third root-level container.At this point, I thin that is an as-yet un-made feature request (though I may be wrong).

GaiusScotius · May 24, 2021, 10:03pm

@mwra Thank you for the explanation, I had got the strong impression that TBX files were self contained and that creating navigable links between separate .tbx files would not be possible; it seems I was right.

Having discovered Tinderbox via the Hook app website (it’s listed as well supported note taking app) I knew I could always connect notes and other files “externally”, but I’m not sure that will work for what I have in mind. The use case is largely to do with scoping searches, so perhaps another example would help (with the proviso that it is chosen as something with which more people may have a degree of familiarity rather than it being the actual classifications/ ontologies I have in mind).

Take the Linnaean system. This divides all living (and extinct) life in to a seven level structure: Kingdom - Phylum - Class - Order - Family - Genus - Species. Describing this in TBX would seem to be as simple as creating nested containers of notes, the leaves being a Species. K, P, C, O, F, G and S would be prototypes.

Say I’m a palaeontologist examining fossil dinosaurs. I have notes on fossils in a museum collection identified by accession number (1, 2, 3 … n). I can classify each specimen by linking its note to the appropriate Species in the Linnean classification. If I have two specimens of Tyrannosauraus rex (#3 and #42), their notes can be linked to the same note in the Linnaean classification (that for Tyrannosaurus rex ) note. To find if the set of all T. rex speciments I navigate to that note and look for back links to notes about specimens. (Tyrannosaurus rex links back to #3 and #42.

So far so easy, but it might also be useful to look for notes on related accessions, on other species of Tyrannosaur or, indeed, other genera completely. For this I need an expanding nearest neighbour search of the classification. For example, I start by following the link from my specimen to Tyrannosaurus rex, from there step up a level to the Genus container and look for notes in it. This gives me other Species of the Genus Tyrannosaurus – Nanotyrannosaurus lancesis for example (the best fossil of which man be a juvenile T. rex, so you can see why it could be of interest). From these the backlinks lead to notes on other specimens, say accession #87. Equally, I can step up a further container to the Family Tyrannosauridae and so on. Again, Action Script makes these types of query easy.

The trouble is — and please note this is from the perspective of one thinking of buying Tinderbox as a tool for more than mere note taking — many standard classifications are BIG and complex and we shouldn’t be copy them wholesale into every document that we’d like to use them in. A doubt it’s possible to put even a fraction of the number of living species (some estimates go as hight as 8.7 million) into single document, so in reality you’d need a cascade of linked classification documents just to store them all.

Beyond this, most useful ontologies are also somewhat stable (we don’t add new species all that often). The current ONS Standard Occupational Classification was published in 2020, the previous version in 2010 and the most recent International Standard Classification of Occupations in 2008. When you think about it, you actually want standard classifications to be, well, standard for a reasonable period!

To me, therefore, classification schemes are candidates for storing in separate (perhaps, often linked) TBX documents, isolated from the much more rapidly changing notes that reference them. That also makes them shareable, publishable and generally more useful to a wider pool of users.

So that’s the problem. I can link a note in a “working” document to an entry in a separate ontology document using an “external” link — a URL or a Hook link — but I don’t see how I can then query the ontology’s graph to assemble a collection of notes of interest (equate other Species in my example) and follow their backlink to give me the collection of my notes in my working document.

If it’s possible to have Action Script invoke a script outside of the current document (as was implied by @mwra in response to my question on calling out to R or Python, thank you again for your detailed exposition), then perhaps it’s possible to invoke an Agent in the classification document to walk the graph, have it call out to Hook to get the backlinks and return them for further processing. I shall investigate further, but it sounds cumbersome and completely crashes the concept of separation of concerns; the ontology document should need to “know” about how I want to query it; that should be an Agent in the working document.

I was — indeed still am — on the cusp of buying a licence, so this is a bit of a blow. Any other ideas would be most welcome.

Once again my thanks to those who given so freely of their time in answering my novice questions, I am overwhelmed by your generosity of the people to a complete stranger.

satikusala · May 24, 2021, 11:16pm

No, you can not have action script run outside of the document, however, you could trigger Apple Script, Keyboard Maestro, and RunCommand to trigger an external process you may be working with.

satikusala · May 24, 2021, 11:19pm

This may be the case, but just because they’re in your TBX file it does not mean that they need to be touched or edited. You could stick them off in container, build a dictionary around them and have action script call and reference them, integrate their various attributes into you your changing notes and reports. At first pass, seems to be a perfect use for Tinderbox, baring any unforeseen computing or file structure limitation best suited for @eastgate to address.

eastgate · May 25, 2021, 12:59am

I’m disengaging here for various reasons, but in passing I’ll note that:

Tinderbox is not a tool for building vast taxonomies;
The construction of vast taxonomies was a failed enterprise of the turn of the last century;
Nothwithstanding (1) and (2), Tinderbox is probably the best tool available for building big taxonomies.

GaiusScotius · May 25, 2021, 6:29am

Point three is exactly what drew my attention to Tinderbox, it seems by far the best tool I’ve ever seen for exploring and evolving classifications. I would hesitate to say that construction of vast taxonomies was a failed enterprise. At a corporate level perhaps, but at an industry or governmental level I disagree; perhaps they’re best seen as necessary evil.

sumnerg · May 25, 2021, 7:19am

@GaiusScotius I confess I don’t fully follow what you are hoping to do, but leave you with this thought.

Tinderbox has robust external scripting support. An external AppleScript or JavaScript for Automation script (which can be placed in the menu or attached to a keyboard shortcut) can not only act directly on Tinderbox documents and pass data between them, but it can also easily run Tinderbox action code within one or more Tinderbox documents that are open.

An external script can also easily retrieve data from different Tinderbox xml files that are not open in Tinderbox, using XQuery.

All of this gives mind-boggling capabilities that I suspect are mostly untapped at this point.

GaiusScotius · May 25, 2021, 7:48am

Thanks everyone, my mind is made up and that partly because of this forum. @eastgate before I part with my money, should I wait for what appears to be the imminent release of v9, or if I buy today will that come as a free upgrade? Or isn’t v9 imminent?

mwra · May 25, 2021, 10:16am

For say < 50,000 notes, I think this can be done^†. Beyond that you might want to link between documents. However the basic model is one I’m used to from exploratory work in my recent doctoral studies.

Note that the ‘link’ between items (notes) need not be an explicit link as a note can (or both ends of the link can) store metadata—i.e. Tinderbox attribute values—indicating the relationship. Indeed if only using the links for querying, the latter might be easier tidier, plus the links can easily be reconstituted if lost; the the metadata less so (i.e. bar going to old back-ups).

Taxonomies/Ontologies can get large, but as you point out we don’t need to look at the whole system all the time. It might help at this point to know the real subject area (if not sensitive/secret—we’re not prying) to get a better handle on the sizes and the likely discrete selections one could make within them.

One other point is that even if at some point sheer size demands a bigger [tool/program] to handle the data, Tinderbox’s flexibility in support of exploring structure is invaluable. In other words, by the time you get to needing a bigger bucket you will have a much clearer idea of the structure needed, given that large databases/graphs aren’t necessarily designed for significant change without the task being a do-over.

Welcome, imminently, to the Tinderbox world. A good point is to assume no one else is doing what you do (even if that seems counter-intuitive) with this toolbox, but at the same time they will be using some of the common toolset in the same way within their (different) work. Framing questions from that standpoint often helps more people help the questioner. We can thus share patterns rather than exact processes.

Tinderbox is not far off, I think, and @eastgate can answer that. I think it is OK as generally licences include a free update to the current released version (major or minor) version for a calendar year. after that you can purchase an upgrade to continue access to new releases. Or continue on your current version with no upgrades. The cool part is you can purchase an upgrade, get the now-current release and another years free updates. This is a very flexible model and is also kind to those with modest budgets for updates. To get a flavour of updates see v8e and older versions.

TBX documents are well-formed XML so it can all be extracted and is not a prisoner of Tinderbox use.

v8 → v9 is incremental change and, of course there are new features/tools. So, any work done in 8 is not a do-over in 9 simply due to a new version.

Do consider dropping into on of the Saturday meet-ups. They are announced in the forum plus there are links to recordings of past events. We normally start with new folk and try to help dissect any problems/challenges/misunderstandings; there are no wrong questions. The meet-up is 6PM W. Europe, 5PM UK, midday US East, breakfast time US West (i.e. beer for some, cereal for others!)

HTH

†. There are no fixed limits in Tinderbox. Rather the limits the combination of amount of data overall (lots of smaller notes vs. fewer big notes) and how heavily you use scripting in the document. The bigger the document, the more though you want to give to not running always-on processes that are fine in a 100 note document but less so in a 10,000 note document. The key is to experiment and be flexible; work with (apparent) constraints and not against them.

P.S. Tinderbox’s internal macro system which is referred to as ‘Action’ code is not related to the JavaScript-like ActionScript used in Adobe Flash. Just incase there might be confusion.

GaiusScotius · May 25, 2021, 10:45pm

@mwra No problem with sharing “the real subject area”, it’s not a secret just a tad esoteric.

The backstory is that I have a background in both engineering and law. For many years I ran a software company writing drilling engineering software. I sold my shares c. 2000, mucked around for a few years, then took a law degree and qualified as a solicitor. Shock, horror! The legal profession is years (decades?) behind much of the world, particular in terms of how they run their businesses. To be fair, this is partly because of the nature of partnerships, which can’t raise investment by issuing shares, but there are easy things that pay back quickly and that low hanging fruit is not being picked.

When I ran out of steam as a lawyer, I retired and turned my mind to whether there was a business to be made applying engineering practices to the business of law. Buzz words like “design thinking” get thrown around with gay abandon, but nobody much seems to have actually practiced design in the sense of “if we design this oil well wrong it’s going to blow out, kill a lot of people and pollute half the North Sea”. When you’re writing the CAE software that people are using to design that well how you do it is, shall we say, an important factor in sleeping at night.

If you’re an engineer you’ll appreciate that a fundamental reason why oil wells don’t often go bang, bridges don’t fall down, trains pretty much never crash and aircraft rarely spontaneously disassemble in mid air is process systemization. That, and a decent understanding of physics, is largely what engineering is. Things such as trains and 'planes and bridges (and, indeed, oil wells) are not artisanal creations; they are the end product of repeatable design processes, constructed and assured using defined, testable systems, and operated and maintained according to defined systems. Even when the artefact itself is bespoke it is inevitably a variation on a common theme — a taller bridge, a longer tunnel or a deeper well. Te process by which it is designed and built remains systematic.

So I thought I’d try throwing (relatively) common engineering practices at legal processes to see what stuck.

First up, how about stochastic cost and time estimation? If you go to the typical firm of solicitors and ask what fee they’ll charge for some (reasonably complex) transaction — like selling a company or settling a large estate — you’ll be lucky to get a fixed price quote. And that’s for the type of work that solicitors undertake repeatedly and understand well. Why? Because although they know what to do and they have systems (for “matter management”) to support them, their processes aren’t systematised and, because of that, they can’t be instrumented to record to the level of detail needed to generate accurate quotes. c.f. almost any other profession: of course you can get a fixed price for an audit, of course you can get a fixed price for building your website, of course your dentist can tell you the price of a root canal …

Problem number one. “We’ve got all the stuff you’ll need in our matter management system, it’s got all the details of every transaction we’ve done in it.” Well, yeeess, but… I can see that Janet the solicitor spent three hours 12 minutes “drafting”, and that John the trainee was occupied on “document review” for two hours last Friday afternoon. You’ve got lots and lots of that data ‘cos you bill by the hour and you need it to know how much to charge. BUT it would be handy to know what dear Janet was drafting and WHY. “Sorry, we don’t record that". Nevermind, can I get a copy of your database? “Oh, we don’t have a database, it’s all safe in XYZ vendor’s cloud (latest buzz word), I’m sure you can speak to them” . Please Sir, can I get access to ABC’s database, and can you tell me how it’s structured so I can query it? “What! You want us to disclose our super sophisticated, highly optimised, proprietary schema? Heaven forfend!” Which probably means grief, no, it wasn’t that good to start with and we’ve kludged so much stuff into the backend to support the latest spiffy new features demanded by marketing that the technical debt is about to capsize the ship. I know, I understand, I’ve been there, it happens to everyone.

So set aside any idea of stochastic cost estimation, set aside systemisation and process improvement, there’s no point in trying to sell anything that can’t meet the demand of “will it work with what I presently use?”.

Cue much pondering on long walks with the dogs and digging the garden, followed by extensive Googling on data exchange standards (virtually none, it would seem in the legal field) and how client server computing works today (as opposed to twenty years ago). Much the same it seems, but they’ve changed the names and learnt (yet again) that wire protocols should be flexible. REST? Seemingly a database query in disguise loosely married to a remote procedure call. GraphQL? Oh, a treewalk to save round trips to the database. Been there, done that on a '486 running at 500Mhz over an ISDN telephone line. SQL? Surely by now someone, anyone must have come up with a less incomprehensible language! Sorry. Excuse me while I go into Victor Meldrew mode.

The real issue I finally surmised — the pervasive, underlying issue — is one of data sovereignty. Law firms have largely lost it. More accurately, having dug themselves massive holes trying to make disparate off the shelf software play together happily — yes, I’m looking at you Exchange and you too, Sharepoint. Just stop bickering, the pair of you — ceded it to various “legal systems” vendors of in the vain hope of being thrown a rope. Information is the lifeblood of business and ceding sovereignty over it — allowing control over the structure, storage and access to yout own information to slipthrough your fingers— is to hamstring your business’ capacity to change. Give up sovereignty and you are, almost literally, in the hands of your vendors. Sure they gave you a rope, it just wasn’t tied off at the top.

Hence project number two. Design and publish a flexible and extensible data model capable of holding all business information pertinent to legal practice. Then build a reference implementation for a datastore that is based entirely on open source software.

This is where Tinderbox comes in. TBX use case #1. I have what I consider to be a reasonably good core model (prosaically called the Legal Practice Model, LPM), but it needs documenting and, ideally, published to the web. Tinderbox can do this.

TBX use case #2. The structure of the LPM is similar to that used by Tinderbox, namely it is a spin on the Entity - Attribute - Value pattern. The Note - Attribute - Link approach should facilitate exploration and prototyping of data patterns implementable in the LPM.

TBX use case #3. The LPM, like all extensible models, relies heavily on dictionaries. Populating these dictionaries constitutes by far the greatest design effort. Tinderbox seems an ideal tool for building, by iterative refinement, classifications from sets of instance data (time records, documents etc. etc.) and delivering the classification in a form amenable to uploading into a database.

Finally, TBX use case #4. A tool for prototyping applications. There’s looks to be enough flexibility in the user interface and sufficient data manipulation capability in Action Script to build more than mere toy applications, and way quicker than building a web or electron app.

All in all a voyage of exploration into little charted seas. With luck we shall avoid the Corryvreckan, be spat out by the Kraken as a distasteful little morsel and arrive unscathed. If anyone — particularly any practicing or academic lawyers (maybe those teaching legal tech courses?) or anyone with backend development skills — would like to board, do please contact me.

pmaheshwari · May 26, 2021, 7:29am

Thanks for you above post ! It gave me good insights !

mwra · May 26, 2021, 8:41am

Interesting post with lots to chew on. I recognise the systematisation, to which I’d add as ex-Navy “Have you actually tried your procedure using the people it is intended for”. IME, most orgs fail heavily at that point as such steps are a ‘someone else’ job.

You will be happy to know that the Tinderbox community includes lawyers, engineers and some of each who have some coding interest/expertise. They may not all be forum regulars but likely word of mouth may find them and point them here

A regular area of Tinderbox work is exploratory systematisation. You’ll want to look into its inheritance model and prototypes. The app is also useful for exploring/mapping process. I recall the diagram a used showed at a long-ago Tinderbox meet-up showing how he had mapped the info/knowledge flow in part of a FTSE100 engineering firm which had little or no overlap with the under-used multi-£M KMS system that management consultants had recommended. The actual map showed users needs, the KMS structure not so much!

For #1-#4, I’d not be at all surprised if there isn’t some prior Tinderbox work in this area, even if not directly in the same subject. Hopefully some folk may chime in. Note: if we go diving into discrete areas, it might make sense to split out into new threads with relevant titles and link back to this thread.

Absolutely, and this is where the issue of size comes in. You don’t really want to try and put 1M items in a TBX file, but you certainly can investigate, prototype and design the structure that a formal database that is optimised for large dataset. Diving straight into the latter often involves premature formalisation (guesswork!), and where time investigating the problem space and task in Tinderbox would repay well.

HTH

jbmanos · May 28, 2021, 10:31pm

Chemical engineer and attorney here as well. Just buy tinderbox. It’s good stuff.

GaiusScotius · May 29, 2021, 7:14am

@jbmanos thanks for introducing yourself, I have and am slowly getting to grips with some of TBX’s features and idiosyncrasies.

mwra · May 29, 2021, 9:46am

Indeed, Tinderbox’s designer started out in Chemistry (Ph.D. (Chemistry)).