Natural Language Processing

Jasonromney · June 10, 2018, 9:39am

Dear Tinderbox community,
I am excited to note that the latest version of Tinderbox now scans notes to extract information that agents can use automatically, with names, organisations, and locations from note text placed in special attributes based on continuous automated parsing of notes. I am looking forward to examples of how people are going to use this feature to achieve various goals in their day to day workflow - particularly what will presumably be a new chapter in one of Mark’s books. This feature was one of the most fascinating aspects of Lotus Agenda that some folks from this forum will remember from the golden age of PIMs. I have two questions for the group please. The first is about whether the new feature means Tinderbox may now share Agenda’s original propensity to (potentially) corrupt data files linked to the program’s background processing. In Agenda, I believe because the program was constantly embarking on automated processes under the cover, if the program was quit unexpectedly the database would sometimes corrupt. Is Tinderbox, with this revolutionary new feature, immune to this problem or has the (potential) Agenda weakness now been brought into the Tinderbox environment? The second question is broadly about how people will use the new NLP. In Agenda, the automated scanning assigned an item to a category where the categories were user defined and part of a taxonomical tree of relevant terms and concepts. The attribute browser of Tinderbox is analogous to the “views” that could be created in Agenda, providing a custom cut of the user’s information in a relevant and useful manner. Programs such as The Brain and MaxQDA (with MaxMaps) all allow this kind of custom visualisation (each in their own very original manner), giving the user a persistently retrievable (ie saved) view of their data that is spatially helpful in some way. Will Tinderbox allow notes to be assigned with a flick like keyboard shortcut or by typing just a few characters of a relevant category, in the way Agenda used to? Does Tinderbox have a set of rules about how it recognises names in the new NLP function, that is as nuanced as the way Agenda used to approach this function (eg realising that a name needs to start with capitals and that “victor” does not necessarily mean cap V “Victor” as in the person). And will Tinderbox be extending the NLP feature from names, organisations and locations, to dates (dates being one of the central features of Agenda’s original background parsing of user information). Anyway, I am confident this feature points to some really creative possibilities for triggers (pattern recognition etc), inferences and resultant actions and I look forward to exploring more detail of how this has been implemented in Tinderbox in coming weeks. Well done Mark!

jmm · June 10, 2018, 11:22am

I take this opportunity to say that Named-entity recognition is working fine not only for English but for Spanish as well. What seems not to work properly for languages other than English is the repetition window.

mwra · June 10, 2018, 2:33pm

Two questions? I think there are more. I don’t know the answers but let me help later readers by parsing out all the questions:

Will Tinderbox share Agend’a propensity to file corruption if the TBX file is closed while NLP is being used.
How people will use the new NLP?
Will Tinderbox allow notes to be assigned with a flick like keyboard shortcut or by typing just a few characters of a relevant category, in the way Agenda used to? (From this, I’d assume not - as at v7.5.2. But, this is a new feature and yet to be fully developed in Tinderbox)
Does Tinderbox have a set of rules about how it recognises names in the new NLP function, that is as nuanced as the way Agenda used to approach this function (eg realising that a name needs to start with capitals and that “victor” does not necessarily mean cap V “Victor” as in the person)?
Will Tinderbox be extending the NLP feature from names, organisations and locations, to dates (dates being one of the central features of Agenda’s original background parsing of user information). see note to #3.

eastgate · June 10, 2018, 8:06pm

Don’t worry about the background processing: Tinderbox has become rather good at this.

Remember: Agenda was written 25 years ago, and ran on machines that were orders of magnitude smaller and slower. We’ve learned a lot since 1992.

For easy, manual classification of notes, try Stamps and smart adornments. Even better, lots of the time, you can automatically classify notes with agents.

Tinderbox already does a pretty good job with dates – in fact, its date processing was originally inspired by Agenda.

Jasonromney · June 11, 2018, 2:47am

Thanks Mark. What is the nature of the “look up table” used by the NLP function that determines what things it does and does not recognise in the note text, and which NL attribute those note entries should be assigned to? I have noticed some minor anomalies where more obscure places or organisations are misclassified (although for the vast majority of cases, the feature works perfectly). Is it possible for a user to finetune and/or customise how the attributions are made? Also, I’m wondering why the NL capability would not just apply to any and all attribute sets that might be typed into a note? I suppose where I’m going with this line of thinking is the idea that, if desired by the user, it makes sense that a given attribute (any attribute) can be “empowered” (so to speak) with the natural language processing’s note auto-scan capability. Is that the roadmap plan here? Excitingly, that would obviate the need for manually typing in an attribute relevant to a note or manually deploying the stamp and smart adornment features you mention to achieve the desired attribution. I suspect this is the kind of “ghost in the machine” magic that will truly take Tinderbox to the next level as you build it out through the program’s various functions…

JFallows · June 11, 2018, 1:42pm

I was (and am) one of the world’s biggest Lotus Agenda admirers, so am glad to know of another member of the tribe.

I think Tinderbox has evolved to include all of the most important and useful features of Agenda. The Attribute Browser, itself, is essentially Agenda-in-a-can. (I say this admiringly.)

Echoing what Mark Bernstein says below, in response to your question on processing burden: it’s easy to forget that Agenda ran on MS-DOS, in an era when effective throughput was maybe 1% of what it is now, at most. I wrote an article about it in the Atlantic back in 1992 – more than a quarter-century ago, when cell phones were barely invented. (On a long airplane flight recently, I rewatched The Big Lebowski, which is set in 1991. One of the plot elements is the non-existence of cell phones, except a giant “luggable” model.)

Will go into some of your specific points another time. But main point is: if you loved Agenda, as I did, you will find Tinderbox able to match all of its strong points, and to bring a slew of other benefits.

On file corruption: through the long saga of TB betas, I’ve encountered some that are more-and-less crash-prone. But I am aware of only one case in the past decade of actual file corruption, which was recoverable through the automatic backups the system makes.

eastgate · June 11, 2018, 6:24pm

Recognition of $NLNames, $NLPlaces, and $NLOrganizations is done by a neural network.

This has advantages and disadvantages, as you may know. Neural nets are fast and flexible, and this one is surprisingly powerful; it will, for example, understand that good old, reliable Nathan Detroit is a name and not a suburb of the Motor City. But neural nets are opaque; it’s hard to understand what they’re doing or to tweak them.

I agree that there’s a lot that can be done here.

ChemBob · June 16, 2018, 5:01pm

Hi. I’ve been using TB for a few months now and have progressed in making good use of many of its features, but I’m still a novice with regard to all its depth and capabilities. Where could I find out more about how to use its Natural Language Processing and are there some examples? Is this primarily built in simply for searching and, if so, do you have to establish the items for which you might search within the framework somehow or customize a search with arguments other than the search term?

eastgate · June 18, 2018, 3:31pm

It’s new and experimental, and exactly how you might use it depends on what you are doing. Currently, natural language processing is limited to finding (probable) references to names, organizations, and places; that, of course, is chiefly useful if you’re interested in those things.

As always, you can search for words or regular expressions, which can be anything of interest.

Jasonromney · April 22, 2019, 1:40pm

How does one configure Tinderbox please to analyse the title of the note (not merely the note’s body text) to find NLNames, NLOrganizations and NLPlaces? Regards, J

mwra · April 22, 2019, 1:54pm

This is not a user setting NLP processing is described here, so you’re asking for a new feature. If you need title NLP processing, please email a feature request to Eastgate explaining your use case. Please don’t read that as dismissive, it’s simply that it is more helpful if those with the actual need explain directly to the developer so any nuanced points arsing can be discussed. Here in the forum, apart from an answer of ‘no’, there’s little more we can practically do in this sort of instance.

Jasonromney · September 13, 2019, 10:34pm

Folks, unless I’m mistaken there is a newish NLP tag, NLTags. This is explained in the FAQ thus: “New to v8.0.5, a set-type attribute to hold annotations automatically generated using Natural Language Processing. The first such annotation adds the tag ‘plan’ to notes that Tinderbox believes might represent a planning note, such as “remember to deposit the cheque” or “remind the freshers to begin planning their module essays”.” I would be grateful to read anyone’s suggestions as to how this can be exploited by Tinderbox users eg the kind of benefits people are enjoying in different use case contexts from NLTags. I am also looking for tips please on: a) how to use NLP generally in Tinderbox; b) how to increase the likelihood that the TB NLP will recognise notes with, say, a “planning” flavour, reliably; and c) what other annotation categories are expected to be recognised according to the TB NLP future feature roadmap. How Eastgate’s engineers interact with the engineering team of the NLP engine it uses (eg the nature of any collaboration or co-design activity) would also be an interesting story to read one day, perhaps in the next edition of Mark’s book. Cheers, Jason

eastgate · September 15, 2019, 7:55pm

Note that there’s a lot of ferment coming here. (It’s tricky, and Apple has been changing the underlying system)