Natural Language Processing


(Jason Romney) #1

Dear Tinderbox community,
I am excited to note that the latest version of Tinderbox now scans notes to extract information that agents can use automatically, with names, organisations, and locations from note text placed in special attributes based on continuous automated parsing of notes. I am looking forward to examples of how people are going to use this feature to achieve various goals in their day to day workflow - particularly what will presumably be a new chapter in one of Mark’s books. This feature was one of the most fascinating aspects of Lotus Agenda that some folks from this forum will remember from the golden age of PIMs. I have two questions for the group please. The first is about whether the new feature means Tinderbox may now share Agenda’s original propensity to (potentially) corrupt data files linked to the program’s background processing. In Agenda, I believe because the program was constantly embarking on automated processes under the cover, if the program was quit unexpectedly the database would sometimes corrupt. Is Tinderbox, with this revolutionary new feature, immune to this problem or has the (potential) Agenda weakness now been brought into the Tinderbox environment? The second question is broadly about how people will use the new NLP. In Agenda, the automated scanning assigned an item to a category where the categories were user defined and part of a taxonomical tree of relevant terms and concepts. The attribute browser of Tinderbox is analogous to the “views” that could be created in Agenda, providing a custom cut of the user’s information in a relevant and useful manner. Programs such as The Brain and MaxQDA (with MaxMaps) all allow this kind of custom visualisation (each in their own very original manner), giving the user a persistently retrievable (ie saved) view of their data that is spatially helpful in some way. Will Tinderbox allow notes to be assigned with a flick like keyboard shortcut or by typing just a few characters of a relevant category, in the way Agenda used to? Does Tinderbox have a set of rules about how it recognises names in the new NLP function, that is as nuanced as the way Agenda used to approach this function (eg realising that a name needs to start with capitals and that “victor” does not necessarily mean cap V “Victor” as in the person). And will Tinderbox be extending the NLP feature from names, organisations and locations, to dates (dates being one of the central features of Agenda’s original background parsing of user information). Anyway, I am confident this feature points to some really creative possibilities for triggers (pattern recognition etc), inferences and resultant actions and I look forward to exploring more detail of how this has been implemented in Tinderbox in coming weeks. Well done Mark!


(jmm) #2

I take this opportunity to say that Named-entity recognition is working fine not only for English but for Spanish as well. What seems not to work properly for languages other than English is the repetition window.


(Mark Anderson) #3

Two questions? I think there are more. I don’t know the answers but let me help later readers by parsing out all the questions:

  1. Will Tinderbox share Agend’a propensity to file corruption if the TBX file is closed while NLP is being used.
  2. How people will use the new NLP?
  3. Will Tinderbox allow notes to be assigned with a flick like keyboard shortcut or by typing just a few characters of a relevant category, in the way Agenda used to? (From this, I’d assume not - as at v7.5.2. But, this is a new feature and yet to be fully developed in Tinderbox)
  4. Does Tinderbox have a set of rules about how it recognises names in the new NLP function, that is as nuanced as the way Agenda used to approach this function (eg realising that a name needs to start with capitals and that “victor” does not necessarily mean cap V “Victor” as in the person)?
  5. Will Tinderbox be extending the NLP feature from names, organisations and locations, to dates (dates being one of the central features of Agenda’s original background parsing of user information). see note to #3.

(eastgate) #4

Don’t worry about the background processing: Tinderbox has become rather good at this.

Remember: Agenda was written 25 years ago, and ran on machines that were orders of magnitude smaller and slower. We’ve learned a lot since 1992.

For easy, manual classification of notes, try Stamps and smart adornments. Even better, lots of the time, you can automatically classify notes with agents.

Tinderbox already does a pretty good job with dates – in fact, its date processing was originally inspired by Agenda.


(Jason Romney) #5

Thanks Mark. What is the nature of the “look up table” used by the NLP function that determines what things it does and does not recognise in the note text, and which NL attribute those note entries should be assigned to? I have noticed some minor anomalies where more obscure places or organisations are misclassified (although for the vast majority of cases, the feature works perfectly). Is it possible for a user to finetune and/or customise how the attributions are made? Also, I’m wondering why the NL capability would not just apply to any and all attribute sets that might be typed into a note? I suppose where I’m going with this line of thinking is the idea that, if desired by the user, it makes sense that a given attribute (any attribute) can be “empowered” (so to speak) with the natural language processing’s note auto-scan capability. Is that the roadmap plan here? Excitingly, that would obviate the need for manually typing in an attribute relevant to a note or manually deploying the stamp and smart adornment features you mention to achieve the desired attribution. I suspect this is the kind of “ghost in the machine” magic that will truly take Tinderbox to the next level as you build it out through the program’s various functions…


(James Fallows) #6

I was (and am) one of the world’s biggest Lotus Agenda admirers, so am glad to know of another member of the tribe.

I think Tinderbox has evolved to include all of the most important and useful features of Agenda. The Attribute Browser, itself, is essentially Agenda-in-a-can. (I say this admiringly.)

Echoing what Mark Bernstein says below, in response to your question on processing burden: it’s easy to forget that Agenda ran on MS-DOS, in an era when effective throughput was maybe 1% of what it is now, at most. I wrote an article about it in the Atlantic back in 1992 – more than a quarter-century ago, when cell phones were barely invented. (On a long airplane flight recently, I rewatched The Big Lebowski, which is set in 1991. One of the plot elements is the non-existence of cell phones, except a giant “luggable” model.)

Will go into some of your specific points another time. But main point is: if you loved Agenda, as I did, you will find Tinderbox able to match all of its strong points, and to bring a slew of other benefits.

On file corruption: through the long saga of TB betas, I’ve encountered some that are more-and-less crash-prone. But I am aware of only one case in the past decade of actual file corruption, which was recoverable through the automatic backups the system makes.


(eastgate) #7

Recognition of $NLNames, $NLPlaces, and $NLOrganizations is done by a neural network.

This has advantages and disadvantages, as you may know. Neural nets are fast and flexible, and this one is surprisingly powerful; it will, for example, understand that good old, reliable Nathan Detroit is a name and not a suburb of the Motor City. But neural nets are opaque; it’s hard to understand what they’re doing or to tweak them.

I agree that there’s a lot that can be done here.


(Robert Powell) #8

Hi. I’ve been using TB for a few months now and have progressed in making good use of many of its features, but I’m still a novice with regard to all its depth and capabilities. Where could I find out more about how to use its Natural Language Processing and are there some examples? Is this primarily built in simply for searching and, if so, do you have to establish the items for which you might search within the framework somehow or customize a search with arguments other than the search term?


(eastgate) #9

It’s new and experimental, and exactly how you might use it depends on what you are doing. Currently, natural language processing is limited to finding (probable) references to names, organizations, and places; that, of course, is chiefly useful if you’re interested in those things.

As always, you can search for words or regular expressions, which can be anything of interest.