How can I optimize for the NL scanning?

Chrysoula · March 27, 2023, 8:50pm

Good afternoon. Today’s question is about the NL scanning feature! I really love it. I believe I can probably add names to the NLNames note, places to the NLPlaces one and so on to definitely (eventually) grab and place everything where I want. If this assumption is wrong, please tell me!

Now:

What I’d like to know is… how it works. My inital assumption was that it was scanning the text and looking for words either used like People Names, or Place Names, homing in on the ones it recognized. In the screencap above, I figured it skipped Venree and Danrial because I made those names up although it did grab Venree from another note successfully. That’s probably fine as long as adding something to NLNames helps it (again eventually) recognize those are meant to be names.

However, my assumptions are all shaken because, you see how it lists ‘Bythos’ and ‘Meza’ in the NLNames attribute above? ‘Meza’ is the contents of the Full Name attribute on the character note titled ‘The Summoner’. This suggests it’s… reading and learning from my notes? Or…? But not at all reliable, since again, it only sometimes recognizes Venree (and Danrial not at all) despite those names being in the Full Name attribute of their Character notes. Are there any tips for making it more reliable when pulling data from other notes?

Bonus question: is the NL scanning case sensitive? Can I add ‘Society’ to NLOrganizations and trust it will not gather up references to ordinary society?

ETA: I see that Meza is the very title I assigned which I suppose could explain it except I did see in earlier forum posts it didn’t scan note titles for keywords? Very confusing. I’ll experiment more.

mwra · March 27, 2023, 9:47pm

I’m not sure if you’ve seen the aTbRef articles on Taggers and Natural Language Processing? The four ‘NL’ taggers use under-the-hood macOS NLP so—to my understanding—cannot be user-modified in terms of intent.

I note you mention you have made-up names. Have you tried making a user-defined tagger (see linked article).

Chrysoula · March 27, 2023, 10:17pm

I can and have added the made-up names (ok all the names just to be sure) to the NLNames document and it’s working pretty well. I see one can actually make additonal custom taggers, although I can’t even imagine what I’d use those for at the moment. Good to know, though; I’m sure uses will come up.

Tests demonstrate case insensitivity (as your article reports on), which is ok, I’ll just have to be specific at least once per document for the Society. Still no idea where it pulled Meza from–the note’s own title, I guess. I’m trying to set up a test to see if the NL Processing relates to the Mac’s spellcheck dictionary but that’s proving a little tough. But! Oh yes! It DOES seem to read the Note’s title.

mwra · March 28, 2023, 9:30am

It’s very simple. The system installed NL-prefixed taggers (except NLTags^†)are the only ones to use macOS NLP features to scan $Text for matches the NLP thinks appropriate for that category. The reasoning used is not accessible to the user.

My understanding of the 3 NLP powered taggers are that they are experimental, suggesting things. As NLP infrastructure improves so should the taggers. To see how far from accurrate the process is at present (a limit of the NLP not Tinderbox) here are just some of the ‘organisations’ detected in aTbRef by NLOrganizations:

Cmd
Coding And Queries
Collapse All Containers
Columns
Commence Dictation
Continental
CovidNearMe.org
Cow
Creative Commons
Cross
Data
Days
Defines
Delicious Monster
Delta
Development Peekhole
Dictionary
Dictionary.keys
Displayed Attributes value

So, not something I’d rely on.

By comparison in a user tagger, the Tagger file syntax absolutely defines what is/is not detected as a match.

The process only looks at $Text. $NLNames should only contain the value ‘Meza’ for notes where that string occurs in test. As documented, if the that string is removed from such a note, it should also be removed from the note’s $NLNames. If that isn’t occurring I suspect tech support would like to know as it would indicate some gremlin had crept in; I sat tech support as likely they will need to see the whole TBX.

Yes. Tinderbox’s Help is unclear as to what is searched and the run of its text implies only $Text is looked at. In fact, changes to $Text are used as a trigger to re-scan the note. Trawling though release notes^‡, buried in those for b502 I find this:

Taggers now operate on both the note name and its text.

Taggers originally ran on $Text alone and the above change never filtered through to the Help docs^§. I assume that changes to either $Text or $Name now trigger tagger re-processing of the note.

Further notes:

NLP tagging is essentially experimental, especially the 3 taggers using Apple NLP.
It’s not at all clear, but my understanding was that users are not expected to add content to the 3 NLP-using taggers’ notes in hints. I may well be misinformed, but it might also explain why some users are getting unexpected results.
Re $NLTags. I believe the original idea was that it would look for a (never documented) set of things and tag them. One such possibility, would be detecting a possible planning event and adding the $NLTags value “plan”. If that was tried, I think it no longer works and that as at v9.5.2, NLTags can be thought of as a predefined empty user tagger (i.e. the ‘NL’ prefix is slightly misleading). IOW, it does nothing unless you add some syntax as you might for a user-defined tagger. Of course as NLP improves, this might change.
NLP related taggers are experimental. Unexpected outcomes should occur. If they do, they are better reported direct to tinderbox@eastgate.com rather thn here as (a) users can’t see this in-app process and (b) users don’t know what the correct outcome. It’s the problem of AI: it can give you an answer but it struggles to tell you why/how it came to that conclusion!
Outside the 3 NLP-using taggers, tag definition is regex based. Enlightened self-interest would suggest users don’t use really complex regex in tagger definitions. The ought not to ever be such a need, we we do love experimenting, so i think the caution worth noting.

†. Why? See here.

‡. These do, in re-sumarised form get added to Tinderbox Help per release (though the change to taggers is not recorded AFAICT). The Backstage program and beta testers have access to the source TBX recording actual RNs and which is usually a bit behind the current beta and so has much additional/more recent info than Help. I used the latter to find the above.

§. As a result aTbRef currently has this wrong. I’m seeking clarification (different channel) on a number of ambiguities re taggers and will update the pages when I know. But it is the case that both $Text and $Name are scanned by taggers. I’ve already make a temporary change to the main tagger article (see new heading “Scope of analysis”) on this pending a proper review of the tagger articles.

eastgate · March 28, 2023, 1:22pm

To it’s credit, a number of the bad results for Organizations are not actually terrible!

CMD: ticker symbol for Cantel Medical, a NYSE-listed firm
Collapse All Containers: and Down With All Dictators!
Commence Dictation: not a company, but it would be a great name for one
Continental: could be the defunct airline, or Continental AG
CovidNearMe.org: Actually is a (tiny) organization
Creative Commons: actually is an organization, I believe
Cross: Makers of fancy pens since 1846. originally based in Providence. Who knew?
Delta: Large US dental insurance firm
Delicious Monster: this IS an organization (and a successful software developer)

mwra · March 28, 2023, 2:30pm

Agree, NLP is wonder both ways round: truly wondrous at what it is able to do … and a wonder as to how it got to some of its matches. You are right about the fact that some of these are fair/correct calls even if unexpected in context. For instance, I know of Cross fountain pens (I used to have one!) but reflect that in the context of its host TBX above—to the human PoV—is a false positive. Then again, the OS NLP is presumably leveraging a wide Apple-curated LLM rather than just the text in the proximate TBX document.

The moral here is to not use use the NLP extractions sight-unseen as hilarity may ensue. Human tendency to laziness means we might forget that extra step. I don’t, however, argue against this feature as it’s a useful ongoing experiment to get a feel for how such OS NLP underpinnings asre—or aren’t improving!

Chrysoula · March 28, 2023, 3:36pm

The NLPlaces note in Hints actually has instructions on how to add specific locations to the file; I took the same approach in NLNames and NLOrganizations and it’s working very reliably. My $Name scanning test involved sticking a place name’s alternative reference (as listed in NLPlaces) in a Note title, not mentioning either place name in the $Text and observing that NLPlaces acquired the assigned place name.

NL-assigned tags do update correctly when the source text changes, so no worries there.

mwra · March 28, 2023, 4:46pm

Thanks. Based on the default Hints-installed NLPlaces, the NL Places/Orgs/Names taggers may take additional input using the same regex based syntax as used for custom installed taggers.

Assume the stoplist—if used (still confirming)—is that installed by Hints. This is because it trumps stoplists placed in other (legacy) locations.