Exclude terms from Taggers?

I’m using four notes to collect and present the terms highlighted by the inbuilt taggers, and finding the taggers to be a little more enthusiastic and inclusive than I’d prefer.

Is there any way to exclude terms from the taggers?

I don’t think there is.

Can you give us an example or two of excess tagger enthusiasm?

My current working document is tagging (among others) “Alzheimer (technically correct, but not contextually relevant, and never used without the 's), Artifact, Balm, Boy, Cobble, Gates…”

There’s also a smattering of “names” like “John copy” and “Someone truly”.

My capture notes are often incomplete sentences with the first letter capitalized, and that may be a factor. NLNames seems to be capturing ~2000 words as possible names.

NLOrganizations has given me my new favorite band name: the “Constant doppelgangers”, plus “GPS, Gravity, Hearts, Idea of unification…”

This isn’t impacting performance, but I’ve been experimenting with using tagger functionality to forward new tags that are worth tracking. I can reasonably scroll through a few dozen tags, but a few thousand gets prohibitive.

Ah. There’s no way to exclude things from NLNames and NLOrganizations. Both use an Apple-trained neural net which was superb for (say) 2020 but sadly out-of-date today.

I expect this will improve shortly; we’re sure to hear something about that at WWDC next week.

In the interim, you could define $ScreenedNames and $ScreenedOrganizations, and set these using a rule or edict to remove unwanted hits. For example:

var:list exclude="[Alzheimer;John copy;Yours truly;]"
$ScreenedNames=$NLNames-exclude;

In a large document, you might want to keep the exclusion list in a configuration note to make it easier to add elements.

1 Like

I was already drafting that approach.

Thanks!

1 Like

Thinking this through, is anything likely to break if I just declare a tagger line like
`excluded_names:Name1;Name2;NameN;

?

You could do that, but it’s probably better to remove the excluded names from $NLNames — that’s just list manipulation — than to scan all the text. But for most purposes, whatever is easiest is probably best.