NLOrganizations giving surprising results

ccrayton · December 21, 2022, 7:03pm

So today I decided that the NLOrganizations tagger would be great for a study that I am doing about FCC regulations. So I edited the note for NLOrganizations in Hints with this:

fcc: fcc;federal communications commission
faa: faa;federal aviation administration
ntia: ntia;national telecommunications and information administration
ieee: ieee;institute of electrical and electronics engineers
iata: iata;international air transport association
itu: itu;: international telecommunication union
rtca: rtca;radio technicial commission for aeronautics

I have one note where the value of $NLOrganizations was calculated as MHz;Page which makes no sense. Those tags are not in the NLOrganizations note, and not even in the note that was tagged this way. Is that a side effect of the neural net, that completely unrelated tags can appear?

ccrayton · December 21, 2022, 7:06pm

So I see an error now that I am looking at this note in a monospaced font: there is a stray colon after itu in the second to last line. Think I will need to change the font for those notes. It doesn’t seem to matter; I’ve closed and reopened the document several times and the note still has a weird NLOrganizations tag. But at least one error fixed.

eastgate · December 21, 2022, 9:06pm

In addition to the entries you designate in Hints, NLOrganizations is derived from a trained neural net. I wonder if MHZ is a stock ticker symbol? Looks like it’s an oil company.

ccrayton · December 22, 2022, 3:40am

I’ve been looking at the research around whether 5G will actually disrupt airline travel. MHz is megahertz (radio frequency) and is appears in the note’s title, but not the text. If MHZ is also stock ticker, causing the net to make a surprising association, then I am delighted. It may not be helpful for my actual use case, but it adds a little bit of randomness that I find comes in handy.

Which leads me to a slightly different question. I’ve enabled sentiments as a primary key on some note prototypes just for fun, and I noticed that a blank note often has a slightly negative sentiment. For example, on my system a blank note with the title “happy” has a sentiment of about -0.6. Just wondering, is the sentiment based solely on the $Text of the note, or some of the attributes too? Does it depend to some degree on MacOS version?

Thanks for the reply on NLOrganizations and the neural net. I was around in the late 80’s / early 90’s looking at languages like Prolog and Expert Systems, it’s fascinating to see where things are now.

eastgate · December 22, 2022, 6:28pm

Text and title.

For what it’s worth, my own dummy text about shortwave radio does not list mHz in NLOrganizations. Interestingly, the system knows that Hallicrafters is an organization, though it was folded into Northrop fifty years ago. It might be interesting to take a look at the specific note in question.

Yes, different versions of macOS might have differently-trained neural nets. This was certainly the case in Tinderbox 9.3, where we supported some quite early versions of the operating system. I think right now we may all be using the same data.

ccrayton · December 23, 2022, 4:56am

Here is the contents of the note. It’s a random first take of a report I read, so it doesn’t make a lot of sense yet. There are no custom attributes on the note.

Title: Helicopter test of 3400–3800 MHz in France

Text: This is a test in France. Check power levels, I am interested in knowing if they are similar.

Page 17 of this report seems to show the interference is likely. Page 18 seems to say that when looking at 4200 MHz overload interference was possible in a spot location 16,000 meters from the touchdown point, and desensitization possible up to 1,850 meters for a spot location.

It looks like this report is attempting to determine interference all the way up to devices operating at 4200 MHz

====

Looking at this now, maybe the neural net saw “3800 MHz in France” and thought that MHz was an organization in France? That could possibly explain why MHz showed up in NLOrganizations. The only mystery would be why did “Page” show up too?

eastgate · December 23, 2022, 5:14pm

Remember that the neural network is, essentially, observing each word and its neighbors and performing a statistical assessment, weighing

Does this look like an organization?
Is the context one in which organizations are frequently named?

Experimentation with the text indicates that the sentence confusing Tinderbox is:

Page 18 seems to say that when looking at 4200 MHz overload interference was possible in a spot location.

Keeping in mind that the training set used here has a healthy dose of the Wall Street Journal, this might fit the template

Morgan Stanley suggests that the tech sector will recover briskly

Buckingham Palace said it would not comment on the matter,

Interestingly, if we insert commas to set off the prepositional phrase

Page 18 seems to say that, when looking at 4200 MHz overload, interference was possible in a spot location

We no longer have either mistaken parsing. Eats, shoots, and leaves!