How to transform duplicated notes into aliases?

jprint714 · April 6, 2021, 7:54pm

I’m importing a set of notes, in an OPML file format, into Tinderbox. The OPML file contains notes that have been organized in folders, arranged by categories and their respective values (or hashtags). Unfortunately, the OPML file contains many duplicate notes.

Is there a way to transform these duplicate notes into aliases (in a batch function) or is there some other way to achieve this, e.g., by creating aliases of the notes duplicate notes, and then deleting the original duplicates ?

Quick background… I first create these notes in another app, import them into OmniOutliner, and run a script (AppleScript) in OmniOutliner that re-organizes and re-formats my notes: it creates folders for my research categories, and then populates those folders with notes whose hashtags are paired with their assigned categories (e.g., the folder for B.RESEARCH: would contain all notes with the #Review_incident tags , the folder for 2.BACKGROUND: would contain all notes with #key_historical_event tags, etc.).

This process helps me prime complex note organization for Tinderbox, thereby freeing me up to use its other features more effectively. (I’m currently refining this process so that there’s better inter-app workflow between OmniOutliner and Tinderbox.)

BUT…unfortunately, OmniOutliner doesn’t yet support aliases. So, I’m trying to figure out a solution to transform my duplicated notes into aliases for my OPML -> Tinderbox file – through a script, another app, or any other approach.

Any ideas / suggestions on how I can achieve this?

Thanks very much for your help…

jprint714 · April 6, 2021, 9:07pm

Quick addendum to this query…

I just heard from someone who’s helping me refine the OmniOutliner - AppleScript, and he had the following idea that might help us figure out how to transform duplicated notes from the OPML file into aliases, in Tinderbox. Here’s what he said…

We could add a unique ID to the original instance of each entry. When things get re-organized by tags and the duplicates are created, there would then be a column storing this id, which would make it easier to identify the duplicates.

There might then be some way in Tinderbox to script things, so it does the following:

• Starts looping through all entries
• Gets the id on an entry
• Checks if the id has been “seen” already
• If the id hasn’t been “seen” yet:
- Identify this entry as the “actual” entry for that id
- Record the id as having been “seen”
• If the id has been “seen”:
- Remove this entry
- Replace it with an alias of the “actual” entry

What do you think?

Thanks!

mwra · April 6, 2021, 10:06pm

Neither action code nor Tinderbox AppleScript directly allow alias creation. However, AppleScript can send keypresses to simulate using the ⌘+L shortcut for making an alias.

jprint714 · April 6, 2021, 10:17pm

Thanks, Mark…

So, is it possible to integrate that function with the steps I outline in my last post, in which the OmniOutliner script-maker offered to add unique ID’s to the original notes, and then incorporating any of the other steps that the he suggested?

Thank you, again…

rtalexander · April 7, 2021, 2:21am

Perhaps this should be a feature request? I can imagine situations in which being able to create an alias of note would be quite handy.

mwra · April 7, 2021, 11:21am

For other readers, it is useful to give a bit more background. The data here originates from an annotation app called MarginNote. It’s great as sat it’s primary annotation task but such as a data source. It has no structured (user) metadata—just a ‘tag’ bucket, no scripting and really weak export configuration. Reflecting the lack of MarginNote metadata functionality, the source data in this context included user generated syntax in the free-text annotation fields to be recovered downstream. The most useful of the exports offered is OPML but whilst Tinderbox can ingest this, the notes created were not ideal for the user and required post-processing which pushed the OP’s Tinderbox expertise. A compromise was to put the data into OmniOutliner which unlike Tinderbox (at the time—this is a long-running problem) has a rich AppleScript dictionary.

I think the problem is now that the OmniOutliner manipulation is repeating every ‘note’ (i.e. OO row) for every discrete tag. Thus, exporting the OmniOutliner-exported OPML to Tinderbox we get lots of duplicate notes, rather than (as OP desires) one note and aliases for all the other. This better explains the desire for ‘just’ changing the dupes into aliases of a remaining original (easy in the mind’s eye, less clear functionally).

All that said and after 2 years (?) working at this, I wonder if we approaching this wrong. Placing aliases under (per-tag-value) containers seems one step past having an agent for each value of interest because that’s the easiest starting place. Both these scale badly, IMO. Not because of the app, but in self-inflicted duplicated structures. Attribute Browser solves most (all of those problems) of reviewing tag/note relationships and notes matching a given tags without lots of alias-based structural duplication. Perhaps add a reconfigurable agent that allows listing a discrete tag’s matches (i.e. changing an agent attribute alters the target tag in the agent query).

All that said…

I’d suggest the following. First some assumptions:

the ‘original’ for any duped ID is the first item by occurrence, all others are to become aliases.
the OmniOutliner script set a boolean for for the first occurrence of any duped row.
if not possible in OmniOutliner, then the ‘original’ for any duped ID is the first item by $OutlineOrder, all others are to become aliases (script pattern below assumes originals are marked in OmniOutliner)

Now in the TBX with the imported OPML, we use AppleScript (in order to pass keystrokes to the UI) and for easier looping:

make a list of all notes with the ‘original’ boolean marker
iterate that list:
- store a reference to the ‘original’ note
- for each item in the list, collect all the notes with the same name (excluding the original).
- iterate that list of dupes. For each dupe:
  ** note the path of the dupe.
  ** select the original note
  ** use the shortcut keys to create an alias of the original
  ** select the alias and set its path to that of the dupe (i.e. move it)
  ** select the dupe and delete it

There may be a better way (there’s rarely a single ‘right’ way) to do this but this should leave you with the desired result. Given all the improvements since discussion of this pipeline I still think the above is probably unnecessary as AB view is a better way to view tag co-occurrence than lots of per-tag/whatever containers full of aliases. Other opinions are still valid!

I’ve not the spare time to do the AppleScripting here but hopefully someone may be able to take a crack at this. I’m aware the OP’s data is research and may have sensitive elements some it might be useful to make an OPML source doc that can be shared publicly so anyone can have a go at the Tinderbox AppleScripting.

jprint714 · April 7, 2021, 6:11pm

Thank you so much for your reply, Mark! I’ve often thanked you for your incredibly thoughtful input, and generous help. The Tinderbox community owes you a tremendous debt of gratitude for all of that you do for users like me… It’s hard to overstate this, really.

I think you’ve summed up the process and my desired outcomes pretty well re: how / why I’m trying to change duplicate notes to aliases. Just to be clear, in your last post you wrote…

Yes, that’s a fair summation.

The OmniOutlier script does a terrific job w/ batch sorting complex notes based on categories (labels) and their respective values (hashtags), but creates duplicate notes in doing so.

For me, useful to see these notes organized in Tinderbox containers by Category / value, when I’m solely focusing on certain research elements (i.e., per container) – and also where they intersect, via the Attribute Browser. But then there are obvious problems with finding & editing duplicate notes, as opposed to alias notes.

Two quick thoughts…

The OmniOutliner AppleScript developer is happy to consider suggested changes he can make on his end to help with an inter-app workflow that might help solve this problem – and generally create better parity with notes from OmniOutliner → Tinderbox. So, please let me know about any other ideas or recommendations!
I’m certainly open to other ways of doing things in Tinderbox, or as part of a larger process. As @mwra pointed out, the data that I’m porting over originates from MarginNote annotations. Even though the app has many shortcomings, I think it does the best job (currently available) of rendering discrete notes w/ configurable metadata from annotated PDF files. (MarginNotes also produces urls that link directly back to the original annotated text in a file, which is also quite valuable.)

@mwra , I’m interested to hear more about your suggested approach with the Attribute Browser and reconfigurable agents - and would be happy to discuss that here or via DM.

Thanks so much again!

jprint714 · April 7, 2021, 6:26pm

Agreed!

In general, I think this feature would be quite useful for Tinderbox users.

Thanks for the suggestion!

mwra · April 8, 2021, 10:37am

This explanation uses my starter file so it quite generic. So, we’ve a folder of primary notes (‘ant’, bee’, etc.) in a common folder:

These notes also have metadata in various field. Thus, $MyString holds colour names and $MyList a set of tags. In your current model, to examine variants of $MyString, we’d make a container for each value and place an alias of our source note in that container. Leaving aside the work to do this, this scales badly. Let’s say you’re 100 source notes and 10 metadata fields. That’s potentially 900 aliases, or 1,000 if the original notes stay in the source folder.

How does AB view help? Well, lets open an AB view, set the ‘container’ to ‘Test cell’^† (#1) and we will set $MyString as the target attribute (#2), with a count (#3) for instances of each discrete value of that attribute (a.k.a. ‘categories’) and sorted (#4) from count high to low and with notes listed in $Name sort order (#5) within categories. As shown here:

We can see that there is a banner for each category (#6), i.e. each discrete $MyString value^‡, with a count (#7) of in-scope occurrences at right. N.B., if some notes have no value for the target attribute, it is listed under a “[no value]” category. Under each category banner, notes using that values are listed (#8) in the set sort order (#5). The number of notes listed is reported top right (#9), here it is 7 notes.

The selected note (#10) is highlighted in the view and displayed in the text pane. Although it is possible to select multiple items in the view pane, the text pane still shows the note data for the first selected item (unlike other views—this difference may be a glitch).

Anyway, take a look. You can also filter the ‘container’ contents via an agent, e.g. your might want to view $MyString data but only for notes where $MyList includes the values ‘frogs’. So, we apply an agent query: click the query button and in the pop-up’s code box put $MyList.contains("frogs") (#11). Doing so, we find there are not only 3 notes listed (#12) as only these noteshave the desired $MyList value.

There’s much more in AB view. But as I hope you can see, but altering the controls (and using optional column view) you can rapidly review lots of data without a maze of aliases in containers.

†. This is akin to an agent scoping query of descendedFrom("Test cell")

‡. Essentially the same list of values as you’d get from values("MyString") is scoped to the contents of ‘Text cell’.

jprint714 · April 8, 2021, 5:45pm

Thank you very much for this, Mark… It’s fascinating, and I need to review and try it out a bit to see how to make it work.

As far as I know, Tinderbox hasn’t yet been able to save Attribute Browser searches / views, correct? (I believe that was one parts of using Attribute Browser that I found challenging when you were helping me set them up.)

Also, I understand and appreciate the points you raised re: scale of notes and aliases. Again, I’m thinking through how your Attribute Browser approach might mitigate that problem (along w/ your idea about possibly adding a reconfigurable agent that allows listing a discrete tag’s matches).

Putting that aside for the moment, the benefit of the MarginNote->OmniOutlier script is how well it forms organizational structure for notes, based on the category - values (hashtag) relationships. Since reading your last post, I’m wondering if / how there might be a way to still use that output, de-dupe notes, and then use some of suggested approach…perhaps w/ the reconfigurable agent to create alias notes base on discrete tag matches…? Just spitballing.

Anyway, I’d still love to solve duplicate notes to alias query – while also experimenting with your A/B view & reconfigurable agent approach as another, possible solution. Thanks again!

mwra · April 8, 2021, 7:16pm

But you don’t need to save the AB settings. If your source notes are in one contain (or hierarchy of containers) all you need to do to change the target attribute is one pop-up. so just leave the AB view tab (or several) open. when not in use they aren’t ‘working’ in the background.

What is it that you need to save in the view?

jprint714 · April 8, 2021, 8:25pm

I guess I need to better see how this works, esp. with the target attributes. It seems this process requires having several AB view tabs open in lieu of saving them, re-opening them, etc. - correct? And would one need to set them up anew each time one quits and re-launches a TB database?

mwra · April 8, 2021, 8:50pm

Not really, unless you want to quickly tab switch between two contexts. I think you’re slightly missing the point that AB view is essentially doing all the fixed structure your alias method involves, which in turn represents an understandable misconception that if it isn’t in the outline somewhere you can ‘see’ it. But, pause to consider that what you are doing with all the laborious aliasing is mimicing note metadata (i.e. attribute values) as part of the outline listing.

This suggests you’ve not just tried this. Rather than type an essay about what might happen, better is you try it out in one of your files, with AB view selection the /a container holding your primary notes. Then report back on what happened. It’s much easier to help after you’ve actually tried it out.

You might take a little longer the first few times but, despite the seemingly large number of controls in AB view it is quick and easy to set up. Remember - don’t fixate on how many views might be needed. Instead, just get one to work, then change the same tab to show you a different attribute’s value. Do that and you’ve all the major building blocks needed and a smaller more manageable file.

Data · April 9, 2021, 12:47pm

mwra:

These notes also have metadata in various field. Thus, $MyString holds colour names and $MyList a set of tags. In your current model, to examine variants of $MyString, we’d make a container for each value and place an alias of our source note in that container. Leaving aside the work to do this, this scales badly. Let’s say you’re 100 source notes and 10 metadata fields. That’s potentially 900 aliases, or 1,000 if the original notes stay in the source folder.

How does AB view help? Well, lets open an AB view, set the ‘container’ to ‘Test cell’† (#1) and we will set $MyString as the target attribute (#2), with a count (#3) for instances of each discrete value of that attribute (a.k.a. ‘categories’) and sorted (#4) from count high to low and with notes listed in $Name sort order (#5) within categories. As shown here:

Can an agent create another agent with the name, for example, tags in the note and put all notes that have these Tags in it?

mwra · April 9, 2021, 12:52pm

In short, No!

Plus doing so would create the mess of alias filled containers away from which we are trying to steer. Having a (permanent) container/agent just to list matches to any given attribute/attribute-value, scales badly. You soon end up with an enormous outline. An example of of Incremental formalisation is learning to use tools like AB view to create on-demand listings (albeit with a few pop-up selections to make—hardly onerous) and so avoiding massive nests of aliases.

jprint714 · April 9, 2021, 7:56pm

I’ve been following the steps that @mwra outlined for using the Attribute User, and understand and appreciate its utility. Thank you @mwra for your help with breaking all of this down for me with your explanation and screenshot. That was quite helpful.

I’m not yet sure it’s the solution I’m seeking, though I’ll continue to work with this process so I can become more proficient with it. @mwra , might be better to DM you quick questions about this, since they’re a bit off topic for this post.

Quick questions that relate the original post…

If I’ve already assigned categories to values in OmniOutliner, and use them to set up Attributes - Value relationships in a Tinderbox file (with prototypes, etc.), could I then import notes into this file, make a stamp and/or agent action that could:
(a) create containers based on either the imported OPML notes’ categories or existing Tinderbox Attributes, and
(b) assign notes to those Attribute - containers, based on the Attributes - Value relationships (creating aliases, as needed) ?
Would it be possible to create a script that could find and delete duplicate notes based on the unique ID of the imported notes?

Thank you…

mwra · April 9, 2021, 8:51pm

If I understand correctly, you want to make a container for each discrete value of every one of N different attributes, so for 10 attributes each with 10 discrete values in use you’d make 100 containers. Then you want to place an alias of your source notes in every container whose value matches the container attribute/value combo?

I’d suggest not trying, for the reasons already stated at length multiple times before. You reply suggests you’ve not really tried out AB view. Go, give it a try. In the way above, madness lies.

jprint714 · April 9, 2021, 8:59pm

No. Just assign imported notes to Attribute - containers, based on the Attributes - Value relationships that I’ve set up.

mwra · April 9, 2021, 9:11pm

Yes, but I’m familiar with your data structure from the many months we’ve worked on it. All those months of iterations have shown it was not a good, scalable approach, but we keep going around in cirles. It’s not clear from your answer that you’ve actually tried AB view. I think if you’d only try AB view just once, you’d see that it’s a simpler cleaner approach, not worth dismissing without trying it out.

I’m aware your data can’t be shared as it’s sensitive. Happy to help via DM, but only if we start by trying AB view at least once before dismissing it; the alias approach is flawed, I’ve seen and worked on the data for long enough to know. Sometimes a new approach is better, especially as the OmniOutliner is now adding new complications.

For the months spent on this, you’d have done better to have just copy/pasted data from MarginNote to Tinderbox (given the former’s lack of useful output).