Stuck on simple search

This works:
(?=.*vancouver)(?=.*doctor)

Short term hack could involve text expander snippet, but it also seems reasonable to have search use this approach in general, perhaps with a toggle in find popup?

Here’s a textexpander Applescript snippet to wrap all query words with the regular expression syntax to allow searching for all of them in any order. Sequence to make it work is cmd-F in Tinderbox, then type ‘findwords’ (or whatever you rename the snippet) in the query box, then type words in textexpander fill-in, enter to replace query text, enter to search.

It’s a hack until this behavior could be made the default, which I am also in favor of.

-- Script to split string and wrap in regex to search for matches of all words in any order

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

--set theResponse to the text returned of (display dialog "query?" default answer "")
return split("%filltext:name=field 1%", space) as string

to split(someText, delimiter)
	set AppleScript's text item delimiters to delimiter
	set someText to someText's text items
	set AppleScript's text item delimiters to {""} --> restore delimiters to default value
	set someText to replaceItemInList(someText)
	return someText
end split

on replaceItemInList(theList)
	repeat with a from 1 to the count of theList
		set thisItem to item a of theList
		set thisItem to "(?=.*" & thisItem & ")"
		set item a of theList to thisItem
	end repeat
	return theList
end replaceItemInList

In full agreement with this. More passing ones: Scrivener, Workflowy, Zotero, Bookends…

1 Like

How about an operator

words("doctor Vancouver")

which would return true for any note in which both “doctor” and “Vancouver” appear in either the $Name or the $Text? (I’d welcome better suggestions for the name of the operator.)

Mark, I’m curious about why this isn’t just the default search assumption. Are there use cases you’re thinking of here where simple search is best? I feel like I’ve been trained by the apps mentioned above to assume when I’m searching across multiple blobs of text (emails, documented, notes) that the search works as the operator search you describe above.

If I’m searching for “Mark Bernstein”, I typically don’t want to retrieve

Leonard Bernstein made his mark in 1943 when called on to replace an aging Bruno Walter.

This is of particular concern for agents, which you want to retrieve specifically the things you expect them to.

This is useful, I can see how that search wouldn’t work. I guess my perspective, and maybe Pat’s?, is that I have different expectations when doing a global search using cmd-f on the document vs. creating an agent. I want all that flexibility in the agent, but when doing a “quick” search, I’m ok with more “noisy” hits rather than fewer relevant hits.

1 Like

My feelings are:

  1. words("doctor Vancouver") would be generally useful, and an improvement over the current state of things – the words should be order independent, and either it’s case-insensitive or there’s a corresponding iwords
  2. I think writing action code for a quick command-f is too high a burden for new users – but that’s not my primary concern here, so okay
  3. To search for the exact string “Mark Bernstein” I would propose "Mark Bernstein" – using quotes – as opposed to Mark Bernstein which matches `Leonard Bernstein made his mark…"

Anyway, for me personally, yes the words() action code gets the job done just fine! And can be used elsewhere in action code.

But this wouldn’t work in agents, where the search string is already in quotes. We could further separate search and agents, but that too has its costs.

Now that you have explained it, I fully understand why search works like it does in the cmd-f window. Not many other applications have as many ways to perform a search, so their search box are easier to code for a particular behaviour. However, as a data point, I had no idea that the window wasn’t assuming an AND or accepting quotes for phrases until @pat brought it up. For what it’s worth, some kludging to mimic the Google/DEVONthink etc methodology would be a plus in my view, although I obviously don’t know what that would entail under the hood.

1 Like

Tinderbox - from a beginner’s standpoint (for over a year) - remains ‘trippy’ to me that this one singular issue of performing a simple search (as explored above) has really been such a hurdle to using Tinderbox effectively. It feels like such a peculiarity, a quirk, a kink, in not being able to do a quick search like I do in many other daily-used programs/applications. This one ‘feature’ I’m sure would assist adoption of Tinderbox more quickly for manifold others.

3 Likes

Two things I’d like to add to this old discussion:

  1. I think there is a core to the frustration arising, that has not yet been formulated here: If one wants a very quick and dirty, broad and coarse query – one does not want to write a very detailed and specific query.
    In line with that, f.e. google search works by “the more precisely you know what you want, the more instructions you add”, whereas the current implementation in TBX has the logic of “the more fuzzy you want your search, the more instructions you have to add”.
    I think the problem with the ladder approach is, that opening up a query is a lot less straight forward than narrowing down: See how complex the query @pat wrote is and compare that with the effort of narrowing down a google search to an exact phrase using " ".

  2. An “Exact Phrase” option, that can be toggled is, how f.e. the Preview app in macOS deals with this issue. Hidden behind the little dropdown arrow next to the search icon.
    image

  • Pro: You get to choose gears
  • Con: You might forget what gear you’ve set.
    Not noticing that search does not behave the way you expect is non-trivial. See users reporting, that they always assumed, their input would be treated with an implicit AND operator.
  • Resolution: Visual indicator showing what search mode is active. Imagine f.e. quotation marks around the loupe icon.

Curious to hear if there has been changes regarding this – fiddling around and quickly searching documentations it did not seem like it to me. Finding an elegant way to navigate these issues and especially looking at my point 1. seems like a very tinderboxy thing to me, which is why I thought digging out this old thread is worth it.

As it is, to search for a note with two words in it, I have to create action code with four actions, three boolean operators, and two sets of parens.

No. To search for a note that contains Vancouver and also contains doctors, ⌘-F Vancouver&doctors.

I’m willing to think about this again. But (hint hint) the way to convince me is to show at least one of the following:

  1. Your alternative approach makes possible things that aren’t possible at present.

  2. Your alternative approach makes things easy which are now possible only with difficulty, and the things it makes easy are important things to lots of Tinderbox users.

  3. Your approach makes some things easier and some things more difficult, but you can present hard evidence that the things it makes easier are more important or more common.

  4. You can’t present hard evidence, but you can show how hard evidence could be obtained.

1 Like

Let’s think the Google query Vancouver doctors in the Tinderbox context. (The Google context — all the text in the world — isn’t a lot like the Tinderbox context.)

  1. In a small document — say, notes on The 34th Intl. Conference On Time Travel — neither term will be very common. If you’re trying to find your notes on that interesting keynote Thursday about Dr. Who and the Vancouver Grizzlies, searching for either term will find what you’re looking for.

  2. In a big document about a specific topic — your notes for a doctoral dissertation on Emergency Room Medicine In The 19th Century: Direct Observations — one of the terms may be mere confirmation. Hundreds of your notes will mention “doctor” and only a few will mention “Vancouver”. So, search for Vancouver and then do a text search (or visual scan) to skip over your receipts for that dinner at Joe Fortes’s.

  3. In a big, unfocused document — notes on all the books you’ve read in the past decade — it might be worth searching “Vancouver|doctor” because neither term is very common. (Quick: have you read a mystery set in Vancouver? Have you read two? What’s the last mystery you read in which a doctor was prominent?).

These all contemplate a core Tinderbox task: having accumulated hundreds or thousands of notes on a topic, you want to locate a half-remembered note. That chore is the reason ⌘-F exists.

  1. In a large and active Tinderbox document, we have an agent that gathers notes on a topic of particular interest. Perhaps we have a daily review of the most recent notes on this topic. This agent works best if it is precise — if it doesn’t list many notes that aren’t relevant. For example, we might want to highlight the most recent notes about “Vancouver doctors” even though there are lots of receipts for tasty Vancouver restaurants, and also lots of receipts for the Doctor Whatsis, your psychotherapist in Sheboygan. In this case, we really might want all the notes that mention Vancouver and also mention doctors. The agent query $Text.contains("Vancouver") & $Text.icontains("doctor") explains precisely what you’re looking for.

Incidentally, I think functions help encapsulate queries in an interesting way. If we want fairly complicated logic for a query, it may be better to write a simple function wantsDailyReview(var:string theNote) than to mess around with nested conditional.

2 Likes

Thanks for elaborating @eastgate ! This lines up well with what I suspected people fall back on: just querying for single words to keep the query broad.

However, I think there is quite a point to be made for how “the google way” could make things significantly easier.
Google is pretty smart, because it processes your query adaptively – depending on the results. It essentially does automatically what you’d do manually in your 3 cases.
Going through the cases you supplied in reverse order works well to illustrate that:

  • Big unfocused document (3.): This is the most like a everyday web-search using google. There is lots of hits for both terms and thus google will show you results of pages with both terms. (I am unsure, but think I remember finding out that they also prioritize if order matches.)

  • Big document about specific topic (2.): no difference here, with the google way you also have to discard search terms giving you too many results.

  • Small document (1.): This resembles a web search on a very exotic topic, where there is not that much content to be found. In this case, google will adapt to that and start showing you results that only contain one or the other search term.

So to wrap up the benefit of the google way, as I see it:

  • You get the results you want for both case 1. and 3. using the same, dead simple query and just starting one Find.
  • The benefit is actually biggest for the case, where you thought there definitely was something with “Vancouver&doctor”, but it turns out there is neither results for “Vancouver” nor for “doctor”. With the current Find you’d end up running 3 queries – google would directly shift into searching for either of the terms and tell you there is nothing for “Vancouver&doctor”, “vancouver” and “doctor”.

Would not be surprised, if you can come up with an example where the manual approach allows you to do something one can’t with the google way however.

But the view pane find (Cmd+F) does that by default. If you type in the box Library of Congress the matches are to the exact phrase, not instances of any individual word. IOW, congress in its own does not match.

Noting that defaults of the two optimisations in the Find bar’s pop-up menu:

  • case sensitive (off by default)
  • regex (on by default)

…then typing Library of Congress in the find box is the same as the agent query:

$Text.icontains("Library of Congress") | $Name.icontains("Library of Congress")

(you can optionally pick to also OR-include include a single user attribute as the source - or use any one or two of those three possible sources)

So, to my understanding, Find view already defaults to the the Google notion that the quoted term must be in the answer. TBH though, google doesn’t play by that rule and offers you other irrelevant matches without the term—I suspect it occurs if the algo worries it has too few matches.

So I’m unclear as to what problem you are solving. What do you mean by a “broad and coarse query”? One that you can’t define but which gives the desired result? I don’t mean the last in a snarky sense, but search result effectiveness can be highly subjective and as much lack as correct user input. I generally find Google needs more prodding to get close to answer links worth clicking partly as its indexing tends to default to too wide a match. also is there a difference, in your experience in the app of using find with single words as opposed to phrases? If so, in what way? It seems the OP’s point was about searching a list of words (implicit: also phrases) co-occurring in a searched attribute but it seems you want something different.

2 Likes

Hi there, I’m curious, what and I doing wrong? When I hit enter with “Vancouver&doctors” in the search result dialog does not pop up. If I do one or the other word it does.

You’re right. I invented some regex syntax that doesn’t exist.

1 Like

Yes, Find only searches on one term across between minumum one and maximum three attributes. The search carried out against the attributes is the same as query:

$Attr.icontains("My Search Term")

Of the two drop-down options, ticking ‘case sensitive changes’ the query to:

$Attr.contains("My Search Term")

Dropping the regex option makes the query—without case sensitivity:

$Attr ==  "My Search Term".lowercase();

or no regex with case sensitivity:

$Attr == "My Search Term";

It may not be what people want, but hopefully gives a better handle about what the Find box search is doing. :slight_smile:

Gotcha, a multi-term find or a “near” operation could be pretty coll. Perhaps we bring the discussion to the backstage, including the idea of filtering search to a specific container.