Tinderbox Forum

Regex extraction from text - how to populate a set

I have a regex that extracts ‘Key Messages’ from my meetings’ notes $Text ( to populate a set-type KA - ‘$KeyMessages’ for later reports etc…) - “KM:[\s\w’-]+$”

I start such Key messages with KM: Blah blah blah … Blah
This works ( and thanks for the BBEdit tip in last week’s meetup )

How can I get all the results to populate a list/set?

$KeyMessages=$Text.extract(“KM:[\s\w’-]+$”); is easy enough and works but only picks up the last entry?

Missing here is a realistic example of the actual key messages values you with to process. That’s not snark, but regex are incredibly precise so guessing the structure of the $Text isn’t regex-construction-friendly!

For instance, is there only one key message per $Text or several? Does the message label always start a line/paragraph or can it be inline as sentence(s) in other body copy in $Text. Does the proposed workflow work at single-note scope or are the messages being stored across a wider scope

Note that bolding is fine for the eye but RTF styling is no ‘seen’ by string processing tools such as regex.

$Text.extract(regex) returns the first and only the first match to the regex argument (as String-type data). However, the $Text.extractAll(regex) operator returns every match to the regex argument (as List-type data).

See .extract() and .extractAll().

†. ‘regex’ is shorthand terminology that I’m in the process of formalising across (aTbRef) documentation. An action code operator’s input arguments that allow regex expect:

  • a quote-enclosed string that is a literal string, i.e verbatim text as read on the page
    • by convention the enclosure uses straight double-quotes, but single quotes may be used, especially if the regex string contains a straight double quote. Start and end quote types must be the same!
  • or a quote-enclosed regular expression (regex) code
  • or a quote-enclosed mix of both of the above

The point is that ‘regex’ arguments are parsed on the assumption they might contain regex, regardless of whether they do or not.

I’m working on improving the documentation of action code operators and which input arguments allow regex use but 20 years of ad hoc additions mean there are lots of confusions (and edge cases through which to wade). Current terminology is inconsistent and evidently confusing to the newer/less technically based user.

Thanks Mark

That’s worked nicely

I tend to write copious notes during zoom meetings but only need to pick out a few salient points for my wider team - I just preface a sentence with KM: and I can now populate a list object and use a template to display them etc… - A bit clunky and ugly but a little incremental formalisation I felt in the need for

Best wishes

2 Likes

Great, glad that’s a fix!

1 Like