Extract a block of text from $Text

satikusala · March 13, 2021, 12:17am

Is there a way to extract a multi-line block of text out of $Text that follows a specific character e.g. > to that is nested between two characters > >. I’m looking for a built-in operator, not regex.

e.g.

Text> Ipusm Lorum Ipusm Ipusm Lorum Ipusm Ipusm Lorum Ipusm
Ipusm Lorum Ipusm Ipusm Lorum Ipusm
Ipusm Lorum Ipusm
Ipusm Lorum Ipusm>

pmaheshwari · March 13, 2021, 5:12am

Interested to know the solution to this.

mwra · March 13, 2021, 10:15am

Well, .contains(">([^>]+)") should be able to (with $1 referencing the text within the markers), but it does so using a regex so I guess that’s out.

Why the aversion to regex? The issue with these sort of scenarios is we over focus on what we want and so overlook the attendant problem of avoiding accidental detections of things we don’t want.

satikusala · March 13, 2021, 2:32pm

Actually, RegEx is fine, I just did not want to reply on GREP or AWK. I’m close to getting what I want to do. Maybe you can help me on today’s session.

mdavidson · March 13, 2021, 2:33pm

You could try the following

$MyString=$Text.split(">").at(1);

This assumes that there is some text before the text block of interest which is something you might want to test. Also it assumes only one block of interest per $Text passage. Otherwise you need to find a way to loop over all of the split results testing each one individually.

satikusala · March 13, 2021, 2:37pm

I’ll try that.

mwra · March 14, 2021, 6:34pm

During the meet-up yesterday, it became apparent that the real task is to lose all current $Text up to the first ># marker.

This appears to work

$Text=$Text.replace(">#$","").replace("[^>]+>#","");

The first .replace removes the end marker which otherwise confuses the main regex. See Fix text.tbx (82.3 KB)