Extract text from the following standard format

Just spent some time with @mwra and we devised a way to get the backreference with a stamp.

$MyString=$Text.paragraph(0);
$Text=$Text.replace("^.+\)\n\n","");
$Text=$Text.replace("> ","");
$Text=$Text.replace("\n"," ");
if($MyString.contains("^\[Page (\d+)")){$PageNo=$1};
if($MyString.contains("(highlights:[^\)]+)")){$URL=$1};
$URL=$URL.replace("%20","");
$MyString="";

The above works.

For context setting. I have a string with a pre-defined and fixed structure. What I want to do is to be able to extract a specific identifiable piece of text with RegEx, e.g. “14” or the URL.

I find the .replace actions a bit confusing as it is counter-intuitive. I just want to extract what I want, i.e. what I want to match. Whereas .replace requires me to match what I don’t want to get what I want. Moreover, until talking to Mark, it would never have occurred to me to use an if statement to get the backreference which I can use to populate an attribute. NOTE: I don’t want to use an agent for this because I don’t want the background overhead, these are one-time on-demand actions.

I think it would be easier if we could have a String.extract(pattern) operator to enable a user to pull a specific piece of the content.

I understand that from an engineering perspective that this might seem like syntactic sugar, but given Bruce’s comments today this more straightforward approach may help users accomplish their goals.

2 Likes