Parse $Text for hashtags and populate $Tags

If you had this in $Text and wanted to parse it so all the hashtags out of the text, leaving the text untouched, and put the hashtags as a semicolon-separated set in $Tags how would you go about it?

``
Lorem ipsum dolor sit amet, consectetur #interaction adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint #impact occaecat cupidatat non proident, sunt in culpa qui officia #insight deserunt mollit anim id est laborum.

#personaldata #identity #marketassessment #selfsovereignidentity #ssi #authenticdata #verifiablecredentials #personaldatastores #personalinformationmanagementsystems
``

I’ve tried various .following stamps (don’t want to use replace because I want to keep the text untouched.

This might be a start but you need to create $MyList2 first:

$MyList=$Text.split(" β€œ);
$MyList.each(x){
if(x.contains(”#β€œ)){
$MyList2=$MyList2+x+”;";}
};

Result: #impact;#insight;laborum.
#personaldata;#identity;#marketassessment;#selfsovereignidentity;#ssi;#authenticdata;#verifiablecredentials;#personaldatastores;#personalinformationmanagementsystems

Unfortunately, that did not work for me. Also, I want the # to be removed. The .split is a nice idea. I’ll explore that.

Slightly easier:

$MyList=$Text.split(" ");
$MySet=$MyList.collect_if(x,  x.beginsWith("#"), x.substr(1) );

$Text.wordList() would be better than using split(), but it strips out the β€œ#”.

I think this might be a good topic to explore in the meetup.

It might also be a good candidate for a simpler action, like the `.extract()`` operator we discussed a few days back. In a way, this is the same problem, except that here you do want to capture all matches, not just the first.

2 Likes

This worked perfectly!!! Thanks!