How to grab values from text and insert them in $Attributes

Hi,

I am a beginner to Tinderbox and would like to start my first project.

I got notes that look like this:

as text it is:


Treffer 1 von 76
[ID: 19-275569]
19. Wahlperiode
Vorgangstyp:
Antrag

Biersteuer in betroffenen Bundesländern nach Möglichkeit senken

Initiative:
Fraktion der FDP

Aktueller Stand:
Überwiesen

Wichtige Drucksachen:
BT-Drucksache 19/27815 (Antrag)

Sachgebiete:
Öffentliche Finanzen, Steuern und Abgaben

Schlagwörter:
Biersteuer; COVID-19; Länder der Bundesrepublik Deutschland; Steuersenkung


I don´t have any experience in this field, but my understanding is this:

I can find for example the ID with regex.

\[ID:\s(\d\d)-(\d+)\]

and set

$Wahlperiode=$1

$ Nummer= $2

$ID=$1"-"$2 {<-not sure if correct syntax}

How do I execute that?
As action code I believe.

Any hint on where I can find information about how to pull values out of text and insert it as Attribute would be very appreciated. Thank you very much in advance.

I am totally stuck.

1 Like

You’ve very nearly got it!

Here’s one solution. Extracting some data.tbx (111.3 KB)

  1. We have an agent that looks for notes that contain the letters "ID: ", followed by a two-digit number, a hyphen, and another number.
$Text.contains("ID:\s(\d\d)-(\d+)")
  1. When the agent finds a note with this pattern, it stores the first sub-expression — the two-digit number — in $Period, a numeric user attribute. The second number is stored in $MyNumber, a built-in numeric attribute.
    2.We now synthesize the entire ID string and store it in $MyString. (You don’t want to store it in $ID, which Tinderbox uses for something else.). This is easy enough:
$MyString= $Period+"—"+$MyNumber

Just to be careful, I used an em-dash instead of a hyphen here, because in some cases Tinderbox might mistake the hyphen for a minus sign. Don’t worry about this; it’s an esoteric detail.

2 Likes

Thank you very much! I think I start to get now the mechanics! (Yippie!)

Although I am experiencing a strange behaviour of TB:

When I play around with your file by adding my original text file and then explode it, your agent won’t read them.

If I create a new note and add some text manually with the pattern, your agent adds it asap.
But if I copy/paste text from the exploded notes to a new one, your agent doesn’t recognise it. (This seems very odd imho)

But if I create my own agent with a new query like

$Text.contains(“Vorgangstyp:\n(.*)”);

the exploded notes are recognised. I will play around with it and keep testing.

May I ask for a better understanding of the TB concept:

Is it also possible to perform such a task not via Agents but by implementing it as a Rule in the note itself?

As a complete Newcomer my best guess was to create a Stamp or a Prototype with these rules included and then apply those on the exploded notes. Thereby every note would “calculate” its values by itself.

Since the theoretical number of these index cards is around 28k notes at the moment, the agents would constantly cycling all cards.

As somebody who just gets started with TB, it makes me wonder if this can lead to performance issues?

My instinct told me to put it into the Rule of each note like this:

if ($Text.contains(“Vorgangstyp:\n(.*)”)){$Vorgangstyp=$1;};

Does this make a difference? Cycling through 30k notes vs. 30k single rules?

You’re actually better off with 30,000 notes and agents than with 30,000 rules.

This is especially true if you mark the note as “processed” after you’ve dealt with it, so that the agents don’t need to examine it again.

30,000 notes is going to be a challenge. But we’ll cross that bridge when we come to it.