Advice on importing text with attributes


(David) #1

I am new to TB and preparing to begin using it for large scale academic note keeping. I’ve done the help example document project (with books and films) and now I want to move onto a small scale preliminary project, albeit with genuine value and data.

I’d like advice on how to get a large block of text into notes with some user attributes set. The plaintext data looks like this:

Parker 61 Rage Red, (F nib), Deluxe lustralloy with gold clip cap*
eBay £45

Parker 61 Grey (F or M nib?) Nickel and Silver Legacy cap (nickel in the “lines”)
eBay $75+$7

Parker 61 Vista Blue (Stub nib?), Heirloom, green and orange/rose*
(loose arrow) pre-1962
eBay £28+£3 postage

I would like to have each of those three pens become a note and I would like to populate some user attributes with values gleaned from the text, e.g. brand, nib, price, which I can extract using regular expressions. Any residue (i.e. text not extracted into attributes) I’d like to leave in the note text (although I don’t mind if all the text stays too).

I understand that I could paste all the text into one note; then explode that note using the paragraph boundaries to get one note per pen. I recall that I can use the explode settings to set an OnAdd that will give all the notes a prototype, and that the prototype can have the user attributes I want my notes to have.

How though do I then extract the values from the note text and assign them to user attributes? Should I do this with an Agent that munges the notes; or a Rule inherited from the prototype? Should I (can I) try do all the extraction and attribute setting in one Agent/Rule script or do one attribute, then change the script, until done? What is the general form of a script that searches the note text to extract a value into a key attribute?

I am willing to massage the data in a text editor beforehand using regular expressions and the like. If so, is there something I can do to make this much easier or more reliable (e.g. prepend some delimiter)? For example, I could probably munge it all into CSV or TSV, but will that help with my attribute setting?

Thank you for any advice you can lend me.

David.


(Paul Walters) #2

I do this sort of thing frequently. My preferred method is CSV, and my preferred source is a Numbers** worksheet. If you set up your sheet with a header row, and the header row has a column called “Name” (usually the first column), and optional column for “Text” (if you are adding to the text body), and other columns whose names correspond to attributes that already exist, or attributes that you want to have Tinderbox create on import. Just be sure to use Tinderbox guidelines for attribute naming.

Here’s the steps I use:

  1. In a text / word processor edit the text into a tab. The method depends on the text, but I believe you understand the point.
  2. Select the text and paste it into Numbers.
  3. Do whatever post-processing you need to do in Numbers to add the header and column titles as mentioned above
  4. Either export the range from Numbers to CSV, or copy the range + headers.
  5. Either import the CSV to Tinderbox or paste the copied range.

You should end up with nicely-formed notes, with the attributes you want, and values. If your import caused Tinderbox to create new attributes, you’ll possibly need to use the User Attribute inspector to adjust the attribute types. Tinderbox makes its best guess on type, but cleanup is sometimes needed. You can also assign prototypes to these notes, of course.

By the way, if your data originates on a web page and is displayed there is a tabular format, it is frequently possible to copy the table from that page and paste it into Tinderbox as notes. Or paste the table into Numbers and clean it up then proceed from step 4, above.

Here is more info on this and other import methods.


**CSV exports and/or selected & copied ranges from Excel are not behaving well in Tinderbox. YMMV, but this seems to be an Excel bug in recent versions of that application.


(David) #3

Thanks Paul, that is very helpful. I have Numbers too and could munge things into CSV.

However, the Import page on aTBref is not clear about whether or how text is assigned to the note itself. If I were to set a column name to Text would those fields become the $Text of the note? Or is the note’s text set to the whole “row” of the incoming data? Or is there no way to set the note’s text by this method?

Thanks again for the speedy and detailed reply.

David.


(Paul Walters) #4

Pictures == 1000 words

Source

Result: when copied and pasted


(David) #5

Right. Nice one, I should have just done a toy test.

Thanks for that, looks like I have what I need.

Thank you.
David.