I have a 1.2 MB XML file -- how can I get it into TBX?

(Jake Bernstein) #1

Yea, fairly straightforward question. Publicly available XML file that I want to get into TBX somehow:

XML NIST SP 800-53 Controls (Appendix F and G)

Any ideas?

(Paul Walters) #2

Is the tab-delimited version at this page the same content?


Tinderbox imports tab delimited content.

(Jake Bernstein) #3

Yes it’s the same content. Just drag 'n drop then?

(Mark Anderson) #4

In the majority of cases Tinderbox import involves drag-drop or blind-pasting into the view pane. this can be initially confusing as it means there is no menu indicating how this occurs. File -> Open is generally intended for opening TBX files. To start a new file based on existing external data, make a new Tinderbox file and then drag-drop the data file into it.

For more on importing Tab-delimited (or CSV) data, including considerations about attribute auto-detection and naming, see here and here.

(Paul Walters) #5

Not directly. I worked with that file you said was the correct content and found:

  1. Right click the download link for “tab delimited” file and save the file to downloads.
  2. It will be created as a .txt file, so you need to change the extension to .tsv
  3. Open that file in Excel – it is a longish file (1,600+ rows) and needs some data cleansing before importing to Tinderbox. The rows for the attributes “FAMILY”, “TITLE”, “PRIORITY”, “BASELINE-IMPACT” do not contain values except when there is a break point in one of those attributes. (Just look at the file – it’s obvious what happens). When the document is imported to Tinderbox the application will not have a clue what to do with missing values, so you need to pre-process the file to copy the missing values into the correct cells. (For what it’s worth, the XML has the same “issue” because of the way it handles parent-child relationships – it’s not a bug, it’s just the way the data were extracted from NIST’s data store.) Fixing 1,600 rows is a minor annoyance and doesn’t take all that long.
  4. After cleansing the data, and possibly changing the column names if you wish, select all the applicable rows/columns, copy, and paste into a Tinderbox file. Tinderbox will create attributes based on the column headers and notes for each row with KAs for each of the new attributes. You might need to go into the user attribute inspector and fix some of Tinderbox’s guesses for attribute types for the attributes it created.

My point is that it is strongly recommended when working with internet data sources that you first examine the data (e.g., in Excel or Numbers, etc.) to see if it needs cleansing before importing. It is frequently the case that it is a lot easier to fix data in Excel before a bulk import into Tinderbox than it is to slog though thousands of notes determining what’s wrong. Here, if you had imported the file directly you would have had hundreds of notes with missing values for attributes.

(Mark Anderson) #6

This ^^^^^^^ - yes!

Some data cleaning is often required. Actually, unless you are making the data yourself and understand the workflow & tools it’s usually required . To those of us used to such annoying but necessary steps, it’s easily overlooked.

Note to self, check aTbRef notes on TSV/CSV import and properly reflect this aspect for those less used to it. CSV, less so TSV, is a very variable ‘standard’. Lots of tools/apps do this structure differently, i.e. dealing with (or not) commas/tabs/line returns within ‘cell’ data within the overall TSV/CSV table.

(Jake Bernstein) #7

So great, Paul, thank you!!

(Andreas Grimm) #8

Just to make I sure I get it: xml-files can’t be imported by drag and drop!?

One has to do some steps in between … which, to be honest, I still don’t understand according to what I read here.

Any suggestion for easy digestion?

(Paul Walters) #9

Drag a Tinderbox .tbx file into another Tinderbox document. See what happens? You just dragged an XML file into Tinderbox. Tinderbox did nothing except create a note with the name of the file and no content.

Open an XML file (or .tbx file) with TextEdit. You’ll see a structure that is easily read in plain text, but that also relies on either the application that created it, or on a custom routine to parse it. Either way, other than Tinderbox opening .tbx files in the normal way – someone else’s XML creation is not something that Tinderbox has a clue about if the data are dragged into Tinderbox.

(eastgate) #10

XML files can represent all sorts of things: bank transfers, places on a Google Map, entries in Books In Print, software configurations, ancient manuscripts, Tinderbox needs to know how you’d like to map a specific XML file onto Tinderbox’s own constructs.

Some specific XML formats – OPML, Tinderbox color schemes – are imported to Tinderbox often, and Tinderbox handles them more or less automatically. In other cases, you’ll need to transform the XML into a Tinderbox file, or into another familiar format.

Of course, if your workflow depends on an XML format that you think has general interest, we’d love to know about it; we may well be able to support it directly.