File-crunching, Importing, Notes-crunching


(Andrew McDowell) #1

I am very interested in Tinderbox but need to determine whether it can do a couple of things I need before I go with it. I have 25 years of notes/journals in Word, .txt and Vim formats in files ranging from just a few KB up to about a MB or two in size. No huge files. A file may have just a few items/notes up to a few hundred. Individual items/notes are from one paragraph to a few pages. Most of the Word files are outlines with inline text (Word can do this but I understand TB cannot?).

My project: I want to build a ThoughtBase from the material in these files (maybe a few hundred files altogether with many thousands of individual items/notes/thoughts). TinderBox may be the best way to go (opinions welcome), but here are my questions:

I want to import this material into TB, breaking it up on the fly into individual items, each inserted per my header specs into TB’s system. I don’t know if TB (or any other software) can do this. It would have to read and parse a file, separate out items, initiate separate TB items complete with headers, tags, etc. To do this, I anticipate laboriously going through the files, creating headers for each item (or some such process), perhaps inserting separator flags of some sort, then running an agent trained to deal with such a file. Preferably the agent could tackle an entire directory of a few hundred such files in one pass. Maybe I’d need to convert all these files to .txt format, but if TB can import a Word outline file and retain the outline format that would be a plus (not a deal breaker if it cannot).

Can Tinderbox do all this? Or if TB cannot (by itself) is there any way I can make it happen with the help of other software? Alternatively, if TB can simply import the contents of these files en mass (hopefully a directory load of them in one go), then maybe I could do all this separating and massaging within TB. Possibly that would be just as good or better? TB users would have a much better idea than I on this. Maybe there are tools in TB that could speed this painstaking process?

My second question: Once this material is in TB, I’ll be slowly classing, linking, sorting, updating, combining, etc, including in ways I have yet to learn about, to integrate all these items into a usable ThoughtBase. I’ll need effective search methods of course, but I also want to be able to ‘collect’ a set of items (by way of search or other TB magic), and ‘stack’ or otherwise assemble them, then manipulate, combine, split, order and reorder, edit, etc, to form one or more new items, update or otherwise massage. What are TB’s capabilities along this line? Ability to effectively collect, organize, manipulate and massage all this material (thousands of notes written over 25 years) is key to this project.

Hmm, it looks like BBEdit might offer some helpful capabilities for working on this project, perhaps prepping files for TB to import. Looks like it’s a FILE cruncher/editor (as opposed to a mere TEXT editor), which is something I’ve been wanting to find! Any comments? Or any other software I should know about?

By the way I don’t have Mac OS yet (never have). I’m Hackintoshing my computer for this purpose. Main reason: The most promising-looking software for my peculiar style of thinking/journaling/note-taking seems to mostly be Mac software. Also, Microsoft hijacked my computer and installed Win10 without my permission. I’m a little peeved about this kind of thing and sort of want to jump ship.

TIA for any help you can give.
Andy


(eastgate) #2

It’s a tall order. I think, reading between the lines, that you understand this, and that you’re prepared for a certain amount of work classifying your notes.

Given that, yet, I’m pretty sure Tinderbox can manage.


(Mark) #3

Hi, you’ve had a succinct yet informative answer from ‘he who knows’ - my $0.02 would be to trust TB to handle emergent structural needs - I have found adding and subracting ( or simply ignoring) key attributes so intuitive that I avoid thinking too much about structure when starting projects


(Mark Anderson) #4

I’d concur with the previous posts, especially having done and assisted with a number of into-Tinderbox transfers of text generated elsewhere. A few observations arising…

I’m uncertain as to your concept of ‘inline text’ in this context. In Word’s outline view each paragraph is a discrete outline item (regardless of level). There is no discrimination between the object’s title and any ‘inline’ text except in the eye of the reader. Thus you might presume the object’s title to be the first sentence and the remaining text, if any, to be inline ‘text’ but I don’t think that’s Word structure per se. This is relevant when it comes to the transfer process. You need to think about the actual structure (indentation, styling, etc.) of the text as opposed to simply how it looks in the Word UI as that structure is what you have to use during the transfer process. I suspect this rather than the overall volume of notes will be the part requiring some trial and error. Once figured, scaling up is simply applying automation where possible to the process.

Tinderbox’s outline view will show a note’s title ($Name) and wrap it if necessary to fit the view pane (left pane of a document window). The inline text would, in normal Tinderbox practice form the body text ($Text) of the note and which is displayed—for the selected note—in the right-hand pane of the Tinderbox document window. Whilst, you can use long titles (i.e. so more is displayed in the outline view I find it overloads the view making it busy and robbing it of clarity (of course that’s only a personal view - others may differ).

Tinderbox can ingest a Word (docx) document and retain most formatting (think this support is in v7+ - so older documentation may not describe this). Tinderbox notes’ text is an RTF text space allowing for styled text, rulers, etc. However depending on the degree of formatting (desired for retention) you may do better transferring plain text which allows more scope for moving data in bulk and doing some re-organisation en route and Yes, BBEdit is a very useful tool in that workflow (I use it a lot). In terms of your question #1, I think you have a choice. Assuming the word docs aren’t complex you probably can drag them in a batch at a time but I do wonder if that’s best. Sometimes the look/layout of the text simply impedes comprehension. My gut feeling is just importing the existing Word docs into Tinderbox notes will obfuscate the real task of interlinking and analysing the data. To give a better idea I’d prefer to see source docs indicating the style/amount of the source (word) content.

Question #2. “What are TB’s capabilities along this line?” Many and varied. Tinderbox is essentially a toolbox for notes (thoughts), supporting a variety of styles of work and thinking. Few if any will use every view type or feature in the app. Instead, people tend to use a subset of features that fit their style of thought, or the task at hand (the feature set needed may vary by task). Rather than ingest everything and then try to figure out structure I’d import a small set of representative data and experiment in ways you can organise and link the data to fit your needs. Doing this will also help inform how you import the data, as you may wish to use that process to attach extra metadata to the notes at the time of import (put another way, adding extra ‘fields’ of data ).

Besides BBEdit, if you don’t directly import Word docs you may also find yourself using VBA (on Windows) or AppleScript (Mac) to automate text export from your source word documents.

HTH.


(Dominique Renauld) #5

I don’t know if my experience can be useful for your project, but I’ve been taking notes and reading notes with Tinderbox for years now and it works very well when it comes for me to search for specific notes, whether it be with cmd + F or using the Attribute Browser. As you can see on those two screenshots, I can search into several files I created since 2011 and select in which container I want to look at.


(Paul Walters) #6

Perfect project for Tinderbox – be prepared to invest considerable time. Not because of Tinderbox – but because of the nature of these efforts.

Seems like you have two phases in mind: “the grind” and “discovery”. The grind is the grunt work of ingesting, breaking down, creating rough groupings of notes, etc. This is all in prep for the discovery phase where you go deep into finding relationships and mining your history for gems and insights. You might consider getting a trial of DEVONthink Pro Office, which is not cheap but offers lengthy trials before deciding if the investment is worthwhile. DEVONthink can accept all your files, convert word to RTF or plain text when needed, and help you aggregate / disaggregate your notes into groups. If you’re into tags, your can tag your notes.

From there you can drag your notes into Tinderbox for the exciting work (which DEVONthink is not good for, IMO) of discovery. If you used tags in DEVONthink they will carry over to Tinderbox if you drag note files into Tinderbox.

That is how I do projects like yours.

(I think I would take a subset of the external note documents – maybe a six-mnonth segment, or a topic-oriented segment – and experiment first. Twenty-five years of data can be daunting unless the project is sliced into chunks.)


(James Vornov) #7

I would agree with Paul that collecting all of the files in DEVONthink is a useful intermediate step. The question I’m always asking myself as I accumulate references, notes, writings and drafts is whether the material is reference material or current work.

I try hard these days to keep an active TBX map as current work, pushing old notes into archive containers where they are easily found. All of my reference is in DEVONthink and it’s easy to link back to those materials in my TBX notes. DEVONthink is nice because it will reference the files on disk and I can leave the originals alone, sorting and organizing in DEVONthink itself.

It’s probably has to do with the way I work and my tendency to get bogged down with old drafts and outlines when I’m trying to produce new work. I like clean sheets of paper where I can look at references, but then think and write with a fresh point of view.


(Andrew McDowell) #8

Word outline view provides for a ‘text’ level, which appears in italics I think (it’s been awhile). You can have a text entry (as many paragraphs as desired) under any heading. They can be folded under so you don’t see them. You can raise the level of a text entry to that of a heading, or lower the level of a heading to that of a text entry. Don’t know that Word calls it inline text, but that’s what I mean by it, perhaps erroneously (as far as terminology).
It’s a nice feature for a single-pane outliner, and not many outliners have it, but the outline thing will not be a deal-breaker. I’ll play with it and see what happens. Thanks for the comment on BBEdit. Think I’ll try it. I know I’ve got a big learning curve ahead of me, and a big work load once I get the system put together. But I don’t have a deadline (except that I’m aging!) so I’ll peck away at it.


(Andrew McDowell) #9

I’m going to go for it. I’ll likely need detailed help along the way, but this community is clearly accessible and helpful!


(Mark Anderson) #10

BBEdit used, until very recently, have a free sibling called TextWrangler. The two are now combined as BBEdit, with some extra features being only available in the paid-up licensed version. I mention this as many articles around online may refer to ‘TextWrangler’, in which case now just read that as ‘unlicensed BBEdit’

I pay for the app as I use it a lot, although I mainly use the features available for free. IOW, you should be able to get by with the (non-expiring) BBEdit demo for your data transfer until you know if you want/need to licence it.


(Andrew McDowell) #11

From your comment and jjvornov’s and others I’ve seen, it appears DEVONthink partners well with Tinderbox on projects involving a lot of files. One concern I have about going to a proprietary file format is just that. I’d really like all my text data backed up in a standard and eternal file format like plain text, which will never be obsolete and will always be readable by almost any editor. Would there be a way to create external file backups for all my text data in Tinderbox? Don’t know if DEVONthink might come into play with such a system.
Also, BBEdit. Clearly a very useful, capable tool (is it free?). What does it do that DEVONthink doesn’t (or does better than DT)?
Thanks, Andy


(Douglas Johnson) #12

Building on Mark A’s thoughts, I found BBEdit invaluable. First, the material to be moved into TBX was into a series of text files. Then, BBEdit facilitated “cleaning up” or “preparing” the content so it imported smoothly with a minimum of fuss in TBX. Cleanups included line breaks, indents, paragraph breaks, URLs, etc. Took a bit of thought to examine the patterns in my text and create small scripts and saved “Find & Replace” operations but it was far easier than I was led to expect. For example, working with the Import capabilities in TBX, I could easily control where separate notes were going to be created. Best of all, the content is now in TBX and 100% usable. The project converted files in the now-defunct Notebook application.

I, too, found BBEdit valuable enough to obtain a license. Don’t be surprised, however, it is powerful enough that you will need to get up to speed with it. The support materials are terrific.


(Paul Walters) #13

DEVONthink does not use any “proprietary formats” – it stores documents internally in a database (or “indexes” external files) in those files’ native formats. Word as Word, Excel as Excel, and so on.

Tinderbox files are XML – you can store Tinderbox files inside DEVONthink if you want. (I always keep my Tinderbox project files in their related DEVONthink database.)

You can export data from Tinderbox in a large variety of formats – plain text, HTML, OPML, RTF, DOC – and you can invent your own export formats if you wish.

DEVONthink and BBEdit are different categories – DEVONthink is for document management and discover; BBEdit is for plain text editing and manipulation. It is very much NOT free (nor is DEVONthink). Sorry – really great Mac apps have a price of admission which is sometimes high because they are really great apps.


(Mark Anderson) #14

Concur. BBEdit, for full use of all features does require a licence. However, since the unification of BBEdit & TextWrangler (c. BBEdit v11?), most features are available in the demo mode (as it now fills the old ‘free’ TextWrangler role); the demo doesn’t time-expire. Of course if you do use the app and find it worthwhile you should purchase a licence.

As the BBEdit/Textwrangler pairing have been around for a long time, many references on this are out of date (Note to self: check aTbRef is updated on this).


(eastgate) #15

Tinderbox files are XML, and are quite easy to parse or even to edit by hand. XML is robust; short of nuclear apocalypse (alas not nearly as unthinkable now as in recent years!) you’ll be fine.

DEVONthink uses Core Data, which is deeply baked into macOS, as well as a variety of standard formats (RTF, text, pdf, jpg) for data it contains. Again, you might be vulnerable to apocalypse or to a generation-length shift in environment, but all of your Macintosh work (and everyone else’s) shares this vulnerability.


(Andrew McDowell) #16

Thanks again, glad to ditch my worry about proprietary file formats. Not withstanding my current almost complete ignorance of all things Mac, let alone Tinderbox, DEVONthink and BBEdit, I feel very inclined to go with those three for this project. I don’t want to lean too much on the community here before I’ve even got Mac OS up and running, but I would appreciate some overview concepts on how DT and TBx best work together, how giving the role of document manager to DT aids the TBx project, etc, and any deeper comments about the uses of BBEdit, which I gather is a powerhouse of a file editor.
With my project, it seems I can either do my splitting (dividing the text in ‘large’ files into numerous text items) in TBx, or before importing. Which I do may make a significant difference in labor as the project goes on, so any thoughts on ‘best practice’ for this part of the project would be welcome.
Ok, one more question: Does TBx read/import Vim files? I’ve taken to doing all my journaling in Vim. Like it a lot.
Thanks, Andy


(Paul Walters) #17

It sounds like @andyjim has a lot of files that need breaking down into pieces. BBEdit’s “Text Factory” is excellent for that type of work. Unlike DEVONthink, which has several user-contributed scripts for exploding files, Text Factory is an actual purpose-built and developer-supported text manipulation tool.


(Andrew McDowell) #18

So it sounds like BBEdit may be the optimum tool for dicing the files into individual text notes and perhaps adding the headings required by TBx. I expect these files will require to be prepped manually with separator characters (and anything else?). Then maybe insertion of header fields and any other characters required by TBx can be automated/scripted with BBEdit (btw I’ve never done scripting either, but that’s ok)? Then at that point I would go through the files again, manually entering the header content for each item/note. And maybe at this point the file is ready for import to TBx and TBx can break it out into individual notes decked out with the tags, etc that I want? Is there anything else that would pay to do prior to import? Anything that once I get into TBx I might wish I’d done at this point?

And one more pre-flight double check: Is doing this prep work with BBEdit really a more efficient way to go about this than using tools available in TBx (of which I know nil) if I were to just import the raw data files as is and then do the same work within TBx?

Ok, assuming the item delineating and headering done (whether in BBE and/or TBx), does DEVONthink still play a very useful role in source material (files) discovery/viewing/management/archiving and also in TBx file storage/management? Maybe it will considerably facilitate the files side of the administration and workflow of the project? I’m sure there are probably project administration and workflow tools in TBx too, but I’m not quite ready to dig into that yet. I have other questions about files too, but I’ll hold them for now.

I’m getting excited. Planning to install Mac OS asap to get this under way. The learning curve looms steep, the work load large, but I’ll just go a step at a time, a day at a time. It may take years to complete the project, which has to be relegated somewhat to ‘spare time’ (what’s that?). And of course completing the data entry part is just the first phase. Really I’ve been wanting and needing to do this for many years, but never found the tools. I’m beginning to think I am finally finding them.
Andy


(Mark Anderson) #19

[Disclaimer. There are URL references to aTbRef here, a free online resource. I (@mwra) am its author.].]

There are probably 3 ways to pull your data into Tinderbox:

  1. Drag/drop Word files onto the Tinderbox view pane (v6.6.0+). This makes a note with the DOCX’s filename as the note title $Name and an RTF rendition of the file’s text (some mote exotic formats may not transfer & I’ve not tested embedded image ingest). As discussed above this probably is the worst route, even if it seem the simplest.
  2. Tab-delimited or CSV ‘spreadsheet’ type import (see here and here).
  3. Plain [sic] Text drag-drop import and Explode.

Of these I think #3 is probably of most use, implying you may want to use some form of Word-based scripting to export the plain text for import. I say ‘probably’ on the assumption you may be wanting to split out elements of each Word doc into >1 Tinderbox note. (We can get into the mechanics of how later, though reading up on Tinderbox’s explode feature will help).

If you’ve lots of metadata (or data ‘fields’) relating to your word docs you make want to generate some tabular data. It’s not possible to merge imported data to existing notes but by seeding your data with some form of unique ID you can import data using methods #2 and #3 then copy data via method #2 to notes generated via #3 before deleting the #2 method generated notes.

What’s being discussed here falls in what in common parlance is ‘power user’ behaviour, if only due to the scale (number of source docs). The ordinary user who only has 10s or 100s or notes doesn’t need to get to concerned with this thread.

Tinderbox and/vs DEVONThink. Different tools for different purpose. DEVONThink (or whom there are far more expert users here in the forum) is best thought of as an ‘everything bucket’. I’d say like Evernote but unlike that app DEVONThink is not a roach motel for data - it can both enter and leave. So, rather than try a pull, for example, a PDF into Tinderbox it is better to store it(s location) in DEVONThink and then link to that DEVONThink asset in your Tinderbox note. In you note you write about that asset and using a DEVONThink link you can access the original PDF if needs be. Tinderbox is best at emergent structure/incremental formalisation of text, i.e. finding relationships hitherto unseen. You can put image and things like RTF tables into a note’s text for display/remembrance purposes, but I think (and please correct me fellow forum users) Tinderbox is primarily a textual analysis tool.


(Paul Walters) #20

It depends on your data, which we have not seen but from the initial discussion we might assume is sometimes complex, and certainly evolved over the 25 years of your data collection phase.

As @mwra points out, Tinderbox’s “explode” feature can deal with a lot of tasks involving breaking larger texts into smaller. (There are features to go the other direction and aggregate notes, also.) I suggested BBEdit’s Text Factory because they can be very sophisticated when dealing with modifying or breaking down complex documents.

Along the way you might want to pick up some rudimentary skills in regular expressions (RegEx) which can be helpful in the kind of work you are contemplating.

I’d insert again a comment: start small. You’re dealing with a repository of 25 years work and you might very well spend the next five years steadily working through your project. You’re also learning a new OS, new software, and, above all, new and occasionally sophisticated techniques. So experiment and get your wind up before marching into the deep woods.