Tinderbox Meetup Video- Saturday, July 22, 2023: A Review of Bookends--Citation and Reference Management--with Jonathan Ashwell

Tinderbox Meetup Video- Saturday, July 22, 2023: A Review of Bookends–Citation and Reference Management–with Jonathan Ashwell

Level Intermediate
Published Date 7/16/23
Type Meetup
Tags 5Cs, 5Cs Learning and Knowledge Management, BibTex, Citation Management, Mendeley, Papers, Reference Management, Research, Tinderbox, Zotero
Video Length 01:38:49
Video URL - YouTube
TBX Version 9.5
Instructor Michael Becker

In this Tinderbox Meetup, we welcomed Jonathan Ashwell, the founder and developer of Bookends at @Sonny Software.

Jonathan and the Sonny Software team have been developing Bookends and leading the field of reference and citation management since 1983 (yes, over forty years!). As a result, Bookends is one of the world’s leading citation and reference management software solutions. It is designed for casual and the most advanced scholars and professionals striving to gain mastery over the provenance and lineage of knowledge that contributes to their ideation and output. Bookends is a lot like Tinderbox, is has an incredible amount of depth.

In today’s session, Johnathan kicks off his discussion with the history of citation and reference management software, which also includes solutions like BibDesk (BibTex) (2002, Senate (2004-2016), Zotero (2006~), Papers (2007~2016), and Mendely (2007~). Once he completed his history, and what brought him to the field of citation and reference management software, Jonathan dove deep into the power of Bookends. He shows us how to use field flags to granularly control the formatting and output of references in Bookends. We discuss methods and strategies for integrating Bookends in authoring workflows, including using citation keys and processing references in word processing software like Melell. In addition, some unique Bookends features that Jonathan calls out are its ability to create citation keys, integration with library databases to support search, the Bookends floating citations menu bar that allows for access to Bookends alongside any application to insert citations, the Bookends browser, and more. Jonathan also gave us a glimpse into some soon-to-be-released features, including the ability to highly and annotate PDFs and other source material. Jonathan also makes some helpful distinctions between privately developed software like Bookends vs. open-source solutions like Zotero. Specifically, he makes some helpful comparisons on breath vs. depth, quality vs. quantity, and capabilities. We also discussed how to use the best that Zotero offers with the power of Bookends through the use of Bookends watch folders.

References

The following include references introduced by the participants during the discussion and in the meeting chat.

Please comment

Please help with the development of future sessions by answering the three questions below.

  • What were your top 2~3 key takeaways from this lesson?
  • What do you want to learn next? Learn more about?
  • What exercises would help reinforce your learning?
5 Likes

I think you have the wrong video link here

Ah, yes, you’re right. Fixed. Thx.

For those interested in such things, and just to add a bit of “historical” perspective on citation managers, an overview of the “Bibliographic Marketplace” for scientists in 2002 is found at Managing references the easy way: software aids reference organization and bibliography creation. (Lab Consumer) - Document - Gale Academic OneFile.

1 Like

I’m a user of both Tinderbox and Bookends and was reminded that’s what’s true of Tinderbox is true of Bookends:

  • Don’t look for a magic ‘answer’ button. Decide what you need to do and your process constraints/choices and then pick accordingly. Otherwise you’re drowning in choice whilst assuming your perspective is that of any other user. Tinderbox shows, counter-intuitively, that latter is less often the case than imagined. We do the same broad over-arching task but in idiosyncratically different ways. Once one flips from complaining the smart kids are hiding the answers to figuring one’s actual need, enlightenment follows.
  • You don’t have to use every feature in the app. Many of the darker corners are for long term users with different or legacy needs. Tinderbox is long-lived by software standards and Bookends is near twice its age. The upside is that when you find the basics don’t work for you’ll find a wealth of hidden features (as can I attest). If you can’t find them a quick call the Bookends support will set you straight.
  • I think we over-focus on the ingest/output and forget the a reference manager also assists one to generate clean, i.e. accurate, citation info. Even citation data pulled direct from original publishers can incomplete or have errors (elisions, typos, encoding errors, etc.). Thus my Bookends file has 20+ (Smart Group) queries running just to catch various types of errors I’ve noticed consistently in ingested citation data.
  • It is easy to conflate clipping (URLs/bookmarks) with ingesting intentional citation data, even with the same outcome in mind—a resolved correct citation. But, speed of action does not necessarily mean accuracy of action.
  • Did I mention Jon Ashwell’s excellent support of his tool? I need it less often now but it’s always fast and helpful. In the world of big-box stores and online retailers, it’s a joy to find tools like Bookends, Tinderbox, and the like where individuals or small teams produce tools that eclipse corporate offerings.

Good citation (i.e. accurate info) is enlightened self-interest for anyone publishing papers or reports. Doing that without a decent Reference Manager is significantly harder (other RM apps are available).

†. I checked my paper copy of the Whole Earth Software Catalog (1984) and I guess Bookends (b.1983 for Apple IIe) probably just missed the survey period for the book. Interestingly, if people recoil at the term ‘everything bucket’ for useful apps like DEVONthink, in the Software Catalog p.90 lists such apps in the category ‘Garbage Bags’ so we’re a bit kinder these days (?).

‡. Wikipedia’s Comparison of reference management software page.

7 Likes

Citation Style Language (https://citationstyles.org) strikes me as the wrong solution to an outdated problem.

Publishers, aka gate-keepers, like ‘their’ house style but this is only the render of the underlying data for which no agreed, maintained, standard exists. RIS is a form of standard but AFAICT has no formal steward as the vendor who initiated the format it is now gone and there is no RFC or ISO, etc for the format (a bit like the CSV, a standard in name only). BibTeX is similarly lacking in rigour and given that some parts of the LaTeX community are both doggendly pre-Unicode and focus on (paper) typography this doesn’t bode well.

Good citation would be best served by data references that are machine readable and resolved to legacy [sic] style only at render for (faux) paper formats like PDF. But, to do that we have to break a lot of rice bowls. I don’t actually think that affects Reference Manager tools as they are already (under the hood) databases. IOW, they’ve been ahead of the curve for years even if publishers haven’t noticed.

†. See RIS (file format) - Wikipedia.

1 Like

@mwra, I wholeheartedly agree with all your comments.

One thing I wrestle with is efficiency. There is a balance to be had between efficiency and time-consuming accuracy. For instance, I think, in this meetup, we had a good conversation on about the efficiency of the Zotero web clipper without the methods used by Bookends. My takeaway was that Jonathan did not disagree with the value of the Zotero clipper, it is just that his team does not have the scale or resources to compete with the open-source community to build and maintain the clipper. This does not make Zotero better or worse than Bookends or Bookends better or worse than Zotero. The tools have their pros and cons.

As discussed in the webinar, I think there is a workflow that can be created to leverage both Zotero and Bookends. For some, this could have far-reaching benefits as one strives for the balance between ease/efficiency vs. quality/depth of the different tools. Each can play their role for different jobs, as needed. As long as one is open to letting the data be free and possibly residing in both places a lot of value can be created, but it very well may come as a result of cognitive cost and time as one figures it out.

Jonathan and I have agreed to meet up and explore the too tools alongside and with Tinderbox to see how they can work together.

I hope to have Jonathan back soon, as I would have loved to see more example on the use case and workflow of Tinderbox and Bookends working together, wither natively or through tools like Pandoc. We’ll report back once we’ve made progress.

3 Likes

Great topic! I have Bookends but have not explored the edges of the envelope. I had to work while listening the first time through. It’s worth rewinding and listening again to glean tips.

2 Likes

Discovering past ingest had somehow upt ISSN data into ISBN fields (spoted by LaTeX error messages in a paper submission), my Bookends just got yet another quality control query using a ‘Smart Group’, in this case an SQL one as I needed regex. As finding the right target field involved a call to support i’ll share the query here, as seen in the UI when configuring the group:

Code:

user6 REGEX '^\d{4}-\d{4}$'

‘user6’ being the underlying fieldname of the Bookends database that’s normally labelled ‘ISBN’.

The regex looks for 4digits, hyphen 4 digits. As that can occur withn a hypen punctuated ISBN the ^ and $ start and end markers ensure exactly 9 characters form the whole value, which is not valid for ISBN, so no mis-identification.

So, that cleaned 25 misconfigured records, otherwise hard to easily find amidst >2k references in my Bookends file and it reacquainted me with import filters and the fact you can edit/customise them.

HTH

†. FWIW, for lost historical records the ‘user6’/‘ISBN’ mapping was hard-to-find and the part where I had to dial in for help, as the Fields pop-up in the UI above lists ‘user6’ not ‘ISBN’.

3 Likes

Watching the conversation, I was surprised how @satikusala catches up with Jon’s points so fast even if he is not familiar with the software. Yes, I have been the user of Bookends for over 10 years. It is the most sophisticated reference manager in the market. It is more than capable; and as @mwra noted, the developer is a wonderful human being. I have suggested many features over the years, almost all of them are part of the great software. That pleases me a lot.

But, @satikusala, you have a point. BE cannot compete with the Translators (you call them clippers; but, I think translator the formal name of the tools in Zotero) of Zotero. That is why I am using Zotero as my inbox; to collect references from thousands of websites. Those repositories which share pre-print articles are becoming very popular these days to tackle the inaccessibility of published articles. If you want to get the reference, it is a matter of days, if not hours, for sb to write a translator for that repository. Having a scapper for smaller sites like that in Bookends is difficult.

Zotero excels in getting reference data (of various quality) from the web. It is also getting better in a lot of ways. A good pdf reading and annotation capability has been added to it in version 6. That is really great.

Still, BE is miles a head in many ways. Mark has mentioned the regex search; like him, used to catch up faulty entries in my references–and many other things. The Format manager is another great power hard to compare in other reference managers. The format manager can be used to output anything, in anyways you like.

  • BE also has a feature called Term List; which is used to check out the consistency of your entries. Some entries might have an author named as Anderson, Mark C.; while in other entries he might appear simply as Anderson, Mark. The same goes with Journals, for book series etc. You catch those inconsistencies using the Terms menu under Windows.

All the tools under global change are the true hallmark of BE. I was first convinced to commit to BE due to those tools. So powerful tools; To do those kinds of manipulations in Zotero, you need to know JavaScript. Otherwise, you are out of luck.

You can manage multiple libraries in BE; running them side by side. You an move references from one library to the other easily. If you have a lot of references, that is very important. BE is much faster than Zotero as well.

On my old mac (MacBook, from 2011; upgraded to drive and ram), loading 4000 references takes longer time in Zotero than loading 10000 references in BE. I don’t know if the new modernization of the underlying code in Zotero 7 bring a lot of changes. But, in Zotero 6, running over 10,000 references is merely impossible for slower machines (especially with a lot of add ons). BE is incredibly fast.

Another great feature of BE, which I think the Tinderbox team could probably learn from, is how to write an exhaustive and clear user manual. I haven’t seen any well-written user manual as in BE in any other software. I know that Mark’s TbRef is exhaustive, and that Tinderbox’s approach is a bit more complex to explain. But, a bit of more work to make things clearer, as in the manual in BE could benefit a lot of users (especially the beginners).

Overall, this was a very interesting discussion. I enjoyed watching it.

6 Likes

So glad you’re happy. I’ll be meeting with Jonathan over the next couple of weeks to see if we can come up with a possible Zotero/BE workflow. It should be fun. We’ll report back.

3 Likes

Yes, man. Thank you for making this possible.

That‘s excellent, Michael! Thanks for taking that initiative - I look forward to the results!

3 Likes

The way you use Bookends for quality control is very impressive, Mark!
Perhaps it’s a little bit off-top, but could you share other smart groups/regex searches that you use in Bookends? (if it is not a secret)

2 Likes

Since you ask, I have (checks …) 20 Bookends smart groups addressing reference quality. In context, I always use BiBTeX data, where offered, for ingesting data. Pragmatically, I’ve found it tends to have more (field) data and is less prone to format coercion errors.

So:

  1. Citekey errors_

  2. (1.1) No citekey. As I generally write papers in LaTex, BibTeX export needs a citekey

  3. (1.2) Uppercase in citekey. My key format is name:year:ad-hoc, all lower case, no diacriticals in ‘name’. Ad-hoc section is normally initial letters of the (short) title—essentially enough to ensure the cite is unique in the database.

  4. (1.3) Period+semi-colon in citekey. I think this was from fixing a bulk legacy import issue.

  5. (1.4) No colon in citekey—see above.

  6. (1.5) No colon\d{4}:\d pattern in the cite key. Actually also addresses the last above, so the preceding test could be weeded.

  7. ‘Times quoted’ in Notes. Drek arising from (mainly, IIRC) pulling info from Google Scholar/Books.
    The rubbish comes along comes for free: I do not need/want as it is out of date as soon as stored. (and I have trust issues about over-valuing relative citation counts).

  8. ‘Price in Notes’. Same issue of import source dumping unwanted marketing info in the new record’s Notes.

  9. ‘Proceedings in title’. Avoiding backstory, BibTeX import for some sources dumps the ‘journal’ title onto the end of the article title. This needs to be reviewed and the Proceedings’ volume title info needs cut/paste to the correct field.

  10. Record of ‘InProceedings’ type but no Volume field info. A version of the fail arising in the item above.

  11. Keywords field has ‘DOI’ or ampersand character string. A sign of badly structured import data.

  12. Commas in Keywords field. Bookends expects one keyword per input line.

  13. Escaped Ampersands. Indication the publisher last updated their website in the 1990s.<sigh>. Basically all fields need lose inspection for un-needed escaped/mal-encoded characters.

  14. A wider search derived from last above. This could now cover both, but sometimes finer grained checks help to flag publishers who just don’t care how bad their citation. IOW, if you see [publisher] is the source you know it can’t be trusted.

  15. minus in page ranges (should be en-dash). I know this shouldn’t matter, but the eye can’t unsee such inattention to detail if let slip into output. Sometimes this is publishers not properly parsing BibTex where the LaTeX -- which renders (in LaTeX) as an en-dash is passed verbatim into non-LaTeX data.

  16. Lower case after hyphen. Grammatically correct normally, in academic publishing in most cases a colon is a proxy ‘title : sub-title’ divider. Again lazy publishers who don’t update their software are still doing thinks like storing titles in all -lowercase so their ancient software can for case insensitive search. Often, you need to source the actual paper to recover the correct title casing. That’s difficult as English is a prime academic publishing language and for many it isn’t their first language so things like knowledge of capitalisation is sketchy. No matter: I try to use the author’s title capitalisation not what ever bowderlised version comes out of the publisher’s steam-powered software.

  17. ‘In proceedings’ with no DOI. Mainly me picking up DL.ACM cites where for some reason the DOI didn’t get captured. I feel in the 21st Century there is simply no excuse to not provide a DOI if one is available. TBH, if there is a ;link on a citation, that’s what most readers will —in the first instance_follow to check a citation. A to-do here is to find all references without a DOI/publisher’s listing that could/should have one. DOI for citing a lost ancient text? No. Citing a scientific paper published last year? No such excuse.

  18. Missing abstract. Not always available, but I occasionally review the list to back fill some I don’t have.

  19. No keywords. Perfectly possible—in terms of author’s own keywords. Who knows what Cicero might have chosen. But for current works they are often available but not passed on.

  20. Caps in keywords. Bookends’s keyword list separates different cases of the same word, which is unhelpful for counts of matches. So, item #11 above notwithstanding, I use all lowercase keys for more accurate (<sigh>) recall.

  21. ISSN in ISBN field. Just, ‘no thanks’. Again, rubbish ingest from poorly configured source export of data.

A weakness Bookends does have is mis-parsing author names passed via BibTeX where—annoyingly—there are two different-but-accepted methods (‘Last, First’ or ‘First Last’). Bookends gets the latter wrong, for reasons I don’t fully accept. This is harder to test/fix via automation. I might spot, if looking, that “Anderson Mark” is probably a transposed name. But ,a name form a nationality I don’t know? Much harder. Part of the reason is it seems a core/original community for Bookends is in medicine where they either don’t use BibTeX or use it less, so this mis-parsing problem isn’t pressing. However, I notice all my DL.ACM-sourced names come in correctly and all those from IEEE are transposed, needing manual review and correction.

Do I have to do all the above? No. But the last issue cause me to automate spotting badness where I could. It’s never nice to see a citation with author’s names reversed because the paper’s author was too lazy to check their reference data. So, I choose to do the above in order that what I cite for others is at least not making already ropey citation data any worse. We otherwise validate the low standards we encounter without comment/correction.

I’m sure there are other tests I could add, but HTH. (Sorry for the long post, but I was asked to ‘share’ my searches).

3 Likes

This is very helpful, Mark! Many thanks - will steal some of this once I’m back from holiday! :wink:

Mark, thank you so much for such a detailed answer! Your use cases are truly remarkable!

I don’t use BibTeX or LaTeX myself. But it’s still interesting to learn about the methods of using Bookends to solve specific problems.

I wonder how do you get rid of unnecessary imported data (7 and 8 in your list)? Do you delete them manually?

It would also be interesting to know how you get rid of unnecessary symbols in keywords? (I also recently encountered a similar problem).

Unfortunately, I can’t easily compose queries for SQL/Regex searches. The only one I do is a query to find keynotes without line breaks (Bookends uses such breaks as a separator between keywords).
I apologize for the offtopic, but the subject seemed very important. Thanks Mark!

Manually until I learned to use global edits. But, generally I use I’m cautious. There’s scope for doing lots of damage through over zealousness.

As I generally review often, manual review edit suffices and I’ve only c.2.5k records at present. I started using these groups once I realised I couldn’t ‘just’ trust primary sources like publishers to provide clean/complete citation data. The process was realising my records were full of errors then figuring out how to spot ones that needed attention then looking at the size of tasks and deciding whether to group replace or review and edit manually. Overall the process was good as it made me think about the import process and start using some of the less obvious features. For instance, you can edit import filters (or make your own) as well as customise output filters.

I don’t use LaTeX (for academic writing) because I like it, but simply because using Word is a bigger waste of time as it has so many bugs once you dig in. For basic writing I’ll use Tinderbox or Scrivener or Nisus, etc.

† I currently use texstudio (https://www.texstudio.org, free) - but there are plenty of others. I also use Overleaf as well if collaborating on a paper.

1 Like

Thanks again Mark! very impressive! There is a lot to learn.

1 Like

I use this regex to catch the nonstandard names (Mark Anderson kind)

authors REGEX ‘(?m)^(?:(?!,).)*$’

And, then run Standardize Names… in the global toolset to correct them.

1 Like