Tinderbox Meetup- Meetup 15OCT22 Parsing Highlights in SKIM

Tinderbox Meetup- Meetup 15OCT22 Parsing Highlights in SKIM

Level Intermediate
Published Date 10/15/22
Revision 1
Tags 5CKMEl, 5Cs of Knowledge Management and Exchange, BetterBibTex, Explode, Highlights, PDF, Skim, Stamp, Stream, Zotero
Video Length 01:33:53
Video URL Tinderbox Meetup- Meetup 15OCT22 Parsing Highlights in SKIM - YouTube
Example File TBX L - Meetup 15OCT22 Parsing Highlights in SKIM.tbx (1.1 MB)
TBX Version 9.3
Instructor Michael Becker

For this meetup, we set out to review Michael Beckerā€™s method of parsing notes in Tinderbox that were exported from the SKIM PDF reader. We accomplished this and more.

We reviewed,

  1. Exporting notes , reviewed highlighting and exporting notes from Skimā€ 
  2. Importuning notes , dragging and parsing the exported Skim notes into Tinderboxa. Exploding notes , explained the process to explode notes and the applications of RegEx (the optimal RegEx for the explode is ^# .b. Parsing notes , reviewed a parsing stamp produced by Becker. We discussed how to work with the sRGB colorspaces coming out of Skim and how to handle these colors so that TBX can understand them. Note: if you want to use this code in another TBX file youā€™ll need to add a $CitationKey, $ColorRGB, and $Type attributes to your file.
  3. Associating notes with reference , Becker explained his process for using a parse reference function to parse citations dragged into a TBX file from Zoteroā€” (Learn more: see his two videos on this topic: Video 1, Video 2).
  4. Stamp Optimization , we then optimized Beckerā€™s stamp. As youā€™ll see in the meetup video we optimized Beckerā€™ original text to remove the need for $ColorRGB.
  5. Stream operator, we then took it even further and created a stream operator that automatically explodes and parses all the Skim notes with one stamp.

ā€ Youā€™ll want to use the Skim export template that Becker has processed. The code is in Skim Template. See the Skim Wiki to learn how to modify Skim templates. This template needs to be saved in the Skim Application Support folder in your macOS Library, e.g. e.g. /Users/USERNAME/Library/Application Support/Skim/Templates/TBXnotesTemplate.txt. ā€”For Michaelā€™s reference parser to work you must ensure your file has all the necessary attributes created, that your Zotero is using BetterBibTex and that your Zotero has Michaelā€™s customized Zotero export parsers installed. Reach out to Michael via the forum if you need help with this.

Your Support will Be Most Appreciated

Support Michael Becker:

PayPal Donation, show your support by making a one-time or ongoing donation; or,
Become Beckerā€™s Patron, Becker holds a monthly call for his Patrons and offers private Tinderbox support and consulting.

Support Mark Anderson:

Mark Anderson is the author of the Tinderbox reference file (aTbRef) and an avid supporter of the community. A donation to Mark via PayPal to help him continue to develop and maintain aTbRef would be most welcome.

1 Like

Thanks for this, Michael. I know this video was done a while ago, but thatā€™s what they mean about the ā€œlong tailā€ of information, I guess.

I had a little trouble using the instructions with Bookends/Skim import (and was too lazy to go back and watch the video all over again). I put together a mini-TBX with just the Bookends/Skim bits and a step-by-step process in written form.

Key points:
* The Skim export file needs a blank line between the collection array tag and the custom note separator, otherwise the highlight notes run into the next note separator and explode does unintended things
* I modified the note title construction stamp to grab the first 6 words, rather than a given number of characters. Incomplete words in the title are like an itch I canā€™t scratch, LOL.

In hopes this will help some other newbieā€¦

BookendsImportTest.tbx (425.2 KB)

1 Like

Sweet!!! Nicely done. :slight_smile:

Iā€™ve been working on adapting Beckerā€™s workflow for my use with PDF Expert for its availability on iPad, iPhone, and MacBook.

I attempted to use the Skim template developed by Becker, but I canā€™t seem to locate or successfully create the directory ~/Library/Application Support/Skim/Templates on my MacBook. Any pointers would be appreciated, as this method was superior in getting the links to point to the exact highlight in the PDF.

The Python script I have shared works but includes unhighlighted or non-underlined text surrounding the intended content. Everything in the frame, I suppose.

Be sure to change the names of the notes resulting from the explode processes due to this behaviour: it seems there are specific characters or sequences that, when appearing in a noteā€™s $Name or $Text, cause the TBX file to become corrupted and refuse to reopen after saving and closing. For example, this issue occurs when the following sequence is included in the $Text of a note:

$Name=$Text.replace("\[|\{|\}|\\|\]|\||ā€œ|ā€|ā€˜|ā€™|<|>|&|\/|:|=|\||\$|%|ļææ|~|`|#|\^|\*|\(|\)|;|\?|!","")

The same issue occurred when I exploded and parsed the following JSON, saved it, and then closed the file. This happened even when I changed the names of the resultant notes, such as removing {, for example. Notably, the Json within $Text of a note does not corrupt the file.

[
    {
        "page": 1,
        "text": "the absence",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=1"
    },
    {
        "page": 1,
        "text": "of noninvasive gold standard diagnostic tests for",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=1"
    },
    {
        "page": 1,
        "text": "HFpEF is more apparent than ever.",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=1"
    },
    {
        "page": 1,
        "text": "port protein-2 inhibitor (SGLT2i) trials and their widespread therapeutic implications,2,3 the absence",
        "type": "Highlight",
        "color": "Green",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=1"
    },
    {
        "page": 1,
        "text": "de\ufb01nition, anchored clinically to the syndrome of heart failure (HF) caused by structural and/or func- tional cardiac abnormalities, with HFpEF de\ufb01ned as a left ventricular ejection fraction (LVEF) $50%",
        "type": "Highlight",
        "color": "Green",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=1"
    },
    {
        "page": 1,
        "text": "elevated natriuretic",
        "type": "Highlight",
        "color": "Green",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=1"
    },
    {
        "page": 1,
        "text": "peptide (NP) levels or other evidence of congestion.4",
        "type": "Highlight",
        "color": "Green",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=1"
    },
    {
        "page": 1,
        "text": "prevalence. Relying on easily measured biomarkers of congestion, such as the NPs, misses approximately one-third of all affected patients and may dispropor- tionately affect patients with obesity or African ancestry.5,6 Indeed, HFpEF remains underdetected in",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=1"
    },
    {
        "page": 1,
        "text": "with obesity,",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=1"
    },
    {
        "page": 2,
        "text": "decreasing, the incidence of HFpEF speci\ufb01cally continues to rise (Figure 1).9,10 Across 4 community-",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "w27 cases per 10,000 person-years.11",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "risk of HFpEF at age 45 years is >10% in both men and women.14 Taken together, these data suggest",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "future, affecting approximately 1 in 10 adults during their lifetime. Therefore, cli-",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "(Figure 1).15,16 In one study, women out- numbered men 2:1 with respect to HFpEF",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "lifetime risk estimates of HF: the lifetime risk of HFpEF is nearly double that of HFrEF among women (10.7% vs 5.8%),",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "that average event rates for \ufb01rst HFpEF hos- pitalization were highest among Black women (7.4 per 1,000 person-years [95% CI:",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "6.7-8.1 per 1,000 person-years]) when compared with Black men (6.2 per 1,000 person-years [95% CI: 5.5-",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "White women",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "groups and was particularly pronounced among Black women.17 These racial disparities in HFpEF",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "lower NP levels in Black individuals compared with other race/ethnic groups, which likely leads to underdiagnosis.6",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "for both HFpEF and HFrEF, including older age, hy- pertension, and ischemic heart disease.18 However, it",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "obesity,",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "is important to note that obesity, metabolic dysfunction, and physical inactivity appear to spe- ci\ufb01cally predispose to HFpEF more so than HFrEF",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "A number of speci\ufb01c diseases cause the clinical syndrome of HF in tandem with a normal LVEF, but have their own unique pathophysiology, natural his- tory, and treatments (Table 2). These etiologies",
        "type": "Highlight",
        "color": "Yellow",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "tory, and treatments (Table 2). These etiologies should not be considered to represent true \u201cgarden variety\u201d HFpEF because of their distinct features and treatments, and the present text does not apply to these \u201cmasqueraders.\u201d In this JACC Scienti\ufb01c State- ment, we examine the epidemiology, pathophysi- ology, diagnosis, and treatment of HFpEF in the context of these recognized limitations and the available evidence.",
        "type": "Highlight",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 2,
        "text": "dence rate in 2000-2009 vs 1990-1999. Similarly, the prevalence of HFpEF is increasing and is expected to exceed that of heart failure with reduced ejection fraction (HFrEF) in the near future.12 Speci\ufb01cally,",
        "type": "Highlight",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=2"
    },
    {
        "page": 3,
        "text": "differentially associated with future HFpEF, particu- larly among women vs men.19 In addition, physical",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=3"
    },
    {
        "page": 3,
        "text": "larly among women vs men.19 In addition, physical inactivity was associated with higher risk of HFpEF compared with HFrEF in a dose-dependent manner.20",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=3"
    },
    {
        "page": 3,
        "text": "HFpEF (STAGE B). How individuals progress from risk factors (stage A) to cardiac remodeling and preclinical HFpEF (stage B) and eventual HFpEF (stages C and D) remains incompletely understood. In contrast to stage B HFrEF, which is easily recognized as asymptomatic left ventricular (LV) systolic dysfunction, readily prompting a change in clinical management, stage B HFpEF remains nebulous. The most recent consensus document de\ufb01ned patients with stage B HF as individuals free of HF symptoms, with evidence of structural heart disease (eg, LV hypertrophy, chamber enlargement), abnormal cardiac function (eg, elevated \ufb01lling pressures or diastolic dysfunction), or elevated NP or cardiac troponin levels.4",
        "type": "Highlight",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=3"
    },
    {
        "page": 3,
        "text": "Evolving studies will need to more clearly de\ufb01ne how to apply these criteria to preclinical HFpEF. The diagnosis of stage B HF requires establishing the absence of HF symptoms; however, clinicians may be less likely to rigorously test for symptoms such as ex- ercise intolerance in patients with abnormal cardiac structure/function associated with HFpEF (eg, LV hy- pertrophy, left atrial enlargement, diastolic dysfunc- tion) compared with patients with asymptomatic LV systolic dysfunction. If exercise intolerance is present,",
        "type": "Highlight",
        "color": "Orange",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=3"
    },
    {
        "page": 3,
        "text": "systolic dysfunction. If exercise intolerance is present, determining whether it is due to cardiac vs extrac- ardiac abnormalities can also be challenging.",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=3"
    },
    {
        "page": 3,
        "text": "Furthermore, it is known that NP concentrations are lower among individuals with overt HFpEF vs HFrEF.1",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=3"
    },
    {
        "page": 3,
        "text": "Whether the same cutpoints in preclinical HFpEF vs preclinical HFrEF adequately capture risk remains to be seen, but seems unlikely as roughly one-third of patients with stage C HFpEF have NP levels below typical thresholds used for HF diagnosis.5 A recent",
        "type": "Underline",
        "color": "Orange",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=3"
    },
    {
        "page": 3,
        "text": "typical thresholds used for HF diagnosis.5 A recent study has shown that even among patients where HFpEF has been excluded, an increasing burden of HFpEF risk factors and functional abnormalities based on echocardiography are strongly correlated with he- modynamic and aerobic limitations typical of (but less severe than) those observed in patients with overt, stage C HFpEF.21 As we consider the role of potential",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=3"
    },
    {
        "page": 3,
        "text": "stage C HFpEF.21 As we consider the role of potential preventive therapies for HFpEF, clearly de\ufb01ning preclinical HFpEF will be paramount.",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=3"
    },
    {
        "page": 4,
        "text": "disease to multimorbid patients with obesity, dia- betes, and metabolic syndrome,8,33 focusing greater",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "systemic in\ufb02ammation,",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "endothelial",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "dysfunction,",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "altered myocardial energetics,",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "abnormalities in skeletal muscle.34-37",
        "type": "Underline",
        "color": "Red",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "factors in\ufb02uencing the extracel-",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "lular matrix",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "those intrinsic to the cardiomyocyte",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "itself.37",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "Myocardial \ufb01brosis",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "total collagen volume",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "tissue.38-41 Both collagen type I and type III expres- sion and tissue staining are elevated in HFpEF and",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "reduced collagenase,",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "metalloproteinase-1,",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "metalloproteinase-1, but increased tissue inhibitor of metalloproteinase expression, which may further",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "altering matrix",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "turnover,",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "turnover, cross-linking of collagen including the formation of advanced glycation end products con- tributes to \ufb01brosis and stiffening. Potential mecha-",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "in\ufb02ammation,",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "diabetes,",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "in\ufb02ammation, diabetes, and neurohumoral activa- tion. Although increases in passive stiffness are",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "\ufb01brosis was only present in the minority (27%) of patients.38 Alterations in isotype expression and",
        "type": "Underline",
        "color": "Blue",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "isotype expression",
        "type": "Underline",
        "color": "Orange",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 4,
        "text": "phosphorylation of sarcomeric proteins such as titin",
        "type": "Underline",
        "color": "Orange",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=4"
    },
    {
        "page": 5,
        "text": "passive LV chamber stiffness in patients with HFpEF.34,37,42,43 Heightened pericardial constraint",
        "type": "Underline",
        "color": "Orange",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=5"
    },
    {
        "page": 5,
        "text": "HFpEF.34,37,42,43 Heightened pericardial constraint (eg, due to increased epicardial and pericardial fat in",
        "type": "Underline",
        "color": "Orange",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=5"
    },
    {
        "page": 5,
        "text": "OBESITY-CARDIOMETABOLIC STRESS. Myocardial stiffness estimated based on echocardiography is",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=5"
    },
    {
        "page": 7,
        "text": "Blood and plasma volumes increase with greater body weight,47 a relationship that is steeper in women than",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=7"
    },
    {
        "page": 7,
        "text": "weight,47 a relationship that is steeper in women than men,47 and patients with obesity-related HFpEF",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=7"
    },
    {
        "page": 7,
        "text": "men,47 and patients with obesity-related HFpEF display higher blood volume and greater sensitivity of \ufb01lling pressures on plasma volume.44,48 Even in the",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=7"
    },
    {
        "page": 7,
        "text": "absence of frank volume overload, abnormal distri- bution of blood volume (increased stressed blood volume) due to impaired venous capacitance is pre- sent in HFpEF, and notably is associated with increasing BMI (Figure 2), resulting in even greater",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=7"
    },
    {
        "page": 7,
        "text": "from a contemporary obese HFpEF cohort, car- diomyocyte hypertrophy and \ufb01brosis were common in HFpEF, although the severity of each was mild to",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=7"
    },
    {
        "page": 7,
        "text": "exhibited substantially reduced right ventricular (RV) systolic sarcomere function, but less passive car-",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=7"
    },
    {
        "page": 8,
        "text": "Myocardial work increases and ef\ufb01ciency de- creases with increasing BMI and insulin resistance in",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=8"
    },
    {
        "page": 8,
        "text": "women without HF,51 which may relate to greater myocardial reliance on fat vs carbohydrate oxida-",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=8"
    },
    {
        "page": 8,
        "text": "myocardial energetics are abnormal in HFpEF52",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=8"
    },
    {
        "page": 9,
        "text": "HFrEF and controls, with uniquely up-regulated genes in HFpEF enriched for mitochondrial adeno- sine triphosphate synthesis/electron transport, path-",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "endoplasmic reticulum stress, autophagy, and angiogenesis pathways.",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "HFpEF down-regulated genes included",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "obesity, diabetes, chronic lung",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "disease and hypertension,",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "induce a proin\ufb02ammatory",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "state",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "coronary microvascular endothelial",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "dysfunction develops,",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "leading to downstream reduc-",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "tion in nitric oxide (NO) bioavailability,",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "tion in nitric oxide (NO) bioavailability, cyclic gua- nosine monophosphate (cGMP), and eventual protein kinase G activity in cardiomyocytes. Studies of LV",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "increased oxidative stress",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "depressed NO signaling resulting in in\ufb02ammation",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 9,
        "text": "maladaptive changes",
        "type": "Underline",
        "color": "Purple",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=9"
    },
    {
        "page": 13,
        "text": "dalities. Alternative provocative maneuvers have been tested, including saline infusion and passive leg raise. An increase in PCWP $18 mm Hg with saline or $19 mm Hg with leg raise may also help to",
        "type": "Squiggly",
        "color": "Yellow",
        "link": "file:///Users/stephen/Desktop/borlaug-et-al-2023-heart-failure-with-preserved-ejection-fraction 4.pdf#page=13"
    }
]

Are there specific characters or sequences known to corrupt TBX or XML files?
ExtractingPdfAnnotations.tbx (540.3 KB)

1 Like

If the Skim folder is not created by default, you need to create it yourself. I manually created the Skim/Templates and Skim/Scripts folder under Application Support for skim to automatically use them.

I remember reading this years ago, then realized that I needed to do this on my new mac (thanks to your prompting) and it worked like a charm.

Tom

The action code example looks like the $Text of a stamp note? What is the reason for this regex (which incidentally doesnā€™t properly escape all regex special characters. Oddest is the ā€˜replacement characterā€™ (Unicode FFFD). If you have that in your input stream, Iā€™d argue you need to do proper sanitisation of your source text before it hits Tinderbox. I normally use BBEdit for such tasks, as often ā€˜manualā€™ triage/inspection is needed.

The JOSN also looks weird, e.g. line #32 which has this:

"text": "de\ufb01nition, ......

which is one of 35 such issues (all various Uniclode encodings). Some decades of data-munging experience suggests the source of this is mis-detection/encoding of ligatures in a PDF. The point to bear in mind is there is no deliberate error here. (Presuming a Skim annotation is the source) the first misstep is trying to encode ligatures (\ufb01 is the ā€˜ffiā€™ ligature) into a plain text form, presumably because the appsā€™ annotation display font doesnā€™t support ligatures. The app is trying to help but in an unhelpful way as the enext consumer doesnā€™t have the same assumptions.

Pragmatically, Iā€™d take the JSON and use a BBEdit text factory to ā€˜decodeā€™ the ligatures back into usable plain text characters, e.g. replacing characters \ufb01 with ffi, etc.

Tip: If you really want the ligatures, Iā€™d do the interchange in an RTF format.

If you simply copy and paste (Cmd+V) text from [other app] into Tinderbox all sorts of cruft may come along for the ride. Unless you need the source formatting (e.g a web pageā€™s flaky CSS styling) then Paste-and-match-style (Cmd+Opt+Shift+V) avoids a lof of problem.

But if the source text already has things encoded in the text steam that are mis-encoded or shouldnā€™t be there, then youā€™ve work to do.

Summary, as tips:

  1. If you hit these sort of text encoding issues donā€™t rush to assume the problem is in the context of the app you are using.
  2. If you encounter weird text errors, your best approach is to investigate is a text editor (such as BBEditā€”other editors are available!). Not least such tools give you ways to expose the text encoding, non-printing characters in the text stream etc. This also lets you work out if the issues are tractable within Tinderbox or whether pre-processing is used.
  3. The fact that most times text copy/paste ā€˜just worksā€™ hides a host of assumptions apps are forced to make. When these get out of alignment (no one userā€™s or appā€™s fault) weirdness happens. If that does occur, take a moment to think beoyond just hammering on the current app UI.

Tinderbox files are always written in the UTF-8 Unicode text encoding. XML requires that the text be valid; there exists a Unicode code point that represents ā€œinvalid characterā€, and so a note which includes that character in its name or $Text cannot be opened.

In anything approximating normal use, thereā€™s no way to get that character into either a name or the text.

I have concerns about your ā€œļææā€.

But in either case, divide-and-conquer will rapidly isolate the problem point.

Thank you all for the feedback.

I am not a sophisticated TBX user and have only rudimentary coding knowledge (which is why I did not already know about illegal characters in XML). However, I use TBX every day, often to accomplish powerful tasks that save me a lot of time. Over the past three years, I have learned how to wield the power of TBX by following Beckerā€™s videos and meetup recordings. I have had TBX for even longer (I learned about it from Beck Tenchā€™s videos) but was paralysed by awe.

Let me explain how these characters inadvertently ended up in the TBX file.

I used a Python script via the run command to extract PDF annotations. The PDF in question, whose extract appears in the JSON, is a JACC paper from 2023 - relatively recent. The Python script extracted it with those characters.

I was running the Python script from within TBX, with the results automatically imported into TBX. The workflow was set up so that I did not need to review the output before exploding the data. However, every time I exploded and saved the file, it would fail to open. It then occurred to me that the issue might lie in the names. I developed a regex to remove any special characters from the names, as I had heard in meetups that such characters could cause problems. Despite this, the problem persisted.

I opened the corrupted file in BBEdit (which I have been using since learning about it in this forum three years ago) and copied portions of the XML into different TBX files. Through this process, I was able to isolate the issue: the hypothesis being that special characters were causing the problem.

As you can see in my shared working TBX file, I was able to write a simpler regex to remove any non-letter or non-number characters from the output.

I shared it here in case someone else is following a similar workflow in the future and encounters the same issue. I had several corrupted files before resolving it.

Thanks, Tom. I tried doing the same, but when I attempt to create a file named template in the application support folder, it fails with the warning that files should be downloaded into this folder.

Hummā€¦ worked like a charm over here. No warnings at all.
~/Library/Application Support/Skim/Templates is my path
I think skim has a very specific specifications for the templates to work. I remember it taking me quite a while to get one working. It has been years since I revisited this topic.

Here is the wiki page that has the documentation:
https://sourceforge.net/p/skim-app/wiki/Templates/

Here is a github page where someone 15 years ago built a template ā†’ html. I had not tried this one

Have you looked at the security settings for that computer?

Tom

Great. Let me study these.

Meanwhile, here is the exact message I get when I try to create a folder named Skim into the application support folder The operation can't be completed because "Skim" needs to be downloaded.

The skim app itself has already been installed and is working well.

I am using MacBook Pro, 2019, macOS Sequoia 15.1.1

As you note, youā€™re in-export in this area (no critique in that) but unfortunately youā€™ve been given a script that isnā€™t fit for the purpose for which you need to use it.

So, the problematic characters in the JSON are being caused by assumptions being made by the python script. Therefore, it might be worth checking with itā€™s creator/maintainer to see if there is a flag for ā€˜unpackingā€™ ligatures into discrete characters and/or avoiding unicode symbol codes being included in the output.

The uncomfortable lesson: ā€˜textā€™ is just ā€˜textā€™ until it isnā€™t. Iā€™d repeat thereā€™s no deliberate error by anyone or any app here, but itā€™s actually a good example/lesson in how solutions donā€™t always travel well. :slight_smile:

But, if text misbehaves, look further than assuming it is simply a bug/misconfiguration is the context (app) where the text ā€˜corruptionā€™ occurs. Here, the problem is the Python program and the assumptions made by the person who coded it. It is likely their use didnā€™t encounter this issue so they were none the wiser about the Unicode issue or simply werenā€™t bothered. Sometimes it can be easier to get almost-good text and clean it after. Good to hear youā€™ve got BBEdit in your toolbox.

TipS for BBEdit & finding problem items:

  • Text ā–ø Zap Gremlins can sanitise a text by changing problematic characters into a (user chosen) character or simply deleted. It might affect the text but it shouldnā€™t break import.
  • In the lower border of BBEdit windows is a pop-up that lets you set the assumed format of the file (XML, TXT, JSON, etc.) as well as the encoding (UTF-8, etc.). The format selection invokes some code-colouring and sometimes odd colour patterns help see where corruption has broken the format.
  • Upside-down question marks indicate characters the current font canā€™t show. Once found, use Window ā–ø Palettes ā–ø Character Info to find out what the character is.

Here is an example of the last tip above, when trying to figure out the issue with the regex:

Thank you. Actually, I was already familiar with the BBEdit process, as Iā€™ve been reading your posts and following the meetups for a while now. (Thank you for aTbRef; I often use it to implement what I learn in the videos.)

I was not able to improve the disparaged Python script. However, it works acceptably well for my needs. As a result of the discussion, Iā€™ve refined my stamp (applied at explode) to allow only characters that would reasonably appear in the literature I am reading:

$Text = $Text.replace("[^a-zA-Z0-9\\s,.:;!?()\\[\\]{}ā€œā€ā€˜ā€™ā€¦\\-ā€“ā€”+/=*%<>Ā±Ā°ĀµĀ§Ā¶Ā©Ā®ā„¢|Ī±Ī²Ī³Ī“ĪµĪ¶Ī·ĪøĪ»Ī¼Ī½Ī¾Ļ€ĻĻƒĻ„Ļ…Ļ†Ļ‡ĻˆĻ‰ā‰¤ā‰„#^~`]", " ");

TBX will now help me accomplish the work I need to do.

Iā€™ll still try to figure out Skim templates, but I do almost all my reading and highlighting on the iPad and use the computer only for processing the notes.

1 Like

Iā€™m glad this is satisfactory, but I wonder whether this is using a sledgehammer where a gentle tap is enough.

Itā€™s fine to have unicode characters in the name or the text of a node. The only thing to beware is that not every sequence of bytes is valid unicode. The Unicode replacement character \uFFFD, in particular, means ā€œwhatever generated this text couldnā€™t do that, and gave upā€:

The replacement character ( ) (often displayed as a black rhombus with a white question mark) is a symbol found in the Unicode standard at code point U+FFFD in the Specials table. It is used for problems when something is unable to render a stream of data to a correct symbol. ā€“ Wikipedia

Deleting the ligatures (such as de\ufb01nition) is going to leave you with gaps like definition āž› deinition, which is unpleasant.

I donā€™t anticipate this interrupting my work too much.

If a file fails to open, it would cost me significantly more time, as Iā€™d need to reconstruct it from scratch. Now that I understand the likely culprits, I can exclude them during the explode process.

While cleaning the extract in BBEdit wouldnā€™t take long, itā€™s a step that TBX can help me avoid.

Once the text is exploded and notes are coloured and marked with badges, Iā€™ll be able to identify any gaps or issues as I focus on the notes of immediate interest. Since I annotate the PDFs myself and am familiar with the material, it will be straightforward to spot whatā€™s missing and add it as I interact with the notes.

Admittedly, this approach may not work for everyone.

Utopia for me would be an iPad version of Skim or if PDF Expert, Bookends, or DEVONthink Pro included the annotation colours with the annotation extract. I have learned today that Zotero annotation extracts include the colours, but I have already invested money and time into the other apps