How Do I automatically keep track of my daily writing progress?

I would like to have an automatic count of the number and percentage of words written each day.

In the screenshot below, The effect at the bottom is what I want.

But I’m confused about how to do that.

Automatic word count and percentage statistics.tbx (106.4 KB)

Here, is one solution (example file at end of post):

Constituent parts of solution:

  • The completion data is put in the title using a Display Expression
    • As the Display Expression involves some calculation and because you might run a number of these, it is more efficient to calculate the numbers and make up the overall string and store it, then make the dd just report that stored value.
    • Here we do the calculations in a rule but if running a lot of those , you could move the rule code to an edict instead (edicts can be force refreshed on remand and otherwise run on creation, on doc open and then once an hour)
  • Each note stores its word count in $WordCount (auto-calculated)
  • A container can fetch the total word count of its child notes via sum(), e.g. sum(children,$WordCount). Not used here but sum all descendants of the container use sum(descendants,$WordCount)

I used your sample TBX for the text, first using a ‘lipsum’ generation to make the desired number of words for each child note so the $WordCount is correct. to prove this look at the Display Expression for each child note, which is $Name+" "+$WordCount.

I then made 2 user attributes via the Inspector:

  • A Number-type TargetNumber. This holds the target word count you desire. this will be set in the container note where we do the calculation, but could be stored in, and referenced from, any note if you prefer.
  • A String-type ReportString. This holds the pre-calculated title that is used as the Display Expression
    • Thus our example container’s true $Name is “5000 Words” but due to the Display Expression we see “5000 Words (5000/3922) 78.44”

The rule is this:

// make variable to hold aggregate child word count
var:number vProgress = sum(children,$WordCount);
// make variable to hold that count as a % with 2 DPs
var:string vPct = ((vProgress/$TargetNumber)*100).format(2);
// start building ReportString value to use in Display Expression
// take existing title
$ReportString = $Name;
// add target number and actual count
$ReportString += " ("+ $TargetNumber+"/"+vProgress+") ";
// add percentage
$ReportString += vPct;

(N.B. this code in the demo file is not commented - it is only here for explanatory purposes)

Lastly, set the container note’s Display Expression to $ReportString in the Text Inspector:

As we’ve already calculated % progress, we can throw in a map view progress bar for free:

This is done by adding this to code to the end of the container’s rule:

$Pattern = "bar("+vPct+")";

I added a map view tab to the test doc so you can easily see this. Setting a pattern on the container does mean the container sub-map viewport is not shown. So, if you need the latter, don’t bother with the extra line of code.

Your example TBX with the above changes: Automatic word count and percentage statistics-demo.tbx (172.2 KB)

5 Likes

Thanks to @mwra for providing the TBX file, which allows a novice idiot like me to stand on the shoulders of a giant and understand how the code works. According to another post, I made some adjustments, counting the number of words in the title of my notes, added a percentage symbol.

However, when the note is in Chinese, it can not be counted correctly. It can only Count English, not Chinese. In the screenshot below, there are four characters in Chinese.

var:number vProgress = sum(children,$WordCount + $Name.split(" ").count);
var:string vPct = ((vProgress/$TargetNumber)*100).format(2) + '%';
$ReportString = $Name;
$ReportString += " ("+ $TargetNumber+"/"+vProgress+") ";
$ReportString += vPct;
$Pattern = "bar("+vPct+")";

I tried to change split to length, but it failed. How can I make the code support the statistics of the number of Chinese title words?

Automatic word count and percentage statistics-demo.tbx (99.3 KB)

Why is there $WordCount for text content and no $NameCount for text titles?

To be honest, as a fellow user, I’m not sure. I don’t know if the word count is Eastgate code, code in the underlying Apple Frameworks, or both. For instance the count may run on spaces in the $Text or use additional heuristics and unless there is language-specific support any heuristics are likely to reflect English / roman script word boundary conventions).

I tested in a note using some copied Chinese text (note: I can’t speak/read Chinese) and it looks like my (UK locale) Tinderbox v9.5.1 doesn’t recognise words in Chinese:

Oddly, in the $Text shown above there are 24 non-space characters and 5 spaces. So, I’m guessing $WordCount uses more than word breaks (dictionary look-up, perhaps?).

More generally, where there is locale-based support (e.g. currency symbols, date formats, decimal delimiters), these follow the locale of the host OS. Is is possible to set a different locale in Tinderbox - see locale().

Most likely it is simply because no one has ever expressed the need for it—which doesn’t mean that you don’t now! However, I’ve been supporting the Tinderbox community forums since 2004 (2 years after Tinderbox launched) and I can’t recall a title word count ever cropping up. It turns out to be quite simple if we use a space as a word boundary:

$MyNumber = $Name.split(" ").count;

So:

But on a title copied from your example file I get 3 words:

But, why? It turns out§ that you aren’t using a comma-and-space between the two Chinese words, but an uncommon (to me!) unicode character ‘FULLWIDTH COMMA’:

In the grab I’ve selected the whole character. So although it is only one unicode character it looks on screen like two characters! OK, how to fix? Happily, String.split() can use a regex to split the source string, we’ll change the code above to split on either a space or a FULLWIDTH COMMA character:

$MyNumber = $Name.split("[ |,]").count;

Result:

I assume the FULLWIDTH COMMA' might be used in writing Chinese in European horizontal style and there may be other characters you might need to add to the .split()regex for use in your documents. To add to the existing 2 items, in the code, _before_ the closing] type a pipe (|vertical bar) and then the desired character. As an example below, I'll add hash character#` as a third split option:

$MyNumber = $Name.split("[ |,|#]").count;

If you need more detail, read up on regular expressions (the web has numerous tutorials for all sorts of levels of expertise - pick one that you can follow!).

Now that’s sorted, if you truly want, for each child the note’s $Text word count plus its $Name word count then you want

var:number vProgress = sum(children,$WordCount + $Name.split("[ |,]").count);

Does that help? Meanwhile, for deeper info about support for Chinese word count (and possible detection of Chinese numbers) I’d suggest contacting tech support at tinderbox@eastgate.com.

†. Decades back I remember re-checking software-derived word counts by eye and they were clearly using word breaks (spaces) so were invariably slightly off (by ± a few). No enough to rarely matter but often not 100% correct. You only have to add in unusual acronyms—or these days brandnames—and it can get confusing, even for a human—is that a word or not? I suspect a locale-based dictionary may also come into play

‡. But as you are using a mix of English and Chinese in the document this may just give you a different problem!

§. I selected the note title’s text and pasted it into BBEdit and looked at the character count and saw I was one short of the expected number. Using BBEdit’s Character Inspector palette, I realised the comma+space was unicode character number FF0C. I then used the Unicode Checker app to check (as illustrated above). Both apps can be used free: BBEdit has limited features if not registered (but most you might need) and UnicodeChecker is free to use.

A very quick thought, after all the above. Leaving aside the issue of word count on Chinese script, don’t overlook simple inserting the title’s text as the beginning $Text. that way it is in the scope of $WordCount.

I can see scenarios where it might not work, but i think it is worth consideration if word counts are really important for your work. HTH.

Quite right: $WordCount isn’t language aware, and it won’t work well with languages that use word-spacing conventions drastically unlike English. (I think, a couple of years back, we did patch $WordCount to handle French end-of-sentence punctuation, which was confusing it.)

We might be able to handle Chinese word count; investigation proceeds.

1 Like

Reflecting recent findings I’ve just updated my article on $Word Count.

TL;DR:. English language-using folk should be fine but for other languages, ‘your mileage may vary’. To be fair though, I don’t think this is something Tinderbox can or should shoulder alone. Hopefully, advances in the OS and AI may improve the word boundary algorithms for a wider range of languages.

You’ll be fine in Romance languages, and also in Russian and other Slavonic languages. I’ll look into Chinese.

1 Like

Dear Dr. Mark Anderson,
Thank you for showing us the example sentence
at the beginning of the 1,000 character sentence. (Sennjimonn in Japanese)
In the example sentence, it is a correct array of 4 characters each.
It is a verse consisting of 250 short phrases, each consisting of four letters.
As a model of calligraphy, I also practiced several years ago.
I will add the meaning for your reference.
天地玄黃 The sky is dark and the earth is yellow.
宇宙洪荒 The universe is infinitely wide.
日月盈昃 The sun rises and dips to the west, and the moon waxes and wanes.
辰宿列張 The stars spread out in constellations.
寒來暑往 When the cold comes, the heat leaves.
秋收冬藏 Harvest in autumn and store in winter.
Respectfully, WAKAMATSU
PS
Senjimon-Chou-Su-Gou

My favorite Chinese calligrapher.
趙子昂
(In Japanese reading, it is pronounced as Chou-Su-Gou.)

2 Likes

I tried this code:

var:number vProgress = sum(children,$WordCount + $Name.split("[ |,]").count);
var:string vPct = ((vProgress/$TargetNumber)*100).format(2);
$ReportString = $Name;
$ReportString += " ("+ $TargetNumber+"/"+vProgress+") ";
$ReportString += vPct;
$Pattern = "bar("+vPct+")";

Automatic word count and percentage statistics-demo (1).tbx (108.7 KB)

In the screenshot below, the content of $name is “你好,你好”, which is 4 words, but the split of the code is separated by symbols. It actually counts 2 words, which is inaccurate.

I asked ChatGPT, which gave several pieces of code, but could not count the number of $Name words

Compared with the following paragraph, tinderbo does not support the length method:

var:number vProgress = sum(children,$WordCount + $Name.length(" ").count);
var:string vPct = ((vProgress/$TargetNumber)*100).format(1) + '%';
$ReportString = $Name;
$ReportString += " ("+ $TargetNumber+"/"+vProgress+") ";
$ReportString += vPct;
$Pattern = "bar("+vPct+")";

In this code, it tries to use a match function to count the number of $Name words, but it can’t either:

var:number vProgress = sum(children,$WordCount);
var:number vName = sum(children,match($Name, "[\\u4e00-\\u9fa5]").count);
var:string vPct = ((vProgress+vName/$TargetNumber)*100).format(1) + '%';
$ReportString = $Name;
$ReportString += " ("+ $TargetNumber+"/"+vProgress+") ";
$ReportString += vPct;
$Pattern = "bar("+vPct+")";

It tries to use Javascript, but it can’t:

var:number vChineseCount = javascript("$Name.match(/[\u4e00-\u9fa5]/g).length;");
var:number vProgress = sum(children,$WordCount + vChineseCount);
var:string vPct = ((vProgress/$TargetNumber)*100).format(1) + '%';
$ReportString = $Name;
$ReportString += " ("+ $TargetNumber+"/"+vProgress+") ";
$ReportString += vPct;
$Pattern = "bar("+vPct+")";

It tries to pass a regular expression, but it can’t:

var:number vProgress = sum(children,$WordCount + $Name.replace(/[^\x00-\xff]/g, "xx").length/2);
var:string vPct = ((vProgress/$TargetNumber)*100).format(2) + '%';
$ReportString = $Name;
$ReportString += " ("+ $TargetNumber+"/"+vProgress+") ";
$ReportString += vPct;
$Pattern = "bar("+vPct+")";

It tries to split the string $Name into character arrays and count them, but it can’t either:

var:array vNameArray = $Name.toArray();
var:number vCharCount = 0;
for each (vChar in vNameArray) {
  vCharCount++;
}
var:number vProgress = $WordCount + vCharCount;
var:string vPct = ((vProgress/$TargetNumber)*100).format(1) + '%';
$ReportString = $Name;
$ReportString += " ("+ $TargetNumber+"/"+vProgress+") ";
$ReportString += vPct;
$Pattern = "bar("+vPct+")";

它认为tinderbox有一种内置的$Chars属性获取字符数,也不行:

var:number vProgress = sum(children,$Chars);
var:string vPct = ((vProgress/$TargetNumber)*100).format(2) + '%';
$ReportString = $Name;
$ReportString += " ("+ $TargetNumber+"/"+vProgress+") ";
$ReportString += vPct;
$Pattern = "bar("+vPct+")";

And custom functions:

var vProgress = 0;
var children = children(this);
for (var i = 0; i < children.length; i++) {
  vProgress += children[i].getAttribute("WordCount");
  var name = children[i].getAttribute("Name");
  for (var j = 0; j < name.length; j++) {
    if (name.charCodeAt(j) > 255) {
      vProgress += 2;
    } else {
      vProgress += 1;
    }
  }
}
var vPct = ((vProgress/TargetNumber)*100).toFixed(2) + '%';
$ReportString = $Name;
$ReportString += " ("+ TargetNumber+"/"+vProgress+") ";
$ReportString += vPct;
$Pattern = "bar("+vPct+")";

And this:

var title = "$Name";
var titleLength = 0;
for (var i = 0; i < title.length; i++) {
if (title.charCodeAt(i) >= 0 && title.charCodeAt(i) <= 128) {
titleLength += 1;
} else {
titleLength += 2;
}
}

var vProgress = sum(children, $WordCount) + titleLength;
var vPct = ((vProgress / $TargetNumber) * 100).toFixed(2) + '%';
$ReportString = $Name;
$ReportString += " (" + $TargetNumber + "/" + vProgress + ") ";
$ReportString += vPct;
$Pattern = "bar(" + vPct + ")";

It seems that there is no effective way to count the number of Chinese words of $Name

TL;DR Any progress here is going to need a new release (written as at v9.5.1)

OK, as at v9.5.1, I don’t believe there is fix. Testing is in (private) progress, but from doing some reading, there aren’t fixed rules for detecting word breaks in Chinese text (in short: Chinese words are one character except when they are not). So it is not entirely clear that there is an accurate method of doing Chinese word count, let alone a mixed script one

As well as $WordCount, there is $TextLength which is all characters including spaces, punctuation and line breaks. But that doesn’t help.

I also discovered (and re[ported) that using $String.size, to get the number of characters in a string, overcounts for (some) double-byte unicode characters (e.g. Chinese).

If you want to extract only the Chinese characters from a string (in this case $Text) this may work:

if($Text.replace("[^\p{han}]","").contains("\p{han}+")){$Text = $0;};

I say ‘may’ as I’m there are differing input methods for Chinese text (pinyin, etc.) I’m unclear what \p{han} matches.

Closing note: given that I can’t read/understand Chinese and am only a fellow user (so can’t ‘see’ inside the app), I’m somewhat limited in the depth to which I can check things.

1 Like

The current backstage release has a new $WordCount algorithm that appears to handle Chinese correctly. (We will do word count in the dominant language of the note, so mixed texts may report incorrect word counts.)

2 Likes

My main requirement is to count the number of Chinese characters in $Name, and $WordCount can already realize the statistics of Text.

Looking forward to the next version of tinderbox