Function takes a very lot of CPU load for long time

So, this works:

var:list vHolidays = [2023-12-25;2023-12-26;2024-01-01;2024-03-29];

$StartDate = date("21 Dec 2023");
$EndDate = date("3 Jan 2024");

var:number vCount=0;
vHolidays.each(aDay){
  if(date(aDay)>date($StartDate-"1 day") & date(aDay)<date($EndDate+"1 day")){vCount-=1;};
};
$Text = vCount;

as $Text is now “-3” and there are indeed 3 (UK) public holidays (as defined overall in vHolidays) between the test’s start and end dates.

Why am I subtracting a day from the start and adding 1 at the end? Whilst == and != ignore the time element of the date, the greater-/less-than/or equal to operators (< > <= >=) do allow for time. So, to ensure a match to holidays on the start/end dates, we set one day outside to get a true count.

Yes - I’ve not forgotten the fact revealed by @eastgate (up thread—who knew!) that adding an Interval of 1 day is faster/more-efficient than via a “+1 day” string, in date() but that would be more code :slight_smile: But, that approach could be done.


Separately, in this

I don’t think the $Doorlooptijd =... line needs to be in the loop. Why? the first line of the while() loop increments vSetDagen, so we only need to read it and count the weekdays into $Doorlooptijd once the while is done.

However, that means my holiday test for you code would need to store the holiday count in aggregate for the while loop and then, outside the loop, decrement $Doorlooptijd.

An interesting end-fof-week puzzle!

Have we understood at this point why the original function was taking so long to complete? Or is that still a mystery?

No, but a contributory factor is likely this:

Every loop involved two date to string conversions. By comparison the later code posted at msg #

var:num vDagen = days(vStart,vEnd); //Count the total number of days between start and enddate
...
var:num vTeller =; //counter used for the loop
...
	while(vTeller<vDagen){
...

compares two numbers.

The code improvements in-thread have been discussed absent of a test file so there’s no way to tell if performance has improved even if the code has. The code here would suggest that changes discussed have reduced/removed the slow performance in the OP’s source file.

1 Like

As a developer who’s done a fair amount of date processing - that’s one of the worst conversions to inflict unnecessarily on code. So yeah, I’d expect @mwra’s numeric comparison to have completely eliminated the performance problem.

People often dramatically underestimate how long conversion to and from strings can take when done en-mass. I did a lot of work with geophysical data with up to hundreds of thousands of floating point numbers, read from XML. It wasn’t until a scientist at Geoscience Australia did a formal timed proof that most of those in the project accepted that it was the number parsing that was slow, not the use of XML.

(My developer perspective comment obviously not aimed at @eastgate but it’s surprising how many experienced programmers don’t understand parsing costs.)

1 Like

That mystery has not yet been solved other than the comment from @mwra.

in fact, I’m confused because today my original function only takes 23 seconds at startup to make the calculations. No idea why. I didn’t restart my mac or make any other changes.

Nevertheless I think the function has improved significantly.

I am not a developer but this makes sense.

I think a part of the function is missing here?

    // start end are same day => whole weeks apart 
    vWorkWeekDays = fGetWorkDaysOnly(iStart,iEnd);
    // no remainder work days, so return and exit
    return vWorkWeekDays;
  };

is referring to a sub function “fGetWorkDaysOnly” which I couldn’t find in the file.

I want to further investigate what is causing the changed performance of my original function and will post my findings -if any- here.

Many thanks @mwra. I will adopt your version.

Sadly, yes. That line should read:

floor(days(vOffsetStartDate,iEnd)/7)*5

Fixed in this version of the file: ElaspedWorkDays3.tbx (186.4 KB)

I’ll update the link upthread too.

Interesting! I think active/trained programmers are a minority (even if well represented mote). A wider group as ‘scripters’ (myself included)—able to repurpose code but lacking formal training in the underlying issues, such as the into/out of string conversions you raise.

For non (pro) coders the pinch point is understanding scale. Either more/longer code, or more use of code (e.g. 100s of rules/edicts vs 10s. The power of modern Macs means (unintentionally) inefficient code’s performance is hidden when working with small file. It is only as the code, and number of notes using it, grows that issues tend to creep in. Plus, those least likely to understand the cause are inevitably those most a risk from unexpected slowdowns.

I open to writing some notes for aTbRef of sub-par approaches to avid—at least when scaling up from first use to a mature document.

With @AndyDentPerth’s comments in mind, I took my earlier public holiday test code:

var:list vHolidays = [2023-12-25;2023-12-26;2024-01-01;2024-03-29];
//Set these dates via Displayed Attributes instead
//$StartDate = date("2023-12-21");
//$EndDate = date("2024-01-03");
var:number vCount=0;
vHolidays.each(aDay){
  if(date(aDay)>date($StartDate-"1 day") & date(aDay)<date($EndDate+"1 day")){vCount-=1;};
};
$Text = vCount;

and changed it to:

var:list vHolidays = [date("2023-12-25");date("2023-12-26");date("2024-01-01");date("2024-03-29")];
//Set these dates via Displayed Attributes instead
//$StartDate = date("2023-12-21");
//$EndDate = date("2024-01-03");
var:number vCount=0;
var:interval vInterval = interval("1 day");
vHolidays.each(aDay){
  if(aDay>($StartDate-vInterval) & aDay<($EndDate+vInterval)){vCount-=1;};
};
$Text = vCount;

What changed, and why?

Minimising string to date coercion. This affects the vHolidays List-type variable holding the public holidays. The issue is not where they are stored(here, in a config note, etc.) but that every list item is coerced from String to Date type and every time the .each() is called. Here the loop is called once, but in integrated use it might be called for each of 100s of notes having its workday duration assessed. So, I stored the holiday dates as a date() calls (I don’t think I can ‘manually’ store a Date-type’s value other than as a string. But, this way. once vHolidays is read all further use is iterating Date-type info, instead of calling a date() per list item and again if the .each() loop is run again.

Using Interval type data for in/decrementing dates. Upthread, @eastgate made a passing observation that adding an interval with a value of 1 day is more efficient (at least, if working at scale) than the more normal adding of the string ‘1 day’. See the code above, in the e.each() loop to see the difference. Here we are only using an interval of 1 day so we can store that once and re-use, doing so outside the loop so the variable isn’t remade every loop.

FWIW, the two test can be reviewed in this test doc: Holiday-test1.tbx (104.1 KB). the $Edict of note ‘Test 1’ is the original code, ‘Test 2’ has the optimised(?) code.

An alternate to:

var:list vHolidays = [date("2023-12-25");date("2023-12-26");date("2024-01-01");date("2024-03-29")];

could be:

var:list vHolidays = list(date("2023-12-25"),date("2023-12-26"),date("2024-01-01"),date("2024-03-29"));

I’ve no idea if there is any functional/performance difference.

Yes, converting strings to dates can be slow, but I suspect that the real problem may be deeper. In the original, we have the loop:

while(vLoopDate.format("D-M0-y")!=vEnd.format("D-M0-y")){
   vSetDagen+=vLoopDate.weekday+";";
   vLoopDate = date(vLoopDate + "1 day");
};

If the original vLoopDate for one of the notes was never, this would proceed to step through each day in turn from (I think) March 25, 1CE to vEndDate. That’s roughly 365.25×2024= 739,000 trips through the loop, each of which involves at least two date-to-string conversions and one string-to-date conversion. So that’s 2 million conversions in one direction or the other.

1 Like

Thats exactly what the problem was. Starting the Tinderbox document with the original function took over 12 minutes. I forgot that I had corrected some notes where vLoopdate ($DatumIn) was “never”. thats why I couldn’t reproduce the issue. So…
I tested all three versions of the function with my document changing one note to one with a vLoopdate “never”. The document contains 346 notes that uses the function.
My initial function now took over 20 minutes and then I forced Tinderbox to stop. (didn’t took screenshots :frowning: . Then tested my new ‘improved’ version: I stopped it at 14 minutes. Memory was at 14GB and swap almost at 10 GB

Last but not least I tested the version of @mwra which completed at 21 seconds


Memory was almost at 9GB and Swap at 3,4.

Lesson learned -at least for me- is to improve my testing of functions, especially if a loop is included.

Thanks for all the help.