Thinking About Crosstabs

eastgate · January 31, 2020, 7:06pm

Resuming the interesting discussion of cross tabs here: A new opportunity for cross-tabulations within Tinderbox - #36 by mdavidson

Question: We could imagine wanting crosstabs for many kinds of attributes. Strings and numbers, certainly, but also sets. Perhaps for dates, too. Each of these requires different infrastructure. Which have the greatest priority? For which do people have immediate need?

Question: Consider a numeric attribute, $OvenWidth, which specifies the external width of various ovens. Ovens come in discrete widths — 24", 36", 48" and 60" — and so even a large collection will have only a few values. We’d want a cross tabs row for each value.

On the other hand, $Latitude varies continuously; we’d want to sort our latitudes into ranges. How many “bins” do we want? (Yes, we could make it a preference, but it’s good to know how to decide!)

PaulWalters · January 31, 2020, 7:13pm

I have used Excel extensively for cross tabulation for years – it’s 90% about numbers, 10% about strings, 0% about dates (except occasionally for headers).

Just my 2 cents, but without a huge investment, would never match Excel’s facility with cross tabulation. Personally, I think there are more important priorities.

eastgate · January 31, 2020, 9:14pm

I agree that we don’t want to replace Excel here. I suspect there’s a potential synergy in having even a crude cross tabs facility in Tinderbox if it allows access to the underlying data.

For example, suppose we have a class of 18 students, who have test scores for a midterm and a final exam. As a general rule, we’d expect that students who did well on one would do well on the other. It would be interesting to have a quick way to see which students did much better or much worse on the final exam than their midterm might have suggested,

We could do this with an agent, of course, if we knew how to define “much better”. But if we don’t know much about the score distribution, that would be tricky.

mdavidson · January 31, 2020, 9:15pm

A useful starting point is provided by the Attribute Browser view and the way it works. As I understand the AB view:

The summary horizontal rows (groupings) are determined automatically from the Attribute chosen from the AB menu above, the range of values of that attribute and the attribute type. For a string attribute you have one summary row for each unique string + [No Value], for numbers you use bins (automatically calculated ?), for sets and lists the default is to use each combination of the list elements.
The vertical columns are chosen by the user by hand and correspond to individual attributes. Unlike rows there is no corresponding automatic generation of columns based on the range of values.

In terms of the types of attributes, my preference would be for simply mono-valued attributes such as strings which carry information such as names of people, countries, cities, satellites, authors etc…, booleans (true, false) and numbers (especially integer numbers which can represent years or months, eventually floats). Using lists and sets seems more complex and requires some thinking and usage cases to drive the implementation.
Lastly the actual computation of the statistic for each cell needs to be addressed. Expressions such as count apply to most if not all situations, average only makes sense with numerical values.

mdavidson · January 31, 2020, 9:19pm

I also agree that we don’t want to replace Excel let alone some of the modern data analytic software. However, basic summarising capabilities using cross-tabulation within TB would help making sense of larger collections of notes, provide insight into trends which would otherwise be invisible or require a series of carefully tailored agents.

mdavidson · January 31, 2020, 9:34pm

Here an easy sample. For every meeting I have at work I usually write down some details (actions, the issues, outcomes, ideas and so on). I probably generate 20+ notes per week.

I’d like to track the notes that I have with each of my colleagues as an indication. Perhaps I should meet more often with the one or other. Perhaps baed on the table I’d like to compare the notes from January with those in March to chart progress. I’d expect that an overview such as the one below would help.

As was pointed out, having access through action code to the underlying notes for each cell and access to the table results through action codes would all be very helpful indeed.

KELSERJ · March 13, 2020, 12:10am

Sum, Count and Average for numerical fields would be good start. Currently the cross tab will provide a count and percent as a style of the container. Adding the Sum, Count and Average for the Display would be helpful. My current use case is a selection of sales records that show, by sales bucket/stage (cross tab intersection), the count of records and I would like to see the total value as well. The current display shows me values (I have display set to a numeric field) as I scroll over them which is useful but not complete. If I am not hoovering over a bucket I would like to see the Sum or Average or Count of the display field.

Also, when I hoover over the box (cross tab intersection) and scroll it would be nice to choose the HooverExpression so it can change along with what is displayed.