FWIW: Breaking a FullName attribute into constituent parts

As a learning exercise, I created a person-type attribute and wrote a stamp with action code that breaks the FullName attribute into first name, middle name, last name, and suffix attributes. Had to figure a way to account for punctuation and Roman Numerals. The result is a bit prolix but it gets the job done.

Demo of FullName disaggregation.tbx (169.6 KB)

4 Likes

Really nice and thoughtful Paul! Thank you for sharing.

A great example beause Full Name is common and complicated because of its variety
 esp with prefix and suffix’s. What if we compared vName.at(0) with known_prefiix and the the (-1) with the known_suffix.
Then grow the list as needed.

known_prefixes = [“Mr.”; '“Mrs.”; “Dr.”; “Ms.”; “Prof.”]
known_suffixes = [‘Jr.’; ‘Sr.’; ‘III’ ; ‘IV’;‘Ph.D.’]

Question (with respect and probably further behind in regards to programming experience): Why would you not use the list syntax
vName[0] 
etc instead of the current vName.at(0)?

Great job Paul.

Tom

Interesting. This might also be of passing interest: Myths Hackers Believe About Names. Some are purely programming concerns but some of the early ones are pertinent.

It is unsafe to assume people have only zero or 1 middle names. Last names can be long like “De Sales La Terriere”, " Thurn und Taxis" or “Temple Nugent Brydges Chandos Grenville” (note: not all ‘mutli-barrel’ names are hyphenated). In some countries like Spain a person may use two last names (one from each parent) e.g. JosĂ© Luis [RodrĂ­guez Zapatero]. A prefix can be multi-word ‘von der’ or ‘van der’ and depending on locale such a prefix is part of the last name and yet may/may not be used in sorting names. In China, the ‘last name’ is written first. And so on


In truth the internet age and the digital primacy of (American) English conventions in software code and design mean many of the above are falling away as software can’t cope with ‘normal’ names from other locales. So much for the global village. :slight_smile:

I don’t think the above invalidates your approach but it does mean you do need to review the output for mis-parsing. Recently completing a 2.9k database of international names gave me pause for thought about my starting assumptions as mis-naming, even innocently done, can cause upset.

Nicely done.

I have a script that attempts this as well—it does not cover as much as yours, but it works.

var:string vFullName=$Name;
$ShortTitle=$Name;
$FullName=$Name;
$FirstName=vFullName.extract("([^\s]+)");
$LastName=vFullName.extract("([^\s]+)$");
$MiddleName=vFullName.extract("(?<=\s).*(?=\s)");

([^\s]+)

This RegEx matches the words without spaces. Tinderbox is saving the $1.

([^\s]+)$

This RegEx matches the Last Word.

(?<=\s).*(?=\s)

This RegEx captures everything in the middle.

I realize that this DOES not address issues like people with two last names or two first names, nor does it capture the prenominal or postnomials. However, in my use case, I have some control over if they are in the name or not. It might be fun to build some of the conditions to capture those.

  1. Don’t forget the lastWord( ) and firstWord( ) operators.
  2. split( ) is useful, too.
  3. Doing this with the streaming operators makes an interesting exercise. Though I was skeptical at first, I do believe it can be done without undue contortions.
1 Like

Some links for the latter:

1 Like

How do you handle these two variable situations that occur in some names and not others:
1. Names +/- prefix: “Mr., Mr, Mrs., Mrs, Ms, Dr.”
2. Names +/- a suffix: “Jr, II, MD, PhD
”

Tom+

I’ve not played with it. If/when I’d do I’d put in some if conditions, e.g., first work has “.” then X, and/or less the x characters, if name has more than y words, etc.