What’s in a Name? AI Meets the Sociology of Naming
What’s in a Name? AI Meets the Sociology of Naming
Introduction: Context of this blog
My first encounter with this topic occurred while working on clustering financial time series to identify relative value opportunities in credit default swaps and develop alternative risk models. Around the same time, I was exploring open data sources and came across yearly datasets released by INSEE, which detailed the number of first names given to people born each year in France. These datasets allowed for the reconstruction of time series over the past century, enabling analyses such as measuring correlations between first names, clustering based on attribution patterns, and identifying similarities and trends.
This became an excellent use case for testing the flexibility of the DataGrapple platform, a tool designed to visualize CDS spread time series and hierarchical clustering from my research. Collaborating with Philippe Donnat and his team at the time, we integrated the first-name time series and hierarchical clusters into the platform, showcasing its visualization capabilities. Alongside this technical work, I developed an interest in the sociological aspects of naming and purchased Baptiste Coulmont’s book, which delves into the sociology of first names in France.
Fast forward 10 years, I encountered a tangentially related problem while working with my colleagues Abdulla and Khaled, who are pursuing PhDs in Data Science. As part of their research, they explored a dataset containing names (of both individuals and companies) but with few other accompanying features. Recognizing that names often encapsulate rich information about the entity they represent (see the field of onomastics), we explored the idea of using Large Language Models (LLMs) to extract characteristics from names and enrich the original dataset.
Although the research is still in its early stages, Abdulla and Khaled have already summarized some of their findings in two preprints:
- Enriching Datasets with Demographics through Large Language Models: What’s in a Name?
- Mapping Hong Kong’s Financial Ecosystem: A Network Analysis of the SFC’s Licensed Professionals and Institutions
Interestingly, these same tools have the potential to advance historical research, particularly in fields like prosopography, where patterns across groups of individuals—such as their names, origins, and societal roles—reveal broader historical dynamics. To better appreciate the possibilities for combining AI with such research, it’s worth delving into the historical development of socio-onomastics, a discipline that has long sought to understand the cultural and social significance of names.
Introduction: Brief history of socio-onomastics
The sociology of first names and surnames is not a new subject. Studies on the topic date back as early as 1681, such as the Traité de l’origine des noms et des surnoms (apparently available on Amazon).
This field, known as socio-onomastics, emerged as a sub-discipline of sociology in the late 1970s.
In France, sociologist Baptiste Coulmont has studied this topic for over two decades. He authored Sociologie des prénoms (2011, with a third edition published in 2022) and hosts a website illustrating the relationship between first names and results on the French national exam.
In France, a person’s full name (first name + last name) can reveal a surprising amount of information, such as gender, year of birth, social status, and even geographic origin—often pinpointing the region and sometimes even the specific village. This phenomenon is particularly pronounced in France, which boasts more than 900,000 different surnames). However, similar patterns can be observed to varying degrees in other parts of the world.
Key takeaways from socio-onomastics research
- Cultural Influence: Media, religion, and globalization shape naming trends.
- “Kevin” in France became popular in the 1990s due to American pop culture, including actors like Kevin Costner.
- “Chloé” gained traction in part due to its association with the French luxury fashion brand.
- “Marie” remained dominant in earlier centuries because of its connection to the Virgin Mary and Catholic traditions.
- Social Stratification: Certain names are linked to class or community perceptions.
- “Éléonore” or “Hugo” are often associated with upper-class or more academically inclined families in France.
- “Jordan” or “Dylan” are sometimes linked to working-class families, reflecting aspirational naming trends inspired by celebrities or sports figures.
- “Kevin” became stigmatized over time in France, being stereotypically linked to lower social classes.
- Regional Identity: Names can pinpoint geographic and cultural origins.
- “Jean-Baptiste” is more common in rural, Catholic regions of France, reflecting traditional values.
- In Brittany, names like “Gwenaël” or “Maël” are tied to Celtic heritage and local identity.
- Surnames like “Le Goff”: This name is prevalent in Brittany, a region in western France, and reflects the area’s Celtic linguistic heritage.
- Surnames like “Barthez”: Common in Occitan-speaking regions such as southern France, particularly in Languedoc.
- Surnames like “Noël”: Frequently found in northeastern France, particularly in Lorraine, where it often reflects historical Catholic traditions associated with Christmas.
- Surnames like “Dubois”: While common across France, it historically originates from regions with significant woodland areas, such as Normandy or Picardy, as it means “of the woods.”
- Surnames like “Schmitt”: This surname is common in Alsace and reflects the region’s Germanic influence, as Alsace has historically shifted between French and German control. “Schmitt” is derived from the German word Schmied, meaning “smith.”
- Generational Shifts: Popularity cycles reflect evolving societal values.
- “Jacques,” once a staple name for boys in older generations, has largely fallen out of favor in modern France.
- “Emma” surged in the 2000s as a modern, elegant alternative to older names like “Marie.”
- Names like “Théo” and “Léo” are increasingly popular among younger parents, reflecting contemporary naming preferences.
- Ethnic Identity: Names often preserve cultural roots within diverse societies.
- “Mohamed” remains a popular choice among families of North African origin in France, reflecting cultural and religious ties.
- “Giuseppe” or “Enzo” among Italian immigrant communities highlights the retention of cultural heritage.
- “Wei” or “Xiao” are common names among Chinese immigrants, signaling strong ties to their linguistic and cultural origins.
Case Study: Gautier
What does ChatGPT 4o know about “Gautier”?
Prompt:
What can you tell about the first name “Gautier” in this context?
ChatGPT 4o:
The first name Gautier offers an interesting lens for socio-onomastics in the French context, as it reflects cultural, historical, and social influences:
Historical and Cultural Context
- Origins: The name Gautier is derived from the Germanic Waldhari (wald- meaning “rule” and -hari meaning “army”). It has deep medieval roots and is closely tied to the Frankish heritage in France, making it a historically noble or prestigious name.
- Popularity: Gautier has experienced revivals during different periods. In the 19th and early 20th centuries, it was relatively rare, often associated with historical or literary connotations, such as the poet Théophile Gautier. Its popularity increased in the late 20th century as part of a broader trend of reviving older, traditional French names.
Social and Regional Patterns
- Social Class: In modern times, Gautier is often associated with upper-middle-class families or parents seeking a name that blends tradition and sophistication.
- Spelling Variants: The choice of spelling can also reveal social nuances:
- Gautier (traditional and French spelling) tends to appear among families with an appreciation for history or classic French culture.
- Gauthier (with an “h”) is equally traditional but sometimes perceived as slightly more formal or tied to surnames that have become first names.
Regional Relevance
Geographical Association: The name may be more common in regions of France with strong medieval legacies or historical ties to Frankish culture, such as the Loire Valley or Burgundy. However, it is not strongly tied to any one region, making it relatively “neutral” geographically.
Abstract Pattern
Generational Shift: Like many older French names, Gautier has benefited from a revival trend among parents looking for names that are both distinctive and rooted in French tradition. This reflects broader societal interest in blending heritage with modern identity.
In summary, Gautier serves as a good example of a name that is culturally rich, socially nuanced, and representative of broader trends in French naming practices.
Prompt:
If you were to provide an educated guess, what is the most likely year of birth of “Gautier”?
ChatGPT 4o:
Based on naming trends in France, an educated guess for the most likely year of birth for someone named Gautier would fall in the 1990s to early 2000s. Here’s why:
-
Revival of Traditional Names: During the late 20th century, there was a resurgence of interest in older, classic French names, including medieval and aristocratic names like Gautier. This aligns with a broader cultural trend toward reviving names with historical significance.
-
Distinctive yet Timeless Appeal: Names like Gautier became attractive to parents who wanted something traditional yet distinctive, without being overly old-fashioned.
-
Peak Popularity Data: While exact statistics for Gautier may not always be available, names in the same category (like Arthur or Thibault) typically saw a rise in the 1990s and early 2000s.
If we were to pick a single most likely year, 1995 would be a reasonable estimate, as this was a peak period for names of this type in France. However, it would still be commonly used throughout the decade surrounding this time.
Note: Data sourced from INSEE, showing the annual number of individuals named ‘Gautier’ born in France:
Prompt:
If you were to provide an educated guess, what is the most likely average grade for “Gautier” at the French Baccalauréat (French national exam)?
ChatGPT 4o:
Analyzing the performance of students named “Gautier” in the French Baccalauréat reveals notable trends:
-
High Success Rate: In 2023, 100% of candidates named “Gautier” passed the Baccalauréat.
-
Mentions Très Bien: The name “Gautier” is among those whose bearers frequently achieve the highest honors, indicating strong academic performance.
These patterns suggest that students named “Gautier” often excel in their studies, achieving commendable results in the Baccalauréat.
Note: Data sourced from Baptiste Coulmont’s website, illustrating performance of ‘Gautier’ vs. the national average results in the French Baccalauréat:
Let’s now combine the first name “Gautier” with the surname “Marti” to analyze the sociological insights a full name can reveal.
Prompt:
Based solely on the combination of the first name “Gautier” and the surname “Marti,” and without referencing any specific individual, what can be inferred about the person’s likely cultural background, regional origin, and other sociological aspects?
ChatGPT 4o:
Analysis:
The name “Gautier Marti” offers insights into cultural and regional origins:
- First Name: Gautier
- Origin: “Gautier” is the French form of “Walter,” derived from the Germanic elements “wald” (rule) and “hari” (army), meaning “ruler of the army.”
- Usage: Predominantly used in France, “Gautier” is a masculine name with historical significance, often associated with medieval French nobility.
- Surname: Marti
- Origin: “Marti” is a surname found in various cultures:
- Catalan: Derived from the personal name “Martí,” a form of “Martin,” meaning “of Mars” or “warlike.”
- Swiss German and South German: A diminutive of “Martin.”
- Italian: Possibly from the medieval Greek “Martios,” meaning “March,” indicating a connection to the month.
- Usage: Common in Catalonia (Spain), parts of France, Switzerland, and Italy.
- Origin: “Marti” is a surname found in various cultures:
Note: Interestingly, I briefly considered the army as a potential career path, and went through the competitive exams for officer positions in the French Air Force and the French Navy.
Sociological Inferences:
- Cultural Background: The combination suggests a French cultural background, given the French first name “Gautier” and the presence of the surname “Marti” in southern France.
- Regional Origin: The surname “Marti” is prevalent in southern France, particularly in regions like Languedoc and Provence.
- Historical Context: The name “Gautier” has medieval French associations, while “Marti” may indicate Catalan or southern French heritage.
“Gautier Marti” likely denotes a male individual of French nationality, possibly from southern France, with a name reflecting both medieval French and regional influences.
Overall Hypothesis:
Gautier Marti is likely a male individual of French nationality, born in the 1990s, with a mixed northern and southern French heritage. The name suggests an upper-middle-class background and possibly a family with a strong sense of cultural or historical appreciation.
Note: ChatGPT 4o is accurate in its assessment of me, based on first name + last name only!
Modern Tools and AI in Onomastics
As demonstrated in the anecdotal “Gautier” case study, Large Language Models (LLMs) can infer a surprising amount of background information about a person based solely on their first name—and even more accurately when a last name is included. These models, trained on vast datasets, excel at identifying cultural, historical, and regional naming patterns, providing insights that were once the exclusive domain of sociological expertise.
Together with Khaled and Mathieu, we aimed to provide a more comprehensive evaluation of LLMs’ capabilities, extending beyond the relatively “easy” case of French nationals. Our focus was on assessing performance across datasets with different geographic emphases: an American-focused dataset (Florida Voters Registration), a global dataset (Wikipedia People, albeit with a strong American bias), and an Asia-focused dataset (the SFC register of Hong Kong finance professionals, with a strong emphasis on Chinese names).
The results are detailed in Khaled’s paper, Enriching Datasets with Demographics through Large Language Models: What’s in a Name?. In summary, we found that these models are highly accurate in predicting gender, ethnicity, and nationality but face greater challenges when estimating age or date of birth. Some countries lack clear trends in naming patterns over time. For instance, in countries where first names are tied to religion, names tend to be timeless and remain in consistent use across generations. In others, like China, unisex names present difficulties in determining gender. Additionally, some Asian countries have a very limited pool of surnames—for example, South Korea, where Kim accounts for 21.5% of the population, or Vietnam, where Nguyễn is used by 39% of the population, reflecting historical practices of adopting the family names of prominent rulers.
Some results of the empirical study:
Conclusion: Names as windows into society
This brief exploration of socio-onomastics, combined with modern tools like Large Language Models (LLMs), demonstrates how much information can be inferred from something as seemingly simple as a name. By leveraging contemporary AI tools, we can uncover patterns tied to cultural, historical, and demographic factors.
While the results are promising, this work also underscores the inherent challenges and limitations of using AI for such tasks. Difficulties in handling unisex names, the timeless nature of religious naming traditions, and the prevalence of shared surnames in certain cultures highlight the complexity of this field. These challenges serve as a reminder that names are deeply intertwined with societal and historical contexts, which cannot always be fully captured by even the most advanced models.
Nonetheless, the integration of LLMs into this area of research presents exciting opportunities. In contexts where breadth of analysis is prioritized over depth, LLMs provide an effective means to scale investigations across vast corpora of text. Their ability to process large datasets efficiently enables broad analyses that complement traditional approaches. This synergy could open new avenues for insights in fields such as historical research through prosopography and demographic enrichment.
Ultimately, this blog represents a small step in understanding the role of names in society. While the findings suggest that names continue to offer valuable perspectives on identity and culture, we hope this work inspires further exploration at the intersection of sociology and AI research. Interestingly, this direction was not our original focus; we initially stumbled upon this topic while seeking to enrich a dataset with additional features to solve a practical prediction task.