Originally published in Spanish: “De los apellidos de los catalanes”. Maia Güell and Sevi Rodríguez-Mora. Nada es gratis.
22nd Sept 2015
The existence of an ethnic dimension to the Catalan conflict is often discussed, generally using the phonetics of surnames as a indicator of ethnicity. Given the bitter political climate, it seems useful to carefully analyse the data to discern how much truth there is to this. To this end, we think it would be convenient to refer to a recent research paper in which we propose a methodology for the use of surnames to measure the evolution of intergenerational social mobility (i.e. the probability that the children of the poor will end up rich, and vice-versa) taking into account ethnic and socio-economic factors. All this with Catalan data.
Measuring the evolution of intergenerational social mobility is difficult, as there are rarely data connecting the members of several generations for a statistically representative set of families. Our work overcomes this obstacle by using surnames as a marker for each family’s history.
Our method measures social mobility –an intrinsically dynamic reality– by looking only at a still photograph of society. We can do this because dynamic effects leave recognisable traces in the photo, just like a picture will look blurrier the faster a car moves.
Our work rests on the notion that the larger the incidence of inherited things on an individual’s well-being (i.e. the smaller social mobility is), the greater the importance of the things a family’s members have in common (their inheritance), as opposed to life’s vicissitudes and the unexpected. Thus, the smaller social mobility is, the greater the similarities in income and educational attainment are among family members, as compared to any set of individuals selected at random. If we knew the degree of kinship between members of a population, it would be easy to determine mobility levels. We do not, in fact, have information on their family relationships, but we do know their surnames.
It’s not that having a specific surname makes you rich or poor, but your surname is inherited together with other family traits that are indeed very important to children’s future well-being, such as parents’ genes, wealth, educational attainment, ethnicity or assets. In our paper, we show that the distribution of surnames in a given population provides information on degrees of kinship, establishing a direct relationship between the informational content of surnames (how much more similar the education and/or income of people who share a surname are than those of people chosen at random) and the degree of intergenerational mobility: the smaller the latter, the more information a surname provides regarding the profile of the person bearing it.
Some surnames are very common, and the people who share them are thus unlikely to be related; we can’t obtain any information from these. However, there are a great many very rare surnames. Two people sharing one such surname have a very high likelihood of being relatives. The greater the incidence of hereditary variables (lower social mobility), the more similar the educational attainment of individuals sharing an uncommon surname will be, compared to the degree of similarity between the educational attainment of two people selected at random: hence the amount of information provided by the surname will be large.
We define the informational content of surnames (ICS) as  the R^2 of a regression of the educational attainment of each individual on a dummy for each surname. It’s a measure of how much more we know about a person by simply knowing their surname.
We then develop a model in which the endogenous variable of interest is the joint distribution of income and surnames and in which there are two exogenous variables: (1) a process whereby outcomes (income, education or whatever) are generated, partly determined by the importance of inheritance. And (2) a process  whereby surnames are generated and destroyed (destruction happens when the last male bearing the surname dies without male descendants; creation takes place through random mutations).
The main methodological result of the paper is to show that the greater the importance of inheritance in the generation of income, the greater the ICS we’ll be able to observe in a given economy. Hence examining the evolution in time of the ICS in a given society tells us whether the importance of hereditary things has decreased or increased throughout the years.
Your surname contains information not only on your family’s educational attainment, but also on your ethnicity. If, for whatever reason, ethnicity were to have a direct effect on people’s educational attainment (regardless of their parents’ education), we would want to separate the two effects. Fortunately, the use of surnames makes this easy even if we lack direct information on people’s ethnicity.
We define the “Catalan-ness” of a surname as the percentage of Spaniards bearing that surname that live in Catalonia.
A surname as common in Catalonia as in the rest of Spain has a value of 0.16 (the percentage of Spain’s population living in Catalonia). A surname which is relatively rare in Catalonia has a smaller value, and a surname more common in Catalonia (relative to the rest of Spain) has a greater value. If every Spaniard with a given surname lives in Catalonia, that surname’s value is 1.
The census tells us a person’s province of birth and level of Catalan. In our paper, we show that a surname’s Catalan-ness is an excellent predictor of both variables, which suggests it’s a good proxy for ethnicity.
For example, panel (a) in the following graph shows how a surname’s Catalan-ness correlates with the probability of responding that one speaks Catalan well for the cohort born in the Barcelona region between 1945 and 1950; and (b) depicts the likelihood of having been born in Catalonia for the cohort born between 1935 and 1940.
Ethnicity, like income, is partially hereditary (indeed the degree to which it’s inherited depends on the level of assortative mating, more on this later), thus affecting both inheritance and social mobility.
Our aim is to first measure the global evolution of social mobility in Catalonia (regardless of how much is due to parents’ educational attainment and how much to their ethnicity) and then separately measure the evolution of the effects of paternal education and ethnicity.
Evolution of social mobility and its components in Catalonia
We first measure the global evolution of ICS, with no distinction between family and ethnicity effects. Indeed this is what social mobility really is, since both ethnicity and family position are inheritable. We find that Catalonia has seen in an increase in the amount of information provided by surnames in the last few generations, which is highly suggestive of a drop in intergenerational mobility over time.
The following graph shows how much information about a person’s educational attainment can be gleaned from their first surname (ICS) for a “moving average” of every cohort in the year 2000. The left-most point is the ICS of those aged 75-100, the next of those aged 70-95, and so on.
Knowing a person’s surname provides more (much more) information about a young person’s educational attainment than about an old person’s (relative to their respective cohorts). This is because inheritable things are more important in determining the income of the young than of the old. Social mobility has dropped dramatically in Catalonia.
However, we still don’t know whether this is because your father’s education has become more important or because the language you spoke at home has — what is clear is that your ethno-linguistic group matters. Catalan-speaking families are on average substantially better educated. A rise by one standard deviation in Catalan degree is associated with six more months of education on average before controlling for family income, and four more months after this control.
We can then separately measure the evolution of the ethnic component (Catalan family origin, measured through a surname’s Catalan degree) and the family component (two families with the same ethnicity but different levels of income of education).
We can see that the incidence of both has increased in time. In other words, it’s not just that the type of family you come from has become more important (regardless of ethnicity), it’s also that the value of having a “more Catalan” ethnicity is greater today than it was one or two generations ago.
Panel (a) plots the ICS value of somebody’s first surname conditional on the Catalan degree of their second for all cohorts. It shows that your family’s education has grown more and more important in determining your own, regardless of what language you speak at home.
For instance, we can compare, on one hand, two young people with the same Catalan degree and, on the other, two older people (again with the same Catalan degree). The relative position of the two young people will be much more affected by their parents’ education than that of the two older individuals.
Panel (b) shows the value of the parameter determining the incidence of Catalan degree conditional on family education. Having a more Catalan surname “gives” a child more years of education than it “gave” an older person. Let’s consider two families with the same level of education, one “more Catalan” than the other. If they are young, the difference between the levels of education of each family’s children will be larger than if they are old.
Up until now we haven’t established why mobility has decreased–just that it has–, and this is generally quite difficult to establish. Yet the Spanish surname convention (the wonderful idea of keeping both dad’s and mom’s for life) allows us to observe an event capable of explaining everything we’ve described.
We must first note that one of the things that can reduce intergenerational mobility is an increase in assortative mating. For this to be the case, all we need is for a child to be influenced by both mom and dad. If mating is very assortative, the top couples have both mom and dad influencing their kids very positively, and the bottom couples much less so. If mating is less assortative, the two couples will be more similar. Additionally, from the point of view of ethnicity, a low level of assortative mating quickly dilutes ethnic origin. It’s only with assortative mating that an identifiable ethnic variable can be maintained, and with it its effects on income or education.
What we can do is analyse how similar the father (first surname) and mother (second surname) of each person were, both in terms of education and in terms of the ethnicity reflected in each surname. We find that throughout the 20th century, individuals in Catalonia have married more and more assortatively, both in educational terms (university graduates marrying their peers more) and in terms of regional origin (people of Catalan origin marrying each other more). Hence the observed decrease in intergenerational mobility is quite likely due to an increase in assortative mating: there’s more to inherit.
In the graph, the red line shows the correlation between the Catalan degrees of the first and second surname for people born in each cohort, and the blue line shows the correlation between the two surnames’ average educational attainment. Note that this information refers to the degree of assortative mating in these people’s parents. In other words, this increase in assortative mating took place a generation earlier than the observed decrease in mobility–and indeed caused it.
Political manifestations of socio-cultural asymmetries
Ok, now we can go back to the ethnic component, which is the one everyone talks about in hushed voices. In view of what we’ve seen, there seems to be conclusive evidence that Catalonia exhibits a high degree of ethno-linguistic diversity which translates into statistical differences in education (and probably income) that have increased over time. It is highly likely that this is at least partly due to a steady rise in assortative mating, both in socio-economic terms (the rich mating with the rich) and in ethnic ones (the “more” Catalan with the more Catalan). At least as far as our data goes.
In fact, if something defines the morphology of Catalan society, it’s precisely this diversity.
Given the current situation, and although our published research doesn’t delve directly into this question, it seems pertinent to ask whether this cultural and economic asymmetry has a political manifestation in Catalonia.
Obviously, this is an unpleasant topic that no one really wants to talk about… And yet a brief glance at both the distribution of surnames among leaders of various political groups and the geography of voting patterns strongly suggests that this may be the case. We cannot go into this in detail (we’d have to examine the differential effects of ethnicity and education on electoral behaviour), but we can take a look at a surprising aspect of Catalan politics that is probably not independent of the political situation.
Ten years ago, in an extraordinary book  that remains essential to an understanding of the apparently surreal Catalan politics, Cambridge political scientist Thomas Jeffrey Miley compared the socio-economic and cultural parameters of Catalan elites to those of the general population, only to find a gaping chasm. The elites highly resemble what he calls the Catalan-speaking “ethno-linguistic community”, and is nothing like the Spanish-speaking one.
Using our methodology, we can verify that this hasn’t changed. The average Catalan degree of all Catalans’ surnames is 0.37. Compared to the average for the members of the regional government (0.59), high-ranking regional officials (0.59), members of the CATN  or all members of the regional parliament (0.55), it seems obvious that, statistically speaking, the elites are much more “Catalan” than the population of Catalonia at large. They’re not even remotely representative. This is Miley’s result all over again: politically speaking, only one cultural group exists in Catalonia.
So, the evidence shows that Catalonia combines (i) a deep and growing socio-economic rift associated with ethno-linguistic diversity and (ii) political structures that, for whatever reason, massively over-represent the dominant social group.
Food for thought.
 Maia Güell, José V. Rodríguez Mora y Christopher I. Telmer “The Informational Content of Surnames, the Evolution of Intergenerational Mobility, and Assortative Mating” Review of Economic Studies (2015) 82 (2): 693-735, 2014doi:10.1093/restud/rdu041
 It’s actually slightly more complicated than this, but this captures the essence of the concept. Please see the paper for details.
 The model is equivalent to a genetic inheritance model without natural selection (of non-coding DNA).
 Our universe is the male population aged over 25 born in Spain and living in Catalonia in 2001.
 We use the first surname as a proxy for family education and the second as a proxy for Catalan-ness, but the results hold when we do the opposite or use other combinations.
 “Nacionalismo y política lingüística: El caso de Cataluña”. Madrid, Centro de Estudios Políticos y Constitucionales, 2006
 Consell Assessor per a la “Transició” Nacional, or Advisory Council for the National “Transition”