Music streams
Our research was conducted on Deezer, a globally available music streaming platform present in 187 countries, containing a catalogue of over 90 million tracks. Users’ listening behaviour is captured with comprehensive logs, including the date and time of song playback, listening duration, the listener’s self-reported age and gender (when provided), preferred language, type of streaming device and network connection, geographical localisation at the city-level derived from a third-party service, and whether the stream is organic (i.e., user explicitly making the choice of music) or algorithmically recommended (Algorithmic streams in Method). We gathered all songs that were streamed in France, Brazil, and Germany between 6 March and 2 April 2023, a four-week period ensuring a balanced representation of weekdays and avoiding holiday seasons. To reduce noise and potential biases in our data, we excluded streams that were played for less than 30 seconds and those over mobile networks, which are unreliable for geolocation tracking. The data was handed to the researchers anonymised and the analysis was not used to derive commercial profiling of any kind. In accordance with the Deezer user agreement, users have consented to their data being used for the purpose of advancing scientific research. The study was approved by the Ethics Council of the Max Planck Society (Application No: 2025_8).
User sampling
From the sampled streams, we excluded users who streamed less than 100 times within the study window and those who frequently travelled (more than 10 unique geo locations identified). This criterion was set by assessing the general frequency distribution of unique locations per user (Supplementary Fig. 1). The results were also replicated with more loose and strict thresholds (Supplementary Fig. 2). Each user’s home location was approximated from the most frequently streamed city-level locations. Using data from GISCO (version year = 2020), French and German users were mapped to Local Administrative Unit (LAU) areas (number of LAU units: France = 34,968, Germany = 11,007). LAUs are building blocks of the NUTS (Nomenclature of territorial units for statistics) and statistical areas, and comprise the municipalities and communes of the European Statistical System (ESS). Brazilian users were mapped to municipality-level areas following the data (version year = 2021) published by the Instituto Brasileiro de Geografia e Estatística (IBGE). This mapping procedure excluded 5.4% of the users who could not be mapped to any of the areas within these geographical boundaries. Finally, we grouped the remaining users in France and Germany by the NUTS3 unit areas (one level higher than LAU) to reduce noise when measuring BID and WID, while keeping the municipality level for Brazil as grouping by the state was too broad. Areas with less than 200 unique users were excluded from our analysis to reduce noise and to ensure anonymity when shared as aggregated-level data. This resulted in 96 and 113 NUTS3 unit areas remaining for France and Germany, respectively, and 218 municipalities of Brazil.
User demographic
In the final set, 2,544,549 users remained (France = 1,506,899 (47.1% Female), Brazil = 816,101 (33.6% Female), Germany = 221,547 (46.6% Female). A detailed demographic comparison across the countries by their self-reported age and gender and the number of monthly streaming activities can be found in Supplementary Fig. 9. Population census data showed that the majority of the population in these countries live in urban areas (France = 81.5%; Brazil = 87.6%; Germany = 77.6%), which were all above the world average (M = 62.0%, SD = 23.4%), and thus represent characteristics of highly urban countries. To assess how representative our user sample is of the general population, we compared the user demographics to the Eurostat census data on population size, median age, and gender (Socio-demographics in Methods). We tested this for France, given the highest sample granularity and importance in our analysis involving causal testing. When compared across the NUTS3 area population of France, strong correlations were observed between the number of users and census population size (r = 0.90, P < 0.001), and their median age (r = 0.61, P < 0.001). However, when stratified by age groups, our user sample consisted of 25% more young (between ages 15–30), 3% more mid-age (31–50), and 30% less elderly (51–80) populations. Our sample demonstrated a slightly skewed tendency towards having more male demographics (52% male) compared to France’s general population (48% male). The gender ratio per area did not significantly correlate (r = 0.13, P = 0.179), suggesting platform-wise biases in gender.
Embedding space
Song embedding
Many recommendation systems leverage embeddings to efficiently encode the latent relationships between users and content. Embedding is a multidimensional space where objects, such as songs in this context, are represented as vectors in a way that captures the relationships between these objects based on user behaviour, namely, co-occurrences of songs across playlists and listening patterns. In this space, closely related songs are positioned near each other. Tracks that share thematic or stylistic similarities tend to also have high co-occurrences. For instance, Adele’s ‘Someone Like You’ and Sam Smith’s ‘Stay With Me’ would be located close to each other within this space. To provide a tractable low dimensional space, Deezer employs the singular value decomposition (SVD)79 technique based on the co-occurrence of millions of songs. This matrix is then approximated through a factorisation technique to yield a 128-dimensional embedding space80, capturing the nuanced relationships between songs based on user interactions and thematic links.
User embedding
Each user’s listening behaviour, represented in the song embedding (Song embedding in Methods), can be summarised to identify a central position in the space, defined as the average of all songs they have listened to in the past 28 days. This results in a 128-dimensional user vector, which can then be used to construct a user embedding space—similar to how relations between songs were converted into song embeddings. In the user embedding space, users with similar music preferences (often corresponding to fans of a particular genre) are positioned nearer to each other, while those with distinct tastes are placed far apart.
Measuring diversity
Between-individual diversity (BID)
To quantify the diversity found in a given population, existing research has commonly applied measures like the Gini coefficient81, Simpson’s index82, or Shannon’s entropy83. However, these measures have been criticised for their arbitrary scales, making comparisons between results challenging. As a solution to this, Hill’s number (also known as the effective number of species) has become an increasingly popular method to quantify the diversity of a species assemblage. It allows for standardised comparisons by encompassing various diversity metrics through varying the order of a single parameter q46,84. Originating from ecology, Hill’s number treats the abundance of species in an ecosystem (or field site). In our use case, we treat each song as a species and the abundance of songs streamed in a geographical area as their ecosystem (i.e., NUTS3 areas of France and Germany, municipalities of Brazil). It essentially quantifies how diverse the music consumption is in a given area, indicating whether individuals listen to a wider range of songs or concentrate on a few popular ones. From the 1000 music streams we sample in each bootstrap (Statistical analysis in Methods), a Hill’s number of 900, for instance, implies that the diversity is equivalent to having 900 songs that are equally represented in the dataset—that is, how many equally common songs would produce the observed diversity.
We calculate between-individual diversity (BID) of music engagement as (qD), expressed as:
$$^D=^$$
(1)
Here, \(q\) defines the order of Hill’s number, where higher values of \(q\) emphasize the contribution of rare songs, while lower values of \(q\) focus on the abundance of popular songs. The \(S\) represents the total number of unique songs, and \(p\) signifies the relative abundance of each song.
The Hill’s number of order \(q\) = 1 is then defined as the limit of the expression in Eq. (1) as \(q\) approaches 1:
$$^D=\frac{_{i=1}^{R}{p}_{i}^{\,{p}_{i}}}=\exp \left(-{\sum }_{i=1}^{R}{p}_{i}\,{{\mathrm{ln}}}({p}_{i})\right)$$
(2)
which essentially becomes the exponential of the Shannon entropy in natural logarithms. In our analysis, we set the order of q to be 1 a priori, but results were also robust to other values of order \(q\) (Supplementary Table 3).
Given that algorithmically recommended content can bias the calculation of BID, we excluded all algorithmic streams and used only the organic streams for the main analysis of Fig. 2 (Algorithmic streams in Methods). However, additional analyses that do not exclude algorithmic streams, as well as analyses considering them separately, can be found in Supplementary Table 2. For causal testing (Fig. 3), we did not exclude the algorithmic streams as we include this as a potential confounder (Algorithmic streams in Methods).
Within-individual diversity (WID)
To assess an individual’s diversity of musical engagement, we employed the Generalist-Specialist Score (GS-Score), a previously validated metric in user music exploration and discovery studies45,49. The GS-Score computation relies on the high-quality song embeddings of Deezer that summarise relationships between songs as high-dimensional vector representations (Song embedding in Methods). First, a user’s (\(\mu\)) centroid position in the song embedding is defined by computing the mean vector (\({\mu }^{ \rightharpoonup }\)) of all the songs the user has listened to within the last 28 days. Next, the cosine similarity is calculated between \({\mu }^{ \rightharpoonup }\) and a randomly selected song (\({s}^{ \rightharpoonup }\)) the user listened to, weighted by the number of times they have listened (\({w}_{s}\)). This approach ensures the measure is not sensitive to the number of songs that the user has listened to. Moreover, WID is computed based solely on the user’s explicitly chosen content, and thus does not include algorithmically recommended streams. The resulting GS-Score ultimately captures the user’s radius of coverage in the song embedding, formally written as:
$${GS}\left(\mu \right)=\frac{1}{{\sum }_{s}{w}_{s}}{\sum}_{s}\frac{{w}_{s}\cdot {s}^{ \rightharpoonup }\cdot {\mu }^{ \rightharpoonup }}{{||}{s}^{ \rightharpoonup }{||}\cdot {||}{\mu }^{ \rightharpoonup }{||}}$$
(3)
If a user is a ‘specialist’, they would have a smaller radius, indicating a more focused interest, whereas a ‘generalist’ would exhibit a wider radius, indicating a broader range of music engagement. As such, unlike BID, WID does not rely on categorical grouping but instead captures the distances within the vector space. To make this score consistent with the direction of our BID measure, we inverted the score (\(1-{GS}(\mu )\)) and normalised the value to range between 0 and 100, where 100 represents maximal WID.
Dispersion in user embedding
The frequency-based measure of BID (Between-individual diversity (BID) in Methods) potentially overlooks the relationships between the songs. An alternative approach that can provide a higher-level characterisation is to measure how misaligned users are in their music preferences by utilising distances within the user vector space (User embedding in Methods). As described previously, users with similar musical preferences, or taste, will have a shorter distances within the space, while users with distinct tastes will be far apart. At the geographical area level, we can compute the pairwise distances across all users and measure the dispersion, or the radius, as an indicator of the diversity of music preferences of the given area. Formally, this user dispersion is computed by taking the population variance in the pairwise distances of users from a given area, written as:
$${{\rm{population}}} {{\rm{variance}}}=\frac{1}{N}{\sum }_{i=1}^{N}{\left({x}_{i}-\bar{x}\right)}^{2}$$
(4)
Where \({x}_{i}\) is the cosine similarities of bootstrapped pairs of user vectors, and \(\bar{x}\) is the mean cosine similarity across all users. This approach of leveraging vector space and measuring the dispersion is analogous to the method used for measuring WID using the GS-Score computation (Within-individual diversity (WID) in Methods).
Socio-demographics
Our DAG model includes seven socio-demographic confounders. We outline the data sources for each and discuss the rationale behind their inclusion, drawing on existing literature.
Age and gender
Studies have shown how one’s musical exploration45 and preferences51 are influenced by age, demonstrating that one’s music taste generally consolidates during adolescence. Research has also shown significant differences between the consumption patterns of males and females, noting that male users on online platforms tend to consume more diverse and niche content44,85. We thus include the user’s self-reported age and gender as potential confounders influencing music consumption diversity. Among the users who provided information (89.7%), age was computed based on the self-reported birth date. When registering, a user could specify their gender from the following options: ‘Male’, ‘Female’, ‘Unknown’, ‘Non-Binary’, ‘Other’, or left blank. For simplicity in our DAG analyses, we only included users with ages above 18 and below 65, and who self-identified as Male or Female (13% excluded; see Supplementary Fig. 9 for demographic distributions). This was to reduce noise as there were only a small number of individuals per area outside of this criterion.
Immigration, education, and income
Immigration or contact between populations has been shown to act as a vibrant channel for cross-cultural exchange, introducing fresh perspectives to a culture, and fostering innovation and complexity10,68. In parallel, one’s educational attainment and economic status have been shown to be tightly linked with their cultural inclination. Rooted in the cultural omnivore theory widely debated in the social sciences32, numerous studies have observed that the societal elites seek a broader spectrum of cultural experiences53,54,55. Considering this past literature, we include dimensions that can indicate social class and immigrant status. The data were drawn from Cagé and Piketty (2023)86, who aggregated and made openly available electoral and socio-economic data from the municipalities in France, with longitudinal records dating back as early as the 18th century. These data are collected from the electoral reports digitised in national archives such as the L’Institut National de la Statistique et des études économiques (INSEE)—the National Institute for Statistics and Economics Studies of France. Among various socio-demographic indicators they collect, we focused on immigration, education and income at the level of communes that divide France into over 35,000 area units, which corresponds to the granularity of postcode resolution. We used these area averages as proxies for each user’s attributes (see Supplementary Fig. 11 for geographical map visualisations). Specifically, we used the most recent data from 2022 on: (1) percentage of immigrants (pimmigre2022) from the ‘naticommunes’ dataset, (2) percentage of residents with bachelor’s degrees (pbac2022) from the ‘diplomescommunes’ dataset, (3) and average per capita income (revmoy2022) from the ‘revcommunes’ dataset (for an accurate description of the columns and source of the data, see their appendix material86). Assessing by the Q-Q plot, we observed that income and immigration percentages across the municipalities were highly skewed in their distribution, thus log transformation was applied (Supplementary Table 5).
Musical venues
Cultural activities concentrate in places like cities and towns87. Residents of metropolitan and large cities have easier access to diverse cultural offerings compared to their rural counterparts, which may subsequently influence their cultural engagement. To approximate the amount of access available to cultural events, we gathered information about the musical venues at the NUTS3 level using SongKick’s database, a popular global concert discovery service. Using their API, we initially queried 10,000 venues in France in August 2023. After excluding venues without geolocation information, 6618 venues remained (see Supplementary Fig. 12 for geographical map visualisation).
Algorithmic streams
Recent studies on the effect of algorithmic recommendations have shown a direct link to users’ diversity in music consumption, suggesting that individuals who tend to have more diverse tastes rely less on algorithmic recommendations and engage in more organic exploration49. To test for possible differences in algorithmic recommendations usage across population areas, we sub-sampled 10,000 individuals at random per area in our dataset and across the three countries. We then computed the proportion of their algorithm-driven music streams over all streams, sub-grouped by age and gender. Algorithmic streams include listens such as auto-played next song after listening to a song or an album, and personalised recommendations. Organic streams, on the other hand, are users’ explicitly chosen content such as searches and songs already included in their music library. In France and Brazil, there was a moderate to strong negative relationship between algorithmic stream proportion and population size across all age groups (see Supplementary Fig. 3 for statistics). This suggests that individuals living in large metropolitan areas tend to rely less on the use of algorithmic recommendations; however, the effect was small. There was also an effect of gender across all three countries, where male users tend to use algorithms more than female users. Given such apparent differences in algorithm usage by area size, we include the proportion of algorithmic streams as a confounder in causal testing. Although this algorithmic bias might affect our BID measure (Supplementary Table 2), it does not directly impact the WID since it is anyhow calculated solely based on users’ explicitly chosen content (i.e. organic streams). Nonetheless, an indirect pathway that could potentially influence the WID may exist, and we represent this in the DAGs (Fig. 3).
Social connections
Engaging in international connections can significantly influence one’s exposure to diverse content by providing access to cultural content beyond their own cultural sphere88,89. Increased international connections may also suggest extensive travel experience or a background of living abroad. Recent research has also found that individuals tend to broaden their preferences and interests towards the cultural influences of the places they visit90. We used the publicly available dataset released by Meta (reference period: 13th of October, 2021) to approximate the number of international Facebook friends one has at the level of NUTS3 units. The Social Connectedness Index (SCI), first introduced by Bailey et al.91 uses an anonymised snapshot of all active Facebook users and their friendship networks to measure the intensity of social connectedness between locations. Users are assigned to locations based on their information and activity on Facebook, including the stated city on their Facebook profile, and device and connection information. Formally, the SCI between two locations i and j is defined as:
$${{\mbox{SCI}}}_{i,j}=\frac{{{\mbox{Connections}}}_{i,j}}{{\mu }_{i}\times {\mu }_{j}}$$
(5)
Here, \({\mu }_{i}\) and \({\mu }_{j}\) represent the number of Facebook users in locations \(i\) and \(j\), and \({{\mbox{Connections}}}_{i,j}\) is the total number of Facebook friendship connections between individuals in the two locations. This metric effectively captures the relative probability of a Facebook friendship link between locations. To quantify the amount of international social connections, we summed the SCI paired with all other areas around the world that are not from the same country (i.e., France). We then normalised the scale and applied a log transformation to account for the skewness (Supplementary Table 5). To validate how Facebook user demography compares with the Deezer users in France, we collected data from a private company (https://napoleoncat.com) that gathers country-level Facebook user demographics, for the same month as our sampling window in March 2023. Demographics were similar for both females and males across the age groups (same Spearman correlations for both genders = 0.86 [0.29, 0.98], P = 0.08), with adolescents taking up the largest proportion in both platforms (see Supplementary Table 5 for raw values).
Causal inference
The DAG was illustrated and evaluated using the ‘dagitty’ R package92. A step-by-step procedure for causal testing that we applied is described in full detail in Supplementary Note 1 but summarised here. We first checked the consistency of our data with the DAG models and the robustness of various versions of the candidate models60,93. Our ultimate model successfully passed several implied independence tests, which evaluate whether certain variables in the model are truly independent of others when considering the values of a different set of variables. This testing is crucial as it validates the model’s representation of relationships and lends support to our DAG hypotheses. All variables were normalised with Z-scores for effect size comparisons. Variables that did not follow a normal distribution were log-transformed (Supplementary Table 5).
To control for confounders, we used propensity scores to adjust for group differences in users living in different size areas using the ‘WeightIt’ R package94. The propensity score condenses all observed covariates into a single metric95. Acting as a balancing measure, it aims to equalise the distribution of confounders between individuals across the groups. Each individual is assigned weights using inverse probability weighting (IPW)96, which determines how much they ‘contribute’ to the group. Consequently, it enables the simulation of a quasi-randomised scenario to facilitate causal inference testing58,93. To obtain estimates of the causal effect, a weighted generalised linear model (GLM) was fitted to model the outcome of interest. To quantify the uncertainty associated with this estimate, we conducted bootstrap simulations on the entire sampling and weighting procedure (Statistical analysis in Methods).
Statistical analysis
All hypothesis tests were conducted using a bootstrap with 1000 replications to derive the mean, with the exception of when computing BID and WID. For BID, 1000 unique streams were drawn from each area, while for WID, 100 unique individuals were drawn for each bootstrap. We sample a fixed amount of streams (BID) and a fixed amount of users (WID), so that larger areas are not a-priori overrepresented. Confidence estimates were derived from the 2.5% and 97.5% quantiles of the bootstrap means. Pearson and Spearman correlation coefficients were adjusted for multiple comparisons using the Holm method97. One-way ANOVA was used for comparison across groups and post-hoc Tukey’s test p-values were also adjusted for multiple comparisons. Cohen’s d was used for effect size estimates98. Analysis was conducted using R (version = 4.3.3). Unless explicitly mentioned, all stats were computed using the ‘stats’ package in base R and custom scripts (Code availability).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
