Adam Naj, PhD, genetic epidemiologist at the University of Pennsylvania, discussed the importance of collecting genetic data from diverse ancestries to investigate the risk of Alzheimer Disease.
The lack of inclusion of diverse populations in early genomic investigations has impeded the potential prevalent risk variants for late-onset Alzheimer disease (LOAD) among nonEuropean ethnicities. In contemporary studies, research has indicated that extensive collection of data from multiple ancestral backgrounds could substantially enhance the comprehension of genomic foundations of LOAD. This is supported by recent observations drawn from a multi-ancestry study employing datasets sourced from the Alzheimer's Disease Genetics Consortium (ADGC).1
Lead author Adam Naj, PhD, genetic epidemiologist and assistant professor of epidemiology at the University of Pennsylvania Perelman School of Medicine, presented these findings at the 2023 Alzheimer’s Association International Conference, July 16-20, in Amsterdam, the Netherlands, in a featured research session. The data collection included 38,774 non-Hispanic White, 7454 African American, 11,436 Hispanic, and 3277 East Asian genotypes from participants. In the analysis, a single-variant association was used to score-based logistic regression for population-based datasets and generalized linear mix-models were used for family studies with covariate adjustment for onset/exam age, sex, principal components for population substructure, and apolipoprotein (APOE) ε2/ε3/ε4 genotype.1
Naj recently sat down in an interview with NeurologyLive® to further discuss the significance of incorporating genetic data from diverse ancestries in AD research. He also talked about how the study's findings challenge the conventional understanding of genetic risk factors for AD. Additionally, Naj spoke about the steps that are planned for further analysis following the identification of new genetic loci associated with AD risk.
I work with this consortium called the Alzheimer's Disease Genetics Consortium which is based predominantly at the University of Pennsylvania. We have more than 30 collaborating sites across the country and 5 core analysis groups across the US as well. We have a huge collection of case control, prospective cohort, and family-based datasets of individuals with AD and similarly aged people without AD, for comparison. We’ve collected a huge amount of genetic data on these individuals using genome wide or high-density genotyping technologies that then produce exposure data that we use in our analysis for genome wide association.We looked across a set of about 300 million markers across the human genome, in a sample of about 56,000 individuals of either nonHispanic White ancestry, African American ancestry, Hispanic ancestry or East Asian ancestry. We looked to see whether or not there were any genetic association signals that we hadn’t seen before, that may have been identifiable by including such a diverse data set in our sample. One of the bigger challenges has been to get more diversity in genetic data. People may not realize, with the benefits of diversity in genetic studies, is that very genetic variation that may be rare or are not even present in individuals, have certain ancestries that may be present. It may unveil mechanisms that affect the risk of disease across groups and the idea that the disease mechanisms are the same mechanisms across individuals, regardless of their ancestral or genetic background. That certain mechanisms might have more predominant effects in one group versus another because of different frequencies of variation.
While most studies to date have been done in individuals of European or nonHispanic White ancestry, because of the availability of data in those in that group, the reality is that there has been an effort to collect data from a variety of different backgrounds. We have a modest sample of datasets with African American ancestry, Hispanic ancestry, and East Asian ancestry. When we ran an analysis using data combining our nonHispanic White ancestry data with these others, we were able to identify 2 genetic loci that hadn't been observed in studies that were much larger than ours since it had many more samples. But because of the diversity, we could see these signals in our data.
We are also able to identify a few new genetic loci that were unique to individual ancestry groups, but absent in others. This was a big result that showed us that even with a little bit of diversity introduced into our data, we were able to dramatically boost our ability to find genetic risk for AD. One of the most compelling aspects of it is that one of the associations we observed had been observed in a larger European ancestry dataset before. The first time it was seen was in a dataset that included 800,000 people. We observed it in this data with only 56,000 people, we were able to see it with 7.7% of the sample size required in only people of European ancestry. Our mixed ancestry data set was able to identify the signal that shows that genetic diversity in your data sets can really boost your ability to see novel genetic risk factors.
The interesting aspects of this analysis are that when we look at the signals that we observed, they vary in terms of the specific variations in the different groups that are driving these signals that we see across groups. One of the things we're going to do to follow up on is what we call fine mapping, teasing apart the sources of the signals in those genetic regions. Also, it means we are looking at potential functional effects of those variants, since there's not a lot of information about what effects variation may have on downstream components of the disease process. Typically, we think that genetic risk factor will typically perturb something like the gene expression, and the downstream that'll have effects on the disease or disease phenotype itself, like the disease outcomes. Some people will have expression levels that don't increase their risk, and others will have expression levels those that do.
We're going to start looking at what these variants may do in terms of changing expression patterns, whether they have any effects on the way the proteins produced and generate function. These are going to be a whole bunch of secondary analyses that build on this first data. But we're going to continue to try to expand our sample sizes in each of these nonEuropean ancestry groups to try to maximize our ability to detect signals that may only be observable in nonEuropean ancestry samples.
What's nice is that, while talking about this as a new direction, there has been a big push by the National Institutes of Health and the National Institute on Aging, in particular, in the last couple of years to increase the diversity of samples. What we've seen in this study is more a reflection of the utility of that approach. It's a proof of concept with the idea that a little bit of diversity goes a long way in terms of enabling us to see genetic risk factors. The hope is that this will continue to push people to look into collecting more diverse datasets. The challenges that come with that are building on an infrastructure that has typically gone for more easily collected data that we can collect in our vicinity. US institutions collect data on individuals who are around them, and we do still have a population that has is predominantly European ancestry. In order to get more diverse samples, we have to build bridges and connections to communities that may have not been involved in research before or historically like the collaborations that have existed between the institutions and those groups in a productive way have even been hurt sometimes by those interactions.
Part of it is reaching out to new groups and making them not just stakeholders in the process but making them sort of partners in developing the research resources to study individuals of various ancestries. I think one of the things that we're very cognizant of is that in the past, those interactions were not always productive, but in the sense that some groups benefit and other others did not. This is something that individuals of all ancestry groups, of all backgrounds, and all walks of life, have potentially are at risk for. As a result, part of it is making sure that everybody can benefit from our research and use when they're involved. That's a big challenge but it's something that that we are headed in the direction of going, and also where the direction that a lot of the science is going now.
Transcript edited for clarity. Click here for more coverage of AAIC 2023.