Studying Human Diversity
Studying human diversity raises important ethical and methodological questions, and we will only mention some of them. Observations and inquiries in the area of human diversity are probably as old as humanity and have been and are still linked to both blatant and subtle forms of racism.
A new study of DNA recovered from an ancient Philistine site in the Israeli city of Ashkelon confirms what we know from the Bible – that the origin of the Philistines is in southern Europe. https://t.co/bwmcPax9Jp
— Benjamin Netanyahu (@netanyahu) July 7, 2019
For curious
Read more about Debate on ‘Race’ and Genomics
Should we study human genetic diversity?
Should we study human genetic diversity at all, or is this an area of work where the potential for misuse outweighs the potential benefits to such an extent that it should not be pursued?

Such studies already have a long history, and a pragmatic answer to this question is that information on human genetic diversity is needed, and is therefore generated, for medical and forensic applications, and thus is already available whatever evolutionary geneticists decide to do, so we must be ready to consider its implications and consequences. Furthermore, genetic information, in fact, refutes any scientific basis for racism as the existence of discrete human groups. Most human genetic variation is found within any individual population, except for a few loci affected by natural selection. It can therefore be used to argue that racism-the belief that discrimination between apparent groups is justifiable-is an entirely social construct. In addition, genetic information is of enormous intrinsic interest to many people, not just scientists.
"Race" cannot be biologically defined due to genetic variation among human individuals and populations.
A) The old concept of the "five races:" African, Asian, European, Native American, and Oceanian. According to this view, variation between the races is large, and thus, the each race is a separate category. Additionally, individual races are thought to have a relatively uniform genetic identity.
B) Actual genetic variation in humans. Human populations do roughly cluster into geographical regions. However, variation between different regions is small, thus blurring the lines between populations. Furthermore, variation within a single region is large, and there is no uniform identity
(See source blog for more information).
Benefits from genetic diversity studies
-
Increased understanding of genetic history and relationships
-
Medical advances such as the identification of genes predisposing to disease
-
Accurate paternity testing, victim and assailant identification, and other forensic applications
-
Immediate benefits to the population, such as medical advice or treatment
(However the people who receive most of the long-term benefits may not be the donors)
Outstanding issues that have not been fully resolved
-
Is informed consent from members of cultures that do not ascribe to Western scientific values truly "informed"? Indeed, can even leading geneticists such as Jim Watson and Craig Venter, who have volunteered to have their whole genomes sequenced and made public, appreciate the full implications when these may only become apparent in the future as research reveals the medical implications of DNA variants?
-
How much information about the donor should accompany a cell line or DNA sample, so that the privacy of the donor is not infringed?
-
Can samples collected with no written consent many years ago, or perhaps decades ago, still be used?
-
Can samples collected for one study be used in another?
-
Can an individual give broad consent for all future studies, which may involve techniques that do not yet exist and have implications that are not currently understood?
It is difficult to give general answers to many of the ethical questions that diversity studies raise; indeed, the possibility of ever more comprehensive genetic studies is one of the driving forces in the field of medical ethics. Answers may emerge more satisfactorily through the consideration of individual cases than through prior reasoning based on principles.
For curious: Ethics of DNA research on human remains
- Wagner, Jennifer K., et al. "Fostering responsible research on ancient DNA." The American Journal of Human Genetics 107.2 (2020): 183-195.
- Alpaslan-Roodenberg, Songül, et al. "Ethics of DNA research on human remains: Five globally applicable guidelines." Nature 599.7883 (2021): 41-46.
- Ancient-DNA researchers write their own rules
Who should be studied?
Sampling always creates problems: is the sample appropriate and representative? If not, conclusions drawn from the sample may not be applicable to the rest of the population that was sampled. Analyzing everyone would avoid the complications introduced by sampling, and some have argued that it is fairer, but at present it is impractical for DNA studies and even for DNA-free genetics based, for example, on phenotypic traits. This is likely to remain true for the foreseeable future, so the issues raised by sampling must be addressed. Although human genetic diversity has been investigated for a long time, the early studies using pre-DNA polymorphisms aroused little controversy or public interest. Attitudes changed with the launch of the Human Genome Diversity Project (HGDP) in 1991, and all subsequent large-scale projects, including the HapMap, Genographic, and 1000 Genomes Projects, have been influenced by this legacy.
Large-scale studies of human genetic variation and publicly available datasets
A few large-scale studies of human genetic variation have made major contributions to human evolutionary genetics. A number of different factors influence the utility of a given dataset or resource to different areas of study within human genetics. The primary utility of population-genetic datasets to medical genetics studies is providing information on the frequency of particular alleles in populations of broadly-defined ancestry and in constituting a resource for phasing and imputation, such that the sample size of the dataset is key. For population history studies, the diversity of the sampled population is very important, as different populations might provide different information into history. Here we will mention only a few of them, and we will focus on those that provide access to data from many different and diverse human populations as this is clearly key to efforts to understand human genetic diversity and population history.
| Project | Description |
|---|---|
| Human Genome Project (HGP) | The Initial sequencing program of the human genome |
| International HapMap Project | Study of the common pattern of human genetic variation using SNP array |
| 1000 Genomes Project (1KGP) | Determining the human genetic variation by means of whole-genome sequencing in population scale |
| Human Genome Diversity Project (HGDP) | Biological samples and genetic data collection from different population groups throughout the world |
| Simon Genome Diversity Project (SGDP) | Whole-genome sequencing project of diverse human populations |
| Genome Asia 100k (GA100K) | WGS-based genome study of people in South and East Asia |
| UK Biobank (UKB) | Biobank study involving 500,000 residents in the UK |
| Genomics England | WGS-based genome study of patient with rare disease and their families and cancer patients in England |
| FinnGen | Nationwide biobank and genome cohort study in Finland |
| Tohoku Medical Megabank Project | Biobank and genome cohort study for local area (north-east region) in Japan |
| Biobank Japan | Nationwide patient-based biobank and genome cohort study in Japan |
| Trans-Omics for Precision Medicine (TOPMed) | A genomic medicine research project to perform omics analysis pre-existing cohort samples |
| BioMe Biobank | Electronic health record-linked biobank of patients from the Mount Sinai Healthcare System |
| Michigan Genomics Initiative | Electronic health record-linked biobank of patients from the University of Michigan Health System |
| BioVU | Repository of DNA samples and genetic information in Vanderbilt University Medical Center |
| DiscovEHR | Electronic health record-linked genome study of participants in Geisinger’s MyCode Community Health Initiative |
| eMERGE | Consortium of biorepositories with electronic medical record systems and genomic information |
| Kaiser Permanente Research Bank | Nationwide biobank collecting genetic information from a blood sample, medical record information, and survey data on lifestyle from seven areas of US |
| Million Veteran Program | Genome cohort study and biobank of participants of the Department of Veterans Affairs (VA) health care system |
| CARTaGENE | Biobank study of 43,000 Québec residents |
| lifelines | Multigenerational cohort study that includes over 167,000 participants from the northern population of the Netherlands |
| Taiwan Biobank | Nationwide biobank and genome cohort study of residents in Taiwan |
| China Kadoorie Biobank | Genome cohort study of patients with chronic diseases in China |
| Population Reference Sample (PORES) | Resource for population, disease, and pharmacological genetics research containing genotype array data on close to 6,000 individuals |
| The Genome Aggregation Database (gnomAD) | International resource developed with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community |
| The Haplotype Reference Consortium (HRC) | Provides a large reference panel of human haplotypes by combining together sequencing data from multiple cohorts |
| The Estonian Biocentre Human Genome Diversity Panel (EGDP) | Provides a dataset of individuals from hundreds of diverse worldwide populations |
| Allen Ancient DNA Resource (AADR) | Provide a uniformly curated dataset of genotypes for thousands of ancient and present-day individuals that can be useful for scientists interested in carrying out analyses of population history and natural selection. |
Modified Table from: Practical guide for managing large-scale human genome data in research