spot_imgspot_img

GenomeIndia: A National Effort in Population Genomics 

Krithika Subramanian
Krithika Subramanian

Author interview: Krithika Subramanian is a PhD student at the Centre for Brain Research, IISc Bangalore, and Manipal Academy of Higher Education. With over eight years of experience in next-generation sequencing analysis, her research focuses on identifying SNVs, INDELs, and structural variants using whole-genome sequencing data, and on developing and troubleshooting scalable genomic pipelines on high-performance computing infrastructure.

LinkedinTwitter

Lab: Prof. Bratati Kahali, Centre for Brain Research, IISc

Research Summary: GenomeIndia sequenced 10,000 genomes from 83 Indian populations, capturing vast genetic diversity and offering insights into one of the world’s most underrepresented groups in global genomic studies.

What was the core problem you aimed to solve with this research?

India is home to over 1.4 billion people and harbors one of the most complex and stratified human population structures in the world, shaped by migrations, admixture, linguistic diversification, and social practices like endogamy. However, despite this extraordinary diversity, Indian populations have been historically underrepresented in global genomic datasets. This lack of representation poses a significant barrier to understanding human genetic variation at a global scale and limits the relevance of current biomedical research for Indian populations.

Our goal was to address this gap by systematically mapping genetic variation across India through deep whole-genome sequencing. We aimed to build a foundational dataset that captures rare, common and population-specific variants, informs population structure and demographic history, and provides a resource for future genetic, anthropological, and translational studies. By focusing on diverse socio-ethnolinguistic groups across the country, we sought to ensure that this map reflects the full breadth of India’s genetic heritage.

Overview of the GenomeIndia project Whole-genome sequencing (WGS) was performed on 10,074 individuals from 83 diverse Indian populations.
Overview of the GenomeIndia project: Whole-genome sequencing (WGS) was performed on 10,074 individuals from 83 diverse Indian populations. The workflow included DNA isolation, array-wide genotyping, whole genome sequencing, and large-scale data analysis. High concordance of genotype calls, assessed using the (Genome-in-a-Bottle) GIAB truth set and cross-centre comparisons, suggests negligible batch effects. Preliminary results indicate approximately 130 million high-quality autosomal variants, the majority of which (~65%) are ultra-rare (Minor Allele Frequency < 0.1%). The project consumed over 0.7 million CPU hours and generated petabytes of genomic data.

How did you go about solving this problem?

The Centre for Brain Research (CBR) at IISc is the coordinating centre of the GenomeIndia project, working in collaboration with 19 partner institutions across India. Together, we established a nationwide network comprising 13 sample collection centres and 4 primary sequencing centres, the rest were methods development centres. Through this effort, we collected 23,805 samples and sequenced the whole genomes of over 10,000 healthy, unrelated individuals from 83 distinct Indian population groups.

These populations represented all four major language families of the Indian subcontinent—Indo-European, Dravidian, Austroasiatic, and Tibeto-Burman—capturing the breadth of India’s ethnolinguistic diversity. To ensure representation of rare variants, we sequenced a median of 159 unrelated individuals from each non-tribal group and 75 from each tribal group. In addition, 3–6 parents–offspring trios were included per group to support haplotype phasing, imputation, de novo variant detection, and validation of variant calls. Comprehensive genomic analyses were conducted to identify genetic variation and characterize population structure, providing a robust dataset for understanding India’s complex population history.

GenomeIndia is the largest study till date to decode the genetic diversity of the Indian population. Krithika, as a first author, has contributed significantly with her computational skills in processing and analyzing over a petabyte of genomic data in this large national collaborative effort.

How would you explain your research outcomes (Key findings) to the non-scientific community?

Our study has created one of the most comprehensive genomic datasets of Indian populations to date. By analyzing whole-genome sequences from 9,772 individuals across 83 distinct communities, we identified approximately 130 million high-confidence genetic variants. Notably, most of these variants are rare—present in less than 0.1% of the population—and many have not been reported in existing global genomic database

These findings reveal a remarkable level of genetic diversity within India that has largely remained unexplored. This dataset not only helps us better understand human evolution, ancestry, and migration, but also provides a critical foundation for future research in public health and disease. Importantly, it will help make global genomics more inclusive by representing South Asian populations more accurately. We are currently in the process of preparing manuscript to report and explore the biological and medical insights uncovered through this large-scale effort.

What are the potential implications of your findings for the field and society?
Our comprehensive genomic dataset provides a foundational resource for studies in human evolution, migration, and disease susceptibility. By capturing the deep and previously underrepresented genetic diversity of Indian populations, this project paves the way for more inclusive and globally representative genomic research. Beyond population genetics, the dataset is expected to inform future functional studies aimed at understanding the biological roles of population-specific genetic variants. These insights could contribute to disease association studies and support interventional and translational research efforts in India—ultimately helping to tailor healthcare strategies to the unique genetic landscape of its people.

What was the exciting moment during your research?

As a computational biologist, one of the most exciting aspects of this research was the opportunity to work with large-scale genomic data, comprising terabytes of whole-genome sequences from over 10,000 individuals. Performing joint genotyping and variant calling using high-performance computing infrastructure presented both a significant computational challenge and a rewarding analytical task.

A particularly striking outcome was the identification of genetic variants unique to Indian populations, many of which are completely absent from global reference datasets. Several of these variants were found exclusively within specific ethnolinguistic groups, underscoring the deep and previously undocumented genetic diversity across the Indian subcontinent. These findings highlight the critical importance of including diverse populations in genomic research and demonstrate the power of large-scale sequencing to uncover novel aspects of human genetic variation.

Reference: https://www.nature.com/articles/s41588-025-02153-x

Biopatrika News Desk
Biopatrika News Deskhttp://www.biopatrika.com
Life science news, jobs, careers, fellowships, admissions, and interviews. BioPatrika covers academia, startups, and industry, bridging the gap between science and society

Get in Touch

LEAVE A REPLY

Please enter your comment!
Please enter your name here

spot_imgspot_img

Related Articles

spot_img

Get in Touch

588FansLike
520FollowersFollow
4,100FollowersFollow
780SubscribersSubscribe

Latest Posts