Westlake News LAB SHOW

Lab Show: Deciphering Human Genetics in the Sea of Data

09, 2021

Email: zhangchi@westlake.edu.cn
Phone: +86-(0)571-86886861
Office of Public Affairs

Try picturing Statistical Genetics Laboratory at Westlake University: you won't find a workstation, pipette, or Petri dish - it is a data science laboratory. Prof. Jian Yang, the captain that steers this ship, leads his crew fishing out keys to human genetics in the sea of data.

You must have come across such questions: why would the same comment irritate some but not others? Why would some people live to their 90s smoking and drinking but not the rest? The individual differences lie within our genomes. Such differences exist in our body in the form of DNA fragments from the point when we are conceived and lasts through our lives. To give a few examples. Our height is ~80% controlled by the genes and whether one would suffer from schizophrenia is also ~70% determined by genetic factors; more common mental illness like depression is only ~40% influenced by genetics, with the rest of ~60% under other variables.

Prof. Yang’s job is to study the genome differences between people. He aims to reveal the relationships between people’s differences in DNA and their differences in susceptibility to certain diseases. That is not an easy task to crack.

Learning from our high school biology class, we would think that a certain gene corresponds to a certain trait. However, such one-to-one corresponding single-gene traits are very rare in real life.

“Our behaviors, physiological features, and disease susceptibilities are all caused by the differences in DNA fragments which exist in massive quantity with minute effects,” said Yang, “this is because of the natural selection taking place in the evolution.

“New DNA mutations tend to be harmful. If a mutation causes a relatively heavy impact on the body, individuals who carry such a mutation would often die early or be at a disadvantage in competition with others. Therefore, such mutations with heavy impacts tend to have a harder time passing on,” Yang explained. Mutations with minor impacts are more likely to slip through the net of natural selection to become the genetic variants commonly seen in our current populations.

How do we locate such tiny-effect genetic variants? With big data.

Being able to handle massive data quickly becomes crucial to the development of genome science. Yang's Lab develops highly efficient and powerful statistical models and software tools to help wash out the gold sand from the vast sea of genomics and health data.

Yang said, weak variant-disease association signals tend to get lost in noises when the sample size is small. Only when the sample size is big enough would the true signals stand out from the noises and make themselves seen by the scientists. “This is how we discover widely applicable rules through minute changes and the power of big data,” he added.

Some of the genetic analysis methodologies Yang and his colleagues proposed over the years have become the mainstream practice in genome-wide association studies. They could make valuable contributions to the screening, prevention, and treatment of complex diseases in the future. Yang’s lab has located some susceptible genetic loci for diseases such as obesity, diabetes, heart disease, mental illness and cancer, providing important leads to design future experiments to understand the pathology of complex diseases.

Then when can we locate the corresponding genes for all our diseases?

“It may take 100 years, 200 or even longer,” said Yang, “we are still at the early cognitive stage in the field of life sciences. It’d be lucky if we could find the corresponding genes for one or two diseases!”

It may sound far, “but we can’t stop the work just because certain questions cannot be answered in our generation. What we can do is to make efforts, gain knowledge, then perhaps we would be able to be one step forward in understanding how our genes work,” said Yang.