Coxmds: Multiple Data Splitting For High-Dimensional Mediation Analysis With Survival Outcomes In Epigenome-Wide Studies
Understanding how certain exposures, like smoking, lead to health outcomes, such as cancer or Alzheimer’s disease, is a complex puzzle. Often, this connection isn’t direct; instead, it works through intermediate steps, or “mediators.” For example, changes in our DNA, specifically a process called DNA methylation, can act as these mediators, linking an exposure to a disease.
Researchers often face a challenge when trying to pinpoint these mediators, especially in studies that look at a vast number of potential mediators, like all the DNA methylation sites across our entire genetic makeup. Existing methods struggle to accurately identify these true mediators while avoiding false positives, particularly when these mediators are interconnected or don’t follow typical statistical patterns.
A new statistical approach has been developed to tackle this problem. This method, which uses a technique called “multiple data splitting,” helps to reliably identify these crucial intermediate variables. It works by repeatedly dividing the data, analyzing it, and then combining the results, ensuring that the findings are robust and that the rate of false discoveries (incorrectly identified mediators) is kept under control. This is a significant improvement, as it allows for more accurate detection of the real biological pathways at play, even in complex scenarios where many genetic factors are involved.
In practical applications, this new method has successfully identified specific DNA methylation sites that appear to mediate the link between smoking and both lung cancer survival and the progression of Alzheimer’s disease. This advancement provides a more powerful tool for uncovering the intricate biological mechanisms that drive disease, paving the way for a deeper understanding of health and illness.
Source: link to paper