Harvard Biostatistics Dissertations

In addition to learning data collection and analysis methods, participants learn research collaboration efforts by engaging in group projects with other participants and graduate students. Group projects are designed and mentored by a faculty member in the Departments of Biostatistics or Epidemiology and by a graduate student or postdoctoral research fellow. This research is a good introduction to research methods, analysis, and organization and presentation of results.

Faculty Mentor: Cory Zigler
Postdoc Mentor: Chanmin Kim
2017 Program Participants: David Angeles,Alexandra Carruthers Ferrero, Jovaniel Rodriguez Maldonado

Faculty Mentor: Jukka-Pekka Onnela
Postdoc Mentor: Patrick Staples
2017 Program Participants: Danielle Baldwin, Reibin Hiraldo, Silvio Martinez

Faculty Mentor: Sherri Rose
Graduate Student Mentor: Savannah Bergquist
2017 Program Participants: Alicia Dominguez, Julia Thome, Tyler Vu

Faculty Mentor: John Quackenbush
Postdoc Mentor: John Platig
2017 Program Participants: Andrea Ovalle, Ula Widocki

Controversy in Pharmacogenomics

Faculty Mentor: Rafa Irizarry
Graduate Mentor: Sheila Gaynor
2017 Program Participants: Jace Gilbert, Jeff Joseph, Daniel Meza

In 2012, two studies (Garnett et al and Barretina et al) attempted to correlate large numbers of gene expression, mutation, and copy number measurements in hundreds of cancer cell lines with sensitivities to hundreds of different drugs, with the goal of finding genes or mutations that might indicate certain kinds of cancers with vulnerabilities to specific drugs. However, a subsequent study (Haibe-Kains et al 2013), attempting to replicate the initial findings, found major inconsistencies in the results of the two studies. We will review the papers, download the data and analyze it ourselves to form our own conclusions.

Tracking air pollution from power plants: Mapping regulations to populations

Due to the well-established link between exposure to air pollution and human health, regulations to limit population exposure to particulate air pollution are estimated to account for over half of the monetized benefits (and nearly half of the costs) of all federal regulations. Many controversial and high-stakes regulations target harmful emissions from power plants, a major source of particulate pollution, but statistical and data-based methods to evaluate the effectiveness of these regulations are lacking. A main challenge of evaluating such regulations is the fact that air pollution moves through the atmosphere; intervening to reduce emissions at a power plant in Ohio can impact the air that people breathe in Boston. This project will combine time-varying data on emissions from over 1000 power plants in the US and measures of air pollution (and population health) at 10s of thousands of US zip codes to learn about the network defining how regulatory interventions at specific power plants spill across the country to impact the air people breathe. Such information can provide empirical support to regulatory decision making that has historically relied on non-statistical and non-data- based methods.

Smartphone-base Digital Phenotyping

Digital Phenotyping is the moment-by-moment quantification of the individual-level human phenotype in situ using data from personal digital devices such as smartphones. Beiwe is our platform for gathering, storing, and analyzing digital phenotyping data in a wide range of on-going studies, including subjects with depression, schizophrenia, ALS, eating disorders, PTSD, spinal surgery, and neurosurgery. While the scale and fine- grained detail of this data has enormous potential for prediction and classification, data quality can vary widely for different patients, and ground-truth validation data is difficult to come by. For this project, we will gather our own digital phenotyping data, as well as learn statistical and programming tools in Python to visualize data quality and find predictors of daily activities.

Machine Learning for Health Outcomes Prediction

The introduction of machine learning approaches for prediction in health research has the potential to provide improved insights. Historically, these questions have been addressed using parametric regression. Machine learning methods aim to smooth over the data, possibly making fewer assumptions than standard parametric regression techniques. Ensembling allows researchers to combine multiple algorithms to build an optimal prediction function. In this project, students will explore publicly available health data sets and implement machine learning and ensembling algorithms for prediction using existing R packages

Estimating and Understanding Gene Regulatory Networks

Different physical states, or phenotypes, are often characterized based on differentially expressed genes. But gene expression in a cell is controlled through complex regulatory processes that involve regulatory genes (transcription factors, or TFs) activating or deactivating the expression of other genes. We developed an algorithm, PANDA, that models gene regulation as a communication process between TFs and their targets. By estimating networks separately for healthy and disease populations and comparing those networks, we can gain insight into the processes that drive changes from health to disease. Students will spend the first few days learning to program in R and will be introduced to gene regulatory networks and the pandaR package. They will then be given the opportunity to model regulatory networks in a disease model and explore their properties.

Advancing health science research, education, and practice by turning data into knowledge and addressing the greatest public health issues of the 21st century.

The Department of Biostatistics at the Harvard Chan School offers an unparalleled environment to pursue research and education in statistical science while being at the forefront of efforts to benefit the health of populations worldwide.

    • Our faculty are leaders in the development of statistical methods for clinical trials and observational studies, studies on the environment, and genomics/genetics
    • Our graduates are armed with exceptional analytic and computing skills and are thriving in a wide range of careers in academia, industry, the government, and beyond
    • Our innovative approaches to computational biology, quantitative genomics and the analysis of massive data are strengthened by a deep foundation in theory and application
    • Our unique community provides countless resources and collaborative opportunities within Harvard Medical School, the Dana-Farber Cancer Institute and world-class hospitals in Boston

Why Harvard Biostats?

Watch one student’s inspiring story of what led them to study Biostatistics at Harvard.

What Sets Us Apart?

Learn more about our pioneering research and our community of leading scientists, educators & students.

Our programs provide students with a rigorous training in statistical theory, methods, and computation—and to use what they learn in the classroom to address real-world problems in public health.

Our Faculty work with researchers both locally and globally to apply statistical and computational methods to HIV and infectious disease research, chronic diseases, environmental health research, neurology, cancer, and psychiatry.


Leave a Reply

Your email address will not be published. Required fields are marked *