Data Analysis for Life Sciences 6: High-performance Computing for Reproducible Genomics - Harvard University



Informazione importanti

  • Corso
  • Online
  • Quando:
    Da definire

Learn how to bridge from diverse genomic assay and annotation structures to data analysis and research presentations via innovative approaches to computing.With this course you earn while you learn, you gain recognized qualifications, job specific skills and knowledge and this helps you stand out in the job market.

Informazione importanti

Requisiti: PH525.3x, PH525.4x


Dove e quando

Inizio Luogo
Da definire

Cosa impari in questo corso?

Data analysis


Enhanced throughput: Almost all recently manufactured laptops and desktops include multiple core CPUs. With R, it is very easy to obtain faster turnaround times for analyses by distributing tasks among the cores for concurrent execution. We will discuss how to use Bioconductor to simplify parallel computing for efficient, fault-tolerant, and reproducible high-performance analyses. This will be illustrated with common multicore architectures and Amazon’s EC2 infrastructure.  

Enhanced interactivity: New approaches to programming with R and Bioconductor allow researchers to use the web browser as a highly dynamic interface for data interrogation and visualization. We will discuss how to create interactive reports that enable us to move beyond static tables and one-off graphics so that our analysis outputs can be transformed and explored in real time.

Enhanced reproducibility: New methods of virtualization of software environments, exemplified by the Docker ecosystem, are useful for achieving reproducible distributed analyses. The Docker Hub includes a considerable number of container images useful for important Bioconductor-based workflows, and we will illustrate how to use and extend these for sharable and reproducible analysis.

Given the diversity in educational background of our students we have divided the series into seven parts. You can take the entire series or individual courses that interest you. If you are a statistician you should consider skipping the first two or three courses, similarly, if you are biologists you should consider skipping some of the introductory biology lectures. Note that the statistics and programming aspects of the class ramp up in difficulty relatively quickly across the first three courses. By the third course will be teaching advanced statistical concepts such as hierarchical models and by the fourth advanced software engineering skills, such as parallel computing and reproducible research concepts.

The courses in this series will be released sequentially each month and are self-paced:

PH525.1x: Statistics and R for the Life Sciences

PH525.2x: Introduction to Linear Models and Matrix Algebra

PH525.3x: Statistical Inference and Modeling for High-throughput Experiments

PH525.4x: High-Dimensional Data Analysis

PH525.5x: Introduction to Bioconductor: annotation and analysis of genomes and genomic assays 

PH525.6x: High-performance computing for reproducible genomics

PH525.7x: Case studies in functional genomics

Ulteriori informazioni

Rafael Irizarry Rafael Irizarry is a Professor of Biostatistics at the Harvard T.H. Chan School of Public Health and a Professor of Biostatistics and Computational Biology at the Dana Farber Cancer Institute. For the past 15 years, Dr. Irizarry’s research has focused on the analysis of genomics data. During this time, he has also has taught several classes, all related to applied statistics. Dr. Irizarry is one of the founders of the Bioconductor Project, an open source and open development software project for the analysis of genomic data. His publications related to these topics have been highly cited and his software implementations widely downloaded.