Genomics Workshop

Data Carpentry workshops are for any researcher who has data they want to analyze, and no prior computational experience is required. This hands-on workshop teaches basic concepts, skills and tools for working more effectively with data.

The focuses of this workshop will be working with genomics data, and data management & analysis for genomics research. We will cover metadata organization in spreadsheets, data organization, connecting to and using cloud computing, the command line for sequence quality control and bioinformatics workflows, and R for data analysis and visualization. We will not be teaching any particular bioinformatics tools, but the foundational skills that will allow you to conduct any analysis and analyze the output of a genomics pipeline.


There are no pre-requisites, and the materials assume no prior knowledge about the tools. Participants should bring their laptops and plan to participate actively.


In this workshop we’re using data from a long term evolution experiment published in 2012: Genomic analysis of a key innovation in an experimental Escherichia coli population by Blount ZD, Barrick JE, DAvidson CJ, and Lenski RE. (doi:10.1038/nature11514)

Workshop Overview

This document provides basic information about Data Carpentry Genomics workshops for instructors:

All of our materials are on GitHub with a CC0 copyright waiver: Data Carpentry curriculum on GitHub

Workshop Outlines

There are currently two versions of this workshop which are arranged slightly differently and run either two days or three days.

Genomics Workshop with R

This 2-day version includes an introduction to R and analysis of metadata, an introduction to the command line, and bioinformatics analysis at the command line

  1. Project Organization and Management
  2. Using cloud computing for genomics

  3. Cleaning and visualizing data in R and Rstudio
  4. Introduction to the command line

Genomics Workshop with Pipeline Workflow

This 2-day version includes an introduction to the command line, bioinformatics analysis at the command line and the development and automation of bioinformatics pipelines.

  1. Project Organization and Management
  2. Introduction to the command line
  3. Data wrangling and processing

Teaching Platforms

In its current form, the workshop can be run on pre-imaged AWS (Amazon Web Services) instances, Cyverse instances, or data & directories built on a local compute cluster. Contact us for information on other platforms that those listed below.

Platforms Details

Amazon instance for workshop

All the software and data used in the workshop is on an Amazon AMI (Amazon Machine Image).

If you want to run your instance of the server used for this workshop, launch a t2.medium instance in the N. Virginia region with AMI ami-aab445c7, available under “Community AMIs” in the Amazon EC2 Management Console. Information on how to launch an instance can be found on the creating Amazon instances page


Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to insure the proper setup of tools for an efficient workflow. These lessons assume no prior knowledge of the skills or tools, but working through this lesson requires working copies of the software described. To most effectively use these materials, please make sure to install everything before working through this workshop.

Twitter: @datacarpentry