Based on the Software Carpentry strategy of collaborative development of hands-on, interactive lessons for workshops, we facilitate and develop the lessons for Data Carpentry workshops.

Interested in helping develop lessons?

These lessons are distributed under the CC-BY and are free for re-use or adaptation, with attribution. We’ve had people use the lessons in courses, to build new lessons or use them for self-guided learning.

Data Carpentry workshops are domain-specific, so that we are teaching researchers the skills most relevant to their domain and using examples from their type of work. Therefore we have several types of workshops and lessons are ordered by topic.

Ecology Workshop


This workshop uses a tabular ecology dataset from the Portal Project Teaching Database and teaches data cleaning, management, analysis and visualization. There are no pre-requisites, and the materials assume no prior knowledge about the tools. We use a single dataset throughout the workshop to model the data management and analysis workflow that a researcher would use.

More workshop details

The workshop can be taught using R or Python as the base language.


Lesson Site Repository Instructor Guide Maintainer(s)
Data Organization in Spreadsheets Christie Bahlai, Tracy Teal
Data Cleaning with OpenRefine Deborah Paul, Cam Macdonell
Data Management with SQL Paula Andrea Martinez, Timothée Poisot
Data Analysis and Visualization in R François Michonneau, Auriel Fournier
Data Analysis and Visualization in Python John Gosset, April Wright, Mateusz Kuzak

Genomics Workshop


The focus of this workshop is on working with genomics data and data management and analysis for genomics research. It covers metadata organization in spreadsheets, data organization, connecting to and using cloud computing, the command line for sequence quality control and bioinformatics workflows, and R for data analysis and visualization. The workshop does not teach any particular bioinformatics tools, but the foundational skills that will allow you to conduct any analysis and analyze the output of a genomics pipeline.

More workshop details


Lesson Site Repository Maintainer(s)
Introduction to the workshop and dataset Tracy Teal
Introduction to cloud computing for genomics Bob Freeman
Introduction to the command line Sheldon McKay, Karen Cranston
Data wrangling and processing Sheldon McKay, Karen Cranston
Data analysis and visualization in R Naupaka Zimmerman, Jason Williams, Meeta Mistry

Geospatial Data Workshop


This workshop is co-developed with the National Ecological Observatory Network (NEON). It focuses on working with geospatial data - managing and understanding spatial data formats, understanding coordinate reference systems, and working with Raster and Vector data in R for analysis and visualization.


Lesson Material Repository Maintainer(s)
Working with vector data in R Leah Wasser, Joseph Stachelek
Working with raster data in R Leah Wasser, Joseph Stachelek
Introduction to Geospatial data Leah Wasser, Joseph Stachelek

Social Science Materials

This is not yet a full workshop, but we have a lesson focused on text mining in R

Lesson Material Maintainer(s)
Social sciences text mining Ben Marwick

Biology Semester-long Course

The Biology Semester-long Course was developed and piloted at the University of Florida in Fall 2015. Course materials include, readings, lectures, exercises and assignments that expand on the material presented at workshops focusing on SQL and R. The course is accessible to: