A Data Carpentry Workshop

Organized by AAAS Fellows Program


Abelson/Haskins room on the second floor
March 30
8:30 am - 5:00 pm

General Information

Data Carpentry is an organization with the goal of teaching basic concepts, skills and tools for working more effectively with data. The rapid generation of large amounts of data is fundamentally changing how research is done. This deluge of data presents great opportunities, but also many challenges in managing, analyzing and sharing data. Data Carpentry teaches the skills that will enable researchers to be more effective and productive, to learners with little to no prior computational experience. In particular, as large corpora of text are becoming digitized and broadly available, it is now possible to take a quantitative approach to analyzing text to address questions in public policy, develop guidelines for implementation or assess the effectiveness of policy. As many of the types of data collected for this type of assessment are text-based, in the form of free text responses in surveys, written reports, newspaper articles or a variety of other sources, the ability to conduct text mining and analysis effectively and reproducibly will have an impact on the ability to understand and contribute knowledge to policy making.

Data Carpentry workshops are for any researcher who have data they want to analyze, and no prior computational experience is required. This hands-on workshop teaches basic concepts, skills and tools for working more effectively with data. We will cover an introduction to R, how to use the tm package to convert text into numbers, and how to analyze and visualize the data in R. By the end of the workshop learners should be able to more effectively manage and analyze data and be able to apply the tools and approaches directly to their ongoing research.

Participants should bring their laptops and plan to participate actively. Before the workshop you will need to install R and RStudio. Please see the installation instructions for more details.

Data Carpentry's aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain.


Monday 09:00 Setup, introduction and motivation
09:30 Data input
10:30 Break
11:00 Getting the tm package
11:30 Preparing the Document Term Matrix
12:00 Lunch
13:00 Analysing the Document Term Matrix
14:30 Break
15:00 Visualising the Document Term Matrix
16:00 Wrap-up

Before the workshop

Updates will be posted to this website as they become available.

The data file for the workshop are available at the following link (right-click and 'save as' to download to your computer)

The file of additional code for cluster analysis, topic modelling, reading in PDFs, etc. can be downloaded here (right-click and 'save as' to download to your computer)

The etherpad for this workshop can be found here

Instructors: Ben Marwick, Tracy Teal

Who: The course is aimed at faculty, research staff, postdocs, graduate students, advanced undergraduates, and other researchers in any field. No prior computational experience is required.

Requirements: Data Carpentry's teaching is hands-on, so participants are encouraged to bring in and use their own laptops to insure the proper setup of tools for an efficient workflow once you leave the workshop. (We will provide instructions on setting up the required software several days in advance, and the classroom will have computers with the software installed). There are no pre-requisites, and we will assume no prior knowledge about the tools. Participants are required to abide by Software Carpentry's Code of Conduct.

Contact: Please email tkteal@datacarpentry.org for questions and information not covered here.

Twitter: #datacarpentry @datacarpentry

Acknowledgements & Support

Data Carpentry is supported by the Gordon and Betty Moore Foundation and a partnership of several NSF-funded BIO Centers (NESCent, iPlant, iDigBio, BEACON and SESYNC) and Software Carpentry, and is sponsored by the Data Observation Network for Earth (DataONE). The structure and objectives of the curriculum as well as the teaching style are informed by Software Carpentry.

Additional Resources


Where to learn more about R

Working with text in R

Plotting in R

Getting help when working with R