Scaling Collaborative Curriculum Development for Data Skills Training

Authors: Tracy Teal BLOG

We are excited to announce that we have received a grant from the Alfred P. Sloan Foundation to train researchers in essential data skills and build a general framework for collaborative lesson development to scale data training. This grant will allow us to create general infrastructure, guidelines and pathways for community engagement to establish open source lesson development as a practice and enable scalable, collaborative data training. Curriculum developed through this grant will include economics, image analysis and chemistry. This work will be a proving ground for the establishment of infrastructure and processes for collaborative and open lesson development in other domains and topics.

There is increasing awareness of the need for data skills training across a diversity of domains. Necessary core data skills include: data organization and cleaning, exploratory analysis (generating simple summaries and graphs) and data management (sharing, storage).These skills require competency with common file formats, data types, command line tools and the programming languages used by researchers within a particular domain. As different universities and organizations begin to see the need to teach these skills, there is an opportunity to work together to build curriculum, rather than each organization developing their own content in isolation. There is great power in the community perspective, both in what is essential to teach and in the development of materials, and also in the continued re-teaching and re-use of the same materials. This works to improve the content over time and helps keep it relevant and up-to-date. To be maximally effective, these training materials should be accessible, discoverable, and follow best practices derived from educational research.

The Carpentries are at the forefront of this kind of curriculum development, dissemination and teaching strategy. Our curricula are developed collaboratively, are freely available (CC-BY licensed) and are delivered by hundreds of trained volunteer instructors around the world each year. As we have built up our reputation for offering quality trainings, many communities have approached us to help develop and disseminate new content in digital humanities, astronomy, social sciences, library sciences, imaging, economics, chemistry, statistics, high performance computing, meteorology and neuroimaging.

Because of this broad interest, there is a need to establish clearer process and infrastructure to scale this approach to lesson development. This project will build that infrastructure and develop processes that both engages the community and makes contributions more effective and straightforward.

The Carpentries have hired Dr. François Michonneau to lead these curriculum development efforts. We’re excited to welcome François as our Curriculum Development Lead. He brings technical expertise, experience both in teaching and curriculum development, and an inclusive approach to lesson contributions and open source software development to the role. François is a long time Data and Software Carpentry community member. In 2014, as he was planning to teach a semester-long R programming course for the graduate students of the biology department at the University of Florida, he came across Software Carpentry. Intrigued by the pedagogical approach of these workshops, he wanted to experience it firsthand, and attended the inaugural Data Carpentry workshop there. Soon after, he became a certified Instructor, and has since taught a dozen workshops. He is also one of the developers and maintainers for the Data Carpentry R ecology lesson, and has helped organize the development of the Reproducible Science Curriculum lessons. This summer he certified as a a Carpentries Instructor Trainer.

François received his PhD at the University of Florida studying marine biodiversity, where he documented the diversity of sea cucumbers, and in the process described a new species he named after the dog of the museum collection manager assistant (both are very fluffy). As a postdoctoral researcher at the Whitney Marine Laboratory, he synthesized marine biodiversity knowledge available from public databases and used data science approaches to identify knowledge gaps, and levels of digitization for the US marine invertebrate fauna.

François is also the maintainer of several R packages centered around the manipulation of phylogenetic data and an active member of the rOpenSci community. He believes that open and reproducible science can transform the scientific process by generating robust results that can more easily be expanded on. He is excited to lead the growth of the curriculum taught by the Carpentries, so more people and more disciplines can learn the skills needed to conduct open and reproducible research. François is on twitter as @fmic_ on GitHub as fmichonneau, and his personal website is

We are excited about this project and the opportunity to scale open, collaborative curriculum development in The Carpentries and provide frameworks and processes for training in the data science community as a whole. Please join us in welcoming Francois, as he works with the lesson infrastructure community on ideas for updates and in supporting the lesson development and maintainers community.

« Previous Next »

Dialogue & Discussion