Data Carpentry in Africa

Authors: Matt Collins BLOG

First Data Carpentry workshop in Africa at the TDWG conference

by Matt Collins at iDigBio

Data Carpentry conducted its first workshop on the African continent in September! As part of the pre-conference training week held before the Taxonomic Databases Working Group’s (TDWG) annual conference, Deb Paul, Matthew Collins, Libby Ellwood and Kevin Love put on a two day Data Carpentry workshop at the Multimedia University of Kenya. We taught the ecology lessons (spreadsheets, OpenRefine, R, but no SQL) to participants from 12 African countries. The participants were all graduate students and young researchers whose travel and participation in both the training week and the TDWG conference were paid for by the JRS Biodiversity Foundation. This was a significant investment on their part to build capacity for biodiversity informatics in Africa and we were excited to have Data Carpentry be a part of the effort.

Pre-TDWG training people

Deb Paul has written a blog post hosted on the iDigBio web site describing workshop and outcomes in detail. Her post has details on what we taught, outcomes and feedback from participants, and our experiences. In this post, I want to describe some of the logistical aspects of conducting a workshop in Africa for participants from such a wide range of countries.

I’ll start with the amazing participants. Sometimes in workshops the participants are less than 100% invested. Not here. Every person, all 24 of them, was completely invested in learning what we were teaching. Not only that but even though they were all strangers to each other at the start, they were all invested in their shared bond of being African and being biologists. During the 5 days of training they were posting pictures on Instagram with each other, sitting in the lounge sharing wifi until midnight, and getting taxi rides together to the local mall for shopping. Every morning was filled with a round of good morning handshakes.

The only slight barrier to mixing, and to the training, was language. About a third of the participants had stronger French skills than English. During the R lesson, one of the examples used the round() function. A French-speaking participant raised their hand and asked, “What is round?”. Of course this PhD student knew what rounding was, he was just missing the English word. Deb Paul quickly turned to another French speaker and asked them (in French, go Deb!) what the correct word was and everything was cleared up in a few seconds.

This contributed to the only major issue we had with the workshop: we couldn’t get through as much material as we normally do. We didn’t touch SQL at all and the post-workshop surveys reflected a lot of disappointment in that. Some of this was having to speak and switch concepts more slowly due to language. Some part of this was also due to the fact that only three people out of the 24 said they had written a script or programmed in any language before. Data Carpentry is designed to welcome beginners but in my experience I had never taught a group that was truly all beginners. Usually everyone has at least tried a scripting language before.

Everyone brought their own laptops to work on. For the most part, they were fine and software installed on them without a problem. One issue we had was that people had not updated their software in a very long time, probably due to limited internet access or per-MB internet billing. One participant had R 2.2 installed. Another had over 200 un-applied Windows updates. Another set of issues was caused by anti virus software settings being turned up so high that software installations and even browsing the web simply didn’t work. We spent a significant amount of time outside of the workshop helping a few people with system configurations. Fortunately we were all staying together in the same hotel for a week so we had lots of time to work with people so everyone went home with working software.

The internet access was different than you might expect. The quality was random. One moment it would be 1-2 Mb/s, then next slower than a 56 kb/s modem, and then off for 5 minutes, and then back but at ISDN speeds. Time remaining estimates for a large download looked like the output from a random number generator, changing every few seconds. The issues could manifest at any point from the wireless AP to the main connections out of the country. It lead to a sense of futility when trying to use the internet for research, downloading updates, broadcasting our videos, anything we would normally take for granted. Pre-downloading software and data to USB keys helped but not having a predictable connection broke the flow of interactions provided by Etherpad, Adobe Connect, Github, and Google searches for documentation.

Pre-TDWG training people in room

Kevin Love joined us to work his recording and broadcasting magic using Adobe Connect. We set the room up to record the presenter’s laptop and then project the feed from Adobe Connect. This meant that slides were going from the podium, over the internet to the US, then back to a computer in the back of the room connected to the projectors. For the live typing in the R lessons this was too much lag. I ended up having to move the podium to the back of the room where the projector cable was and plugging in directly. This is a bit specific to our goal of recording this workshop but the room didn’t have the cables, outlets, and hardware that are often present to be able to put together a better solution the me talking to the back of everyone’s heads. Kevin brought a lot of his own gear (he always brings it regardless of where we’re going, he does this dozens of times a year) but his standard kit just wasn’t enough to make everything work out.

This workshop was an incredible learning experience for everyone. Deb’s post has more information about what next steps we are planning but I’m looking forward to an opportunity to return to Africa. I hope that our future workshops can be even more successful with everything we have learned.

We would also like to thank the JRS Biodiversity Foundation, the Gordon and Betty Moore Foundation’s Data Driven Discovery Initiative through Grant GBMF4563 to Dr. Ethan White at the University of Florida, and the iDigBio project for providing funds for us to conduct this workshop.

« Previous Next »

Dialogue & Discussion