Python for ecologists

Content Contributors: April Wright, Ethan White, John Gosset, Leah Wasser, Mariela Perignon, Tracy Teal

Lesson Maintainers: April Wright, John Gosset, Mateusz Kuzak

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with ecological data in Python.

Lessons

Data

Data for this lesson is from the Portal Project Teaching Database - available on FigShare.

Specifically, the data files we use in these lessons are:

Requirements

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to insure the proper setup of tools for an efficient workflow. These lessons assume no prior knowledge of the skills or tools, but working through this lesson requires working copies of the software described below. To most effectively use these materials, please make sure to install everything before working through this lesson.

Participants are required to abide by Data Carpentry’s Code of Conduct.

Setting Up Python

Python is a popular language for scientific computing and data science, as well as being a great for general-purpose programming. Installing all of the scientific packages individually can be a bit difficult, so we recommend an using an all-in-one installer, like Anaconda.

Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.4 is fine).

We will teach Python using the Jupyter notebook, a programming environment that runs in a web browser. For this to work you will need a reasonably up-to-date browser. The current versions of the Chrome, Safari and Firefox browsers are all supported (some older browsers, including Internet Explorer version 9 and below, are not).

Windows

  • Download and install Anaconda.
  • Download the default Python 3 installer. Use all of the defaults for installation except make sure to check Make Anaconda the default Python.

Mac OS X

  • Download and install Anaconda.
  • Download the default Python 3 installer. Use all of the defaults for installation.

Linux

We recommend the all-in-one scientific Python installer Anaconda.

  1. Download the installer that matches your operating system and save it in your home folder. Download the default Python 3 installer.
  2. Open a terminal window.
  3. Type
    bash Anaconda-
    and then press tab. The name of the file you just downloaded should appear.
  4. Press enter. You will follow the text-only prompts. When there is a colon at the bottom of the screen press the down arrow to move down through the text. Type yes and press enter to approve the license. Press enter to approve the default location for the files. Type yes and press enter to prepend Anaconda to your PATH (this makes the Anaconda distribution the default Python).

Installing ggplot Python package

ggplot is a Python implementation of the R ggplot2 graphics package. It is not intended to be a feature-for-feature port of ggplot2 but provides some of ggplot2 functionality in Python ecosystem.

The easiest approach to install ggplot is via conda package manager provided in Anaconda distribution that you have installed above.

Windows

  • Open Anaconda Prompt from windows menu.
  • In opened prompt window type in conda install -c conda-forge ggplot and accept when prompted for feedback.

Mac OS X

  • Open Terminal app.
  • Type into Terminal window conda install -c conda-forge ggplot and accept when prompted for feedback.

Linux

  1. Open default terminal application (on Ubuntu that will be gnome-terminal).
  2. Type into terminal conda install -c conda-forge ggplot and accept when prompted for feedback.

Checking that your installation worked

Now it is time to make sure that your Anaconda installation was successful. Download check_env.py file which is a Python script that will check if Anaconda has been correctly installed on your system. From your terminal, navigate to the directory that contains check_env.py and execute the following:

python check_env.py
If you receieve an AssertionError, it will inform you how to correct your installation. Otherwise, it will tell you that your system is good to go and ready for Data Carpentry!

Acknowledgements & Support

Data Carpentry is supported by the Gordon and Betty Moore Foundation and a partnership of several NSF-funded BIO Centers (NESCent, iPlant, iDigBio, BEACON and SESYNC) and Software Carpentry, and is sponsored by the Data Observation Network for Earth (DataONE). The structure and objectives of the curriculum as well as the teaching style are informed by Software Carpentry.

Schedule

Setup Download files used in the lesson.
00:00 Short Introduction to Programming in Python What is Python?
Why should I learn Python?
00:00 Starting With Data How can I import data in Python?
What is Pandas?
Why should I use Pandas to work with data?
01:00 Indexing, Slicing and Subsetting DataFrames in Python How can I access specific data within my data set?
How can Python and Pandas help me to analyse my data?
02:00 Data Types and Formats What types of data can be contained in a DataFrame?
Why is the data type important?
02:45 Combining DataFrames with pandas Can I work with data from multiple sources?
How can I combine data from different data sets?
03:30 Data workflows and automation Can I automate operations in Python?
What are functions and why should I use them?
05:00 Plotting with ggplot Can I use Python to create plots?
How can I customize plots generated in Python?
05:45 Data Ingest & Visualization - Matplotlib & Pandas What other tools can I use to create plots apart from ggplot?
Why should I use Python to create plots?
06:30 Accessing SQLite Databases Using Python & Pandas
07:15 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.