PhD students will generate a lot of data over the course of their studies so it’s important in terms of time, reliability and reusability to manage data effectively. Each year Southampton Research Software Group delivers a bespoke Data Management module to SoCoBio students. The module enables students to develop computational skills, discover tools that will advance their research, and develop experience with these tools. The course is suitable for students with no computational experience and those with some experience of computational research. The module is self-directed. This helps develop confidence in finding solutions from documentation and from developer forums, which enables students to develop their skills independently after the module is complete.
With the addition this year of additional training modules for Bash and Git, two extremely powerful tools, the full training schedule now includes: Spreadsheets, OpenRefine, Bash, Git, and R. The module demonstrates how spreadsheets and OpenRefine, both programs with graphical user interfaces, can be used to handle and analyse everyday data set – and then discusses the limitations of these tools. Students learn how to use Bash and Git, both powerful command line tools, to manage and track files, which enable students to develop efficient and reproducible workflows – which saves time and reduces stress. The final three sessions focus on creating workflows in R, a programming language developed to aid data analysis. The R course starts with the basics, then moves on to explain loading and manipulating data, and finally introduces data visualisation using the plotting library ggplot2.
Feedback from previous students of the course has been hugely positive, with the majority of responses stating that the content will be helpful to their PhD research.
Student comments:
“It’s very helpful to be able to automate everything I used to do very labour intensively in Excel!”
“Really liked the ‘ta-das’ when things worked!”
The second part of the Module is a 2-day workshop for hands-on exploration of omics data and takes place in Spring 2022. Students will work in groups to develop know-how and skills in omics data management, analysis, visualization and presentation.