Do's and Don'ts in Research Data Management

Do's and Don'ts in Research Data Management

Christian Hillen

Christian Hillen

I have a background in history. As an archivist I have been working with historical (research) data - both analogue and digital - for the last 25 years. Currently I am a consultant with DKZ.2R at the RRZK (Regionales Rechenzentrum der Universität zu Köln).

Research Data Management Do’s and Don’ts - Step up your RDM skills!

1. Structuring and naming your folders There is an easy way to make your data findable for you and your team: establish a folder structure which makes sense for you and your working group as well as naming conventions for your folders.

Don’t:

Paul and Suzie
»Guideline
>application
»version2_final
»v.3
»review
»3rd.version
>JD
»qn
»0-1

Instead do:

000_int_orga
»01_application
»02_review 120_questionaires
»01_qualitative »02_quantitative 130_data
»01_qualitative »02_quantitative

Also Do:

000-int_orga
100_planning
»01_application
»01_review
>120_qualitative
»01_guideline »02_data
130_quantitative
»01_questionaire
»02_data

Want to learn more about organizing your data?:
Take part in our Data Challenge on November 7th in Cologne and learn more about Metadata and Data structuring (sign up here!) or visit University of Cologne EduLabs for more information on how to structure your data in useful ways.

2. Storing your data
Storing your data is very important not only to make them accessible for the (right) persons it is also a matter of making them findable: If you store them on a stick no other member of your working group will have access or find the data, they won’t even know this data exists.

Don’t:

measuring device (local, remote)
laptop (local, remote)
Dropbox (local, remote)
flash drive (archive)
external H(ard)D(isk)D(rive) (archive)

Do this instead:

S(olid)S(tate)D(rive) (local, remote) H(ard)D(isk)D(rive) (local, remote) N(etwork)A(attached)S(torage) (local, remote) Sciebo (local, remote) DataStorageNRW (archive) Repositories (archive)

Want to learn more about storing your data?: visit University of Cologne EduLabs or the UDE Speichermatrix.

3. Naming your data
Naming your data in an understandable and consistent manner makes it much easier for you and your team to find the data you are looking for. Therefore you should take some time to develop naming conventions.

Don’t:

Really_long_file_names_because_windows_is not_able_to_process_more_than_255_characters_and_that_includes_the_name_of_the_folders
Using abbreviations that are not generally understood in your community
Using special characters like * % [ ] > / : ä ö ü ß space

Instead do:

Readme file documenting conventions
Use inverted date format for sorting (YYYYMMDD)
If necessary add hour, minute and second
Initial numbers for sorting (01_title)
Use interoperable set of characters
A good filename could be: 20250901_sample01_H2O_v2_original.tiff.
The readme should explain the structure of your naming convention: [SamplingDate][SampleID][SampleType][VersionNumber][description]
Abbreviations should be explained as well.

Want to learn more about naming you data in a way that helps you to stay organised?:
Take part in our Data Challenge on November 7th in Cologne and learn more about Metadata, Data structuring, and file naming (sign up here!).

4. Interoperability
You can enhance the use and reuse of your data by making them interoperable.

Don’t:

Encrypting your data (if not necessary for legal reasons)
Compressing data (like in a Zip-file) or using compressed file formats (e.g. jpeg)
Using proprietary software

Instead do:

Use open standards
Add lots of metadata
Document your processes of gathering, processing, naming an storing your data

5. Write a D(ata)M(anagement)P(lan)
DMPs are required by funding institutions, but they are also useful for yourself and your team and collaborators because they raise awareness for the importance of the whole Data Life Cycle: Which and how many data are gathered when and how. How are they processed and stored, archived and reused?

Don’t:

Starting with the DMP two days before handing in your grant application
Underestimating costs for processing and storing data.
Underestimating costs for curating data (human resources)

Instead do:

Start early on so you have time to consider all the different stages of your data in the life cycle.
Think about potential costs in human resources, soft- and hardware as well as storage.

Want to learn more about DMPs? Useful resources are offered i.a. by the University of Cologne, University Duisburg-Essen, and the Heinrich Heine University

Related Posts

Carpentries Workshop - Introduction to Python

Carpentries Workshop - Introduction to Python

Empowering Researchers with Foundational Computing Skills: Join the Upcoming Carpentries Workshop

In today’s fast-paced research environment, the ability to harness computational tools effectively can make a world of difference. Whether you’re managing data or automating tasks, having the right skills can significantly streamline your work. That’s where The Carpentries come in — a global initiative comprising the Software Carpentry, Data Carpentry, and Library Carpentry communities. These communities are dedicated to equipping researchers with essential computational and data science skills, helping them to work smarter, not harder.

Read More
How To: Open Science

How To: Open Science

Tired of Recreating someone else’s work? - How Open Science can accelerate research and overcome reinvention

Have you ever found papers on algorithms but their implementation is missing? Found an interesting analysis but there is no way to check the results, as you don’t have access to the data they were derived from? Ever thought you had a great idea for a project, just to find out a year later that you are not the only research group following that specific idea? Not having access to other people’s code, data, metrics or even their plans for research projects often leads to unnecessary delays and scientific redundancies. There is an easy solution to overcome (almost) all of these issues. It’s called Open Science! What is Open Science? The UNESCO defines Open Science as a construct of “movements and practices aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation and communication to societal actors […]”. To ensure that everyone has access to scientific knowledge and infrastructure, Open Science focuses on four main concepts.

Read More
FDM-Werkstatt - Into the RDM-Toolbox!

FDM-Werkstatt - Into the RDM-Toolbox!

The Center of Data Litercacy (German: “Zentrum für Datenkompetenz”) DKZ.2R was officially launched mid November 2023. Already a month later we joined forces with fdm.nrw to organise the very first DKZ.2R-event (find the call for participation here). The “FDM-Werkstatt – Into the RDM-Toolbox” took place from March 18 to 20, 2024 at the IT Center of RWTH Aachen University. In total, 50 participants from all over Germany took part in the workshops. Many of them brought their own topics of interest with them and presented it in one of the 12 sessions. The contents of the sessions ranged from low-level introductions and RDM-basics to elaborate and in-detail coding sessions. For three days we worked together, discussed use cases and new RDM tools. But we also enjoyed the social program such as a tour of the AiX Cave and the server room of the ITC’s High Performance Computing Center. There was a good balance between cognitively demanding workshop sessions and more relaxing social events and lunch breaks which hopefully resulted in an enjoyable and rewarding experience for all participants. On the very last day of the workshop we offered a session to especially discuss ideas and directions for the DKZ.2R. The feedback we got in this session will help us in moving forward with the DKZ.2R and making a lasting impact for future researchers.

Read More