Research support

Research Data Toolkit

A library toolkit, focusing on research data management.

About this page

This page offers practical advice and recommendations for data resources, addresses common data myths, and highlights the importance of the University’s data repository, RaYDaR.

Jump to a section:

Research data and research data management
The FAIR data principles
Research data myths
The research data lifecycle
RaYDaR
Policies
Upcoming research sessions
Contact

Research data and research data management

Research data

Research data is any data found or generated through a research project. Research data can take a variety of forms including spreadsheets, field notes, interview transcripts, surveys and questions. Multi-media and audio-visual material are also forms of research data as well as sketches and annotations throughout a project.

You may also hear open data, which is research data that is free to download, share and reuse. There is growing encouragement from researchers to share research data and make it open immediately once a research project concludes.

Research data management (RDM)

Research data management (RDM) is the processes involved in managing and organising research data throughout each stage of the research data lifecycle. This includes planning and collecting data, processing and analysis methods, storage and preserving, sharing and reusing data.

RDM underpins research excellence and is an essential part of conducting responsible and high-quality research. As well as helping to conduct research responsibly, RDM enables staff and students to meet the requirements of the University’s Research Data Management Policy, research funder policies and legislation.

The FAIR data principles

The FAIR data principles are a key standard for high quality research data management. It is something to be aware of at all stages of a research project and the research data lifecycle.

There are 4 FAIR principles. They are:

Findable: Both data and its metadata should be described well and easy to find. It should also be machine readable.
Accessible: Data should be accessible in different formats and be clear to understand.
Interoperable: Data should have standardised vocabularies so it can work across different systems and processes.
Reusable: Data should be described well so others can interpret, combine and reuse it.

Research data myths

This section highlights 4 key common misconceptions of research data and open data. These misconceptions are addressed by key benefits of RDM and highlight some useful tools of the University’s data repository, RaYDaR.

Data myths and resolutions

Myth: RDM isn’t applicable to some projects as not all research has data. Therefore, there is nothing to share.

Benefit of RDM: All research contains data. Data can consist in different forms depending on research type and academic discipline. The way to consider data in relation to RDM is how it links to the key research questions and how it can validate, support and demonstrate findings. In the case of Art and Humanities based disciplines this could include sound recordings, transcripts, sketches or annotations generated or built on throughout a project.

Myth: Sharing data increases chances of others manipulating my data and then taking credit. This is particularly problematic when sharing a dataset before the research output is published.

Benefit of RDM: Sharing research data is part of the open data landscape. It is actively encouraged that open research data is built on by others.*

Creators are protected by Open Licences which are chosen by the creator and attached to a dataset on the University’s data repository, RayDaR. Creative Commons Licences also ensure creator attribution.

RaYDaR has an embargo option so datasets can be hidden from public view until a research output has been published. The use of private links can then be used with publishers to view and discuss data.

*Published, open datasets cannot be adapted in the original file by others in a repository – only the creator of a dataset has the power to change and reupload the original file with the assistance of repository admins.

Myth: Some research data has identifiable information so it can’t be shared openly. By sharing this type of data researchers break the terms of ethical policies and contracts.

Benefit of RDM: All data containing identifiable information should be anonymised in all research and research data outputs.

Information about the storing and sharing of personal information and data generated from human participants or other sources should be outlined in the Data Management Plan and discussed with research supervisors or ethical leads. Human participants should give informed consent to their data being shared openly.

In some cases, data is highly sensitive, such as NHS or government-based projects. Processes of data handling and storage should be outlined in additional policies.

Myth: Publishing a research output can be time consuming and may have financial implications – publishing research data will be similar.

Benefit of RDM: Publishing research data is less likely to involve rigorous publisher workflows such as peer review, and it may not have any publisher involvement at all.

Publishing research data on repositories like RaYDaR can be done quickly and at no extra cost to researchers. It is normal for it to take longer when using RaYDaR for the first time and it depends on how many datasets, and their size, are being uploaded.

For more information, contact raydar@yorksj.ac.uk.

The research data lifecycle

The research data lifecycle is a made up of different RDM stages.

For a breakdown section of each of the stages, accompanied with advice and resources, click on each box below.

Alternatively, you can download the full Research Data Toolkit as a PDF guide or slides, available below.

Research Data Toolkit Guide (PDF, 2.2MB)

Research Data Toolkit Slides (PDF, 2.6MB)

Data planning

A Data Management Plan (DMP) is a formal document that guides a research project. It allows researchers to consider and address risks or issues related to working or managing data.

Writing a DMP is good research practice and is a growing requirement of most research funders.

DMPs are expected to evolve throughout a project and all changes should be recorded within it.

Every research project is unique so what to include in DMPs differs. However, as a general guide you should consider:

The type of data that will be created or collected. This includes different data formats or sensitive data.
Data Management responsibilities. This includes ethical or legal responsibilities or complying with funder requirements.
Policies which relate to managing the data. This includes YSJU data policies and research funder policies.
Information about ownership of data or access rights. This is important when working with third party data.
Information about data archiving or sharing arrangements.
Explanation of how the data will be organised and stored securely

DMP resources:

DMPOnline : A resource provided by the Digital Curation Centre that provides guidance and examples of DMPs.
DMPTool: This website provides DMP templates relating to funder criteria.

Data collection

It’s important that all data collected is documented to ensure the research project is beneficial in the long term. Clear and consistent documentation of data is valuable to additional researchers who may re-use data or replicate the data collection methods of the project.

Keep naming conventions of files and folders consistent and meaningful such as Transcript 1, Transcript 2, Transcript 3.
Keep a record of where you sourced the data, and make sure research data is cited correcting in research reports.
Make sure that any documented data is placed and stored in an accessible format. This allows other researchers to understand the data and convert data formats efficiently.

Microsoft Forms: This software is part of Office 365 and can provide personalised survey designs. It returns information directly into an Excel spreadsheet.
Qualtrics Surveys: This software is a comprehensive online tool. Everyone at York St John can access this with university credentials at https://yorksj.eu.qualtrics.com/login

Further information about software can be found through YSJU’s Digital Training and Support webpages.

Data processing and analysis

Data processing is different for every project. Workflows can include cleaning data, combining different pieces of collected data or converting the format and files of data.

Data analysis is the interpretation and interrogation of data which create findings that underpins the research output. Processing and analysis workflows both centre on the quality and transparency of data, making sure that it meets the FAIR principle criteria.

For collaborative research projects there are additional considerations when it comes to the data collection, data processing and data analysis stages.

Ensure the research roles and responsibilities of the group are clearly defined.
Ensure that everyone has access to secure, data storage spaces, and the same data tools for processing and analysis.
Different people within a group may interrogate data differently. Make sure everyone uses the same file management system and naming conventions for data.

NVIVO: This software assists in organising, analysing and sharing qualitative data.
SPSS: This software assists in the editing and analysis on quantitative data.

Further information about software can be found through our Digital Training and Support webpages.

Data storing and preserving

Data preservation are the processes that ensure that research data and its metadata are suitable for future use and is not affected by any technological changes.

Not all data from a project is preserved – it depends on each research project. However, it is encouraged you keep all data until the project’s completion unless stated otherwise in the DMP. Collected data could also be used for future research projects so it's important data is stored and preserved effectively.

Digital preservation is the activity in accessing data in the long term. It is futureproof over physical storage tools including software and hardware.

DigitalPreservationCoalition: An organisation that advises best practice in creating and preserving digital objects.
DATACC: This platform supports RDM for the physics and chemistry disciplines and has a section about digital preservation.

Preservation planning. Consider and investigate ways to preserve your data while making sure it is constantly findable and accessible. This should be outlined in the DMP.
Keep in mind technological and legal considerations of storage spaces and platforms. Is the space well known and secure? Is there anything to be aware of in the terms and conditions if you are storing data externally? The University recommends using OneDrive for storing collected data and the institutional repository, RaYDaR for completed, published datasets.
Make sure published research data receives a digital object identifier (DOI). This allows the resource to always be findable if its digital location changes. RaYDaR assigns every published dataset a DOI automatically.

Data sharing and publishing

Open data: Research data that can be accessed, distributed and used freely (subject to open licence). Open data must also align with the FAIR data principles (data should be findable, accessible, interoperable and reusable)
Creative Commons Licences: A type of Open Licence that can be used for open research data. This allows the creator of a data set to decide on the terms their data set can be used.
Data Repositories: A digital space to deposit datasets along with its metadata. Our data repository, RaYDaR, is an open access repository which means published data outputs can be accessed publicly and freely.
ORCID iD: An author’s persistent digital identifier which is attached to published datasets. This links to an author’s academic profile where all their research outputs are listed. ORCID iDs are free but require registration.

Research data can be shared during a project or exclusively at the end. While research data is normally attached to a published research output, research datasets can also be treated as a standalone output. This allows that dataset to be used and built on in future research outputs.

As part of our data management policy, researchers are required to deposit research data that supports research outputs into the University repository, RaYDaR unless specified otherwise in the data management plan.

This applies to students whose research data is included in published research outputs, however we expect all staff and students to uphold principles of data management.

Data reuse and citation

When undergoing research, you may find data in repositories or within other research outputs. When data is used in other research outputs it's important you check the citation and source details to ensure the data can be used, and to what extent.

Data in repositories will likely have an Open Licence attached to it, such as a Creative Commons licence. This will outline how the data can be used, and ensure that the original author of the data is credited. If licence details are not clear on datasets, or you want to use the data in a different way to licence terms you will have to contact the author of the dataset to seek additional permissions.

Find out more about Creative Commons and the 6 licence types below.

Creative Commons and the 6 licence types

As best practice, when using research datasets from others it is always encouraged to:

Always cite data sources.
If you generate new data from an original work, use the same open licence as the original as well as attributing the original author.

RaYDaR

RaYDaR, (Research at York St John Data Repository), is the University’s open data repository service which exists to store and showcase Research Data from the University.

For information on how to create and upload research data on RaYDaR, view the RaYDaR guide below. Additional information is also available on the Figshare Knowledgebase.

RaYDaR Guide for Staff (PDF, 160KB)

Under the University’s Research Data Management policy, staff and researchers should deposit all research data into RaYDaR.

Where possible, all data uploaded to RaYDaR should be accessible and in accordance with the FAIR Data Principles.

More information about RAYDAR

Policies

UKRI framework on research data- this provides seven core principles surrounding data sharing. Our research data policy aligns with the UKRI framework.
Concordat on Open Research Data – We ask all York St John researchers to be aware of the Concordat Principles.

Upcoming research sessions

Publishing Open Access (For London and York Staff)

Thursday 2 May 2024, 12.30pm to 1.30pm

Online

Book here

Contact

For further information or queries about Research Data Management, please email RaYDaR at raydar@yorksj.ac.uk.

Courses

News

Staff

Research Data Toolkit

About this page

Research data and research data management

Research data

Research data management (RDM)

The FAIR data principles

Research data myths

Data myths and resolutions

The research data lifecycle

Data planning

Data collection

Data processing and analysis

Data storing and preserving

Data sharing and publishing

Data reuse and citation

RaYDaR

Policies

Upcoming research sessions

Publishing Open Access (For London and York Staff)

Contact

About

Our wider work

Connect with us

Contact

York

London

Courses

News

Staff

Research Data Toolkit

About this page

Research data and research data management

Research data

Research data management (RDM)

The FAIR data principles

Research data myths

Data myths and resolutions

The research data lifecycle

Data planning

Data collection

Data processing and analysis

Data storing and preserving

Data sharing and publishing

Data reuse and citation

RaYDaR

Policies

Upcoming research sessions

Publishing Open Access (For London and York Staff)

Contact

About

Our wider work

Connect with us

Contact

York

London

Colour Picker