Linking community and technology to enable FAIR data

Tools DISQOVER
DMPMaastricht
Confluence
RDM Portal login

Page tree

Linking community and technology to enable FAIR data

Skip to end of metadata
Go to start of metadata


What is pseudonymization?

Pseudonymization is here defined as: replacing the directly identifiable variables in a dataset with a pseudonym. This way of working does not mean that a whole dataset has been pseudonymized. If the dataset contains free text fields, they may contain potentially directly traceable data. In addition, a combination of other (not directly identifiable) variables that are important for the research in question, may lead to the identification of a person.


What is small-scale research?

We define small-scale research as: research with a limited number of participants and/or restricted financial means.


Current approach

Most institutes do not use specific pseudonymization software for pseudonymizing data. Some institutes do have certain tools but these cannot be directly deployed outside their own research, or institute. Currently these tools are therefore not useful on a national level.

Most institutes do not have policy concerning pseudonymization or a subfield thereof, for example, dealing with key files. The variety of answers also show that opinions about whether or not something is permitted, differ widely.


Basic steps

A LCRDM task group has identified the following basic steps that researchers and research support staff can follow when pseudonymizing datasets for small-scale research.

  1. In the data management plan, describe why and how you’re going to pseudonymize data, how access to the separately stored key file and the dataset is regulated and what happens to the key file and the data when the project is completed.
  2. Identify the following categories in your data:
    1. data necessary for identification, to organize research or to communicate with research participants
      Store these in a key file.
    2. data required for analysis
      Preferably stored in a data management system.
    3. data not needed (e.g. in case of a supplied dataset)
      This data should be deleted.
  3. Pseudonymize the data as quickly as possible, i.e. immediately when collecting data. if you are sent a dataset with identifiable data by another party, pseudonymize the data immediately after receiving it.
  4. Use different pseudonyms for different datasets. This prevents that data from participants who feature in multiple datasets can be linked via the pseudonym.
  5. Store the key file seperately from the research data.
  6. Acces to the key file should preferably be managed by someone who is not involved in the research project.
  7. Make sure that the key file and the data are adequately backed up and secured.
  8. Take technical and organizational measures to prevent unauthorized people from linking the key file to the research data. After the data collection, persons in the role of the researcher should be denied access to the key file.
  9. Limit acces to the key file, but ensure that within the organization there is always someone who has access.


Recommendations

Research institutions need clear and manifest policy for pseudonymization and in particular for the management of key files during and after research. In addition, there should be infrastructure in place where research data and identifiable data can be stored separately, preferably in two independent, adequately secured environments.


LCRDM Report on Pseudonymization