" "
Speak English?
Data is much broader than the numbers in an Excel file. Research data management is the result of several movements that support research in France and abroad, including: Open Access or free access to scientific and technical information, the administration of scientific evidence to validate results through replication, the reuse of data sets, their enhancement and more broadly the safeguarding of the scientific heritage. Policies (e.g. Open Science National Plan, July 4, 2018), initiatives (e.g.: Go FAIR), network (e.g.: Research Data Alliance) are actions that set the beat for research. Impacts can be immediate, such as depositing open-access datasets for the submission of an article (American Journal of Political Science) or the drafting of a data management plan for Horizon Europe and ANR projects or longer term projects. Developing management strategies to document, preserve, enhance and safeguard data will allow the community to better understand your work and its results, but above all will save your time and increase your visibility.
Recording your answers to these questions from the outset of your research will enable you to reuse and share your data easily.
Some first insights regarding research data management:
Big data covers everything from Amazon's recommendation systems to the study of social networks.
These very voluminous data sets are difficult to apprehend with traditional database management tools. New tools for data management, processing, analysis, visualization and storage are being used:
Open data aims to share public data in order to provide new services that are useful to all, and to ensure that the State is accountable to its citizens.
A movement and philosophy in favor of free access to information, publications and data. Its aim is to encourage the re-use of public data, collected or produced by a public service as part of its mission using public funds (1978 law, 1994 circular). Public information is a common good, financed by the taxpayer and therefore collective, the dissemination of which is in the public and general interest. The stakes involved in disseminating it are therefore manifold:
► How usable is my data? A little e-learning module just for you!
► Tim O'Reilly suggests the creation of an open innovation platform that enables every citizen to contribute to solving collective problems by bringing up information and expertise disseminated within society.
What is research data, anyway?
Romain Couturier/Cyril Heude
OECD (Organisation for Economic Co-operation and Development) definition:
E.g.: Photographs, satellite images, diagrams, drawings, meteorological records, sound recordings, computer code, data hidden in the code (or in a separate layer)... But laboratory notebooks, preliminary analyses or samples do not fall into this category.
AAF (Association des archivistes français) definition:
For SHS, research data includes quantitative data that define trends, can be quantified, verified and made intelligible by statistical tools. Research data also includes qualitative data that characterize, but do not measure, the properties of a fact or phenomenon.
Romain Couturier/Cyril Heude
This includes observation data (field recordings), which are often unique and irreplaceable, and therefore worth keeping/sharing for future research. Examples of useful tools for creating questionnaires: Qualtrics, Survalyzer, ModaLisa, LimeSurvey...
Experimental data and compiled data are often reproducible, but at a dissuasive cost. Simulation models (economic, etc.) are often more useful than the simulation data they generate. Canonical data are organized, validated and widely used, as at the INSEE data.
Details on data types:
[Vidéo] Data Sharing and Management, NYU Health Sciences Library
Cyril Heude/Romain Couturier
Opening your data is not always a must; there are cases where it is legally obligatory to close it:
Please note: these cases explain why access to data in repositories may be subject to restrictions or embargoes, depending on the nature of the produced data. These exceptions do not exempt you from drawing up a data management plan.
The technique for dealing with these risks without overwhelming the research: a consent form, data anonymization, a declaration of processing recorded in the DPO (Data Protection Officer) register, or even an impact analysis for the most sensitive data may be required.
The content of interviews in sociology or anthropology, for example, is subject to the interviewees' authorization for any dissemination. One important document: the free and informed consent form signed by the participants. They have the possibility of withdrawing from the study as soon as they wish, without justification; the panel has been given time to ask questions; the interviewer has left his contact details to give the interviewees the possibility of going back on what they said; only the data necessary for the project have been collected and processed (principle of data minimization). Model in the interviewee's language. Clearly state that participation is voluntary; participants have the right not to answer certain questions. Study issues and conditions for managing, sharing and archiving project data have been understood and accepted.
Anonymization must be sufficient to protect the confidentiality of personal data while ensuring the dissemination of information for research purposes.
Anonymize consists in removing direct personal identifiers (name, address, social security number...) or indirect (profession, ethnicity...). Service dedicated to Sciences Po.
Examples of software to modify or delete personal and sensitive data: Gimp (images), Metadata anonymisation toolkit, ExifTool (all formats).
Cyril Heude/Romain Couturier
[Video] Anonymisation: theory and practice (part 1 of 3), Mark Elliot (NCRM)
Pseudonymization is less reliable: we can find out who we are talking about by cross-checking. Pseudonymised data is always personal data.
Advice: encrypt confidential data, encrypt content, store the encryption key in a different location from the data. Software: 7-zip.
For all these documents: models, tools and people exist at Sciences Po that can help you. They are listed in the Sciences Po Data Guide where you will find all the information.
Advice: distribute at least the data that supports the research articles published within the framework of the project in order to allow your readers to deepen their understanding of your analyses.
From the base to the top, we can observe different levels of data processing: raw data, collections of reference data (statistics), processed, selected, documented data, supporting data for publications. However, it can also be said that data is never truly raw: it always has a format, an author, a context and a signifying force induced by its own publication.
If you're a sociologist or historian who takes photographs in the field, here are a few recommendations for taking images in the field using a camera or telephone. Objective: better manage your images.
Settings to be made once in the parameters of your camera or phone:
Organize
Copy files to your computer or institutional Google drive. Organize files in a tree structure (e.g.: location, subject, date, etc.).
Rename files
See "Easy-to-find data" page.
Document
Would you like to assign keywords, captions and other attributes to your files? Edit the metadata: it's best to use image management software to edit metadata. This will enable you to make batch modifications, such as associating a keyword with a series of images. Without software, you can also edit your metadata directly from the settings. You can also use a table-type tracking file to note shooting location, context (street rendezvous, regional archives, etc.) or even the names of people or contacts.