" "
Sciences Po | Library - New window

Guides thématiques

Données de la recherche

Speak English?

IN ENGLISH!

Make your data Interoperable

  Shared, discipline-specific metadata schemes;

  Known, open, documented, shared formats.

Interoperable data = shared discipline-specific metadata schemes
 

Metadata standards are used to describe the resource so that it can be found by search engines. By depositing your data in datasciencespo, you solve this problem, since the data will be described by us according to the DDI schema - Data Documentation Initiative: the metadata schema chosen by Sciences Po: adapted to data and types of resources in the social sciences, in particular survey data. It describes the study and variables, source, geographical origin, temporal coverage, collection and production method. This standard allows you to add computer code, which can also be shared on GitHub.

A dataset is also metadata. A dataset consists of a set of data files forming an intellectual unit, the documentation of this data and metadata (descriptive, administrative and structural).

DATASET = DATA + DOCUMENTATION + METADATA

But what is metadata?

Title, author, date, publisher, etc. are the metadata of your publications. Your data can also be described with metadata (data type, date of creation of the set, manager's name, version, format, etc.). The standardized description of your data reduces the ambiguity of natural language, enabling it to be harvested by other servers and facilitating discovery by search engines. An example of this protocol is OAI-PMH: Open Archive Initiative Protocol for Metadata harvesting.

Types of metadata:

  • Descriptive metadata: used to discover and evaluate the data described: title, author, subjects, keywords, date, etc.
  • Provenance metadata (origin, processing): this enables a clear understanding of the context, and promotes reuse.
  • Technical and administrative metadata: conditions of reading by man and machine, software for reading data, configuration, software versions, formats; rights and access (distribution licenses = what the producer of the dataset authorizes or not concerning the reuse of its dataset). 
  • Each metadata has its own template: 
    • Project name: free text
    • Date: ISO standard YYYY-MM-DD...
  • 2 modes of metadata creation often coexist: 
    • External metadata
    • Embedded metadata is automatically retrieved when the DOI is entered or when the document is dragged and dropped (as in HAL): quality control is necessary, but the result is not always optimal!

Sample tools to help you. A guide to the basics of the subject from the University of Texas. Detailed documentation from CESSDA.

What about metadata standards?

Metadata standards are used to describe the resource in such a way that it can be found by search engines (according to precise and uniform criteria for equivalent objects), e.g. Dublin Core (generalist), EML... Where humans look for engaging, interactive content, machines demand structure, logic and clarity. To achieve this, disciplinary standards exist, such as DDI (Data Documentation Initiative), which are more refined and adapted to social science data and resource types, particularly survey data. Fields must be filled in in such a way that the information can be understood by all.

Metadata schemas also exist according to the : 

  • Type of resource: an image (size, device model, color space, color profile, focal length), a video (number of frames per second, color profile, duration), an audio file (bit rate, codec, sampling frequency), a text, a book, interviews etc. are not described in the same way. 
  • Type of repository: general (Zenodo, Dryad, HAL, Data.sciencespo) or specialized (Pangaea, GenBank), private or public.

Target audience: resources are not described in the same way for peers, fellow researchers of all disciplines to encourage interdisciplinarity, the general public, the French or foreign public...

These standards can be combined.

Sciences Po chooses the DDI scheme

DDI (Data Documentation Initiative) is the schema natively implemented in data.sciencespo, Sciences Po's data warehouse.

It is adapted to data produced in the social sciences, including survey data.

You can consult other metadata schemas through the RDA's Metadata Standards Directory Working Group or on FAIRSharing.org.

A podcast that takes a serious look at the link between research data and scientific publication.

Andrea Talaber, CEU Press, approached Sciences Po to expand a podcast offering that was originally geared more towards feedback on publication by academics or publishers. The podcast is part of a series (Getting published) on the scientific publication process: tips for writing an effective book proposal, finding a publisher, responding to peer comments on the manuscript, distribution, promotion, marketing. The aim of What is metadata?, Sciences Po's contribution, is to expand on the research data that supports these publications and on metadata; defining metadata in a research context and its relevance to research teams. The podcast is available on Spotify, Apple, Amazon...

Content: definitions (research data, datasets, metadata, leaving research jargon behind to make it widely understandable, with concrete examples from research projects at Sciences Po); focus: specific metadata (photo, video); metadata standards; links between perennial identifiers and metadata; techniques for referencing data on search engines: metadata, but also wave deposit, smaller file sizes, cross-deposit of data and publications, bounce-back links, investing in sites that are already well referenced (Wikipedia: reference to data in the “metadata” section of the website)...

Interoperable data = known, open, documented, shared formats

 

Risks: being confronted with windows with messages more along the lines of "oops, I can't open the file. The format is not supported" ? + not being able to use a program because its producer has gone bankrupt,

Behind this message = the question of opening formats, which ideally needs to be answered right from the start of your research.

Closed formats guarantee that you won't have access to information in a few years' time. Open formats, on the other hand, guarantee access by pooling production recipes and making them reproducible.
Should a format be open or closed?

  • The problem with closed formats is that they require the use of pay-per-view software. Their "recipe" is therefore hidden. Files can only be read or modified if you have the appropriate software (e.g. .psd > Photoshop; .xsl, .doc, .ppt):
  • But not all proprietary formats are closed: standard pdf, adding "x" to the extensions of Office suite formats.
  • Programs like SPSS and Excel don't work well on HPC - High Performance Computing.
  • In any case: closed formats are not compatible with FAIR Data's principles of interoperability and reuse.  
  • Open formats: files are transparently encoded, and their "recipe" is in the public domain. They are interoperable, i.e. they can be created, read and modified by all software designed to process the same type of file: image, text, audio, etc. 

Examples: .xml, .csv, .ops (tabular data) ; .pdf, .txt, .docx, .odt, .rtf, (textual data) ; .gif, .png , .jpg (image) ; .ora, .xcf (image editing) ; .mp3 , .wav, .zip. (sounds) ; .mp4 (videos)

Cines' Facile tool lets you check the validity of your data file formats, i.e. whether they are still readable.

The PRONOM format directory provides information on a wide range of formats, to help you make the right choice.

Format conversion should be anticipated and documented in advance of distribution and archiving operations. Data.sciencespo transforms closed formats into open formats. Magic!

How do you give your data a 5-star label?

Are you familiar with Tim Berners-Lee's 5-star Linked Open Data program?

Publish your data on the web under an open license

Structuring your data to make it readable by humans and machines

Publish your data in an open, non-proprietary format, not limited to a particular software package, so that all functions are available regardless of the program used to open it.

Use URIs to facilitate permanent linking and web referencing of your data

Link your data to other data to add context

You can't give your data a higher star if the requirements of the previous star aren't met.

Dernière mise à jour: Apr 29, 2025 3:05 PM