" "
Speak English?
Shared, discipline-specific metadata schemes;
Known, open, documented, shared formats.
Metadata standards are used to describe the resource so that it can be found by search engines. By depositing your data in datasciencespo, you solve this problem, since the data will be described by us according to the DDI schema - Data Documentation Initiative: the metadata schema chosen by Sciences Po: adapted to data and types of resources in the social sciences, in particular survey data. It describes the study and variables, source, geographical origin, temporal coverage, collection and production method. This standard allows you to add computer code, which can also be shared on GitHub.
A dataset is also metadata. A dataset consists of a set of data files forming an intellectual unit, the documentation of this data and metadata (descriptive, administrative and structural).
DATASET = DATA + DOCUMENTATION + METADATA
Title, author, date, publisher, etc. are the metadata of your publications. Your data can also be described with metadata (data type, date of creation of the set, manager's name, version, format, etc.). The standardized description of your data reduces the ambiguity of natural language, enabling it to be harvested by other servers and facilitating discovery by search engines. An example of this protocol is OAI-PMH: Open Archive Initiative Protocol for Metadata harvesting.
► Types of metadata:
Sample tools to help you. A guide to the basics of the subject from the University of Texas. Detailed documentation from CESSDA.
Metadata standards are used to describe the resource in such a way that it can be found by search engines (according to precise and uniform criteria for equivalent objects), e.g. Dublin Core (generalist), EML... Where humans look for engaging, interactive content, machines demand structure, logic and clarity. To achieve this, disciplinary standards exist, such as DDI (Data Documentation Initiative), which are more refined and adapted to social science data and resource types, particularly survey data. Fields must be filled in in such a way that the information can be understood by all.
Metadata schemas also exist according to the :
Target audience: resources are not described in the same way for peers, fellow researchers of all disciplines to encourage interdisciplinarity, the general public, the French or foreign public...
These standards can be combined.
► Sciences Po chooses the DDI scheme
DDI (Data Documentation Initiative) is the schema natively implemented in data.sciencespo, Sciences Po's data warehouse.
It is adapted to data produced in the social sciences, including survey data.
You can consult other metadata schemas through the RDA's Metadata Standards Directory Working Group or on FAIRSharing.org.
A podcast that takes a serious look at the link between research data and scientific publication.
Andrea Talaber, CEU Press, approached Sciences Po to expand a podcast offering that was originally geared more towards feedback on publication by academics or publishers. The podcast is part of a series (Getting published) on the scientific publication process: tips for writing an effective book proposal, finding a publisher, responding to peer comments on the manuscript, distribution, promotion, marketing. The aim of What is metadata?, Sciences Po's contribution, is to expand on the research data that supports these publications and on metadata; defining metadata in a research context and its relevance to research teams. The podcast is available on Spotify, Apple, Amazon...
Content: definitions (research data, datasets, metadata, leaving research jargon behind to make it widely understandable, with concrete examples from research projects at Sciences Po); focus: specific metadata (photo, video); metadata standards; links between perennial identifiers and metadata; techniques for referencing data on search engines: metadata, but also wave deposit, smaller file sizes, cross-deposit of data and publications, bounce-back links, investing in sites that are already well referenced (Wikipedia: reference to data in the “metadata” section of the website)...
Risks: being confronted with windows with messages more along the lines of "oops, I can't open the file. The format is not supported" ? + not being able to use a program because its producer has gone bankrupt,
Behind this message = the question of opening formats, which ideally needs to be answered right from the start of your research.
Closed formats guarantee that you won't have access to information in a few years' time. Open formats, on the other hand, guarantee access by pooling production recipes and making them reproducible.
Should a format be open or closed?
Examples: .xml, .csv, .ops (tabular data) ; .pdf, .txt, .docx, .odt, .rtf, (textual data) ; .gif, .png , .jpg (image) ; .ora, .xcf (image editing) ; .mp3 , .wav, .zip. (sounds) ; .mp4 (videos)
Cines' Facile tool lets you check the validity of your data file formats, i.e. whether they are still readable.
The PRONOM format directory provides information on a wide range of formats, to help you make the right choice.
Format conversion should be anticipated and documented in advance of distribution and archiving operations. Data.sciencespo transforms closed formats into open formats. Magic!
Are you familiar with Tim Berners-Lee's 5-star Linked Open Data program?
Publish your data on the web under an open license
Structuring your data to make it readable by humans and machines
Publish your data in an open, non-proprietary format, not limited to a particular software package, so that all functions are available regardless of the program used to open it.
Use URIs to facilitate permanent linking and web referencing of your data
Link your data to other data to add context
You can't give your data a higher star if the requirements of the previous star aren't met.