Open access data publishing is the act of making data freely available online so that they can be downloaded, analysed, re-used and cited by other people than the creators of the data. This can be achieved in various ways. In the broadest sense, any upload of a dataset onto a freely accessible website could be regarded as “data publishing”.
The OpenScienceLink project aims to provide a universal well-structured repository of scientific and research data – currently focusing on biomedical and clinical research - for experimentation and benchmarking of pertinent research works in a given thematic area. Moreover, the OpenScienceLink web-based platform will enable and provide the publishing and sharing of publications and experimental datasets, as well as their link with researchers and scholars.
The OpenScienceLink platform also aims to contribute to the overall open-access environment however, by providing open-access stakeholders and those willing to embark on an open-access quest with the necessary information for doing so successfully. This guide is not merely addressed at users who wish to make use of the OpenScienceLink platform to upload their data. It outlines best practices for the publication and sharing of research data in the context of OpenScienceLink and beyond. In addition, section 7 of this deliverable aims to provide a list of policy guidelines and best practices in an effort to increase the potential of open access initiatives. These guidelines are aimed at open access initiatives such as ourselves, as well as open access policy makers.
There are several issues to be considered during the process of making scientific information freely available. This guide serves the following purposes:
- Promote Open Access by detailing relevant aspects and definitions related to Open Access Publishing.
- Provide additional insight to those unfamiliar with open access to scientific information initiatives.
- Provide an overview of potential legal caveats from both the researcher’s and platform’s perspective.
- Formulate guidelines and best practices for Open Access initiatives and Open Access Policy makers in an effort to encourage the further development and improvement of open access to scientific information.
In the following section we aim to provide the answers to frequently asked questions concerning open access scientific research. Although the considerations relate mostly to questions asked by scientific researchers, they are also informative for research institutions and publishers wishing to enter the open access sphere. These questions mostly pertain to the open access on scientific information in general and could therefore be used as a template for those who wish to inform their end-users concerning open access policies.
1. What is Scientific Information?
In general, the term ‘scientific’ may refer to all academic disciplines, not only hard sciences. According to the European Commission, ‘scientific information’ in the context of research and innovation may fall under two broad categories[1]:
- Peer-reviewed scientific research articles (published in scholarly journals) or
- Research data (data underlying publications, curated data and/or raw data).
[1] European Commission, Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020 (30 October 2015), p. 2, http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf
2. What is “Research Data”?
Research data constitutes all data from an experiment, study or measurement, including the metadata and processing data.[1] Under the EU Horizon 2020 track 'research data' refers to information, in particular facts or numbers, collected to be examined and considered and as a basis for reasoning, discussion, or calculation. In a research context, examples of data include statistics, results of experiments, measurements, observations resulting from fieldwork, survey results, interview recordings and images. The focus is on research data that is available in digital form.
[1] Sarah Hugelier, ‘Publishing Open-Access Biomedical Data: Legal Challenges’, Biomed Data J. 2015; 1(1): 43-51.
3. What is metadata?
Metadata is data about your data, i.e. a description of your data. A common example of metadata is a library catalogue record. The London School of Economics notes that metadata is usually a formally agreed set of standards often with controlled fields and vocabularies, enabling the facilitation of data preservation, discovery and citation.[1] The metadata that a researcher needs to provide may depend upon the chosen repository or archive for data publication.
4. Why add metadata?
Adding structured metadata to your publications provides many advantages. Smart metadata can facilitate the discovery of your work as well as allow others to comprehend and evaluate your data on a basic level without requiring access to the datasets themselves.
The purpose of OpenScienceLink - as well as most other open access initiatives - and the basis for its selection of data and services is the creation of a repository of well-structured and semantically linked datasets. The role of metadata herein is crucial. Therefore metadata descriptions should be based on existing and established domain ontologies.
5. What is a Database?
In general terms, a database can be defined as a usually large collection of data organized especially for rapid search and retrieval, for instance by a computer. The Oxford Dictionary defines a database as a structured set of data held in a computer, especially one that is accessible in various ways.
On the European level, a database has been legally defined as a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means. A database should be understood to include inter alia scientific works, collections of works or collections of other material such as text, images, numbers, facts, and data.
6. What is a “Data Paper”?
A data paper is a publication that is designed to make other researchers aware of data that is of potential use to them. A data paper includes a description on the methods used to create the data set, a description of the structure of the data set and an elaboration on the dataset’s reuse potential. A link to the data paper’s and data set’s location in a repository should also be included. [1] It is important to note that a data paper does not replace a research article, but rather supplements it. When mentioning the data behind a study, a research paper should reference the data paper for further details. Similarly, the data paper should contain references to the research papers associated with the dataset.[2]
In general, the purpose of a data paper is to:
- Describe the data in a structured human-readable form, and
- Bring the existence of the data to the attention of the scholarly community
- Provide a citable journal publication which allows the data publisher to be credited.
The description should include several important elements documenting how the dataset was collected, who owns the data, details of which software was used to create the data, how the data can be viewed, etc.
[1] Hrynaszkiewicz I, Norton ML, Vickers AJ, Altman DG. Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers. BMJ. 2010
7. What are the benefits of publishing data/data paper?
Data publishing has become increasingly important and has had an effect on the policies of funding frameworks and organizations.[1] Moreover, shifting towards the free online access of the research results of publicly-funded research is also a core strategy of the European Commission.[2]
Furthermore, data papers mean that data you have released can be cited and that those citations can be tracked. This is not only an indirect measure of impact and therefore important for career progression, but it can also help you understand who is using your data. In turn, this can lead to new collaborations.
It is now widely recognized that making research results more accessible to all societal actors contributes to better and more efficient science, and to innovation in the public and private sectors. The European Commission encourages all EU Member States to put publicly-funded research results in the public sphere in order to strengthen science and the knowledge-based economy.[3]
Other incentives for authors and institutions to publish data include:
- The idea that data produced using public funds should be openly published and made available for inspection, interpretation and re-use by third parties.
- Data-collecting efforts and associated costs will be reduced by avoiding duplication of work.
- Open data increases the potential for interdisciplinary research, and for re-use in new contexts not envisaged by the data creator; Open data increases transparency and the overall quality of science; published datasets can be re-analyzed and verified by others.
- Published data can be cited and re-used in the future, either alone or in association with other data.
- Published data can be indexed, made discoverable and searchable.
- Data creators, and their institutions and funding agencies, can be credited for their work of data creation and publication through the conventional channels of scholarly citation. Priority and authorship is achieved in the same way as with a publication of a research paper.
8. Can a Data Paper be indexed and cited?
Data Papers can be indexed and cited like any other research article, thus bringing registration of priority, a permanent publication record, recognition and academic credit to the data creators.
9. What is Open Access?
The European Commission has defined open access as the “practice of providing online access to scientific information that is free of charge to the end-user and that is re-usable”[1]. Scientific information usually refers to peer-reviewed scientific research articles and scientific research data. Even though Open Access policies exist the question whether scientific information should be made open, arises only after the researcher has decided to publish his or her results.[2]
What does ‘open’ mean?
The Open Knowledge Foundation has defined open content as “a piece of content or data that is free to use, reuse, and redistribute by anyone — subject only, at most, to the requirement to attribute and share-alike.”
The goal of OpenScienceLink is to publish research data and datasets under conditions of Open Access.[3] Open Access entails "the free availability of scientific literature on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. Relevant in the context of research data is that 'scientific’ refers to all scholarly disciplines. [4]
The only constraint on reproduction and distribution, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited."[5]
Open Access according to the Budapest furthermore requires that:
- The author and right holder grant(s) to all users a free, irrevocable, worldwide, right of access to, and a license to copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship, as well as the right to make small numbers of printed copies for their personal use.
- A complete version of the work and all supplemental materials, including a copy of the permission as stated above, in an appropriate standard electronic format is deposited (and thus published) in at least one online repository using suitable technical standards (such as the Open Archive definitions) that is supported and maintained by an academic institution, scholarly society, government agency, or other well-established organization that seeks to enable open access, unrestricted distribution, inter-operability, and long-term archiving.
[1] European Commission, Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020 (30 October 2015), p. 2. http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf
[2] IPR Help Desk, ‘Open Access to Publications and data in Horizon 2020: Frequently Asked Questions’ (May 2014); <https://www.iprhelpdesk.eu/sites/default/files/newsdocuments/Open_Access_in_H2020.pdf>
[3] Under Horizon 2020 'Research data' refers to information, in particular facts or numbers, collected to be examined and considered and as a basis for reasoning, discussion, or calculation. In a research context, examples of data include statistics, results of experiments, measurements, observations resulting from fieldwork, survey results, interview recordings and images. The focus is on research data that is available in digital form.
[4] Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020 Version 1.0
11 December 2013
[5] Budapest Open Access Initiative (2002). Retrieved at
10. Why Open Access?
The rationale given by many policy makers and scholars for pushing open access conditions is that 'opening up the data allows for new knowledge to be discovered through comparative studies, data mining and so on. It allows greater scrutiny of how research conclusions have been reached, potentially driving up research quality."[1] This rationale is also the idea behind the OpenScienceLink project: "to foster the widest access and re-use of scientific publications and data".
When done correctly, the making of scientific information freely available online, has the potential to benefit all stakeholders within the Open Access value chain, such as researchers, research organisations, publishers, editors and funding authorities.
Open data can be reused by the wider public for a range of purposes including teaching, journalism and citizen science projects. Making research outputs available for others to work with and build upon is part of the social contract of academia. Moreover, some studies claim that open access publications are more likely to be cited than subscription publications.[2] This Open Access citation advantage should be considered by researchers who wish to increase the impact of their works.
[1] For example: Ball, A., (2012) and Guibault, L (2012), (166), and Communication from the Commission towards better access to scientific information, COM (2012) 401 final (1)
[2] Alma Swan, ‘The Open Access Citation Advantage: Studies and Results to Date’ (2010); <http://eprints.soton.ac.uk/268516/2/Citation_advantage_paper.pdf>
11. How to Provide Open Access? (Green and Gold Open Access)?
Concerning the open access publication of scientific peer-reviewed publications, such as data papers, two main open access approaches exist: green and gold open access.
- Green Open Access (Self-archiving): The scientific work (e.g. data paper) is archived by the researcher in an online repository, such as OpenScienceLink before, after or in tandem with its publication. Access to the article is often subject to an embargo period because publishers wish to retain some type of exclusivity in order to recoup their investments.[1]
- Gold Open Access (Open Access Publishing): The scientific work is immediately provided in an open access fashion by the publisher. The costs associated to the publication do not longer reside with the readers. Instead, they can be charged to the university or research institute or to the research funder.[2]
[1] IPR Help Desk, ‘Open Access to Publications and data in Horizon 2020: Frequently Asked Questions’ (May 2014), p. 2; <https://www.iprhelpdesk.eu/sites/default/files/newsdocuments/Open_Access_in_H2020.pdf>
[2] Ibid.
12. Where should data be published?
Data are stored in data repositories. These are databases or archives created to collect, disseminate and preserve scientific output and make them freely available.[1] There are many data repositories to choose from. Repositories can be institution-, discipline- or content-specific, or general. A list of international repositories can be found at databib.org.
Some things to consider when choosing a repository are:
- Does your employer, funder, or publisher recommended or mandate certain repositories?
- Does your research discipline have conventions concerning where to publish data?
- What metadata and format of data does the repository require?
- Does the repository enable tracking of data citations by allocating a unique identifier?
- Does the repository manage conditional access to data, and is conditional access managed by the repository or the data owner?
The decision whether or not to protect the research results is important and needs to be discussed at an early stage with the research and development department of a researcher’s institution.
13. How should data be shared/structured?
For data to be exploited to its maximum potential it is necessary for it not just to be accessible but also intelligible and searchable. This is where standards for data preservation are required.
Standards cover what should be included in the dataset, 'ontologies' or controlled vocabularies for annotating datasets, and exchange formats, for facilitating sharing.
Pragmatic and technical guidance on how to go about preparing your data suitably is available from various sources. A few are listed below.
Managing and Sharing Data (2011) is 'designed to help researchers and data managers...produce highest quality research data with the greatest potential for long-term use'. | |
The DCC provides advice on how to store, manage and protect digital data. Their site includes tools and applications, MRC data plan FAQs, information on data management plans, a list of funders policies, legal information and a developing series of 'how-to' guides. | |
The site includes a growing catalogue of standards to help ensure that 'experiments are reported with enough information to be comprehensible and (in principle) reproducible, compared or integrated'. | |
Provides Guidance for researchers: developing a data management and sharing plan | |
Provides tools and resources, including their Data and tissues toolkit and their Cohort dataset directory, plus a short glossary of common data-sharing terms. | |
The NCRI Informatics Initiative 'supports the development of data standards and promotes a culture of data-sharing to facilitate storage and dissemination of research data'. | |
Resources include examples of data sharing plans alongside more general policy documents |
Specific OpenSciencelink Open Access Policy
OpenScienceLink provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge. Authors of data made available on the platform remain the copyright holders and grant OpenScienceLink and any other third parties the right to use, reproduce and share the article according to the Creative Commons license agreement. [1]
All data on the OpenScienceLink is made available under the following license: CC BY-NC-SA 4.0.[2] This means that anyone who wants to submit research data onto the platform has to understand and agree for the work to be made available under this license. Any user of the research data has to conform to the use requirements under this license. To upload the work the author must confirm to have permission to do so by clearing the copyrights. The data may not include any personal information under the data protection directive.
The following sections will provide guidance for using the OpenScienceLink platform.[3]
[1] For more detailed information about these requirements we refer to D3.2 Legal and IPR Management Framework Specification
[2] The human readable license is available at https://creativecommons.org/licenses/by-nc-sa/4.0/
[3] For more detailed information about these requirements we refer to D3.2 Legal and IPR Management Framework Specification
A. How do I submit a data paper?
The submission guidelines for data papers for the BioMedical Journal OpenScienceLink platform can be found on the webpage of the Biomedical Data Journal:
http://biomed-data.eu/content/submission-guidance
Given the abundance of scientific journals and repositories, there is no set standard on how to submit a scientific article, including a data paper. The researcher himself must verify the submission guidelines of the parties where he wishes to submit the article.
B. How should I cite my data?
As a good practice you must always cite the data you use. This can also be an obligation depending on the license that is attached to the scientific work (e.g. Creative Commons licenses require proper attribution to be given, whereby the correct citation method is often indicated within the work).
If you use data from a repository that has been released under an open license you should include a reference to the data paper describing the data, followed by a reference to the data in the repository itself.
It is essential that the citations are in the references section of the article and include the Digital Object Identifiers (DOIs) or any other identifier the repository might use. Citation methods differ depending on your active field of research, however each citation must include basic elements that allow the dataset to be identified in the future, such as:
- Title
- Author
- Data
- Version
- Persistent Identifier, such as the Digital Object Identifier.
C. What are the criteria for a repository to be accepted by OSL?
Data must be made available via a suitable repository which should meet the following criteria:
- Allow open licences
- Sustainable to ensure the long-term preservation of the data
- Suitable for the type of data involved
- Able to provide persistent identifiers (e.g. DOI, handle, ARC etc.)
D. Is there any specific format I need to follow to submit a dataset?
Within OpenScienceLink, the dataset format follows the OpenScienceLink data model and the file may be in any file format. Nevertheless, depending on the chosen repository, formats may alter.
E. What Metadata Model does the OpenScienceLink platform use?
The OpenScienceLink metadata model will be based on and comprise an extension of the Unidata’s Common Data Model (CDM) and the Dryad Metadata Application Profile, including parameters such as dataset source type (real-world vs. synthetic), level of noise, popularity (e.g., number of views, number of downloads), associated research topics, citation source (i.e., which researchers have used this dataset), among others.
- Unidata’s Common Data Model (CDM)
The Common Data Model (CDM) is an abstract data model for scientific datasets. It merges the netCDF, OPeNDAP, and HDF5 data models to create a common API for many types of scientific data.
CDM has three layers, which build on top of each other to add successively richer semantics:
- The data access layer, also known as the syntactic layer, handles data reading and writing.
- The coordinate system layer identifies the coordinates of the data arrays. Coordinates are a completely general concept for scientific data; we also identify specialized georeferencing coordinate systems, which are important to the Earth Science community.
- The scientific feature type layer identifies specific types of data, such as grids, radial, and point data, adding specialized methods for each kind of data.
- Dryad Metadata Application Profile
The Dryad Metadata Application profile is based on the Dublin Core Metadata Initiative Abstract Model, including the Singapore Framework, used to describe multi-disciplinary data underlying peer-reviewed scientific and medical literature. The application profile consists of three modules describing the:
1. The publication, which is an article associated with content in Dryad.
2. The data package, which is a group of data files associated with a given publication.
3. The data file, which is deposited as bitstream.
F. From which locations will my dataset become available for download?
Once the dataset is uploaded it becomes indexed and searchable from all Platform users, but not downloadable, which requires that the dataset is first reviewed and then published by the data journal.
Once the review of the dataset is over, and the dataset is accepted for publication, it enters into a special status within the dataset pool, flagged as ‘ready to be published’, and enables the publisher to select it as part of any future volume or issue for publication in the data journal. The candidate datasets for publishing comprise all of the peer-reviewed datasets. The rest of the uploaded datasets are either in the ‘pending to be assigned reviewers’ stage, or in the ‘under review’ stage. If further recommendations or clarifications are demanded through the review process for any given dataset, the dataset will still be in ‘under review’ status, until all conditions posed by the reviewers are met. At any given stage, the researchers who uploaded the dataset will be informed for the process, and will be able to follow the status of their dataset’s reviewing process.
The publisher uses the Platform in order to publish the selected datasets in a predefined manner, according to particular specifications. The datasets and corresponding metadata can then be viewed and downloaded by the issue readers. Once a dataset is published, it is indexed with appropriate issue/volume numbers and pages, and turns into the ‘published’ stage. From this point on, the dataset is available to all of the Platform users for downloading and re-using in their research.
G. How does the review process differ from an ordinary research article review? What do the reviewers look at when it comes to a dataset article?
Published datasets are subject to a novel review process. The publisher (or the responsible editor) logs in the Platform in order to initiate the review process of submitted papers or datasets by creating a new review call. The first step is to identify the most appropriate reviewers for evaluating each paper or dataset, i.e. researchers who are highly scientifically related to the specific research topic the scientific paper or dataset resides in and thus could potentially serve as its reviewers, based on their research activity and output. The Platform proposes a list of potential reviewers, who are retrieved by using two different sources: (a) existing literature and published papers, (b) scientific interests and respective communities.
The next step for the publisher is to filter out from the proposed list of reviewers (for a specific research work) the scientists who directly or indirectly relate to the authors of the work under review. The Platform performs this evaluation and notifies the publisher in cases that conflict of interest is detected, e.g., through shared affiliations, if these are available. Next, the publisher or editor selects a number of the suggested reviewers to be invited for the reviewing process. In case the candidate reviewer is already registered with the Platform, the Platform notifies electronically the user that s/he has been selected as a candidate reviewer, and the user can then accept or decline the invitation. In the case that the candidate reviewer is not a registered user of the Platform, the Platform sends a similar invitation, with additional links so that the user can create in a simplified manner a user profile with the Platform. The publisher or editor is notified each time a candidate invited reviewer accepts or declines an invitation.
Submitted research papers, data papers, along with accompanying data sets, and data-based modelling and models' papers will be subjected to rigorous peer review. More information on the review process and the responsibilities of the partners involved, can be found through the following links:
Ethical Responsibilities of Authors: http://www.biomed-data.eu/content/ethical-responsibilities-authors
Ethical Responsibilities of Reviewers: http://www.biomed-data.eu/content/ethical-responsibilities-reviewers
Making Scientific Research and Data Open Access – Intellectual Property Rights Guidelines
The following section aims to provide researchers with guidelines on how to handle copyright. The main goal is to make researchers aware of the responsibilities that follow out of copyright. These requirements mainly arise when using other people’s material during the creation of a scientific work. Before uploading scientific material to the OpenScienceLink portal, the researcher must take into account his responsibilities under relevant national copyright law.
With regard to the re-use of material that has been originally published on the OpenScienceLink portal, we refer to the guidelines concerning our licensing practices. Content uploaded to the OpenScienceLink platform is licensed to platform users under the Creative Commons 4.0 – BY-NC-SA license. The two main consequences of this license for the researcher are:
- When uploading material via the OpenScienceLink portal, the author agrees that his material will be shared to others under the CC.04 BY-NC-SA License.
- Material originally published via the OpenScienceLink portal can be re-used under the conditions of the CC.04 BY-NC-SA License.
The use of scientific articles via external databases accessible via the portal is not captured by this license. For instance, GoPubMed enables the access to articles from the Lancet and Open Access journals. In order to re-use these works, the licensing conditions of those journals, and of the authors that have written the articles therein, must be taken into account.
The following section does not only provide information to the researcher as to which material he can use. It also informs him concerning his rights. Nevertheless, whilst still retaining his exclusive rights, when uploading material via the OpenScienceLink portal, the author agrees to publish his work under the aforementioned creative commons license. Therefore allowing others to use his material under the conditions of that license.
i. What is Copyright?
Copyright aims to protect the rights of authors by ensuring that they receive recognition, payment and protection for their works. A copyrighted work could include a production in the scientific domain, regardless the mode or form of its expression. However, to enjoy copyright protection, works must meet the criteria for copyright protection: the work has to be “an original expression”, i.e. it must be the result of the free and creative choices of the author. In principle, scientific publications, including data papers, will enjoy copyright protection.
ii. What rights are granted to the holder of copyright?
Copyright grants the right holder several exclusive rights with regard to the actions that can be performed with the copyrighted work. As a researcher building upon other people’s work, the exclusive rights of the original author must be respected.
- The reproduction right: the exclusive right to authorize or prohibit direct or indirect, temporary or permanent reproduction by any means and in any form, in whole or in part. This right also includes the right to adapt.
- The communication to the public right: the exclusive right to authorise or prohibit any communication to the public of the copyrighted work, by wire or wireless means, including the making available to the public of this work in such a way that members of the public may access them from a place and at a time individually chosen by them.
Due to the exclusive nature of copyright, the reproduction and public communication of protected works requires the permission of the right holder.
Do Bare Facts – Research Data enjoy copyright protection?
Bare facts do not enjoy copyright protection: they belong to the domain of knowledge, which is a public good.[1] Consequently, biomedical research data per se do not enjoy copyright protection as well.
Research data may still be copyright-protected if they have been expressed in a tangible form with a sufficient level of originality. The latter however does not protect the data as such, but rather the work in which the data has been incorporated.[2]
Do Metadata enjoy copyright protection?
When metadata refer to descriptions of biomedical data using standardised keywords and terms, they are unlikely to be protected by copyright due to a lack of creative freedom in the choice, sequence and combination of the terms. However copyright in metadata cannot be ruled out.
Nevertheless, the purpose of OpenScienceLink - as well as most other open access initiatives - and the basis for its selection of data and services is the creation of a repository of well-structured and semantically linked datasets. The role of metadata herein is crucial and descriptions should be based on existing and established domain ontologies.[3] Indeed, any freedom in the description and definitions of terms may lead to less discoverability of the data and should therefore be avoided.
Do datasets enjoy copyright protection?
Scientific datasets are unlikely to attract copyright when they are the result of research. This can be concluded from the fact that the level of freedom required for a researcher to express his creativity is fairly limited.
However it is the researcher who ‘Has to make sense of the data that have been collected by exploring and interpreting them’.[4] When a researcher has sufficient freedom and made personal choices in how to present the results of his data collection activities the resulting dataset will be protected under copyright.
Does a database enjoy copyright or any other protection?
A database may enjoy copyright protection when it is considered ‘original’, i.e. when it is the author’s own intellectual creation by reason of the selection or arrangement of the contents. A database selection or arrangement, which purely depends on technical factors or aims to achieve accuracy and exhaustiveness, will also not be protected by copyright. Scientific databases, which constitute bare facts and therefore limit creative freedom, are generally not considered to fall under copyright protection.[5]
The creator of a database may also enjoy a sui generis right for the investment he has made when creating the database (see section What is the Database Rights?)
What are the exclusive rights of the original database’s creator?
The author of the database has the exclusive right to carry out or authorise the:
- Temporary or permanent reproduction by any means, in any form, in whole or in part;
- Rights of adaptation, translation, arrangement and any other alteration;
- Any form of distribution to the public of the database or of copies thereof (subject to Community exhaustion); and
- Any communication to the public, display or performance to the public;
Scientific datasets and works originally provided to the OSL platform are licensed under the CC.04 – BY-NC-SA license. This license allows the re-use of the works within the OSL repository under the conditions of the license (see section Making Scientific Research and Data Open Access – Licensing Guidelines).
Who owns copyright?
In general this is the author or creator of the work. However there are a few instances where permission needs to be obtained from other parties. It depends on national legislation how this permission should be given and from whom.
1)Joint ownership
If a work has two authors there is joint copyright for both authors. When a research project has multiple researcher’s institutions there is joint copyright for all researchers/institutions
If research material is derived from existing data enjoying copyright and the newly created work also enjoys copyright there is joint copyright.
2)Works created in the course of employment
The vast majority of scholarly works is made in the course of employment either with a research institution, an enterprise or with universities. In some countries copyright in works created during the course of employment will vest in the employer.[6] However since this may differ, it is important to always consult national copyright regulations. Academic institutions and funding bodies may waive copyright in research materials and publications and assign ownership to the researchers. But the opposite might also be true: the rights of the researcher may be assigned to the research institution. It is therefore very important do check with your institution or funding body what applies to you.
It is important to check the copyright policy of your institution.
[1] Joris Deene, ‘The Legal Status of Research Data (Copyright, Database Right)’ (June 2015).
[2] Joris Deene, ‘The Legal Status of Research Data (Copyright, Database Right)’ (June 2015).
[3] (D4.1 Opensciencelink consortium, 2015)
[4] In quantitative research, data analysis often only occurs after all or much of data have been collected. However, in qualitative research, data analysis often begins during, or immediately after, the first data are collected, although this process continues and is modified throughout the study.
[5] Lucie Guibault and Andreas Wiebe (Eds.) Safe to be open, Study on the protection of research data and recommendations for access and usage
[6] In some countries, e.g. Belgium, a transfer of rights in the form of an explicit agreement between an employer and employee may be necessary for the employer to obtain copyright.
iii. What is the Sui Generis Database Right?
The sui generis database right is an intellectual property right for the maker of a database that is granted when it can be demonstrated that the database shows there has been a substantial investment (qualitative and/or quantitative) in either the obtaining, verification or presentation of the contents of the database.
The right was designed to safeguard the position of makers of databases against misappropriation of the results of their investments, financial resources, time, effort and energy. To enjoy this protection, this investment must have been substantial. The investment that has been made in researching and creating the data included within the database does not count towards this assessment (very often spin-off databases may therefore not be protected).
The Sui Generis right allows the database producer to prevent extraction and/or re-utilization of the whole or of a substantial part of the database. This means that the owner of a database’s consent is required if a researcher wishes to retrieve substantial portions of the database or re-use substantial portions of the database. The substantiality of extraction and re-utilization is evaluated qualitatively (e.g. how much effort went in creating the part of the database that has been extracted?) and/or quantitatively (e.g. how much content has been extracted from the database?). This analysis must be made on a case-by-case basis.
Retrieval of non-substantial parts from databases, e.g. some individual items, is allowed and does not require prior permission from the relevant right holder.
For more information on the sui generis database rights, we refer to the OSL Legal and IPR Management Framework Specification, found here: http://opensciencelink.eu/wp-content/uploads/2013/06/OSL_D3_2_LegalAndIPRManagementFrameworkSpecification.pdf
iv. Can I re-use other people’s material?
In principle, the following actions require the permission of the relevant right holder:
- If the re-use includes creating a new physical fixation of the copyright protected work the right of reproduction prevents this without authorization.
- As long as the work is used as is and not translated or rewritten there will be no illegal act of adaptation.
- The act of mere linking existing datasets, publications and associated datasets or sets of raw data will however not constitute a prohibited act as long as these data are not duplicated.
With regard to protected data, as a general rule of thumb, following actions may still be performed without the original author’s permission[1]:
- Incorporation of factual data in an original scientific work.
- Making a copy of the research for research purposes only.
- Citing research data
The author’s permission would still be required for other actions performed with copy-right protected data.
The OpenScienceLink platform provides services for researchers to publish and re-use publications and datasets. Tools for the re-use of data include the analyses of data for trend reports but also combining different datasets for new research publications. In order to use a copyrighted work, permission from the author is required.
Content originally uploaded to the OSL platform is made available under a Creative Commons BY-NC-SA license (see section Making Scientific Research and Data Open Access – Licensing Guidelines). This enables the re-use of data under the conditions of that license. The use of scientific articles and datasets found in external databases accessible via the OSL portal is not captured by this license. For instance, GoPubMed enables the access to articles from the Lancet and Open Access journals. In order to re-use these works, the licensing conditions of those journals, and of the authors that have written the articles therein, must be taken into account. Publisher copyright policies can be found via the website of Sherpa/Romeo.
Besides legal requirements, authors of scientific material also have ethical responsibilities. In this regard, the Biomedical Data Journal submission guidelines may provide assistance. For instance, authors should always give proper attribution to their sources.
A general checklist concerning the use of research data can be found following this link:
https://www.fosteropenscience.eu/sites/default/files/pdf/1401.pdf
[1] Joris Deene, ‘The Legal Status of Research Data (Copyright, Database Right)’ (June 2015).