It has become increasingly clear that the reliable evidence needed for decision making in health as in other fields has to be based on all existing knowledge and that re-analysis of raw data of research performed is the way to get it.
The opening of research data is a relatively new field enabled by increasing possibilities of digital technology (1). The opening of clinical trial (CT) data is part of the overall tendency to achieve open science. The increased sharing and reuse of data is expected to lead to positive scientific, ethical, health, and economic outcomes promising to improve the quality of research and speed up knowledge creation, thus contributing to research integrity (2). Through the opening of CT data, we could avoid unnecessary research, provide more reliable evidence for decision-making, reduce research waste, enable entrepreneurship, and facilitate innovation. This is a complex process largely influenced by culture, perceptions, and policies of numerous key players. The January 2016 proposal by the International Committee of Medical Journal Editors (ICMJE) to request the sharing of raw data from published CTs is an example of a recent initiative aiming to increase the reusability of CT data (3) and it came as no surprise that it triggered a big interest considering the impact of their 2004 trial registration statement (4).
In order to achieve broader sharing and eventually opening of CT data, numerous barriers and obstacles must be overcome. The effort to find a new balance of CT data sharing has intensified, and there currently exist a number of data sharing initiatives led by various players. This process would benefit from the observation and assessment of its dynamics and an observatory or natural experiment is proposed as a methodology of choice to inform and contribute to its progress.
It is widely accepted that the level of reliability of evidence needed for evidence-informed decision-making increases from in vitro to interventional studies, as is illustrated by the famous evidence pyramid presented in Figure 1.
The plethora of research studies that mostly build on each other from basic via observational to interventional studies produce findings that need to be critically verified. The critical appraisal of randomized controlled trials (RCT), which are considered by many to be the gold standard, was first introduced through systematic reviews (5). The evolution of literature review is important for understanding data sharing. It evolved from the simple literature review all the way to the Cochrane overview of systematic reviews, with critical reviews of literature, overviews, systematic reviews and Cochrane systematic reviews in between.
Although Cochrane systematic reviews represent a relatively small portion of all systematic reviews, Cochrane contributed enormously to establishing standardized methodology and thus increased their quality. Along with the recognition of the benefit of a cumulative approach to the assessment of evidence, this enabled systematic reviews to become a key element to inform decision making. The interest in and importance of systematic reviews can be illustrated by the existence of registries of systematic reviews, such as PROSPERO (6).
It is important to emphasize that the systematic review paradigm has been changing as we are adding another level of critical appraisal and knowledge creation, the analysis of raw data, actually of the individual participant data (IPD). Meta-analysis of IPD may be included in a systematic review. In such IPD meta-analyses, published literature is used as one of many sources of information, often as a starting point for identifying studies in a given field. Additional studies and data are identified through other sources such as trial registries and research data repositories (repositories). Such analysis goes beyond synthesizing the reported data - it reanalyzes the IPDs, published or not. As illustrated in Figure 1, the reanalysis of IPDs will speed up knowledge creation.
The IMProving Access to Clinical Trial data (IMPACT) Observatory is monitoring various current initiatives aimed at making data available for further research, in order to assess changes in the paradigm of the CT enterprise (7).
If data are to be re-used and re-analysed, it is essential to know how to share them, where one can deposit them, and in which format, how to find them, and how one can access data for any type of re-use. While some key players engage in sharing and re-analyzing CT data, others play the catalyst-type role of triggering the process. The loss of data from studies, no matter how small, is of major concern, especially with regards to research integrity and research waste. The first and most critical step of data sharing is their preservation at the source. The overall objective of the IMPACT Observatory is to identify the impact on CTs of data sharing interventions and practices by key players (funders, regulators, journal editors, pharmaceutical industry, researchers, institutions, and consumers), to identify barriers and facilitators, inform the process, and to indicate trends and potential solutions. Once established, the IMPACT Observatory would function as a two-way street: a) it would collect, assess, analyze and host the information gathered and shared by the IMPACT Observatory network regarding changes in data sharing policies, practice and standards; and b) it would make the information available to those that aim to make changes to policies and practices or to develop new standards.
The objective of this paper is to present the IMPACT Observatory as a tool to assess changes in CT data sharing, as well as to present some of its preliminary findings regarding the dynamics of CT data sharing and ways data are shared in order to be reanalyzed. We also propose potential mechanisms that could enable the useful opening of CT data.
Materials and methods
The IMPACT Observatory is an international study, hosted by the Department for Research in Biomedicine and Health of the University of Split School of Medicine. It started in October 2014, evolving from the IMPACT Initiative (7). It was approved by the Ethics Committee of the University of Split, School of Medicine. Observatories or natural experiments are epidemiological studies that assess the impact of one or more interventions that are not controlled by the observatory researcher(s), in order to inform the process and indicate trends (8, 9). While an observatory might follow the impact of interventions by one player, the IMPACT Observatory is assessing the impact of data sharing interventions by multiple players.
In our study we use the term “data sharing” in its broad sense, which includes the sharing and reuse of data. The term “data” is also used in a broad term, denoting the cleaned, anonymized IPDs along with all other documentation generated during the lifecycle of a clinical trial that is needed to reuse data. This includes published and unpublished documents, such as trial protocols, data management and statistical plans, informed consent and patient information sheets, regulatory and ethical documents, and clinical study reports.
We started setting up the IMPACT Observatory by building a network and choosing the methodology. A unique methodological aspect of the IMPACT Observatory is the development of a multipurpose network with a flexible interface between the network and the team, enabling people to move from one to another according their interest and level of engagement. As presented in Table 1, we used a combination of quantitative and qualitative research methods to assess the culture, perceptions and practice of key players regarding CT data sharing and its transition. This includes a scoping study, i.e. mapping of the existing evidence, surveys, interviews, and an environmental scan of research data repositories that host CT data (9).
The scoping study included an Internet and literature search. For the latter, we performed a search in Medline, selected the literature that met our criteria, and extracted pre-defined information into an excel file to analyze it. Surveys were used to gather quantitative and qualitative information from key players. The questionnaire contains questions about the practice and perceptions of the participants with regards to data sharing and reuse. So far, we have performed a web-based survey using SurveyMonkey (SurveyMonkey Inc., Palo Alto, USA) of journal editors and clinician trialists, the results of which are currently being analyzed.
Semi-structured in-depth interviews were performed with a convenient sample of key players. Once the players agreed to be interviewed, a short pre-interview questionnaire was sent to them in order to gather quantitative information (e.g. “Did you perform a trial?”, “How many?”; “Did you register the trial?”; “What did you do with the data?”) and help structure the interview questions. Environmental scans of repositories that host clinical trial data are performed by identifying relevant repositories on the internet, especially through visiting registries of repositories, then extracting the pre-defined information from registry and repository websites into an excel file, and complement the information by communicating with the repositories managers.
Results regarding establishing the IMPACT Observatory
The IMPACT Observatory officially started in October 2014 as an international study of the IMPACT Initiative. We incorporated and continued the environmental scan of repositories, which has been performed by one of the authors since 2012 (7).
Having defined CTs as our area of research, we started building the network and established a core team. We identified key players that influence CT data sharing; these are journal editors, publishers, clinicians, trialists/CT researchers, academia, funders, regulators, industry, consumers, the media, and repositories. Furthermore, we chose the methodology and started implementing it. During this one and a half year period, we presented the IMPACT observatory at several conferences to inform the scientific community and receive their feedback. As of summer 2016, the scoping study, analyses of two completed surveys and interviews are still ongoing as is data collection and the analysis of the environmental scan. Here we shall present some of our preliminary findings.
Preliminary findings of the IMPACT Observatory
Scoping study findings
In our scoping study, the baseline was set to 2000 when the basic prerequisites, i.e. foundations for data sharing, were present. These included: the understanding of the need for higher transparency in clinical trials and for the sharing of raw data, the call for and establishment of initial CT registries, a defined basic methodology for systematic reviews, the launch of the Cochrane Collaboration (since 2015 called Cochrane), and the existence of IPD meta-analyses (10-12).
In the period following the year 2000, the opening of CT data experienced more rapid growth. The major trigger took place in 2004 with the historical New York City against the GlaxoSmithKline Pharmaceutical Company (GSK) trial followed by the ICMJE and the Ottawa Statements, that led to the development of International standards for trial registration by the World Health Organization (WHO) (4, 13-15). It should be emphasized that ICMJE, joined by other journal editors, has repeatedly played an essential role in advancing CT data sharing.
Barriers and gaps
As shown in Figure 2, numerous barriers and gaps prevent the opening of data and the related transition of CTs. Between 2013, when this schematic was created during the process of planning the IMPACT Observatory, and 2016, most of these barriers have diminished or were even overcome due to the interventions of various players, one or several at a time. For example, the fear of prepublications is dealt with by journal editors, while citability is addressed by assigning a digital object identifier (DOI) or some other unique identifier (3). Concerns over the privacy of research participants is still present albeit it is a perception rather than a real barrier since effective methodologies for anonymization have been developed (16).
The concept of intellectual property (IP) remains a barrier but this barrier is shrinking. Furthermore, even in this field burdened with IP concerns and protections, culture and perceptions are changing and various mechanisms of data sharing for reuse are being developed, some of which are presented in this paper. Finally, the lack of international standards of data sharing and of research data repositories are still major barriers that need to be addressed.
Data sharing initiatives
Various key players are taking initiatives, holding discussions, producing statements and declaring policies regarding research data sharing (17–19). Many have contributed substantially to its increase, including the ICMJE, Ottawa statement, WHO, Cochrane, Declaration of Helsinki, the REWARD (REduce research Waste And Reward Diligence) Campaign, the Institute of Medicine report (IOM), and the AllTrials initiative (14, 20-25). It is important to note that regulators play an important role, as can be seen from the European Medicine Agency (EMA) 2014 policy on data sharing and its consequent actions sharing the clinical study reports (CSR) (26, 27).
While citability of data is solved by assigning a persistent identifier, finding of available data is still a challenge. The ongoing bioCADDIE project of the BD2K aims to develop an index of all available research data, similar to what PubMed did for literature (28).
There are additional initiatives aiming to contribute to this issue, such as the EU-funded project CORBEL (Coordinated Research Infrastructures Building Enduring Life-Science Services, www.corbel-project.eu). Corbel plans to establish a collaborative platform for harmonized user access to biological and medical research technologies, biological samples and data services required by cutting-edge biomedical research. ECRIN (the European Clinical Research Infrastructure Network) leads the CORBEL task groups focused on developing procedures and tools to provide the scientific community with access, upon request, to the IPDs from previous clinical trials for various forms of re-use. Another example is the Open Trials which aims to locate, match, and share all publicly accessible data and documents, on all trials conducted, on all medicines and other treatments, globally (29). There is also “Vivli”, a recently launched data-sharing platform aimed at sharing clinical trial data, developing and promoting standards, and improving trial discoverability. It will be designed by linking existing data-sharing platforms and communities and plans to include trials funded and conducted by academia, government, industry, and nongovernmental organizations (30). The main added value of “Vivli” will be its contribution to the creation of standards that would enable the re-analysis of CT data across different platforms and including all relevant players in the process.
Ongoing data sharing
Currently there are numerous mechanisms of data sharing. We have identified multiple formulas of data sharing that vary according to the type of access (from “upon request” to “open”), the data producer and user (trialist, systematic reviewer, academia, pharmacist), the key interested player (from researcher to regulator), the CT area (any CTs; disease specific, e.g. malignant melanoma; groups of disease, e.g. cancer, mental health; population specific).
We identified several “upon request” styles of sharing and re-using of CT data:
Researcher to researcher requests such as most Cochrane IPD meta-analyses. Direct, researcher-to-researcher sharing often includes an offer to the initial data producer to become co-author of the systematic review.
Direct requests to a pharmaceutical industry, which often have strings attached, and are conditioned by an agreement that usually includes confidentiality, secrecy, and non-sharing (32).
Requests via an intermediary that organizes the processing of the application, while data owners still control the access to data.
These “upon request” data sharing styles include sharing of raw data and/or sharing of comprehensive reports such as CSRs and other information. They are increasingly performed in an organized way as a project or initiative. They all have some form of registration, application/request, and approval process followed by a signed agreement. Three data sharing projects facilitated by intermediaries are described below. The YODA project is a partnership between Yale University and three companies (33). The Clinical Study Data request (CSDR) is a project in which 13 companies agreed to share their participant-level anonymized CT data (34). The Welcome Trust has been coordinating the application process, including the peer review of data request applications. Project Data Sphere is focused on data from phase III cancer trials. It is an independent, not-for-profit initiative of the CEO Roundtable on Cancer Life Sciences Consortium formed in 2011 by more than 30 pharmaceutical companies and other organizations. It aims to share data from historic academic and industry phase III cancer CTs (35). Figure 3 illustrates the partnerships of pharmaceutical companies in these projects. It is interesting to note that several companies share their data through more than one project, while Johnson & Johnson made their data available through all three projects.
Open data and research data repositories
Open science needs open data. It has been increasingly understood that the first step in data sharing and reuse is the preservation of data at the source and it must be done in a systematic way. Unfortunately, this is still not the case as can be seen in the preliminary results of our survey conducted with corresponding authors of trials published in 2013. Less than 50% of those that responded had saved their trial data in the organizational database and more than 50% had kept the data on their local computers. However, we noticed a recent trend towards the creation of repositories by academia in order to preserve data generated by their researchers. Ideally, such institutional repositories would forward data to broader national or international repositories such as Figshare and Dryad (36, 37). They would also create a federation of repositories to enable interoperability and reuse of data.
Research data repositories (repositories) are electronic databases that host research data and facilitate their re-use. The following types of repositories are relevant for this discussion:
It can be expected that repositories will play an essential role in increasing the accessibility and reusability of research data (2). With due respect to other more or less organized ways of data sharing, repositories most likely represent the best way forward if we wish to achieve open science. They could enable the broad, free re-use of hosted data that is citable (persistent identifier) and at the same time meeting the acknowledgment need by applying the Creative Commons (CC) citation. The number of existing repositories is growing exponentially and their features are constantly improving as can be seen by visiting registries of repositories. The Re3data registry of repositories started in 2012 and currently (July 2016) contains basic information about more than 1500 repositories. In our environmental scan of repositories, we perform regular searches of the Re3data looking for eventual new repositories that might host CT data, and our analysis of relevant repositories starts with Re3data. Pharmaceutical company-based repositories have existed for a while with various levels of access. Some joined larger networks of data sharing such as the Welcome Trust coordinated CSDR.
The IMPACT Observatory has been specifically focusing on publicly accessible repositories of raw data and their relevant features. In our environmental scan, we identified heterogeneity in the way data are hosted and rather dubious levels of curatorship. Repositories exist at various levels, from individual institution repositories to international ones. They can also be classified according to accessibility, from closed to fully open access; according to whether the host institution is an academic or research institution or the pharmaceutical industry; and according to data types (whether they contain any research data or data from a specific field). It is important to note that there is no domain repository hosting CT data only.
The ultimate goal of opening the CT data is to enable secondary analysis which would reduce research waste, speed knowledge creation (i.e. increase the quality and efficiency of trials), increase the reliability of evidence, and thus contribute to research integrity. All of these outcomes are interconnected and regardless of where they start, all will be impacted. Although CTs have not reached the open data stage, the evolution of CT data sharing is encouraging. Existing data sharing modalities complement each other and can inform further transition which relies heavily on the collaboration with the “producer” of the original data.
The learning curve is steep and the rich experience gained by various ways of ongoing data sharing could inform a development of methods and international standards. The IMPACT Observatory aims to contribute to the process by assessing the dynamics and connecting dots.
Data sharing starts with good data management, which includes the cleaning, preservation, curatorship of data at the origin (preferably at the institutional level), anonymization and posting. There is a trend to create more repositories with constantly improving features, but there is no domain repository for clinical trials. Furthermore, we could not find any re-analysis of data across repositories and believe it would contribute to defining the methodology and data sharing standards. There is a lot of discussion about such standards and what they should include, but the usable internationally accepted standards do not exist. It is not up to repositories to develop them, but rather to the research enterprise. Based on what we have learned so far, such standards could be built on the accumulated expertise, and developed by an interdisciplinary and intersectional group. The standards development process could be coordinated by the WHO, which coordinated a development of the trial registration standards, or by an international consortium formed specifically for this purpose that would include all interested players.
Currently, data sharing standards are the most important gap preventing transition to a new level of CTs. However, we can start developing such standards as we have accumulated an impressive amount of information and expertise. Also, certain necessary elements have been defined such as citability with persistent identifiers (PIDs; DOIs and others) and the increasingly used CC citation, while others are being developed. Furthermore, numerous initiatives are contributing to this process such as the Declaration of Helsinki, IOM, the All Trials initiative, BioCADDIE/BD2K, CORBEL/ECRIN, and the “Vivli” projects. We can also build on the expertise of the Research Data Alliance, and coordination capacity and authority of the WHO.