IMPACT Observatory: tracking the evolution of clinical trial data sharing and research integrity

Introduction The opening of research data is emerging thanks to the increasing possibilities of digital technology. The opening of clinical trial (CT) data is a part of this process, expected to have positive scientific, ethical, health, and economic impacts thus contributing to research integrity. The January 2016 proposal by the International Council of Medical Journal Editors triggered ample discussion about CT data sharing and reconfirmed the need for an ongoing assessment of its dynamics. The IMProving Access to Clinical Trials data (IMPACT) Observatory aims to play such a role, and assess the data sharing culture, policies, and practices of key players, the impact of their interventions on CTs, and contribute to a transformation of research. The objective of this paper is to present the IMPACT Observatory as well as share some of its preliminary findings. Materials and methods Methods include a scoping study of research, surveys, interviews, and an environmental scan of research data repositories. Results Our preliminary findings indicate that although opening of CT data has not yet been achieved, its evolution is encouraging. Initiatives by key players contribute to increasing of CT data sharing, and many barriers are shrinking or disappearing. Conclusions The major barrier is the lack of data sharing standards, from preparing data for public sharing to its curatorship, findability and access. However, experiences accumulated by sharing CT data according to “upon request” or “open” mechanisms could inform the development of such standards. The Vivli, CORBEL-ECRIN and Open Trials projects are currently working in this direction.


Introduction
It has become increasingly clear that the reliable evidence needed for decision making in health as in other fields has to be based on all existing knowledge and that re-analysis of raw data of research performed is the way to get it.
The opening of research data is a relatively new field enabled by increasing possibilities of digital Krleža-Jerić K. et al. IMPACT Observatory -data sharing and research integrity integrity (2). Through the opening of CT data, we could avoid unnecessary research, provide more reliable evidence for decision-making, reduce research waste, enable entrepreneurship, and facilitate innovation. This is a complex process largely influenced by culture, perceptions, and policies of numerous key players. The January 2016 proposal by the International Committee of Medical Journal Editors (ICMJE) to request the sharing of raw data from published CTs is an example of a recent initiative aiming to increase the reusability of CT data (3) and it came as no surprise that it triggered a big interest considering the impact of their 2004 trial registration statement (4).
In order to achieve broader sharing and eventually opening of CT data, numerous barriers and obstacles must be overcome. The effort to find a new balance of CT data sharing has intensified, and there currently exist a number of data sharing initiatives led by various players. This process would benefit from the observation and assessment of its dynamics and an observatory or natural experiment is proposed as a methodology of choice to inform and contribute to its progress.
It is widely accepted that the level of reliability of evidence needed for evidence-informed decisionmaking increases from in vitro to interventional studies, as is illustrated by the famous evidence pyramid presented in Figure 1.
The plethora of research studies that mostly build on each other from basic via observational to interventional studies produce findings that need to be critically verified. The critical appraisal of randomized controlled trials (RCT), which are considered by many to be the gold standard, was first introduced through systematic reviews (5). The evolution of literature review is important for understanding data sharing. It evolved from the simple literature review all the way to the Cochrane overview of systematic reviews, with critical reviews of literature, overviews, systematic reviews and Cochrane systematic reviews in between.
Although Cochrane systematic reviews represent a relatively small portion of all systematic reviews, Cochrane contributed enormously to establishing standardized methodology and thus increased their quality. Along with the recognition of the benefit of a cumulative approach to the assessment of evidence, this enabled systematic reviews to become a key element to inform decision making. The interest in and importance of systematic reviews can be illustrated by the existence of registries of systematic reviews, such as PROSPERO (6).
It is important to emphasize that the systematic review paradigm has been changing as we are adding another level of critical appraisal and knowledge creation, the analysis of raw data, actually of the individual participant data (IPD). Metaanalysis of IPD may be included in a systematic review. In such IPD meta-analyses, published literature is used as one of many sources of information, often as a starting point for identifying studies in a given field. Additional studies and data are identified through other sources such as trial registries and research data repositories (repositories). Such analysis goes beyond synthesizing the reported data -it reanalyzes the IPDs, published or not. As illustrated in Figure 1, the reanalysis of IPDs will speed up knowledge creation. The hierarchy of evidence and the role of the individual participant data (IPD) meta-analysis in knowledge creation is presented. The reliability of evidence needed for the evidence informed decision making in health increases as we move up the pyramid. It is expected that IPD meta-analysis would speed the knowledge creation. The IMProving Access to Clinical Trial data (IM-PACT) Observatory is monitoring various current initiatives aimed at making data available for further research, in order to assess changes in the paradigm of the CT enterprise (7).
If data are to be re-used and re-analysed, it is essential to know how to share them, where one can deposit them, and in which format, how to find them, and how one can access data for any type of re-use. While some key players engage in sharing and re-analyzing CT data, others play the catalysttype role of triggering the process. The loss of data from studies, no matter how small, is of major concern, especially with regards to research integrity and research waste. The first and most critical step of data sharing is their preservation at the source. The overall objective of the IMPACT Observatory is to identify the impact on CTs of data sharing interventions and practices by key players (funders, regulators, journal editors, pharmaceutical industry, researchers, institutions, and consumers), to identify barriers and facilitators, inform the process, and to indicate trends and potential solutions. Once established, the IMPACT Observatory would function as a two-way street: a) it would collect, assess, analyze and host the information gathered and shared by the IMPACT Observatory network regarding changes in data sharing policies, practice and standards; and b) it would make the information available to those that aim to make changes to policies and practices or to develop new standards.
The objective of this paper is to present the IM-PACT Observatory as a tool to assess changes in CT data sharing, as well as to present some of its preliminary findings regarding the dynamics of CT data sharing and ways data are shared in order to be reanalyzed. We also propose potential mechanisms that could enable the useful opening of CT data.

Materials and methods
The IMPACT Observatory is an international study, hosted by the Department for Research in Biomedicine and Health of the University of Split School of Medicine. It started in October 2014, evolving from the IMPACT Initiative (7). It was approved by the Ethics Committee of the University of Split, School of Medicine. Observatories or natural experiments are epidemiological studies that assess the impact of one or more interventions that are not controlled by the observatory researcher(s), in order to inform the process and indicate trends (8,9). While an observatory might follow the impact of interventions by one player, the IMPACT Observatory is assessing the impact of data sharing interventions by multiple players.
In our study we use the term "data sharing" in its broad sense, which includes the sharing and reuse of data. The term "data" is also used in a broad term, denoting the cleaned, anonymized IPDs along with all other documentation generated during the lifecycle of a clinical trial that is needed to reuse data. This includes published and unpublished documents, such as trial protocols, data management and statistical plans, informed consent and patient information sheets, regulatory and ethical documents, and clinical study reports.
We started setting up the IMPACT Observatory by building a network and choosing the methodology. A unique methodological aspect of the IM-PACT Observatory is the development of a multipurpose network with a flexible interface between the network and the team, enabling people to move from one to another according their interest and level of engagement. As presented in Table 1, we used a combination of quantitative and qualitative research methods to assess the culture, perceptions and practice of key players regarding CT data sharing and its transition. This includes a scoping study, i.e. mapping of the existing evidence, surveys, interviews, and an environmental scan of research data repositories that host CT data (9).
The scoping study included an Internet and literature search. For the latter, we performed a search in Medline, selected the literature that met our criteria, and extracted pre-defined information into an excel file to analyze it. Surveys were used to gather quantitative and qualitative information from key players. The questionnaire contains questions about the practice and perceptions of the participants with regards to data sharing and re-use. So far, we have performed a web-based survey using SurveyMonkey (SurveyMonkey Inc., Palo Alto, USA) of journal editors and clinician trialists, the results of which are currently being analyzed.
Semi-structured in-depth interviews were performed with a convenient sample of key players. Once the players agreed to be interviewed, a short pre-interview questionnaire was sent to them in order to gather quantitative information (e.g. "Did you perform a trial?", "How many?"; "Did you regis-

Inform and indicate trends
Knowledge translation through publications, conferences, website Inform so that key players can use the IMPACT Observatory in their policy making, in development of data sharing methods and standards, and to contribute to the sustainability of the Observatory

Build sustainability of the IMPACT Observatory
Various forms of promotion of the IMPACT Observatory; applications for sponsoring and funding IMPACT Observatory is established as a long-term tool to inform the process of data sharing and its impact on clinical trials *These tasks are anticipated in case the IMPACT Observatory continues beyond the initial fellowship. ter the trial?"; "What did you do with the data?") and help structure the interview questions. Environmental scans of repositories that host clinical trial data are performed by identifying relevant repositories on the internet, especially through visiting registries of repositories, then extracting the pre-defined information from registry and repository websites into an excel file, and complement the information by communicating with the repositories managers.

Results regarding establishing the IMPACT Observatory
The IMPACT Observatory officially started in October 2014 as an international study of the IMPACT Initiative. We incorporated and continued the environmental scan of repositories, which has been performed by one of the authors since 2012 (7).
Having defined CTs as our area of research, we started building the network and established a core team. We identified key players that influence CT data sharing; these are journal editors, publishers, clinicians, trialists/CT researchers, academia, funders, regulators, industry, consumers, the media, and repositories. Furthermore, we chose the methodology and started implementing it. During this one and a half year period, we presented the IMPACT observatory at several conferences to inform the scientific community and receive their feedback. As of summer 2016, the scoping study, analyses of two completed surveys and interviews are still ongoing as is data collection and the analysis of the environmental scan. Here we shall present some of our preliminary findings.

Preliminary findings of the IMPACT Observatory
Scoping study findings In our scoping study, the baseline was set to 2000 when the basic prerequisites, i.e. foundations for data sharing, were present. These included: the understanding of the need for higher transparency in clinical trials and for the sharing of raw data, the call for and establishment of initial CT registries, a defined basic methodology for systematic reviews, the launch of the Cochrane Collaboration (since 2015 called Cochrane), and the existence of IPD meta-analyses (10)(11)(12).
In the period following the year 2000, the opening of CT data experienced more rapid growth. The major trigger took place in 2004 with the historical New York City against the GlaxoSmithKline Pharmaceutical Company (GSK) trial followed by the IC-MJE and the Ottawa Statements, that led to the development of International standards for trial regis-tration by the World Health Organization (WHO) (4,(13)(14)(15). It should be emphasized that ICMJE, joined by other journal editors, has repeatedly played an essential role in advancing CT data sharing.

Barriers and gaps
As shown in Figure 2, numerous barriers and gaps prevent the opening of data and the related transition of CTs. Between 2013, when this schematic was created during the process of planning the IMPACT Observatory, and 2016, most of these barriers have diminished or were even overcome due to the interventions of various players, one or several at a time. For example, the fear of prepublications is dealt with by journal editors, while citability is addressed by assigning a digital object identifier (DOI) or some other unique identifier (3). Concerns over the privacy of research participants is still present albeit it is a perception rather than a real barrier since effective methodologies for anonymization have been developed (16).
The concept of intellectual property (IP) remains a barrier but this barrier is shrinking. Furthermore, even in this field burdened with IP concerns and protections, culture and perceptions are changing and various mechanisms of data sharing for reuse are being developed, some of which are presented in this paper. Finally, the lack of international standards of data sharing and of research data repositories are still major barriers that need to be addressed.

Data sharing initiatives
Various key players are taking initiatives, holding discussions, producing statements and declaring policies regarding research data sharing (17)(18)(19)  They are diminishing due to initiatives by key players. The lighter part of each barrier illustrates the tendency of shrinking or even overcome. * The Culture barrier includes a balance of opportunities vs fear; lack of appreciation of the research opportunities that data sharing provides, fear of the human and financial resources needed; lack of recognition of sharing as a good practice; lack of incentives for academics; † Data barrier includes the issues of data accuracy and quality, and the lack of standards of preparing data for sharing; ‡ Repositories as a barrier: lack of domain repository and the lack of data sharing standards via repositories: upload/host/ maintenance/ access.  (29). There is also "Vivli", a recently launched data-sharing platform aimed at sharing clinical trial data, developing and promoting standards, and improving trial discoverability. It will be designed by linking existing data-sharing platforms and communities and plans to include trials funded and conducted by academia, government, industry, and nongovernmental organizations (30). The main added value of "Vivli" will be its contribution to the creation of standards that would enable the re-analysis of CT data across different platforms and including all relevant players in the process.

Ongoing data sharing
Currently there are numerous mechanisms of data sharing. We have identified multiple formulas of data sharing that vary according to the type of access (from "upon request" to "open"), the data producer and user (trialist, systematic reviewer, academia, pharmacist), the key interested player (from IMPACT Observatory -data sharing and research integrity researcher to regulator), the CT area (any CTs; disease specific, e.g. malignant melanoma; groups of disease, e.g. cancer, mental health; population specific).
We identified several "upon request" styles of sharing and re-using of CT data: • Researcher to researcher requests such as most Cochrane IPD meta-analyses. Direct, researcher-to-researcher sharing often includes an offer to the initial data producer to become co-author of the systematic review. • Researcher-to-regulator (EMA, FDA) request, including requests for clinical study report (CSR) which contain rich information including aggregate data but usually not the IPDs (31,32). • Direct requests to a pharmaceutical industry, which often have strings attached, and are conditioned by an agreement that usually includes confidentiality, secrecy, and non-sharing (32). • Requests via an intermediary that organizes the processing of the application, while data owners still control the access to data. These "upon request" data sharing styles include sharing of raw data and/or sharing of comprehensive reports such as CSRs and other information. They are increasingly performed in an organized way as a project or initiative. They all have some form of registration, application/request, and approval process followed by a signed agreement. Three data sharing projects facilitated by intermediaries are described below. The YODA project is a partnership between Yale University and three companies (33). The Clinical Study Data request (CSDR) is a project in which 13 companies agreed to share their participant-level anonymized CT data (34). The Welcome Trust has been coordinating the application process, including the peer review of data request applications. Project Data Sphere is focused on data from phase III cancer trials. It is an independent, not-for-profit initiative of the CEO Roundtable on Cancer Life Sciences Consortium formed in 2011 by more than 30 pharmaceutical companies and other organizations. It aims to share data from historic academic and industry phase III cancer CTs (35). Figure 3 illustrates the partnerships of pharmaceutical companies in these projects. It is interesting to note that several companies share their data through more than one project, while Johnson & Johnson made their data available through all three projects.

Open data and research data repositories
Open science needs open data. It has been increasingly understood that the first step in data sharing and reuse is the preservation of data at the source and it must be done in a systematic way. Unfortunately, this is still not the case as can be seen in the preliminary results of our survey conducted with corresponding authors of trials published in 2013. Less than 50% of those that responded had saved their trial data in the organizational database and more than 50% had kept the data on their local computers. However, we noticed a recent trend towards the creation of reposi- IMPACT Observatory -data sharing and research integrity tories by academia in order to preserve data generated by their researchers. Ideally, such institutional repositories would forward data to broader national or international repositories such as Figshare and Dryad (36,37). They would also create a federation of repositories to enable interoperability and reuse of data.
Research data repositories (repositories) are electronic databases that host research data and facilitate their re-use. The following types of repositories are relevant for this discussion: 1. CT registries that host essential elements of CT protocols, some of them including summary results. They can be accessed via the WHO portal; 2. registries of systematic reviews (such as Cochrane and PROSPERO); 3. repositories that host CT data, and 4. registries of repositories such as Re3data (39).
It can be expected that repositories will play an essential role in increasing the accessibility and reusability of research data (2). With due respect to other more or less organized ways of data sharing, repositories most likely represent the best way forward if we wish to achieve open science. They could enable the broad, free re-use of hosted data that is citable (persistent identifier) and at the same time meeting the acknowledgment need by applying the Creative Commons (CC) citation. The number of existing repositories is growing exponentially and their features are constantly improving as can be seen by visiting registries of repositories. The Re3data registry of repositories started in 2012 and currently (July 2016) contains basic information about more than 1500 repositories. In our environmental scan of repositories, we perform regular searches of the Re3data looking for eventual new repositories that might host CT data, and our analysis of relevant repositories starts with Re-3data. Pharmaceutical company-based repositories have existed for a while with various levels of access. Some joined larger networks of data sharing such as the Welcome Trust coordinated CSDR.
The IMPACT Observatory has been specifically focusing on publicly accessible repositories of raw data and their relevant features. In our environmental scan, we identified heterogeneity in the way data are hosted and rather dubious levels of curatorship. Repositories exist at various levels, from individual institution repositories to international ones. They can also be classified according to accessibility, from closed to fully open access; according to whether the host institution is an academic or research institution or the pharmaceutical industry; and according to data types (whether they contain any research data or data from a specific field). It is important to note that there is no domain repository hosting CT data only.

Discussion
The ultimate goal of opening the CT data is to enable secondary analysis which would reduce research waste, speed knowledge creation (i.e. increase the quality and efficiency of trials), increase the reliability of evidence, and thus contribute to research integrity. All of these outcomes are interconnected and regardless of where they start, all will be impacted. Although CTs have not reached the open data stage, the evolution of CT data sharing is encouraging. Existing data sharing modalities complement each other and can inform further transition which relies heavily on the collaboration with the "producer" of the original data.
The learning curve is steep and the rich experience gained by various ways of ongoing data sharing could inform a development of methods and international standards. The IMPACT Observatory aims to contribute to the process by assessing the dynamics and connecting dots.
Data sharing starts with good data management, which includes the cleaning, preservation, curatorship of data at the origin (preferably at the institutional level), anonymization and posting. There is a trend to create more repositories with constantly improving features, but there is no domain repository for clinical trials. Furthermore, we could not find any re-analysis of data across repositories and believe it would contribute to defining the methodology and data sharing standards. There is a lot of discussion about such standards and what they should include, but the usable internationally ac- Based on what we have learned so far, such standards could be built on the accumulated expertise, and developed by an interdisciplinary and intersectional group. The standards development process could be coordinated by the WHO, which coordinated a development of the trial registration standards, or by an international consortium formed specifically for this purpose that would include all interested players.
Currently, data sharing standards are the most important gap preventing transition to a new level of CTs. However, we can start developing such standards as we have accumulated an impressive amount of information and expertise. Also, certain necessary elements have been defined such as citability with persistent identifiers (PIDs; DOIs and others) and the increasingly used CC citation, while others are being developed. Furthermore, numerous initiatives are contributing to this process such as the Declaration of Helsinki, IOM, the All Trials initiative, BioCADDIE/BD2K, CORBEL/ECRIN, and the "Vivli" projects. We can also build on the expertise of the Research Data Alliance, and coordination capacity and authority of the WHO.