Daria Pašalić
Department of Medical Chemistry, Biochemistry and Clinical Chemistry
Zagreb University School of Medicine
Šalata ul 2.
10 000 Zagreb, Croatia
Phone +385 (1) 4590 205; +385 (1) 4566 940
E-mail: dariapasalic [at] gmail [dot] com

Useful links

Research integrity corner
Mersiha Mahmić-Kaknjo1,2, Josip Šimić3, Karmela Krleža-Jerić*4,5,6Setting the IMPACT (IMProve Access to Clinical Trial data) Observatory baseline.Biochemia Medica 2018;28(1):010201.
1Department of Clinical Pharmacology, Zenica Cantonal Hospital, Zenica, Bosnia and Herzegovina
2Faculty of Medicine, University of Zenica, Zenica, Bosnia and Herzegovina
3Health Sciences Library, Faculty of Health Studies, University of Mostar, Mostar, Bosnia and Herzegovina
4IMPACT Observatory, Montreal, Canada
5Mediterranean Institute for Life Sciences - MedILS, Split, Croatia
6Croatian Cochrane Centre, Split, Croatia
*Corresponding author: karmela [at] krleza [dot] com




Introduction: The aim of the IMPACT (IMProving Access to Clinical Trial data) Observatory is to assess the transformation of clinical trials (CT) related to the evolution of sharing of CT data. The objective of this study is to establish a baseline for monitoring CT data sharing by the Observatory.

Materials and methods: In this scoping review we searched for publications that address sharing, dissemination, transparency or reuse of CT data published prior to December 31st 2000. Two authors screened titles and abstracts of 1204 records received by Medline searches and added 47 publications from direct discovery. Four researchers extracted, coded, and analyzed the predefined information from 102 selected papers.

Results: We found a growing recognition of the importance of data sharing prior to 2001. However, there were numerous obstacles including the ambiguity of the concept of data sharing, the absence of specific terminology and the lack of an “open” culture. By the end of 2000, data, metadata, and evidence based medicine were defined. Data sharing, registries, databases and re-analyses of individual patient data (IPD) emerged. The use of systematic reviews and IPD meta-analysis in decision making was promoted. Most arguments for broader data sharing came from oncology, paediatrics, rare diseases, AIDS, pregnancy, perinatal medicine, and media reporting related scandals.

Conclusions: Our findings indicate that the year 2000 could be used as a baseline for monitoring the evolution of CT data sharing as basic prerequisites were set in place, including greater understanding that CT data sharing is essential for decision making and the advancements of the Internet.

Key words: clinical trial data sharing; baseline; databases; registries; Cochrane; scandals


Received: August 7, 2017                                                                                                              Accepted: December 19, 2017




A clinical trial (CT) is a prospective controlled or uncontrolled study evaluating the effects of one or more health-related interventions assigned to human participants. In this paper, we use the term data sharing to describe the practice of making data from primary research publicly available for reuse. Many different types of data may be shared, including raw or analyzable data set; metadata, or “data about the data” (e.g., protocol, statistical analysis plan, and analytic code); and aggregate, summary-level data (e.g., summary-level results posted in registries, lay summaries, publications, and clinical study reports) (1). Raw data, participant-level data and individual-participant data (IPD) are unprocessed data from a clinical trial which come in their original form (before the information has been analyzed or statistically manipulated) in contrast to aggregate data. They could be records of original observations, measurements, and health-related interventions, researcher’s records on patients, medical charts, hospital records, lab notes, evaluations, data recorded by instruments, attending physician notes, etc. (2).

Results from health research are often considered a public good, and data sharing seen as beneficial, particularly because the re-analysis of data is the basis of reproducible research, which can help better understand results of a trial and serve as the basis of pooling data from multiple trials, thus revealing new information beyond information gained from any single study (3-5).

Also, CT data sharing has been identified as useful to explain disagreements between individual CTs and prevent biases (6-8).

The objective of the IMPACT (IMProving Access to Clinical Trial data) Observatory is to assess the transition of clinical trials regarding data sharing due to ongoing initiatives, identify facilitators and barriers of clinical trials data sharing, indicate trends, and inform the process (9,10). As we needed to establish a baseline from which to start monitoring changes regarding data sharing we decided to perform a scoping review. Specifically, as one of IMPACT Observatory studies, this scoping review aims to explore to what extent CT data were shared prior to 2001 and to determine the appropriateness of setting the year 2000 as the baseline from which the IMPACT Observatory could start monitoring changes regarding data sharing (10).

Observatories or natural experiments are epidemiological studies that assess the impact of one or more interventions that are not controlled by the observatory researcher(s) to inform the process and indicate trends (11,12).


Materials and methods


We performed a scoping review of the literature. A scoping review is a method used to better understand a phenomenon; it generally consists of mapping literature on a specific topic, and identifying key concepts, theories, and sources of evidence. It is particularly useful when a research question is broad and the goal is to identify qualitative rather than quantitative parameters (13,14). Literature searches were performed in Medline by two librarians using 8 different search strategies that were developed jointly with one reviewer. Searches were performed with no language limitations using strategies to select articles published prior to 2001 (i.e. up to December 31st 2000).

A flowchart of the selection process is presented in Figure 1.

Figure 1. Flowchart of selection process


The following (MESH) terms were used: “clinical trials as topic“, ”information dissemination“, ”information storage and retrieval“, ”access to information“, ”disclosure“, ”drug industry, policy“ and looking for specific terms including “clinical trials dissemination storage and retrieval“, ”clinical trial, information dissemination“, ”dissemination policy“, ”clinical trial dissemination–drug industry“, ”disclosure” that included “drug industry”, and “access” that included “databases” and “policy”.

These searches yielded 1204 records. After deduplication and exclusion of papers published after December 31st 2000, titles and abstracts (if available) were screened by one reviewer, and then cross-checked by another reviewer. Inclusion criteria were as follows: any article reporting or possibly reporting on CT data sharing, databases, registries, repositories, re-analysis and related practice and/or policies published prior to December 31st 2000.

We also included 47 papers identified through previous work which met our inclusion criteria. Two of the authors independently evaluated 129 full text records for final inclusion. Upon full text assessment, 27 records were excluded as they did not meet inclusion criteria, but described patient registries, or librarian research, or sharing data other than the ones from clinical trials. Most discrepancies were solved by discussion; remaining disputes were resolved by the third author.

We analyzed a total of 102 full texts. Two of the authors independently extracted relevant information in a predefined Excel file (Microsoft, Redmond, USA). A different pair of authors eliminated duplicates and summarized the information. We analysed two groups of topics: one group of topics (headings) in the Excel were phenomena including data sharing, database, registries, repository. The other group of topics (heading) were disease and/or patient groups and scandals related to data sharing. This group of headings was further expanded to capture specific disease or patient groups, as the preliminary analysis indicated that they appeared multiple times and created the atmosphere or even directly called for data sharing, for transparency.

All three authors coded the information for the following topics: data sharing, registry, databases, oncology, AIDS, pregnancy and perinatal medicine, child health/rare diseases, adverse effects, re-analysis, fraud/falsifications, scandals, individual patient/participant data (IPD), publication bias.




In our analysis of the 102 selected papers, we identified 3 major concepts of interest: data sharing, registries and databases. Figure 2 illustrates that this discussion peaked in years 1986 and 1993, however, significant expansion in the volume of relevant literature occurred between 1998 and 2000.

Figure 2. Number of records addressing data sharing, registries, and database in selected literature prior to December 31st 2000


As can be seen on Figure 3, calls for data sharing came from several health areas, most frequently from oncology, followed by child health/rare diseases, AIDS, and pregnancy and perinatal medicine. The most frequent topic discussed was publication bias, including both underreporting and duplicate reporting. 


Figure 3. Data sharing by topic and health area in which data sharing is addressed in the literature published prior to December 31st 2000.


However, the terminology was very ill-defined. For example, the term “database” was used extensively but described the collection of very diverse records, including bibliographic databases (PubMed, EMBASE, etc.). Also, different terms were used to describe similar activities or similar systems. The term “registry” was used for observational databases, clinical registries, patient registries and disease oriented registries; “trial banks” were also called large population cohorts, administrative databases, electronic patient records systems, large-scale databases, and databases of hospital records (15,16). In the nineties, the term registry was introduced to refer to a collection of data from CT protocols (17-22). The term “individual patient data” was used either to indicate data collected as a part of providing routine healthcare for an individual patient, or data collected during a prospective clinical trial. The acronym IPD was used to indicate the Individual Patient Data, while now, in case of clinical trial IPDs, it stands for Individual Participant Data (23,24).

We selected actions and events that had major impact on the evolution of CT data sharing prior to 2001 and presented them in Figure 4.


Figure 4. Key milestones of clinical trial data sharing prior to 2001. EBM – evidence based medicine. IPD – individual patient/participant data.


These key milestones include the call for trial registration and the establishment of CT registries (International Standard Randomized Controlled Trial Number
(ISRCTN) registry and, IPD meta-analyses, the onset of the Cochrane Collaboration and the Evidence Based Medicine (EBM) (19-22,24-28). We also included Nancy Olivieri scandal as it influenced the practice of industry - academia functioning (29,30).

We also selected highlights of discussion of the evolution of data sharing and presented them in the Table 1. These include citations of pros and cons arguments, as well as some actions like starting of databases and studies using data from industry sources.

Table 1.  Highlights of discussion of data sharing evolution prior to December 31st 2000



As can be seen in the Table 1, the last two decades of 20th century are rich in actions and discussion regarding sharing and reuse of CT data. Related discussions considered both the advantages and disadvantages or risks of these practices.

In 1992 Chalmers et al. identified weaknesses in all stages that took place between research and practice, from design and conduct of clinical trials, to the use of their results in decision making (25). In 1992, Pignon et al. published the first meta-analysis in which raw data were used (24). The Evidence Based Medicine started to form (26-28). Several important reasons for sharing raw data were put forward in the literature: the inclusion of unpublished data in meta-analyses could decrease publication bias, and improve the relevance of the question, the interpretation of results, and the design of future trials, as well as help avoid unnecessary duplication of research (24,25,31-33).

During this period, it was acknowledged that more powerful analyses could be run, stratified by trials, including subgroup analyses made possible if IPD data were available (24). The importance of IPD meta-analysis to produce evidence, develop guidelines, and support decision-making was increasingly discussed (23,24,34).

In 1995, the Cochrane Collaboration convened a workshop in London, UK, to discuss the practicalities of meta-analyses based on IPDs (23). As it became clearer in 1996, that data could and should be reused, Michels and Rosner concluded that it is more important to raise awareness on sensible handling of data than to artificially erect the barriers to data sharing (8). The importance of sharing raw data was further promoted by Vamvakas and Blajchman who noted that raw data meta-analysis could solve disagreements between individual randomised controlled trials (RCTs) (7).

During this decade, there also were breakthrough developments in the area of information technology, both in terms of software and hardware, which provided the underlying means to store and manage information (35,36). The development of the Internet was quickly recognized as an important tool for sharing growing corpus of information, such as raw data (33,37).




Our scoping review reveals that CT data sharing intensified during the last two decades of 20th century, helping forge a strong consensus that a strict baseline could not be drawn, but the real deal breaker was using individual patient data in a meta-analysis in 1992 (24). In this dynamic process a key terminology was defined, and new methodologies were pioneered. This process was led by the Cochrane Collaboration, which sought to improve the quality of systematic reviews and of the IPD meta-analysis, and developed and published methodological guidance for IPD meta-analysis (23). There was a broad discussion on the benefits of data sharing, often led by HIV/AIDS or cancer patient groups, as well as media reports related to the harmful consequences of not sharing of CT data. However, the growing consensus related to the value of data sharing was matched by numerous obstacles including the lack of data sharing culture, ambiguity surrounding key data sharing concepts and terms, some of which we described in this paper. Perceptions about data sharing wobbled between pros and cons. For example, negative attitudes towards data sharing were expressed in mid-nineties: Glass was convinced that „no one would expect investigators to publish their raw data” (31). McCarthy argued that datasets should be available to both physicians and patients, while Warlow et al. argued that unlocking large datasets is neither reliable nor cheap (6,38). Gradually towards the end of the century data sharing was increasingly seen as a benefit, although practical solutions needed to be implemented to facilitate the practice. One possible explanation for this evolution could be the newly founded Cochrane Collaboration, which promoted sharing of IPD and its use to conduct meta-analyses and also the creation of the concept of the EBM (23,26-28).

To the best of our knowledge, this early stage of clinical trial data sharing has never been analyzed, so this paper might contribute to understanding of the process, with its systematic design as the most important strength.

The major limitation of our study is a lack of specific MESH terms for our scoping review. Furthermore, literature searches were performed only in Medline. In order to compensate for these limitations, two librarians performed searches independently and we added references we discovered through other related work.




Throughout the last two decades of 20th century we observed glimpses and hints of attempts of CT data sharing and gradual emerging of the awareness of risks and benefits related to data sharing. Multiple factors contributed to this evolution. Public media and scientific journals played a significant role in raising awareness and influencing the change of culture regarding CT data sharing. Vulnerable populations (cancer patients, AIDS patients, pregnant women, children, rare diseases, expensive therapies) frequently participated in breakthrough cases.

At the end of the century, the basis for further development was laid and the year 2000 ends with initial trial registries, definition of datasets, the Cochrane Collaboration, enhanced systematic reviews and emerging IPD meta-analysis, the use of evidence gained by IPD meta-analysis for development of clinical guidelines and for making decisions that would benefit patients, constant improvement of Internet features, all in the environment of the growing interest, pressure and discussion about the need for  data sharing by various constituencies.




We thank Ana Marušić and Pavle Jeric for useful comments on earlier versions of the manuscript, Ana Utrobičić for contributing to literature search, Jelena Barbarić for contributing to data extraction and analysis of information, and Karine Morin for assisting with proofreading and editing the text. We also thank Nevena Jerić, Apropomedia, for graphic solutions and illustrations.

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7 2007-2013) under grant agreement No. 291823 Marie Curie FP7-PEOPLE-2011-COFUND (The New International Fellowship Mobility Programme for Experienced Researchers in Croatia - NEWFELPRO). This manuscript has been prepared as a part of a project “IMPACT Observatory” which received funding through NEWFELPRO project under the agreement No. 35.  


Potential conflict of interest

None declared.



 1. Sharing Clinical Research Data: Workshop Summary. Washington (DC) 2013. Available at: Accessed August 2nd 2017.

 2. The free dictionary by Farlex. Available at: Accessed July 17th 2016.

 3. Krleža-Jerić K. Clinical trial registration: the differing views of industry, the WHO, and the Ottawa Group. PLoS Med 2005;2:e378.

 4. Krleža-Jerić K. Sharing of clinical trial data and research integrity. Period Biol 2014;116:337-9.

 5. Jefferson T, Jones M, Doshi P, Spencer EA, Onakpoya I, Heneghan CJ. Oseltamivir for influenza in adults and children: systematic review of clinical study reports and summary of regulatory comments. BMJ 2014;348:g2545.

 6. McCarthy M. Unlocking the datasets. Lancet 1993;342:1252-3.

 7. Vamvakas EC, Blajchman MA. A proposal for an individual patient data based meta-analysis of randomized controlled trials of allogeneic transfusion and postoperative bacterial infection. Transfus Med Rev 1997;11:180-94.

 8. Michels KB, Rosner BA. Data trawling: to fish or not to fish. Lancet 1996;348:1152-3.

 9. IMPACT Observatory. Available at: Accessed August 2nd 2017.

10. Krleža-Jerić K, Gabelica M, Banzi R, Martinić MK, Pulido B, Mahmić-Kaknjo M, et al. IMPACT Observatory: tracking the evolution of clinical trial data sharing and research integrity. Biochem Med (Zagreb) 2016;26:308-17.

11. Remler DK, Van Ryzin GG. Natural and Quasi Experiments. In: Remler DK, Van Ryzin GG, eds. Research Methods in Practice: Strategies for Description and Causation. London: SAGE Publications; 2010. p. 427–64.

12. Craig P, Cooper C, Gunnell D, Haw S, Lawson K, Macintyre S, et al. Using natural experiments to evaluate population health interventions: new Medical Research Council guidance. J Epidemiol Community Health 2012;66:1182-6.

13. Levac D, Colquhoun H, O’Brien KK. Scoping Studies: advancing the methodology. Implementation Science 2010;5:69. Accessed November 17th 2017.

14. Armstrong R, Hall BJ, Doyle J, Waters E. Cochrane Update. ‘Scoping the scope’ of a cochrane review. J Public Health (Oxf) 2011;33:147-50.

15. Richesson RL, Vehik K. Patient registries. In: Richesson RL, Andrews JE, eds. Clinical Trials Registries and Results Databases, in Clinical Research Informatics; Health Informatics Series Part 4. London: Springer; 2012. p. 233-52.

16. Sacristán JA, Soto J, Galende I. Efficacy assessment with random assignment using data bases: medicine-based evidence? Med Clin 1998;111:623-7.

17. Perry DJ, Hubbard SM. PDQ-a database of clinical trials and cancer treatment information. Cancer Metastasis Rev 1988;7:209-21.

18. Ranke MB, Dowie J. KIGS and KIMS as tools for evidence-based medicine. Horm Res 1999;51:83-6.

19. Simes RJ. Publication bias: the case for an international registry of clinical trials. J Clin Oncol 1986;4:1529-41.

20. McCray AT. Better access to information about clinical trials. Ann Intern Med 2000;133:609-14.

21. McCray AT, Ide NC. Design and implementation of a national clinical trials registry. J Am Med Inform Assoc  2000;7:313–23.

22. Faure H, Hrynaszkiewicz I. The ISRCTN Register: achievements and challenges 8 years on. Journal of evidence-based medicine 2011;4:188-92.

23. Stewart LA, Clarke MJ. Practical methodology of meta-analyses (overviews) using updated individual patient data. Cochrane Working Group. Stat Med 1995;14:2057-79.

24. Pignon JP, Arriagada R, Ihde DC, Johnson DH, Perry MC, Souhami RL, et al. A meta-analysis of thoracic radiotherapy for small-cell lung cancer. N Engl J Med 1992;327:1618-24.

25. Chalmers I, Dickersin K, Chalmers TC. Getting to grips with Archie Cohrane’s agenda. BMJ 1992;305:786-8.

26. Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn’t. BMJ 1996;312:71.

27. Evidence-Based Medicine Working Group. Evidence Based Medicine. A New Approach to Teaching the Practice of Medicine. JAMA 1992;268:2420-5.

28. Sur RL, Philipp Dahm P. History of evidence-based medicine. Indian J Urol 2011;27:487–9.

29. Phillips RA, Hoey J. Constraints of interest: lessons at the Hospital for Sick Children. [Erratum appears in CMAJ 1998;159:1244]. CMAJ 1998;159:955-7.

30. Shuchman M. Legal issues surrounding privately funded research cause furore in Toronto. CMAJ 1998;159:983-6.

31. Glass KC. Toward a duty to report clinical trials accurately: the clinical alert and beyond. J Law Med Ethics 1994;22:327-38.

32. Melzer D. New drug treatment for Alzheimer’s disease: lessons for healthcare policy. BMJ 1998;316:762-4.

33. Impicciatore P, Pandolfini C, Bonati M. Database could give children safer medicines. Nature 2000;405:882.

34. Horton R. Data-proof practice. Lancet 1993;342:1499.

35. Korcok M. NCI offering computer database on cancer research. Can Med Assoc J 1985;133:225-7.

36. Hubbard SM. The physician data query (PDQ) cancer information system. Bull Cancer 1987;74:205-14.

37. Horton R, Smith R. Time to register randomised trials. Lancet 1999;354:1138-9.

38. Warlow C, Edouard L, Rawson NSB. Unlocking datasets. Lancet 1994;343:118.

39. Pellegrino ED. Beneficence, scientific autonomy, and self-interest: ethical dilemmas in clinical research. Camb Q Healthc Ethics 1992;1:361-9.

40. Munro AJ. Publishing the findings of clinical research. BMJ 1993;307:1340-1.

41. Easterbrook P, Berlin J. Meta-analysis. Lancet 1993;341:965.

42. Weiner MG, Hillman AL. “Virtual” clinical trials: case control experiments utilizing a health services research workstation. Proc AMIA Symp 1998; Annual Symposium: 300-4.

43. Randolph AG, Cook DJ, Gonzales CA, Andrew M. Benefit of heparin in central venous and pulmonary artery catheters: a meta-analysis of randomized controlled trials. Chest 1998;113:165-71.

44. Ebert TJ, Robinson BJ, Uhrich TD, Mackenthun A, Pichotta PJ. Recovery from sevoflurane anesthesia: a comparison to isoflurane and propofol anesthesia. [Erratum appears in Anesthesiology 1999; 90: 644]. Anesthesiology 1998;89:1524-31.

45. Heitman E. Ethical issues in technology assessment. Conceptual categories and procedural considerations. Int J Technol Assess Health Care 1998;14:544-66.

46. McNeil C. NCCN outcomes database makes debut. J Natl Cancer Inst 1998;90:488-9.

47. Silva J, Wittes R. Role of clinical trials informatics in the NCI’s cancer informatics infrastructure. Proc AMIA Symp 1999; Annual Symposium:950-4.

48. Garey KW, Amsden GW. Intravenous azithromycin. Ann Pharmacother 1999;33:218-28.

49. Lindholm LH, Tcherdakoff P, Zanchetti A. Safety profile of lacidipine: update from a clinical trials database. Drugs 1999;57:27-9.

50. Phillips AN, Phillips AN, Grabar S, Tassie JM, Costagliola D, Lundgren JD, et al.  Use of observational databases to evaluate the effectiveness of antiretroviral therapy for HIV infection: comparison of cohort studies with randomized trials. EuroSIDA, the French Hospital Database on HIV and the Swiss HIV Cohort Study Groups. AIDS 1999; 13:2075-82.

51. Setter SM, Corbett CF, Campbell RK, White JR. Insulin aspart: a new rapid-acting insulin analog. Ann Pharmacother 2000;34:1423-31.

52. Kmietowicz Z. UK drugs industry sets up trials register. BMJ 2000;321:850.