The investigation of pre-analytical factors in laboratory medicine is pivotal to improve the overall clinical laboratory quality, and in turn to ensure the patient safety (1). In this regard, phlebotomy is addressed as a crucial process in the pre-analytical phase, in which a large part of laboratory errors is thought to arise, having the potentialities to affect largest part of medical decisions (2, 3). Indeed, apart from the provision with qualitatively appropriate supplies that depends on the healthcare service’s choice, there are no other means than the operator’s skills and compliance with standard procedures to ensure the adequate sample quality (4). As far as pre-analytics is a major concern in current laboratory medicine and an issue for practitioners and researchers, this field of investigation should be fostered in order to produce evidences for best practice (5, 6). Indeed, unnecessarily complicating the patient management without any actual improvement, or even oversimplifying and then flawing its safety, might undermine the operator’s awareness of a mandatory and careful pre-analytics. Carrying out a pre-analytical investigation poses some methodological concerns regarding the statistical framework used to assess the investigated factor. Furthermore, with respect to phlebotomy, there are some more specific issues arising on the choice of the appropriate cohort, the standardization of procedures and the deliver-ability of results to non-academic readers. Thus, such studies should grant the highest reliability, whereby dispelling any doubt of misleadingness.
Scope of the present paper is assessing the appropriateness of the methodology used to carry out a pre-analytical investigation of phlebotomy, fitting researchers with specific recommendations on study set up and delivery. Thereby, in this two-part paper, part I aimed to gather evidences through a review of available literature and summarizing evidence, and part II is concerned with methodological appropriateness and the choice of suitable procedures.
PART I – Evidences
Literature search and data analysis
The literature database MEDLINE was searched using PubMed for papers issued in the last twenty years (January 1996 to April 2016). In order to structure the search, we first set a combination of keywords that targeted the general topic of pre-analytics in laboratory (e.g. “laboratory test”, “pre-analytic”, “biological OR individual”). Afterwards, we refined the search focusing on ten topics each of which addressed a pre-analytical aspect of phlebotomy strictly related to the operator’s choice, and setting appropriate keywords (Table 1).
For each topic searched, suitable papers were extracted by deciding on the bases of the abstract content. Papers were considered suitable only if compliant with all the following requirements: a) were based on an experimental set up aimed to investigate one or more pre-analytical factors related to the procedure of blood drawing, b) reported a quantitative effect (mean change and/or bias) with respect to clinical chemistry, haematology and coagulation tests, c) assessed no other procedure for vein accessioning except venepuncture. Thus, we excluded retrospective studies relying on mathematical models like regression, studies dealing with pre-analytics in metabolomics or biobanking, investigations comparing phlebotomy to other drawing techniques (e.g. saline lock devices or intravenous catheters) as well as those assessing the effect of particular devices or materials (e.g. kind of tube preservatives or infra-red vein finders).
Finally, selected papers were evaluated with respect to the experimental set up, sample size (N), kind of population used for the study (volunteers, donors, inpatients, outpatients), number of individual laboratory tests evaluated, number of factors assessed, testing of data normality, descriptive measures provided (central tendency, dispersion), measure of association between paired observations, methodology used for agreement and bias estimation, clinical significance assessment.
Data were analysed with Microsoft Excel (Microsoft Corporation, USA) spreadsheet and StatsDirect 2.7.2 (StatsDirect Ltd., UK) statistical package, representing relative frequencies as proportions according to the author guidelines (7). The normality of the data was tested by means of Shapiro-Wilk’s test. Data dispersion was assessed using a dot-plot and represented by median and interquartile range (IQR) accordingly. The statistical association between qualitative variables was assessed by means of the Fisher’s exact test or Fisher-Freeman-Halton test, or by the Spearman’s ρ between quantitative continuous variables. Instead, the Mann-Whitney U test was used to assess the association between a continuous and a qualitative variable (i.e. the effect of a factor on a median value). The effect size was estimated according to Spearman’s ρ for U-test, while the pairwise Cohen’s d was used for retrospective estimation of data available in the reviewed studies (8-10). Statistical significance level was set at P < 0.05.
The search provided a total of 136 articles, of which 36 resulted suitable and available for this study according to the established criteria. Three more papers, which resulted potentially suitable, were not available through our local library service, and thus were excluded from this study. Some papers dealt with more than one topic in the same experimental set up, so with respect to each single topic we found:
Thus, it resulted 6/36 (0.17) papers issued by 1996-2000, 5/36 (0.14) by 2001-2005, 7/36 (0.19) by 2006-2010 and 18/36 (0.50) by 2011-2016 (up to March), issued in 17 different journals (Figure 1).
Sample size and study population
With respect to the sample size, 14/36 papers (0.39) had N ≤ 20, 6/36 (0.17) had N ≤ 30, and 16/36 (0.44) had N > 30. In some studies where N > 30, the sample was partitioned into two or more subgroups on which the analysis was repeated independently, so that the actual sample size varied according to the stratification (28, 32, 33, 38, 43, 45). In only a single case (1/36, 0.03) the authors reported the sample size was chosen basing on a preliminary power analysis, otherwise no information regarding preliminary calculations was given (42).
In 22/36 papers (0.61) the study population was represented by healthy volunteers, in 11/36 (0.30) by outpatients, in 1/36 by inpatients (0.03), in 1/36 by blood donors (0.03) and in 1/36 it was not specified (0.03). Notably, with respect to the median sample size, it was N = 88 (IQR: 54.5 - 220.5) for studies using outpatients and N = 20 (IQR: 17.5 - 30.0) for studies using volunteers, with the difference being statistically significant (P < 0.001, effect size ρ = 0.69).
Study design and data summarization
All the studies relied on the within-subjects or single-group repeated-measures design. Particularly, 26/36 (0.72) assessed 1 pre-analytical factor, while the remaining 10/36 (0.28) assessed 2 factors (e.g. tourniquet pressure and time). Besides, in 2 papers it was also assessed a third factor which was not related to phlebotomy (sample storage and data transportation respectively) (19, 28). In 4/36 papers (0.11) the investigation regarded 1 laboratory test, in 15/36 (0.42) from 2 to 5 tests, in 12/36 (0.33) from 6 to 24, while in 5/36 (0.14) 25 or more tests.
In 28/36 papers (0.78) no normality test was mentioned or reported, while in 5/36 (0.14) the Kolmogorov-Smirnov or D’Agostino-Pearson’s test was used (without specifying which kind in one case), and in 3/36 (0.08) the paper was unclear regarding whether the test was performed and which one was adopted (15, 19, 25). Noteworthy, the use of normality test was not associated with the sample size, in that the frequency with which it was used in studies with N ≤ 20 and N > 20 did not differ statistically (P = 0.328).
The paired Student’s t-test was the most used statistical test for assessing the effect produced by the pre-analytical factor and it appeared in 17/36 papers (0.47), while the non-parametric equivalent Wilcoxon’s paired-ranks test was used in 11/36 (0.31). In this regard, in 2/36 cases (0.06) the authors stated that Student’s or Wilcoxon’s test was chosen after the result of a normality test (12, 46). In 4 papers (0.11) the authors used linear models to analyse their data, which were represented by parametric or non-parametric (Friedman’s) 1-way ANOVA, or in 1 single case (0.03) by a linear mixed effect model (LMEM). Except when it was explicitly reported, the choice of a non-parametric instead of a parametric statistical test was made independently from a sample size N ≤ 20 (P = 0.720), as well as a prior execution of a normality test (P = 0.811). Noteworthy, although 29/32 papers (0.91) assessing more than 1 laboratory test used a 2-sample location test (Student’s or Wilcoxon’s test), just 1/29 (0.04) corrected the α inflation by means of the Bonferroni method (41).
The mean was the central tendency measure most frequently used (26/36 papers, 0.75) to summarize data, and in 5/36 cases (0.14) it was used even when the statistic assessment was achieved by means of a non-parametric test (16, 19, 34, 41, 44). With respect to variability, the standard deviation was the measure most frequently used (19/36, 0.53) along with the interquartile range (8/36, 0.22), while just 4/36 (0.11) papers used the 95% confidence interval alongside the mean (28, 36, 41, 42). Just 9/36 papers (0.25) provided the value of correlation between paired data, thereby allowing the retrospective estimation of the observed effect size (21, 22, 25-27, 29, 39, 40, 43). With respect to the appropriateness of summarization, considered as the kind of measure of central tendency and dispersion adopted with a parametric or non-parametric test, it resulted independent from the journal that issued the research (P = 0.676).
Bias and clinical significance
The Bland-Altman analysis was used to estimate bias in 12/36 papers (0.33), followed by the percentage mean difference (8/36, 0.22) which represented the difference between baseline and treatment values divided by the treatment value. Notably, in 12/36 papers (0.33) no bias estimation was reported, while in 3/36 (0.08) cases a Bland-Altman like plot analysis was used although it was not mentioned as such in methods (37, 38, 43). Lastly, in 2/36 (0.06), along with Bland-Altman analysis, the bias was estimated through Passing-Bablok regression with 95% confidence intervals (21, 41).
With respect to agreement between laboratory test results obtained with and without the factor applied, the Passing-Bablok regression was used in 8/36 (0.22) papers, while in 24/36 (0.67) papers no agreement assessment was shown. In 3/36 (0.08) cases it was used the ordinary least-squares regression and in 1 single case (1/36, 0.03) the simple linear correlation (37, 39, 42, 43).
Regarding the clinical significance of the pre-analytical factor, in 16/36 (0.44) papers the authors preferred the direct comparison of the corresponding bias with the value of biological variability reported in databases. Conversely, in 4/36 (0.11) paper it was used a more statistically structured approach based on the total change limit (TCL) or the reference change value (RCV) (13, 14, 19, 41).
Interestingly, in those papers that allowed the retrospective estimation of the effect size (7/36 cases, 0.19), the size of the effect produced on individual laboratory test and the corresponding magnitude of the bias resulted uncorrelated (Spearman’s ρ = 0.146, P = 0.142). However, when the clinical significance of bias was used as grouping criterion, the effect size significantly differed in median magnitude (P < 0.001, effect size ρ = 0.59) (Figure 2) (21, 22, 25-29). Particularly, the median was 0.349 (IQR: 0.228-0.531) and 1.140 (IQR: 0.815-1.700) for clinically significant and non-significant bias respectively.
Pre-analytical investigations of phlebotomy resulted constituting a heterogeneous body of investigations based on the general framework of within-subjects repeated measures design, in which the methodological approach showed a certain variability even among papers issued by the same group of authors.
First, we recognized some methodological inaccuracies that could be considered general issues of research articles, and that usually are addressed at the level of author guidelines by journals. For instance, the choice of the statistical test (parametric or not) was carried out independently from any assessment of the dataset structure (e.g. size, normality, dispersion). In this regard, we found a lack of association between the appropriateness of summarization and the journal issuing the paper, suggesting that this kind of flaws probably depends on a scarce attention paid by authors to that kind of guidelines.
Second, we observed certain specific drawbacks, some of which strictly related to the conceptual and statistical framework that characterized this kind of studies, and that can be resumed as follows:
the choice of the population study affecting the sample size, with cohorts of healthy volunteers ranging below the size of 30
the investigations of more than one laboratory test within the same experimental framework without an opportune correction, causing an inflation of the statistical significance
the lack of calibration (i.e. sample size calculation based on the least significant detectable difference), especially for small sized cohorts
the use of 2-sample location test (e.g. Student’s t-test or Wilcoxon’s test) as a “screening approach” to a factor in the study
the prevalent use of Bland-Altman plot, without a regression analysis of trend of individual differences to assess a proportional effect
the agreement analysis treated as complementary rather than fundamental for a pre-analytical investigation, and therefore often ignored or sometimes carried out with inappropriate methodologies.
Particularly, regarding the last three points, the pre-analytical factor, the bias and the agreement were usually treated as if there was no relationship between them, leading to use multiple tests (often redundant) that resulted in a fragmented statistical framework. Therefore, many studies had potentially non-homogeneous calibration through different statistical methods, and thus were potentially at risk of delivering some unreliable results. For instance, we noticed that factors showing a non-clinically significant bias were associated with a smaller effect size when assessed by means of 2-sample location. Lastly, we also noticed a certain lack of standardization in operative procedures reported in the various investigations, and a general inhomogeneity regarding how presenting data and delivering results to the reader. Thus, based on these evidences, we have developed a set of recommendations presented in part II of this document, aimed to ensure the adequate quality level to researches dealing with pre-analytical issues related to phlebotomy.
PART II – RECOMMENDATIONS
Setting up the cohort
The nature of subjects within a cohort should be chosen in order to address a specific diagnostic issue, rather than a generic laboratory concern. Indeed, phlebotomy constitutes the essential connection between the clinics and the laboratory diagnostics, with venepuncture prompted by a precise medical question (4). Therefore, the patient-side perspective should be preferred over the laboratorian-side perspective, even if the investigation concerns a technical aspect of laboratory pre-analytics.
The choice of the population in a phlebotomy study can make the difference when the results are generalized to a different population. For instance, evidences on mechanical factors gathered in healthy subjects may not suit oncologic patients having chronic lymphocytic leukaemia or under tamoxifen treatment, in that they show an abnormal cell fragility (47, 48). Conversely, the same cohort might suit the investigation on false positives in laboratory testing of general population due to pre-analytical errors in phlebotomy.
Calibrating the study
A study should be meant to detect the meaningful effect size of a factor, avoiding both excessive (over-powered) or scarce (under-powered) sensitivity (49). Study sensitivity depends on the particular statistical test adopted to assess significance, as well as on the size of the cohort that was chosen to carry out experiments (50-52). As the sample size has the larger impact since can be more easily varied by the researcher, its strict management should be meant for achieving the appropriate study calibration and avoiding unreliable results (see Appendix A) (53).
Invasive procedures naturally tend to rely on a small cohort, and in phlebotomy, some of the experiments even require multiple vein accessioning (e.g. the comparison of butterfly versus straight needle). Moreover, some medical conditions can further complicate the enrolment of patients. For instance, it could be easy to adequately size the study when the enrolment concerns subjects under oral anticoagulant therapy with INR between 2 and 3, but the situation could markedly change at higher values of INR (38, 43). A practical way of properly sizing a study consists in starting from an expected magnitude of the effect, that for instance could be estimated by means of retrospective calculations using previously available data (54). Then, the required sample size can be achieved inputting the value thereby obtained into stand-alone freely-available dedicated software as well as on-line web tools, choosing the statistical test that is going to be used (55-57).
Beside calibration, a study should also rely on an accurate data validation, achieved assessing the dataset shape. A normality test is useful to show any eventual significant distortion produced by erratic observations, for instance like the ones that can arise due to biological variation (outliers) (58). It should be remarked that skewness markedly affects parametric statistics (Student’s t-test), so that the choice between parametric and non-parametric tests should be made carefully and not only basing on the sample size (59-61). Indeed, the choice of the inappropriate statistical test is responsible of a deflation of sensitivity, that is already an issue of small-sized studies (51). Therefore, data validation should be mandatorily carried out as strictly as possible.
Setting the procedures
Procedures used to investigate pre-analytical factors should be standardized, as the reliability of such a study strictly relies on their correct application and execution. Indeed, the lack of standardization could introduce uncontrolled confounding factors that might lead to contradictory findings, as it was shown happening for the “fasting” condition or the venous stasis induction (24, 62). Thus, if a referenced protocol is currently unavailable, the author should detail what was performed instead of using general terms or descriptions (e.g. “venous stasis was induced applying an elastic tourniquet at 5 cm above the site of insertion, inducing an equivalent pressure to 60 mmHg, holding in place for 1 minute after 21G needle insertion” instead of “samples were collected after venous stasis was induced”).
With respect to laboratory tests used as part of the experimental procedures, the recommendation concerns the way they should be arranged when the study deals with a panel of multiple analytes. Actually, this implies that the same cohort is independently tested several times (once for each analyte) within the same experimental framework, a fact that rises some concerns on reliability due to the probabilistic nature of the statistical assessment (53). In fact, it causes an inflation of the rate of falsely significant results, requiring an opportune correction like an upward adjustment of the P-value through appropriate statistical procedures (see Appendix B for details) (63, 64).
Maximizing the design
The methodological framework of pre-analytical investigations should maximize the reliability and consistency of the achieved information. Thus, the approach based on assessing the effect of factor, the bias and the agreement within the same design as separate entities through distinct statistical methods should be discouraged since inappropriate and unnecessary.
The two major concerns arising from the use of multiple methods are homogeneity of calibration and robustness. Recalling what stated earlier on statistical power, it is virtually impossible to achieve a homogeneous sensitivity for different statistical procedures at the same sample size (50-52). Furthermore, different methods show unlike robustness toward the same shape (i.e. outliers) and variability (i.e. inhomogeneity of variance) of data, that may arise due to an underlying heterogeneity of the cohort (65). Thus, combining these two factors, a research might show redundant tests producing even discordant evidences (see further in this section). Instead, the statistical framework should avoid any ambiguity, maximizing the advantage of within-subjects designs that allow controlling the intra-individual variability increasing the precision of estimates and in turn the study sensitivity (50, 52). In this regard, linear models like regression and especially linear mixed-effects model (LMEM) should be preferred.
The LMEM (or multilevel model) is a general case of multiple regression (i.e. a regression with more than one predictor) suitable to handle the contribution of individual variability in the analysis of multiple effects (66, 67). It can handle both effects that can be experimentally replicated and have the same size for all tested subjects (namely “fixed”, like two different bore sizes or different stasis duration), and effects that lay outside the experimental control and have a certain variability (namely “random”, like the homeostatic point of each subject in the study) (66, 68). In pre-analytical investigations, the two kinds of effects are always combined, because planned factors are applied to a random set of individuals (28, 69). Thereby, LMEM can decompose total variability (i.e. variance) into components, showing the contribution of within-subject, analytical (i.e. method imprecision), and factor effect (bias) separately. For instance, one may plan to investigate the rate of pseudohyperkalemia due to needle bore size, and simultaneously investigating the effect of age (random effect), MCV (random effect) and gender (fixed effect) of the subjects adding the appropriate terms. The LMEM is an observation-centred rather than a factor-centred framework like ANOVA and repeated-measures ANOVA (70, 71). Thereby, it has other two points of strength: a) it can handle missing data produced by outliers removal or eventual drop outs, and b) it can account for correlation between observations like the effect of baseline value on response within the same individual. A major (and technically the only one) limitation to the use of LMEM is the methodological complexity, that demands the appropriate level of statistical knowledge to properly set-up the experimental design, transferring data into the statistical frame and interpreting the results (66).
The Passing-Bablok regression is an in-error variable method that relies on a non-parametric estimation of coefficients to gain robustness (72-74). With respect to Deming model that relies on the least-squares estimates, it is fairly insensitive to outliers, and allows to handle single measurement for each observation pair ignoring the analytical imprecision (75, 76). As well as any other regression method it shows the agreement between paired observations, that is the way they scatter around a line with no prevailing effect of one procedure over the other. However, being a linear model of relationship, it decomposes the observed effect into a constant (intercept c) and a proportional (slope b) bias (77, 78). What is more it relies on confidence interval for assessing significance, accustoming researchers and readers to give up the use of P-value (79). Limitations of the Passing-Bablok model are that it cannot handle multiple factors and missing data, as well as it necessitates a high correlation between paired observations to hold.
It should be remarked that the use of 2-sample location tests (e.g. t-test and non-parametric equivalents) as a means to assess statistical significance of a factor, alone or beside regression analysis, should be discouraged. Actually, they investigate only systematic difference, that is systematic bias at an agreement analysis, and can be considered reliable just when observations cover a narrow range and no significant trend is supposed to arise (80, 81). Therefore, a paper could report a non-significant factor at t-test producing a proportional bias instead, confusing the reader.
Assessing clinical significance
The clinical significance of any procedure should be always assessed within the statistical framework adopted to test the factor, and reported alongside the statistical significance. If the assessment is carried out with linear models (Passing-Bablok regression, LMEM) it is suitable to use the RCV, popularized in laboratory medicine by Fraser, to get the actual threshold of clinical significance (82). If the observed bias is larger than the expected combined effect of analytical and biological variability, then clinical significance is achieved. The RCV can be obtained at different levels of the laboratory assay knowing the corresponding actual imprecision of the analytical method by means of quality control samples (an example applied to regression analysis in shown in Appendix C). Alternatively, the technically equivalent total change limit (TCL) can be used (83). As they both depend on the underlying assumption of statistical normality (same probability of getting an equally large positive or negative variation), they can be reformulated using a robust non parametric model in order to better resemble the structure of data and gain the appropriate sensitivity (84, 85). Lastly, the comparison of achieved bias with desirable values obtained from databases is another alternative to the use of statistically derived boundaries, but it does not take into consideration the actual imprecision of the methods used to perform experiments (86).
A concluding remark on statistical methodology regards the recommendation to use the difference plot (better known as Bland-Altman plot) for bias assessment and clinical significance in these studies. The method was devised to estimate at-a-glance, by the scatter plot of individual differences between paired observations, the 95% limits of agreement using the ± 1.96 standard deviation interval around the average bias (87-89). The procedure has the major advantage of computational simplicity and visual immediacy, but in order to emphasize the random component of bias it constrains the modelling of the systematic and proportional components (90). It should be also not considered complementary to regression analysis, also because procedures based on least squares estimate do not return independently distributed residuals while the Bland-Altman plot assumes differences to behave otherwise (91). Therefore, use and interpretation of this kind of plot within a framework based upon linear modelling should be carefully undertaken.
Delivering the evidences
A pre-analytical investigation of phlebotomy should aim to deliver information of practical relevance, and thus it should be meant to reach also non-academic recipients. This makes accessibility a major objective, and the author should take into consideration the impact in the decision-making of the phlebotomist accessing his research. In this regard, it should be advisable to use P-value beside the confidence interval plus the level of clinical significance, as that was shown to produce the highest rate of correct interpretation of results (92, 93).
General recommendations of scientific writing are considered mandatorily applied to these studies (7). However, two special recommendations concern the section reporting the study discussion. First, the authors should take care of emphasizing the supposed mechanisms behind the results, especially with respect their relevance for the actual practice of phlebotomy and the current operative procedures. Second, they should avoid mentioning statistical aspects related to results for not distracting the reader, leaving such aspects to footnotes or appendices to the main text.
In phlebotomy, operative procedures are fundamental for an appropriate patient management and clinical testing reliability, and on their simplicity and effectiveness depends the level of compliance they can reach (94, 95). The academic research has a pivotal role in this, and pursuing standardization should be considered part of the consolidation process undertaken by any research field aware of its scope. Actually, this means attaining the unity in the methodological paradigm to achieve effectiveness, with a concise, consistent and efficient production of cumulative knowledge (96). Thus, issuing recommendations (a summarization of which is displayed in Table 2) should be regarded as the first step in such a cultural growth.
In this work, we mainly discussed the statistical methodology, trying to recognize the specific concerns of pre-analytical investigations of phlebotomy. When we completed to review the papers, that body of publications looked like highly heterogeneous with some redundancies within the statistical framework. Thereby, we considered a suitable approach pruning the existing framework inherited from the method comparison studies, basing on the evidences that it was not always properly or sufficiently replicated in all its fundamental parts (97). Maybe, since laying in between clinics and laboratory, phlebotomy has long struggled to gain in scientific literature its own identity and the same reputation as laboratory assays. For instance, it’s symptomatic that just two out of the six papers issued by 1996-2000 used a kind of difference plot, and both of them neither mentioned the Bland-Altman eponymous nor cited the original paper (by the way, Scopus showed 3914 citations yet by that time) (19, 20). Conversely, in the past years, mostly the last five (see Figure 1), we observed a change in trend and a growing attention payed toward this kind of research. Probably, we owe that to the efforts spent for addressing the cardinal role and the pre-analytical relevance of phlebotomy in modern laboratory medicine, making of it a major concern (1, 98).
There are some aspects of this work that should be addressed as possible limitations, and for which we would provide a justification. Actually, we did not structure this work as a systematic review, relying on PubMed MEDLINE alone, and it could be argued that a certain bias of partiality arose. However, we were concerned with the way the scientific information was produced and delivered, and not with its use for generating meta-analytical results. PubMed represented our objective being a comprehensive health information resource that is preferably queried by academic readers over other databases (99, 100). Thus, merging the search results of different sources would have meant deviating from the perspective of largest part of potential readers, introducing a bias of liberality instead.
Second, it could be objected that the pruning was rather an arbitrary choice of the suitable techniques not based on a consensus, which tended to privilege more complicated statistical procedures. Actually, the logic was contrasting the unnecessary multiplication of methods within the statistical framework mostly caused by their customarily use. For instance, we proposed to carefully handle the Bland-Altman plot, reputed a mainstay of the comparative paradigm (97). Interestingly, it should be noticed that the celebrated simplicity was already recognized not a guarantee of appropriateness and homogeneity regarding its use and diffusion (101).
In a future perspective, the critical process initiated issuing these recommendations should culminate in the development of a complete chart dedicated to pre-analytical investigations (and not only strictly concerning phlebotomy) similar to the Standards for Reporting Diagnostic Accuracy’s (STARD) chart. Potentially, that would consolidate the contribution of this research field to both laboratory quality and patient safety (102). However, the adherence to strict requirements represents an additional effort in managing a study, that can be experienced as impractical if the peer-reviewing process does not encourage to comply with it and the research quality is not exalted by an increased citation rate (103-105). The experience maturated with the STARD has shown how much all such factors hindered the consolidation of such a new paradigm, despite the wide resonance it had in the scientific literature (106-109). Obviously, there must be correspondence between authors, peer-reviewers and journals to let any new concept reaching acceptance and spreading (110-113).
What outlined above can be nothing but a slow process of growth that demands collective awareness and positive disposition to achieve maturity. Actually, we need to challenge the safe zone of customaries to follow that growth.