Standardization in laboratory medicine: Two years' experience from category 1 EQA programs in Spain.

Introduction
Standardization is the ability to obtain interchangeable results leading to same medical interpretation. External quality assessment (EQA) is the main support of the on-going harmonization initiatives. Aim of study was to evaluate results obtained from two years category 1 EQA program experience in Spain and determine the impact of applying this type of EQA program on the analytical standardization.


Materials and methods
According to the analytical method, traceability and instrument different groups were established which results were evaluated by calculating mean, coefficient of variation and percent of deviation to the reference value. Analytical performance specifications used to the results' evaluation were derived from biological variation for bias and from the inter-laboratory coefficients of variation found in a previous pilot study.


Results
Only creatinine measured by enzymatic methods gave excellent results, although few laboratories used this method. Creatine kinase and GGT gave good precision and bias in all, but one instrument studied. For the remaining analytes (ALT, ALP, AST, bilirubin, calcium, chloride, glucose, magnesium, potassium, sodium, total protein and urate) some improvement is still necessary to achieve satisfactory standardization in our setting.


Conclusions
The two years of category 1 EQA program experience in Spain have manifested a lack of standardization of 17 most frequent biochemistry tests used in our laboratories. The impact of the information obtained on the lack of standardization is to recommend abandoning methods such as ALT, AST without exogenous pyridoxal phosphate, Jaffe method for creatinine, and do not use non-commutable calibrators, such as aqueous solutions for calcium and sodium.


Introduction
The main objective of clinical laboratory is to provide clear, reliable and useful information for clini-cal decision-making. Current healthcare systems imply performing laboratory tests in different lo- Ricós C. et al. Standardization with category 1 EQA cations, so standardization among laboratories become one of the cornerstones of the quality patient's care. Standardization can be defined as the ability to obtain interchangeable results (within certain analytical quality uncertainty) in order to achieve the same medical decision, regardless of the analytical procedure (method, traceability and instrument), measurement units and reference intervals.
The standardization should be based on six basic pillars, which include in vitro diagnostic companies, reference materials, reference methods, reference laboratories, medical laboratories and external quality assessment (EQA) organizations (1).
Recently, Greaves noted that EQA is not just a pillar but the central support for on-going harmonization (2). Discordance in results between laboratories and methods should become a practice no longer accepted.
It is widely accepted that the best strategy to organize an EQA scheme is to use fresh frozen commutable control samples with values assigned by reference laboratories using reference methods, which can be found on www.harmonization.net (3,4).
Spanish Society of Laboratory Medicine (SEQCML) is a non-profit scientific organization that has been providing EQA schemes in Spain since 1980 by using stabilized control materials. Since 2013 a category 1 program has been organized for basic biochemistry analytes. According to Miller et al. this kind of program distributes commutable control materials with reference-measurement procedure (RMP) assigned values and replicate samples in surveys are tested (3). Accuracy of individual laboratories is assessed by comparison with the RMP, while reproducibility is checked both intra-and inter-laboratory, and standardization is assessed by comparison of measurement procedure calibration traceability with RMP. Two initial surveys were performed in 2013 and 2014, as preliminary experiences and regular annual surveys have been organized since 2015. For a proper assessment of bias, having adequate information of measurement's traceability is therefore a crucial point (5,6).
Another important aspect to consider is the analytical performance specification (APS) or acceptability limits selected for the evaluation of the derived results. When APS are based on biological variation (BV), it is highly recommended to use the gradual classification of APS according to its strictness: optimal, desirable and minimal (7). It should be noted that the APS grade could be selected according to the limitations of the current state of the art, being defined as the performance achieved by about 80% of laboratories. According to this criterion, in this study the minimal BV-based APS grade was selected for electrolytes evaluation, while desirable BV APS were chosen for enzymes and substrates.
In this regard, a performance worse than the minimum APS should alert the laboratory that its results could be at risk and clinical decision-making might be detrimentally affected. Likewise, a performance reaching the minimal grade suggest that further improvement may be beneficial for patients (8,9).
The aim of this work is to evaluate the results obtained from two years category 1 EQA program, 2015 and 2016 surveys, performed in our country and to assess the impact of applying this kind of EQA program over the analytical standardization. Evaluation is based on the inter-laboratory imprecision and the bias of the peer group means compared with the reference method values.  (11,12). Throughout the years commutability has been monitored by including a native, single donation spy-sample (10,12).

Materials and methods
Six vials of fresh frozen human serum pools at different concentrations were distributed once per year in a single express shipment at -80 ºC and delivered within 24 hours to laboratories all over Spain. Different lots at different concentrations were provided for each of the two surveys. Participant laboratories were requested to maintain samples at -20 ºC until analysis, which had to be performed within the following 14 days. Each vial had to be analysed in duplicate, one vial per day, for 6 consecutive days whenever possible. Results were registered on the SEQCML-EQA website, in order to be either individually and globally evaluated.
A preliminary 2013 survey was carried out in 19 laboratories and was addressed to ascertain whether the logistics of managing a non-stabilized set of control materials was operative in our country. No incidents were observed with temperature maintenance during the time between deliveries of control materials from the provider to the laboratory analysis.
Another point of interest of this preliminary survey was to explore whether laboratories could adequately inform about their analytical traceability to standards. Important difficulties were perceived that impelled holding a meeting between EQAs organization and providers, claiming for clear and complete information on calibrators' traceability.
In 2014 first survey was performed, as part of a pilot European study (INPUTs) (Italy, The Netherlands, Portugal, Spain and The United Kingdom), with a total of 20 laboratories participants and whose results has been already published (12,13).
Only about 45% of participants were able to correctly inform about its traceability, so results are not shown in this study. This survey was then considered as a pilot to identify the problems that could impact on the EQA participation and further interpretation of results. For both surveys as well as for those performed in 2015 and 2016, same sample management protocol was applied.   Results were categorized by measurement procedure, traceability and instrument. The description of standard materials used by participants for calibration traceability is shown in Table 2. Participant laboratories using the same combination of these three elements were considered as a peer group.
The peer groups and the number of laboratories included for each analyte are shown in Figures 1-17.
Compared to 2015, a new instrument was incorporated in 2016 survey (Bio-systems BA 400), with only 6 participating laboratories. The overall evaluation of the 2015 survey was published on the SE-QCML website and was presented at the 2016 EQALM annual meeting (13,15). Only groups formed by 5 or more final laboratories were considered in this study.
Inter-laboratory imprecision was calculated by averaging the coefficient of variation (CV) obtained from the six controls distributed on the 2016 and 2015 surveys and compared with the best (Dutch) inter-laboratory CV derived from the 2014 pilot study, which used similar six commutable control materials (16).
Bias was calculated by the percent difference between the peer group mean (same measurement procedure, traceability and instrument) and the reference value. The analytical performance specification to apply for bias evaluation was based on the BV data collected on the online 2014 database, which had been elaborated as detailed by Ricós et al., applying the minimum level of requirement for electrolytes and the desirable level for substrates and enzymes (17)(18)(19).
The results of this study were examined with the particular focus on the most common analytical procedures used in Spain and its repercussion on non-comparable results, detected throughout participation on level 1 EQA schemes.
Standardization is defined by the attainment of inter-laboratory imprecision within the predefined APS and peer group bias (% mean deviation to the reference value) below the allowed bias derived from BV.

Results
All results exceeding the mean ± 3 standard deviation of each group were rejected as outliers. The number of rejected participant laboratories was 5 for the 2015 survey and 10 for the 2016 survey. Moreover, 30 results for lactate dehydrogenase (LD) which were 100% higher than the others due to the different substrate (pyruvate instead of lactate) were also excluded from the study. Results for bias are presented in Figures 1-17. Results for the inter-laboratory imprecision of each peer group for electrolytes, enzymes and substrates are presented in Tables 3-5 and compared with the APS for inter-laboratory imprecision (APSIL) from the pilot 2014 survey (16). An overview of the        standardization achieved in our setting, according to the bias and the imprecision calculated for instruments, is presented in Table 6.

Discussion
The percentage of laboratories excluded was higher in 2016 than in 2015 due to better knowledge of the traceability-instrument, so groups were more specific in 2016. This cannot be considered a disadvantage. The results in this study are discussed form the light of their impact on the aims proposed. These are: positive, negative and needed to be dialogued with providers.
Main positive impacts, which imply an adequate standardization not needing for further improvements, apply to potassium and creatine kinase (CK). Potassium shows inter-laboratory imprecision and bias (Figure 4) within the allowable limits for almost all peer groups. For the remaining electrolytes good inter-laboratory imprecision can also be seen, well in agreement with the 2014 survey (performed in collaboration with other European countries) where all participant laboratories and manufacturers fulfilled the APS for total analytical error at the minimum performance level (20). Creatine kinase show good inter-laboratory imprecision and bias (Figure 10), except for the new group      5), produces low results. Lack of commutability of calibration traceability materials was described to be a crucial factor to assure standardization in medical laboratories by Panteghini and Ambruster (21,22).
Instrument dependent problems can be seen in this study for alkaline phosphatase (ALP) with low results for Roche users (Figure 6), whereas all participants use same method and traceability; this event causes an important lack of standardization in our country because it is the greatest group. Same results had been seen by Braga et al., and Aloisio et al. who observed discrepancies among Abbott Architect users related to an "experimental" calibration factor provided by the manufacturer (23,24). Non-standardized ALP results could have a great impact in some clinical scenarios such as hypophosphatemia diagnosis, so an improvement in the results' traceability becomes a crucial objective (25). Method dependent troubles are seen in four cases.
Firstly, amylase, were all groups using malto-heptaoside (G7) substrate, as well as the malto-trioside (G3) of Abbott Architect show harmonized results. The remaining G3 groups have unacceptable neg-  ALP  TI  OK  TI  TI  TI  OK   ALT  TI  TI  TI  TI  TI  OK   Amylase  OK  OK  TI  OK  TI  TI   AST  TI  TI  TI  TI  TI  TI   Bilirubin  TI  TI  TI  TI  TI  TI   Calcium  TI  TI  TI  TI  TI  TI   Chloride  OK  TI  TI  TI  TI  OK   CK  OK  OK  TI  OK  OK  OK   Creatinine, enzymatic  ---OK  -OK   Creatinine, Jaffe  TI  TI  TI  TI  TI  TI   GGT  OK  OK  OK  OK  OK  TI   Glucose  TI  TI  TI  TI  TI  TI   LD  OK  TI  -TI  TI  TI   Magnesium  TI  TI  TI  TI  TI  TI   Potassium  OK  OK  TI  OK  OK  OK   Total protein  TI  TI  TI  TI  TI  TI   Sodium  TI  TI  TI  TI  TI  TI   Urate  OK  TI  TI  OK  OK  TI TI: To improve because either bias or inter-laboratory imprecision does not reach the APS in both or in one of the two surveys evaluated. *BA400 group (Bio-systems) began its participation in the 2016 survey. Only instruments with more than 5 participating laboratories are shown in this table. ALP -alkaline phosphatase. ALT -alanine aminotransferase. AST -aspartate aminotransferase. CK -creatine kinase. GGT -gamma glutamyl transferase. LD -lactate dehydrogenase. OK: Bias and inter-laboratory imprecision achieve the APS. Table 6. Overview of achieved results toward standardization in our setting ative bias (Figure 7). This lack of standardization affects one third of the participants of this study, thus producing a considerable impact on the healthcare in our country. Alanine aminotransferase (ALT) and aspartate aminotransferase (AST) testing show unacceptable inter-laboratory imprecision and bias (low results) (Figures 8 and 9) for laboratories that did not add pyridoxal-5-phosphate (P5P) in its measurement procedure. Infusino et al. and Jansen et al. reported that when reagent is supplemented with P5P the ratio of preformed holoenzyme to apoenzyme differs among specimens (12,26). Gamma glutamyl transferase (GGT), were all groups using substrate of γ-glutamyl-3carboxy-4nitroanilide > 4mmol/L have good precision and bias; however, the Siemens Dimension Vista group that uses a different concentration of substrate (< 4 mmol/L) produces unacceptable high results ( Figure 11). Lastly, creatinine shows good inter-laboratory CV. However, only enzymatic methods have good bias at the entire concentration range studied, whereas most of the Jaffe based measurements produce unacceptable high results at low-normal concentrations (≤ 50 mmol/L) and some of them show inconsistent bias along the two surveys evaluated ( Figure 14). Part of the 2015 results had been previously published and is in accordance with the 2016 survey, as well as with Jassam et al. that observed as Abbott compensated and Jaffe methods were most af- fected by glucose interferences, resulting in either under-or over-estimation of GFR and may also lead to errors in the classification of chronically kidney disease (20,27,28). Likewise, data reported by Panteghini showed an 18 μmol/L positive bias derived from the Jaffe-based method on a Beckman AU 2710 instrument (29). These results are especially relevant for paediatric population. Our results evidences that for consecutive years the Jaffe method produces false high results at low-normal concentration values, in all the instruments used in our country. Consequently, creatinine is not standardized in our setting and considering the clinical implications associated, Jaffe method should be abandoned. Dialogue with providers is of upmost necessity in several cases. The main negative issue is the lack of adequate information about the calibration traceability of the measurement procedure; this circumstance was observed to affect the 55% of participating laboratories in 2015. In order to address and minimize this issue, the SEQCML-Analytical Quality Commission promoted regular and specific meetings with providers and holding educational communications and workshops in national laboratory congresses (5,6). This effort seems to have been worthy, observing a decrease in the percentage of wrong-coding traceability from 55% to 20% in 2016.
Some in vitro diagnostic medical device providers reported their methods for ALT and AST as "IFCC traceable" when no P5P was added; this created a high incidence of wrong codifications by laboratory workers that was solved and recorded by SE-QCML after informing of this circumstance to providers and users.
Lactate dehydrogenase measurements gave good inter-laboratory CV in the 2015 survey but not in 2016; the reason for this remains unknown and should be discussed with providers. Bias showed an interesting improvement, resulting in satisfactory results for all users of the lactate to pyruvate based measurement in the 2016 survey ( Figure 12).
Our findings for bilirubin, chloride, glucose, magnesium (irregular inter-laboratory CV and bias), as well as total protein and urate (good inter-laboratory imprecision, but irregular bias) led us to the opinion that a dialogue with providers would be necessary for improving standardization in our country.
A limitation of this study would be the reduced number of participants in certain groups, due to the fact that this program is still poorly known by many Spanish laboratories. Consequently, one symposium, various workshops in the national congress and specific meetings were organized in 2017, a book has been written in 2018 and other educational activities are planned for the future to overcome this limitation.
Another drawback might be that there is a single exercise per year; this could be not enough to guarantee the trueness for the rest of the year. Because the economic difficulty to make more distributions of these controls materials along the year, laboratories in Spain could use our regular EQA schemes (stabilized materials, peer group evaluation, one sample per month) to verify if their analytical performance is maintained along the year.

Conclusions
The two years of category 1 EQA program experience in our country have manifested a lack of standardization of the 17 more frequent general biochemistry tests used in our laboratories. The application of this kind of EQA program allows estimating measurement procedure-traceability-instrument bias in a way that can be expanded to what happens with real patient samples. The impact of the information obtained by category 1 EQA program on the lack of standardization is: to recommend abandoning methods such as for ALT, AST without exogenous pyridoxal phosphate, Jaffe method for creatinine, pyruvate-lactate for LD, and do not use non-commutable calibrators, such as aqueous solutions for calcium and sodium.

Potential conflict of interest
None declared.