Vitamin D (VitD) deficiency results in abnormalities in bone metabolism such as rickets, osteomalacia, and osteoporosis. Moreover, it has been found to be related to a variety of non-bone-related conditions such as cancer, diabetes, and autoimmune and cardiovascular diseases (1, 2). This increased the demands for VitD testing and forced clinical laboratories to demand for accurate and simple methods suitable for routine measurements (3).
Immunoassays, liquid chromatography tandem-mass spectrometry (LC-MS/MS), and high-performance liquid chromatography are the analytical techniques currently used for 25-hydroxyvitamin D (25(OH)D) measurement (4). Ancient immunoassays had poor analytic performances in terms of accuracy, precision, linearity, and agreement with reference methods. Poor antibody specificity with cross-reactivity to other VitD metabolites, incomplete extraction of the 25(OH)D analyte from the vitamin D binding protein, and confounding matrix substances such as lipids are proposed as potential reasons for the significant differences in the 25(OH)D determination between various assays. Inter-assay variations caused confusion in clinical diagnosis. On the other hand, chromatography/mass spectrometry-based assays are highly sensitive and specific for measuring serum 25(OH)D concentrations, but the high equipment costs and complex measurement procedures limit its widespread use. Thus, clinical laboratories largely rely on automated immunoassays with an improved accuracy and precision performance (5). To improve the quality of 25(OH)D testing, identification of a reference method and a reference standard has been debated in the last decade. An international effort to standardise the measurement of 25(OH)D and its metabolites is currently being led by the Vitamin D Standardization Program (VDSP), which was established in November 2010 by the National Institutes of Health Office of Dietary Supplements, Centres for Disease Control and Prevention, National Institute of Standards and Technology (NIST), and Ghent University (6). A NIST standard reference material (SRM 2972 and 972) has been proposed to improve the traceability and harmonization of 25(OH)D assays. Recently, new kits calibrated with reference material have been produced, or present ones are recalibrated to overcome inter-laboratory variability. The Access2 25(OH)D assay is a competitive chemiluminescence enzyme immune assay, and Roche 25(OH)D total assay is a competitive electrochemiluminescence immunoassay, both presented as traceable to a new NIST standard.
The hypothesis of this study was that these newly calibrated immunoassays minimize inter-laboratory variations with an improvement in clinical decision. In this study, we aimed to assess the analytical performances of newly developed Access2 25(OH)D assay on two analysers, Dxl 800 and Access2 (Beckman Coulter, Brea, CA, USA), and compare these two and a recalibrated Roche 25(OH)D (Roche Diagnostics, Penzberg, Germany) assay with reference LC-MS/MS and see the improvement on clinical decision.
Materials and methods
Study design and subjects
This is an analytical method evaluation study; records of outpatients attending our institute on two consecutive days were evaluated. Adult patients (> 18 years) with no pathologic laboratory results and taking no medications were selected. A total of 124 samples out of remnant sera were included in the study. After LC-MS/MS measurements, 4 samples having measurable 25-hydroxyergocalciferol (25(OH)D2) concentrations were excluded because the number was insufficient to state a subgroup. One sample over the measuring range of E170 system was also excluded. The remaining 119 patient samples, with mean age of 58 years (range 18–80 years) (54 females (45%) and 65 males (55%)) and with different concentrations of 25(OH)D (range 3.99–178.20 nmol/L), were evaluated.
Blood sampling was performed after an overnight fasting at 8:00–10:00 am from the antecubital vein into 5 ml BD Vacutainer Serum Separating Tubes II Advance Tube (Lot 6197520) (Becton, Dickinson and Company, BD Plymouth PL6 7BP, UK). Blood samples were centrifuged at 2000 x g for 10 minutes. Samples were divided in to four aliquots and stored at - 20 ºC for a maximum of 15 days and analysed in batches on four systems. Analytical performance of newly developed 25(OH)D assays was assessed on two platforms: UniCel Dxl 800 and Access2. The limit of blank (LoB), limit of detection (LoD), limit of quantitation (LoQ), linearity, interference, and carryover studies were performed on two Beckman platforms. Another 25(OH)D assay on E 170 (Roche Diagnostics, Penzberg, Germany) was included in precision, accuracy, and method comparisons since acceptable performance criteria were based on accuracy and imprecision. Other analytic performance characteristics of Roche were supplied by the manufacturer and of LC-MS/MS were provided from the reference laboratory.
All studies were done according to the Clinical & Laboratory Standards Institute (CLSI) Evaluation Protocols (EP) specific to each parameter.
LC-MS/MS measurements were performed at Centro Laboratories, a certified clinical laboratory in Istanbul. Three immunoassay measurements were performed in the biochemistry laboratory of Dr. Lütfi Kırdar Kartal Research and Training Hospital between March and May 2016. This study was approved by the Ethical Committee of our institution.
Special applications of Access2 25(OH)D total assay were designed for both Dxl 800 (Cat. No. A98856) and Access2 (Cat. No. B24838). The total coefficient of variations (CV) provided by the manufacturer were 9.3% for 38.94 nmol/L and 5.6% for 399 nmol/L for Dxl 800 and 7.5% for 61.4 nmol/L and 6.1% for 353 nmol/L for Access2. Linearity was given as 4.99–524 nmol/L for Dxl 800 and 4.99–416 nmol/L for Access2. CV values for Roche 25(OH)D assay (Cat. No. 05894913) provided by the manufacturer were 6.8% for 20.4 nmol/L and 3.7% for 174 nmo/L. Linearity was given as 7.5–175 nmol/L. Both Access2 and Roche assays were claimed to measure both 25(OH)D2 and 25-hydroxycholecalciferol (25(OH)D3) as a total and traceable to the NIST reference material.
LC-MS/MS reference measurements were done with an in-house method in Centro Laboratories, a certified clinical laboratory in Istanbul. The analyzer was Triple Quad 4500 (AB SCIEX, Framingham, USA). Phenomenex Kinetex 2.6 um C8 100 An LC column was used. It quantifies 25(OH)D3 and 25(OH)D2 separately using atmospheric pressure chemical ionization and a deuterated internal standard. The system was calibrated with 6PLUS1 25(OH)D3/25(OH)D2 multilevel serum calibrator (0, 13, 39, 76, 152, 262, and 357 nmol/L) (ref. no 62039, Chrome systems, Munich, Germany) traceable to the NIST 972 reference material. The method was linear in the concentration range of 9.98–374.4 nmol/L. The inter-assay CVs were 5.5% for 43.5 nmol/L and 4.2% for 95.5 nmol/L. This laboratory participated in Vitamin D External Quality Assessment Scheme (DEQAS). Previous five monthly external control results of this system showed biases of - 4.6%, - 0.8%, 9.4%, - 1.1%, and -2.7% from the target value, and peer group CVs were 10.9%, 9.9%, 12.6%, 10.7%, and 11.1%, respectively.
Assay performance studies
Imprecision (within-run, between-run, and between-day) was determined using serum pools of two different 25(OH)D concentrations.
Imprecision (CV total) was determined using two serum pools at two different concentrations, low (32.2 nmol/L) and high (109.5 nmol/L), based on patient test results measured in our laboratory. According to the CLSI Guidelines for testing precision (EP5A), they were tested twice daily in duplicate, with a minimum of 4 hours between each run for 20 days. Total CVs were calculated for three immunoassay systems. Acceptable imprecision criteria was CV ≤ 10% (7).
Three samples provided from Randox International Quality Assessment Scheme (RIQAS) external quality assessment monthly immunoassay program (code RQ9130) were used for evaluating accuracy. The first, second, and third samples of Cycle 14 were analyzed once with all the methods. Percent difference from the published target mean was determined as follows: ((result – mean) / mean) × 100. The acceptable accuracy was defined as 30.2% for RIQAS.
LoB, LoD, and LoQ
Studies were done according to CLSI EP17 (8). LoB was determined by analyzing 20 replicates of manufacturer’s zero calibrator and was calculated using the following formula:
LoB = Mean (blank) + 1.645 SD (blank).
The limit of detection was determined using the lowest non-zero calibrator (14.976 nmol/L) which was diluted (1/2), and 20 replicates were analysed. LoD was calculated using the following formula:
LoD = LoB + 1.645 (SD low-concentration sample).
Limit of quantification study was performed by measuring samples with concentrations ranging from 4.822 to 38.281 nmol/L for 10 days. Six samples around the limit of quantitation indicated by the manufacturer (14.976 nmol/L) were measured, and CVs were determined. LoQ is the point at which the fitted curve crosses the 20% CV line.
Linearity was assessed according to CLSI EP6 (9). By diluting the highest standard of each reagent kit, six different concentrations in the range of 23.5–525 nmol/L for Dxl and in the range of 17.5–412 nmol/L for Access2 were performed and analysed for three replicates in a single run. Acceptable recovery criteria were ± 15% from the target concentration.
Interference was tested for haemoglobin, bilirubin, and triglycerides according to CLSI EP7 A (10). All spiked samples were tested twice. The percent difference for each interferent studied was calculated using the average from the duplicate measurements ((spiked – nonspiked) / nonspiked) x 100). A deviation of more than 10% was considered significant.
For haemoglobin interference study, 6 mL of venous blood samples of healthy volunteers were collected to Plastic Whole Blood tubes spray-coated with K2EDTA (BD Vacutainer, lot 367863) and centrifuged at 5000 x g for 5 minutes. Plasma was omitted, and the cell pocket at the bottom was washed three times with 8 mL of physiologic serum. Supernatant was omitted, and equal volume of distilled water was added on the erythrocyte pocket. The tube was placed in - 40 oC for 20 minutes to get erythrocytes haemolysed. After last centrifugation hemolysate was obtained as supernatant. Haemoglobin concentration was measured at LH750 haematology analyzer (Beckman Coulter, Brea, CA, USA). Eight different concentrations of haemoglobin (range 0.25–42 g/L) was spiked into patient serum pools.
Bilirubin standard (≥ 98%, Sigma Aldrich B4126, EmM/453 = 60) was dissolved in chloroform Merck (M102445.2500), and different bilirubin concentrations (range, 54.72–513 μmol/L) were spiked into serum pools for the determination of bilirubin interference.
Samples prepared from the lipid standard Intralipid 20% (Sigma Aldrich) with four different triglyceride concentrations (7.67–21.8 mmol/L) were spiked into serum pools for the determination of triglyceride interference.
Sample carryover was evaluated by measuring three replicates of a high-concentration sample (samples a1, a2, and a3) immediately followed by three replicates of a low-concentration sample (samples b1, b2, b3). Carryover was calculated using the equation (b1–b3) / (a3–b3), < 2% was accepted as negligible (11, 12).
25(OH)D analyte concentrations of the samples within the measurement range of all systems were processed in a single batch, in duplicate, within the same freeze / thaw cycles. Method comparison studies were performed according to CLSI EP9 (13).
The distribution of data were assessed by the Kolmogorov–Smirnov test, and results were expressed as median and interquartile range. EP Evaluator Release 9 software (David G Rhoads Association, Kennett Square, PA) was used to calculate imprecision, LoB, LoD, LoQ, and linearity. Method comparison data were evaluated using the Bland–Altman plots, Passing–Bablok regression, and concordance correlation coefficient (CCC), and kappa (κ) coefficients were done with MedCalc Statistical Software (version 12, MedCalcSoftware, Mariakerke, Belgium). Systematic error was considered significant if the 95% confidence intervals did not include 1.0 for slope (proportional error) or 0 for the y-intercept (constant error).
Diagnostic accuracy was tested with the κ coefficient. Taking 74.88 nmol/L as cut-off, patients were grouped as deficient and non-deficient according to LC-MS/MS results. Agreement of the test method in identifying patients was expressed with the κ coefficient. The interpretation of κ is as follows: 0.41–0.60 moderate; 0.61–0.80 substantial; and 0.81–0.99 almost in perfect agreement. Kappa should be greater than 0.61 to be considered acceptable (14).
The assay performance studies are given in Table 1. The median (2.5–97.5 percentiles; nmol/L) values of 119 samples were 50.4 (12.7–161.9) for Access2, 56.3 (14.5–174) for DxI 800, 59.7 (7.4–163) for E170, and 67.8 (13.8–174) for LC-MS/MS. All three systems deviated negatively from reference results. Access2 with the smallest mean had a 22.2% deviation, while E170 had 11% and DxI 800 had 10% deviations from LC-MS/MS. Box and whisker plots show the distribution of results for the four methods in Figure 1.
The Bland–Altman analysis yielded negative biases for all three immunoassay systems compared with the reference. All three biases were significantly different from zero (P < 0.05). However, none of them were < 5% as VDSP suggested. DxI 800 had the smallest bias, - 8.6%, and Access2 had the largest at - 19.2% (P < 0.001). The E170 system had the largest limits of agreement (- 40.1–64.5%), which is a measure of imprecision. Bland–Altman plots are shown in Figure 2.
According to the Passing–Bablok regression analysis, the DxI 800 and Access2 systems had proportional biases (with slope values of 0.878 and 0.748, respectively), while the E170 system had a constant bias with an intercept value of - 2.797. This system had the largest random error (residual standard deviation, 5.10) Passing–Bablok regression analysis is shown in Figure 3.
In concordance correlation analysis, the DxI 800 and E170 systems showed moderate agreement (CCC = 0.941 and 0.901, respectively).
Kappa coefficients of interrater agreement were found to be moderate for DxI 800 and E170 (κ = 0.709 and 0.771, respectively) and fair (κ = 0.572) for Access2 systems.
Compared with Access2, the DxI 800 system showed a positive bias of 12.7% (P < 0.001) with an R value of 0.95 (intercept 2.26 [CI, 1.96–2.55], slope 0.78 [CI, 0.74–0.82], P < 0.001) in regression analysis. Method comparison data are shown in Table 2.
Access2 25(OH)D assay showed good performance in LoB, LoD, LoQ, linearity, interference, and carryover studies on both platforms DxI 800 and Access2. Imprecision values for both low and high concentrations of 25(OH)D were acceptable (< 10%). Accuracy was found acceptable for both systems based on RIQAS criteria (30.2%). Method comparison studies showed a persistent negative bias in all three immunoassays compared with LC-MS/MS; the DxI 800 system had the smallest bias (- 8.6%), and Access2 had the greatest (- 19.2%). The DxI 800 and E170 systems showed moderate agreement (CCC = 0.941 and 0.901, respectively), while Access2 system was fair (CCC = 0.854). Taking 74.88 nmol/L as the cut-off for diagnostic insufficiency, DxI 800 and E170 systems differentiated insufficient patients moderately (κ values 0.709 and 0.771, respectively) as Access2 system did fairly (κ = 0.572). Compared with Access2, the DxI 800 system showed a positive bias of 12.7% (P < 0.001) with an R value of 0.95 (intercept 2.26 [CI, 1.96–2.55], slope 0.78 [CI, 0.74–0.82], P < 0.001).
There were significant biases and poor CVs in many of the previous studies about VitD (15-17). Even VDSP recommendations did not contribute too much to assay performances and clinical diagnosis (18-20). In these studies, the lack of definite acceptable performance criteria was an important issue. Different CVs, biases, and accuracy goals were set in different studies. Enko et al. used a CV < 20%, referring to an opinion of the U.S. Department of Health and Human Services, Food and Drug Administration (19, 21). This study did not demonstrate a priori goal for bias. Farrell et al. used CV < 9.1% and bias < 15.8% as performance goals based on biological variation studies (16, 22, 23). Yu Chen used the DEQAS expert opinion criteria of CV < 22% and bias < 10% and laboratory data model goals of CV < 15% and bias < 10% as proposed by Stockl et al. (12, 24). Wyness et al. also used CV < 10% and bias < 15.8% as criteria based on the biologic variation saying that to establish a performance of < 5% for bias could hardly be attained (25). In our study, total CV and bias values of all assays were acceptable according to these established criteria in literature, but none of them achieved a bias of < 5%; DxI 800 could be said to attain a comparable one (8.61%).
LC-MS/MS is a precise and reliable method with high sensitivity and specificity and is also able to measure 25(OH)D2 and 25(OH)D3 separately. In the past, LC-MS/MS methods were not in agreement by themselves, and in a recent study, routine LC-MS/MS measurements were 11.2% higher than the standard reference procedure (26, 27). VDSP also identified a standard reference procedure for this method and expects routine laboratory LC-MS/MS results to be traceable to this reference procedure by the use of NIST reference materials. In our study, the results of LC-MS/MS were significantly higher than all of the immunoassays. In accuracy-based studies we held with RIQAS external quality assurance samples, almost all immunoassays gave negative biases of up to - 23.82% (except one bias of 0.11%). When the same samples were measured with LC-MS/MS, it gave positive biases in the range of 29.6–62.82%. Obviously, LC-MS/MS results were higher than immunoassays. Immunoassays were participants of the RIQAS, and the LC-MS/MS system was a participant of the DEQAS Programme. The acceptable accuracy was defined as 25% for DEQAS and 30.2% for RIQAS (26). Thus, each method showed a comparable accuracy in its own program. The previous five biases obtained by LC-MS/MS in DEQAS were - 4.6%, - 0.8%, 9.4%, - 1.1%, and - 2.7%, and the observed peer group CVs were 10.9%, 9.9%, 12.6%, 10.7%, and 11.1% respectively. Regarding these data, Centro laboratory results had high accuracy, with peer group CVs showing the scatter of LC-MS/MS results seemingly higher than expected. Though some reports mentioned about recently improved agreement among LC-MS/MS methods (28), the CV values above reflect the variations in DEQAS participants.
Different from previous studies, we provided essential data describing both analytical and diagnostic performances of the Beckmann assays besides imprecision and accuracy performances of the new Roche assay. We saw that achieving analytical goals did not mean better clinical diagnosis. For instance, the DxI 800 system showed an acceptable performance in terms of imprecision, accuracy, and bias, and it could moderately differentiate insufficient patients (κ = 0.709) in agreement statistics. From the point of clinical practice, these numerical data mean a misdiagnosis of 33% (16/49) of non-deficient patients as deficient. Thus, we say that analytical and clinical aspects of a method performance should always be considered together.
An important limitation of this study was the lack of a subgroup containing VitD2. The majority of supplements mostly used do not contain VitD2 in Turkey; thus, we could only detect measurable amounts of 25(OH)D2 in four patients’ sera and excluded them. Moreover, 3-epi-25(OH)D3, which might be present in sera of children (29), was not measured in this study. To avoid the influence of this epimer, we did not include children’s sera in the study.
Based on the present criteria, all immunoassays can be used in routine 25(OH)D measurements, still fairly diagnosing the patients’ status. Recent standardization attempts do not seem to contribute too much to clinical diagnosis. At least a clinical laboratory must be aware of its method to avoid the misinterpretation of results.