Proficiency testing of maternal serum prenatal screening in second trimester in China, 2015

Introduction Prenatal screening and diagnosis is important for the detection of birth defects and genetic diseases. The nationwide proficiency testing (PT) of maternal serum prenatal screening in second trimester in China has been launched since 2003 and partly reflected the performance of screening laboratories. This study analysed the 2015 PT results to examine the performance of clinical laboratories and different platforms. Materials and methods Fifteen lyophilized samples with different concentrations divided into three panels, were distributed to 613 clinical laboratories in 2015. Acceptable performance was defined as scores more than 80% of acceptable responses with the evaluation criterion of ± 30%. The robust coefficient of variation (CV) was also analysed. Chi-square (χ2) test was used to compare the acceptable performance while Kruskal-Wallis test and Mann-Whitney test were applied to compare the robust CV among analytes and mainstream platforms. Results Overall, 605, 61, 214, 416, 303 laboratories submitted effective results for alpha fetoprotein (AFP), total human chorionic gonadotropin (t-hCG), β-hCG, unconjugated estriol (uE3) and free β-hCG. The acceptable performances of AFP (μg/L), AFP (KIU/L), t-hCG, β-hCG, uE3, and free β-hCG were 98.45%, 99.24%, 95.58%, 98.72%, 94.50%, and 98.66%, respectively. The χ2 test indicated significant differences existed in the acceptable performances among different analytes and platforms for uE3. Kruskal-Wallis test and Mann-Whitney test suggested the robust CV differed significantly in different analytes and platforms. Conclusions The majority of results were acceptable. However, further effort is needed to achieve the standardization and harmonization among analytes and various platforms, particularly for uE3.


Introduction
The high prevalence of birth defects and genetic diseases in China has seriously threatened the health condition of neonates and a ected the quality of population (1). The objective of prenatal screening and diagnosis is to identify those women at the increased risk for an a ected pregnancy and to maximize the options available to them (2). Maternal serum prenatal screening in second trimester is a screening test that collects the peripheral blood of pregnant women at 15-20 weeks (+ 6 days), combining the age-related risk of maternal for an a ected pregnancy with the risks associat-ed with the concentrations of biomarkers (3). The serum biomarkers involves alpha fetoprotein (AFP), total human chorionic gonadotropin (t-hCG), β-hCG, unconjugated estriol (uE3), free-β-hCG, and Inhibin-A (Inh-A), which have been used in combined forms to produce double (AFP and t-hCG/β-hCG/free β-hCG), triple (AFP, uE3, and t-hCG/β-hCG/free β-hCG) and quadruple (AFP, uE3, Inh-A, and t-hCG/β-hCG/free β-hCG) tests (4). Second trimester prenatal screening is economic, simple and non-invasive, and has been widely adopted since 1990s (5).

Special issue: External Quality Assessment in Laboratory Medicine
Original papers Zhang X. et al.
Second trimester maternal screening PT in China Prenatal screening has been started with imported software in China since 1990s (6). After two decades, most clinical laboratories and maternal and childcare service centres have provided prenatal screening services. To ensure the reliability of screening testing results in second trimester and assess the performance of laboratories simultaneously, the National Center for Clinical Laboratories (NCCL) in China has initiated the pro ciency testing (PT) schemes for prenatal screening in second trimester nationwide since 2003, including AFP, hCG, β-hCG, free β-hCG and uE3. The frequency of PT evolves from once a year, twice a year to three times a year in 2015, shortening the monitoring period of institutions. The number of participants has increased from dozens to more than 600, covering 31 provinces nationwide (6). Information obtained from PT scheme can partly re ect the quality of screening laboratories by comparing its results with those of its peer group that have adopted the same platform (7). In addition, extensive results from the national PT may o er valuable information on the overall performance of prenatal screening laboratories within a country. This study presents the PT results of maternal serum prenatal screening in second trimester in 2015, in order to examine the performance of clinical laboratories and di erent platforms in China.

Materials
The PT samples were commercial controls purchased from Baorong (Hangzhou, China) and prepared from human serum with additives of human or animal origin, chemicals, and stabilizers. All samples had been prepared, labelled and inspected to be non-reactive for the hepatitis B surface antigen (HBsAg), hepatitis C virus antibody (HCV) and human immunode ciency virus antibody (HIV-1, HIV-2). All samples were provided in lyophilized form to increase the stability and would remain stable if stored integral at 2 to 8 °C until the expiration date. The homogeneity and stability of all samples were validated based on the China National Accreditation Service for Conformity Assess-ment (CNAS) guidance CNAS-GL03 (8). In this survey, three PT test panels (20151, 20152, and 20153) consisting of fteen samples were distributed to each participated laboratories in 2015, including low, normal, high and clinically important decision levels of analytes. Each sample of this PT scheme included ve analytes: AFP, t-hCG, β-hCG, free β-hCG and uE3, respectively and was coded with six digits to facilitate analysis. The rst four digits indicated the year, the fth digit represented the lot of the panel, and the last digit stated the number of a sample in one panel.

PT program organization
In total, 613 laboratories in China were invited to participate in this survey for prenatal screening organized by NCCL in 2015. Fifteen control materials of three panels were assigned to participated laboratories in February 2015. Detailed instructions were provided to laboratories in hospital and maternal and child care service centres meanwhile, including details relating to the storage conditions, sample processing methods, and other procedures. Participants were required to handle the samples as guided and treat them equally as the patient specimens according to instructions. Participants were recommended to assay the rst ve samples ( The results were submitted via the Clinet-EQA reporting system developed by NCCL (http://www. clinet.com.cn) before November 2015. Participants were expected to handle the samples using their routine methods to ensure that the results of this survey can re ect the actual ability of measuring.

Evaluation of the results
The participated laboratories were classi ed into several subgroups in terms of the platforms they adopted. For each analyte, we merely selected the . The robust average of the results reported by all participants in a subgroup was considered as the assigned value, which was calculated using algorithm A introduced in ISO 13528 (9). For AFP, the result of each sample was considered acceptable if it fell in the range of ± 30% or 5 μg/L (whichever was larger) of the assigned value, for t-hCG, β-hCG, uE3, and free-β-hCG, the criterion was ± 30% established on the basis of the testing performance in China. As for other PT programmes, participants would obtain 20 points for an acceptable result. When 4 or 5 acceptable results for each panel (5 samples) were reported (80 or 100 points), the performance of this laboratory was determined to be satis ed. Unsatisfactory performance was attributed to scores below 80% for each analyte based on CLIA' 88 (10). The overall acceptable performance of each analyte was dened as (number of acceptable results) / (overall number of e ective results). The acceptable performance of each panel was calculated as the ratio of the number of laboratories with satisfactory performance of this panel divided by the total number of laboratories of this panel. The acceptable performance of each platform was equivalent to the total number of laboratories with satisfactory performance of this platform divided by the total number of laboratories of this platform.

Statistical analysis
Data submitted by participants were calculated and statistic analysed via Microsoft Excel 2010 (Microsoft Inc., Redmond, Washington DC, USA), SPSS 19.0 (SPSS Inc., Chicago, IL, USA) and Clinet-EQA evaluation system designed by NCCL and developed by Clinet Information Technology (Beijing, China). For each sample, basic statistic parameters, such as the number of laboratories, arithmetic mean, standard deviation (SD), coe cient of variation (CV), robust average, robust standard devia-tion and robust CV were calculated and applied to assess the performance of screening laboratories. The parameters of each panel and platform were also analysed. To compare the acceptable performance among di erent analytes and various platforms, the chi-square (χ 2 ) test was used. The nonparametric Kruskal-Wallis (K-W) test and Mann-Whitney (M-W) test were also applied to identify signi cant di erences of robust CV among various platforms and analytes. P < 0.05 was de ned as the threshold of signi cance.

Results
In 2015, a total of 613 screening laboratories in hospitals and maternal and child health centres providing prenatal screening services were enrolled in this PT programme, in which 605 laboratories submitted e ective results. Results of AFP were submitted by two di erent units, μg/L and KIU/L, respectively. Overall, 289, 316, 61, 214, 416, 303 laboratories submitted e ective results for AFP (μg/L), AFP (KIU/L), t-hCG, β-hCG, uE3, and free-β-hCG. The overall acceptable performances of AFP (μg/L), AFP (KIU/L), t-hCG, β-hCG, uE3, and free β-hCG were 98.45%, 99.24%, 95.58%, 98.72%, 94.50%, and 98.66%, respectively. The results of each screening laboratories were scored and analysed in accordance with the criteria described above. Table 1 shows the acceptable performance of three panels for each analyte in 2015. For all analytes and panels, the proportion of laboratories with acceptable performance was above 90%, ranged from 92.8% (uE3, panel 20151) to 99.7% (AFP, KIU/L, panel 20153). The results of (χ 2 ) test suggested signi cant di erences existed in the acceptable performance among di erent analytes (P < 0.001).
The scatter diagram of robust CV of each sample for 6 analytes is shown in Figure 1. Each data point identi ed the robust CV of each sample (15 samples for each analyte). AFP and free-β-hCG showed better performance with robust CV below 10% while uE3 represented a poor performance with robust CV reached 30%. The results of Kruskal-Wallis test indicated statistical signi cant di erences of robust CV from di erent analytes (P < 0.001).
To further evaluate the robust CV of di erent platforms in prenatal screening testing, Figure 2 shows the assigned value (robust average), robust SD and robust CV for each sample and each mainstream platform for AFP (μg/L), AFP (KIU/L), t-hCG, β-hCG, uE3, and free β-hCG. The samples in horizontal  For AFP (μg/L), the robust CV was higher in lower concentrations. For AFP (KIU/L), large uctuations were seen in robust CV of Fenghua, while Perki-nElmer had a preferable performance with robust CV below 4%. For t-hCG, the robust CV using Beckman was lower than that using DPC, except lots 201511, 201533, and 201534. The robust CV did not change drastically with the change of assigned value. For β-hCG, Abbott showed best among these three measurement systems. For uE3, the results indicated an observable decrease in robust CV along with the increased concentration. The robust CV was extremely large for uE3, however, the robust CV of PerkinElmer was relatively low with robust CV less than 10%. For free-β-hCG, the robust CV had larger degree of dispersion in lower concentrations among di erent platforms. The P values of Kruskal-Wallis test indicated the robust CV di ered signi cantly among various platforms for AFP (μg/L, P < 0.001), AFP (KIU/L, P < 0.001), β-hCG (P < 0.001), uE3 (P < 0.001), and free-β-hCG (P = 0.002). Mann-Whitney test showed signi cant di erences in robust CV between the two mainstream platforms for t-hCG (P = 0.002).

Discussion
Clinical laboratories desire to perform well and are required to participate PT schemes regularly by national standard and some local regulations in China. This report is an inaugural analysis of the national PT scheme for maternal serum prenatal screening in China. Information obtained from this PT programme might encourage participants to make e ort to investigate the failures and improve the prenatal screening testing performance in Chi- AFP -alpha fetoprotein; t-hCG -total human chorionic gonadotropin; β-hCG -β-human chorionic gonadotropin; uE3unconjugated estriol; free-β-hCG -free β-human chorionic gonadotropin. * The acceptable performance (%) of each platform was de ned as the total number of laboratories with acceptable performance of this platform in three panels divided by the total number of laboratories. Acceptable performance was attributed to scores ≥ 80. Acceptable result fell in the range of ± 30% or 5 μg/L (whichever was larger) of the assigned value, ± 30% for hCG, β-hCG, uE3, and free-β-hCG.    na, which could help to the detection of birth defects and decrease the rates of birth defects ultimately.
A total of 605 laboratories in tertiary and secondary hospitals submitted e ective results, covering mainstream platforms used nowadays. The numbers of laboratories participated in di erent measurements varied due to the disparity of selected screening protocols by laboratories (double, triple, or quadruple tests). Among them, AFP, β-hCG, free-β-hCG, and uE3 were customary chosen by laboratories while the number of laboratories using hCG was relative small (approximately selected by 10% laboratories). As the study for prenatal screening suggested that free-β-hCG was de ned as an indicator with higher speci city in prenatal screening than hCG at 14 ~ 16 weeks during pregnancy (11).
College of American Pathologists (CAP) set the evaluation criterion as ± 3 standard deviations of the assigned value for AFP, t-hCG, uE3, and free-β-hCG. In our study, the acceptable criterion was dened as ± 30% or 5 μg/L (whichever was larger) of the assigned value for AFP, ± 30% for hCG, β-hCG, uE3, and free-β-hCG. The evaluation criterion was established based on the state-of-the-art performance in China, comprehensively considering the suggestions from extensive specialists of laboratory medicine and clinical medicine. Despite the criterion used in this study was di erent from the criteria in other PT schemes, it can certainly reveal the performance of laboratories in China.
The results of this study demonstrated that there was a signi cant di erence in acceptable performance among di erent analytes, in which uE3 was comparatively lower. Study conducted by Lü discovered that the stability of uE3 was relatively worse than that of other measurements in prenatal screening (12). Likewise, our report showed the robust CV of uE3 was higher than that of other analytes, suggesting the results of samples for uE3 may be more possible to exceed the evaluation criterion, which contributed to the lower acceptable performance for uE3.
The acceptable performance of maternal serum prenatal screening in this study di ered signi cant among di erent platforms for uE3, but not for other analytes. The robust CV of uE3 using Beckman and PerkinElmer platforms was remarkably higher than that using DPC platform, suggesting that the dispersions of results using Beckman and Perki-nElmer platforms were greater than using DPC platforms, thus corresponded to the lower acceptable performance. To further analyse the root causes, the di erent performance among various platforms for uE3 might be explained as the problems of methodology, instructions, practice, reporting, or even aware of quality control of laboratory sta . For AFP, t-hCG, β-hCG, and free β-hCG, although variations within various platforms generated discrepancies, and signi cant di erences existed in the robust CV among di erent platforms, the acceptable performance had no signicant statistical di erence among those platforms.
A limitation of this study was the manufacture, transport, and storage technique of control materials, and simulated mature sera instead of samples of real pregnant women were used in the PT scheme, which may have caused the unavoidable matrix e ect. There might be a signi cant di erence among results of di erent platforms due to the matrix e ect, so the assigned value (robust average) was also calculated by subgroups. In spite of this, the performance in the PT scheme could somewhat re ect the performance of daily practice in laboratories and platforms.
In conclusion, the results of this prenatal screening PT scheme indicated that the majority of results were acceptable in maternal serum prenatal screening in second trimester in China. However, signi cant di erence existed in the acceptable performance among analytes and platforms for uE3. The PT scheme is vital, and further e ort is needed to achieve the standardization and harmonization among various platforms, particularly for uE3. Zhang X. et al.
Second trimester maternal screening PT in China the PT schemes. Clinet website (www.clinet.com. cn) is gratefully acknowledged for giving computer technology support to establish the network platform of data analysis.

Potential con ict of interest
None declared.