Reference intervals for thyroid disorders calculated by indirect method and comparison with reference change values

Introduction The aim of the study was to calculate reference intervals (RIs) for thyroid stimulating hormone (TSH), free thyroxine (fT4) and free triiodothyronine (fT3) and evaluate the clinical significance of these intervals by use of reference change values (RCV) of the analytes. Materials and methods Laboratory patient data between August and December 2021 were evaluated for the study. A total of 188,912 patients with TSH, fT4, fT3, anti-thyroid peroxidase antibodies (Anti-TPO) and anti-thyroglobulin antibodies (Anti-Tg) results were evaluated. All measurements were performed on Cobas c801 (Roche Diagnostics, Penzberg, Germany) using electrochemiluminescence immunoassay technology. Estimated RIs were compared with manufacturer’s by means of RCVs of analytes. Results Thyroid stimulating hormone values didn’t differ significantly by gender and age. The combined RIs for whole group (N = 28,437) was found as 0.41-4.37 mIU/mL. Free T4 values (11.6-20.1 pmol/L, N = 13,479 in male; 10.5-19.5 pmol/L, N = 17,634 female) and fT3 values (3.38-6.35 pmol/L, N = 2,516 in male; 3.39-5.99 pmol/L, N = 3,348 pmol/L in female) significantly differed by gender (P < 0.050). Both fT4 and fT3 values also showed significant differences in age subgroups comparisons. So, male and female RIs were represented separately for age subgroups. When compared with manufacturer’s RIs, TSH whole group and fT4 subgroups RIs didn’t exceed the analytes’ RCVs, but this difference was greater for fT3. Conclusions Reference interval estimation by use of indirect method out of laboratory data may be more accurate than manufacturer provided RIs. This population based RIs evaluated using RCV of analytes may provide useful information in clinical interpretation of laboratory results.


Introduction
The reference interval (RI) is defined as the interval corresponding to the central 95% of values of a reference population, including the two boundary limits: upper reference limit (URL) and lower reference limit (LRL). This interval is supposed to represent a well-defined status of physiological conditions, mainly "goodhealth" together with other analytic variations of the assay system and biologic variations of the analyte in the particular population (1,2). Thus, it is recommended that medical laboratories determine their own RIs to cover the variability of their local populations and their specific analytic methods and devices. For the process of RI determination, the Clinical Laboratory Standards Institute (CLSI) recommends "direct" approach, where well defined reference subjects are selected with pre-defined criteria and the measurements are done afterwards (3). Direct method is hard to apply for every laboratory in routine practice for it demands much time and money (4). The alternative approach is the "indirect" method where test results of patients that were ordered for screening, diagnosis or follow-up purposes, are derived from laboratory information system (LIS) and used to determine the RIs. Indirect method generally uses the data of outpatients and primary care patients and exclude the results that don't fit the general distribution of data. This method is faster and cheaper; it doesn't cause discomfort or any additional risk to patients, nor any additional workload to laboratory staff (5). Besides, the results obtained by the indirect method are closer to the actual state of the population of a given region, because they take into account the analytical and biological variability of the analysed parameter (1). Recently, The International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) Committee on Reference Intervals and Decision Limits encourages the use of indirect methods to establish and verify reference intervals (2).
The prevalence of subclinical hypothyroidism is very high; up to 4.8% in Europe in a recent metaanalysis (6). Subclinical hypo/hypertiroidism is diagnosed in the laboratory, so the need for accurate both upper and lower reference limits have been strongly emphasized. But there are still discrepancies between RIs used in laboratories as well as up-to-date scientific literature (1). In recent studies, calculated thyroid stimulating hormone (TSH) upper limits varied between 5.28 to 2.84 mIU/L and lower limits from 0.17 to 0.64 mIU/L with remarkable differences in RIs of free triiodothyronine (fT3) and free thyroxine (fT4) too (7)(8)(9)(10). Variations in the results of thyroid function tests in a healthy population may be due to analytical (CV A ), intra-individual (CV İ ) and inter-individual (CV G ) variations. Thyroid stimulating hormone, in its nature, has various isoforms with different glycosylation patterns in circulation. Glycosylation may alter the biological activity of the hormone but the immunological pattern is not affected causing a normal result when measured with an immunoassay (11). Moreover, this heterogeneity may induce problems in the standardization of TSH measurements, which may explain differences of around 30-40% in TSH values due to assay technology (12). Besides, thyroid hormones, especially TSH, as well as fT4 and fT3, show a large biological variation, mainly CV G . In such a situation, it is not advisable to use other populations' RIs; at least each laboratory should establish its own population-based intervals by means of a cheap and easy optimized indirect statistical method out of a large data set. For any analyte with a great CV G like TSH, there is a need to have more granularity in the RI by partitioning into more homogenous subgroups by age and/or gender etc. (13).
Reference change value (RCV) is the critical difference that may be attributed to a real clinical change, which depends mainly on the CV A and CV İ variations of the particular analyte (14). Assuming CV A is often very small for most assays, RCV largely depends on the CV İ of the analyte (15). This RCV concept will offer clinician a more accurate tool to detect changes in a patient' health status. Furthermoore, this is surely a better approach than the population-based RIs because many analytes have been shown to have a great CV İ (14). Biological variation estimates of many analytes are available at www.biologicalvariation.eu (16).
In this study, we calculated our age and gender specific RIs for TSH, fT4 and fT3 on a large data. We compared our results with those of manufacturers and others in medical literature and tried to establish the significance of variations in terms of RCVs attributed to these 3 analytes to contribute their clinical interpretation.

Study design
This is an indirect method of reference value analysis using laboratory patient data of Kartal Dr Lütfi Kırdar City Hospital localized in Anatolian region of Istanbul which gives healthcare with 1205 inpatient beds and 10 thousand daily outpatient visits. Besides our hospital, our core laboratory accepts approximately 15 thousand daily samples from 8 other hospitals and 166 primary care centers serving a large population living in both eastern, mostly, and western regions of the city.

Subjects
Laboratory information system patient data between August and December 2021 were evaluated for the study. A total of 188,912 patients having all of TSH, fT4, fT3, anti-thyroid peroxidase antibodies (Anti-TPO) and anti-thyroglobulin antibod-ies (Anti-Tg) results were downloaded from our LIS. Only the first result of each patient was included. Patients < 18 years of age, pregnant females, inpatients results and patients with pathologic fT4, fT3, Anti-Tg, Anti-TPO results and/or with TSH > 10 mIU/mL were excluded. The flow diagram of study is shown in Figure 1

Blood sampling
Only morning fasting samples were accepted for routine chemistry analyses. Blood was drawn into BD Vacutainer® SST™II tubes (Becton Dickinson Italia S.p.A., Milan, Italy, ref. n. 366566) with serum se-parator in all centers and was centrifuged at 3000xg for 10 minutes and then transported to our core laboratory at 0-5 °C and were measured within 2 hours of admittance.

Methods
Roche TSH, fT4, fT3, Anti-TPO and Anti-Tg assays are based on electrochemiluminescence immunoassay to be used on Cobas e 801 immunoanalyser c801 (Roche Diagnostics, Penzberg, Germany). The TSH test method is sandwich immunoassay, while the others are competetive immunoassays. The TSH assay is calibrated against 2. International Reference Preparation (IRP) WHO Reference Standart 80/558, while fT4 assay was calibrated against the 4 Enzymun-Test which had been calibrated against an equilibrium dialysis fT4 analysis of Roche. The fT3 assay was calibrated against an equilibrium dialysis fT3 analysis at Roche. Thyroid stimulating hormone assay has a functional sensitivity of < 0.005 mIU/L with a manufacturer provided RI of 0.27-4.20 mIU/L. Free thyroxine assay has a limit of detection (LOD) of 0.5 pmol/L with a manufacturer provided RI of 12-22 pmol/L. Free triiodothyronine assay has a LOD 0.6 pmol/L with a manufacturer provided RI of 3.1-6.8 pmol/L. Detection limits and RIs for anti-Tg assay was 7.16 and < 115 IU/mL and for anti-TPO was 9 and < 34 IU/mL. Both assays were calibrated against National Institute of Biological Standarts and Controls materials (65/93) and (66/387), respectively. Two levels of commercial control sera provided by the manufacturer were conducted daily to ensure internal quality control.

Statistical analysis
For each analyte, male and female frequency distributions were evaluated separately. Shapiro-Wilk test was used to assess whether the distribution of data was Gaussian. Logaritmic transformations were done. Outliers were tested using Tukey's method and subsequently eliminated. Age partitioning were done as decades according to existing medical literature (17,18). Reference intervals were derived by non-parametric method and reported as the 2.5 and 97.5 percentiles with 90% confidence intervals for lower and upper limits. The significance of differences between gender and age subgroups were assessed by the standard normal deviation test (Z-test) and RIs were reorganized (14). The reference change values were used for comparison of the calculated subgroup RIs with RIs provided by the manufacturer. If the calculated % difference was less than the RCV, the difference was not significant (14,19). Reference change values were calculated as described by Fraser et al. (20). CV A s were calculated out of laboratory internal quality control data with two levels control sera measurements for 20 days and calculated with the formula Total CV = √(CV of Level 1) 2 +(CV of Level 2) 2 . CV İ s of analytes were taken from EFLM database (16).
Frequency distributions of all 3 analytes were non-Gaussian for both genders initially, especially male and female TSH values were quite positively skewed. After logarithmic transformations and exclusion of outliers, distributions turned to be Gaussian for all analytes except for slightly skewed TSH male values with a longer tail towards higher values. Approximately 90% of male had TSH values < 3.5 mIU/L. For fT4 and fT3, central 50% distribution values of male were higher than female. Four percent of male but 10% of female had fT4 values < 12 pmol/L, while 10% of male but 4% of female had fT3 values > 6 pmol/L. Frequency distribution diagrams of 3 analytes for male and female are shown in Figure 2.
Age and gender-specific descriptive statistical data of subgroups are presented in Table 1. Higher TSH values were observed in females in total and all age subgroups, but neither difference was statistically significant (P > 0.050  Tables 2, 3 and 4). The 2.5 and 97.5 percentiles derived by non-parametric method and the significance of partitioning were presented in Table 2.
Percent difference for LRLs and URLs between RIs calculated in this study and manufacturer provided were smaller than corresponding RCVs for TSH and fT4 but not for fT3 (Table 3). So manufacturer provided RIs were clinically different from our population based RIs for fT3, but not for fT4 and TSH.

Discussion
In this study we calculated population based RIs for TSH, fT4 and fT3 out of our hospital data by use of indirect method. The RIs of individual groups found in this study were in accordance with manufacturer provided values for TSH and fT4 but not for fT3 when compared with their corresponding RCVs. Subclinical hypothyroidism has a high prevalence all over the world and diagnosis is made mainly by laboratory tests (5,6). Depending on its importance, there are many reports about RIs of TSH, and also a few for fT4 and fT3 in the medical literature. Table 4 shows the variabilities of studies   5.37 (-26.6) ǂ * Percent differences between group and manufacturer LRLs and URLs are separately calculated and compared with RCVs of the corresponding analytes. † Manufacturer recommended reference range. ǂ Percent differences greater than the RCV of the analyte. § TSH combined RIs for whole group. LRL -lower reference limit. URL -upper reference limit. TSH -thyroid stimulating hormone. fT4 -free thyroxine. fT3 -free triiodothyronine. Table 3. Comparison of estimated reference intervals with reference intervals provided by the manufacturer. the same analyser (7,8). Thus, differences between RIs in the medical studies were hard to attribute to any variable. In our study we used an indirect method using our hospital's patient data, selecting outpatients and primary care patients, whom these tests were probably ordered for screening (23). In two of the studies gender related RIs were established and femaleTSH RIs had higher values compared to male (9,10). In our study TSH values of female were higher too, but not statistically significant. Also, male fT4 values were significantly higher than female as in Milinković et al. (10)  URL of 23.5 pmol/L (1). In our study fT4 and fT3 RIs significantly differed by gender and age. There were also differences in study designs in two major subjects: first was the selection of patients according to different cut-offs for anti-TPO levels. In Inal et al. study National Academy of Clinical Biochemistry (NACB) guidelines criteria was applied and any patient having detectable anti-TPO was excluded from the study, thus in this study the URLs of TSH were quite low, and also in Friis-Hansen and Hilsted (5,18,24). Anti-TPO positive subjects were excluded and TSH URLs decreased after exclusion. In our study we excluded both anti-TPO and anti-Tg positive results and our URL was similar to that of Friis-Hansen and Hilsted (18). Hollowell et al. also mentioned about dependency of TSH results on anti-TPO levels (25). Another point is the thyroid ultrasonography (TUS) evaluation for the selection of reference individuals. In our study and other indirect studies this was not possible, however, TUS is not recommended in even strict NACB guidelines since it is not proven to be associated with TSH RIs in some studies (7). However, if patient selection could be made together with TUS results, it would contribute to the selection of reference individuals; so this may be the considered as the weakness of the study.
Another important difference is the appliance of different statistical procedures and interpretation of statistical significance. In case of laboratory results, a statistically significant difference does not mean a clinical significance all the time. Biological variations and/or RCVs are now the important criteria of effect size to test the clinical significance (26). In our study, we used standard deviation Z test to compare the subgroups, and RCVs to compare our RIs with those of the manufacturer. We saw that manufacturer RIs should be tested before applying it. Another important point is that the percent difference between LRLs for TSH was 51.2%, a difference smaller than RCV. But when we use the manufacturer provided interval of 0.27-4.20 mIU/L, patients having low TSH values < 0.41 mIU/L seems to be misdiagnosed as normal. The important question, is a 51.2% difference of TSH LRL, clinically significant? Thus, apart from establishing accurate RIs, the clinician should be informed about the RCV of the analyte to decide about any change in patient's status in two consecutive measurements. This approach may soon replace the classic RI assessment. For many analytes like TSH, CV İ is far more smaller than CV G . For such analytes, two consecutive results from a subject may be within the population-based RI but may not necessarily indicate a normal thyroid function (27).
In our study TSH values decreased with age, a pattern showing iodine deficiency. There are several studies in literature confirming this relationship (28,29). Maintenance of an iodine deficiency programme improved this deficiency status of Turkish people, remarkably at city centers (30). In a recent study in 2014, Istanbul was stil found mildly iodine deficient (31). According to Van de Ven et al., an in-verse relationship between TSH and age is usually observed in populations with a history of iodine deficiency (32). From a pathophysiological point of view, a chronic mild to moderate iodine deficiency makes chronic TSH stimulation causing functional thyroid autonomy. This situation in the elderly is a long-term index reflecting mild to moderate iodine deficiency lasting decades, more than actual iodine status. Reference interval width (RIW) is a useful measure to assess the different impact on normal values. In our study we also observed that TSH RIW values of age subgroups got wider progressively by aging; this may be another index of long-term iodine deficiency in Turkish population.
Indirect method is a satisfactory and recommended way of establishing population based RIs, with a large set of data covering the variability of the population. Differences should be compared with RCVs to decide whether they are clinically significant or not. Besides RIs, laboratories should inform clinicians about RCV of analytes for a better interpretation.