Estimation of the measurement uncertainty and practical suggestion for the description of the metrological traceability in clinical laboratories

Clinicians request a large part of measurements of biological quantities that clinical laboratories perform for diagnostic, prognostic or diseases monitoring purposes. Thus, laboratories need to provide patient’s results as reliable as possible. Metrological concepts like measurement uncertainty and metrological traceability allow to know the accuracy of these results and guarantee their comparability over time and space. Such is the importance of these two parameters that the estimation of measurement uncertainty and the knowledge of metrological traceability is required for clinical laboratories accredited by ISO 15189:2012. Despite there are many publications or guidelines to estimate the measurement uncertainty in clinical laboratories, it is not entirely clear what information and which formulae they should use to calculate it. On the other hand, unfortunately, there are a small number of clinical laboratories that know and describe the metrological traceability of their results, even though they are aware of the lack of comparability that currently exists for patient’s results. Thus, to try to facilitate the task of clinical laboratories, this review aims to provide a proposal to estimate the measurement uncertainty. Also, different suggestions are shown to describe the metrological traceability. Measurement uncertainty estimation is partially based on the ISO/TS 20914:2019 guideline, and the metrological traceability described using the ISO 17511:2020. Different biological quantities routinely measured in clinical laboratories are used to exemplify the proposal and suggestions.


Introduction
The measured values of biological quantities facilitated by clinical laboratories provide essential information that conditions the correct clinical orientation, optimisation of patients' healthcare process, and lead to appropriate therapeutic, diagnostic, or healthcare actions. Therefore, these values must be reliable (exact) and comparable with other ones obtained in different periods and places (traceable) (1,2).
Metrological concepts like measurement uncertainty (MU) and metrological traceability (MT) allow to know the degree of accuracy of the measured values that a clinical laboratory provides, and the comparability or transferability of these results over time and space. Currently, such is the importance of these two concepts that the estimation of MU and the knowledge of MT are required for clinical laboratories accredited by ISO 15189:2012 (3).

Rigo-Bonnin R. et al. Uncertainty estimation and traceability description
Measurement uncertainty complements a measured value of a biological quantity, indicating the magnitude of the doubt about this value and providing a quantitative indication of its quality and reliability (4). Nowadays, there are two main approaches for estimating MU: so-called bottom-up and top-down. The bottom-up approach is based on a comprehensive categorisation of the measurement where each potential uncertainty source is identified and quantified. The estimates of uncertainty expressed as standard deviations (standard uncertainties) are assigned to individual components of the procedure, which are then mathematically combined using propagation rules to provide a combined standard uncertainty. Finally, an expanded uncertainty is estimated, multiplying the combined uncertainty by an appropriate coverage factor (1).
Conversely, the top-down approach considers uncertainty as a whole. First, the most significant uncertainty sources are identified and grouped. Then, their standard uncertainties are estimated using available laboratory tests performance information, such as measurement procedure validation or verification data, and intra-laboratory or inter-laboratory data (e.g. internal and external quality control data). Subsequently, the combined uncertainty is obtained from the standard uncertainties for, finally, to estimate the expanded uncertainty (1).
Furthermore, MT is defined as the property of a measurement result whereby the result can be related to a reference through a documented unbroken chain of calibrations, each contributing to the measurement uncertainty (4). In other words, to achieve comparability of results over space and time, it is essential to link all the individual measurement results to some common reference. In this way, results can be compared through their relationship to that reference. Ideally, this reference should be an International System (SI) unit of measurement materialised by a primary reference measurement procedure and a primary measurement standard (5

Measurement uncertainty estimation
The top-down approach is particularly well suited to measuring systems commonly encountered in clinical laboratories. So, the MU should be estimated using this approach and taking into account the following steps (6):

Specification of the measurand
A measurand is defined as the quantity intended to be measured (4). So, the measurand must be unequivocally defined and the measurement procedure used must be exhaustively detailed; otherwise, an insufficient specification of the measurand may itself be a significant uncertainty source (definitional or intrinsic uncertainty) that could be difficult to estimate. To specify the measurand, it is necessary to include at least the following information (6,20): • The measurement unit, e.g., mmol/L, mg/L, entities/L, µkat/L, etc. Sometimes, it is also necessary to include additional information such as the measuring system (or the measurement method or the measurement principle) used to measure the quantity, and the conditions under which the measurements are performed (e.g., the temperature for enzymes).

Identification of the uncertainty sources
According to different clinical laboratory guidelines, the most significant uncertainty sources contributions to the overall MU are captured by the uncertainties related to the assigned value of the end-user calibrator (u cal ), the long-term intermediate precision (u Rw ), and the bias (u b ) (6,7). Thus, it would be sufficient that clinical laboratories consider only these three uncertainty sources to provide reasonable estimates of MU that help ensure that patient results are fit for medical use.

Estimation of the standard uncertainties
Uncertainty related to the assigned value of the end-user calibrator A correct estimate of MU is indeed not possible without the u cal because it includes all uncertainties contributions accumulated across the entire traceability chain of a measurement result (13). Thus, clinical laboratories should include the u cal in the uncertainty budget when they estimate the MU. The in vitro diagnostic (IVD) manufacturers are requested to comply with the European Regulation 2017/746 on in vitro medical diagnostics and must be provided with this information to clinical laboratories (21). Usually, manufacturers present this information as the calibration material assigned value (x cal ) jointly with its expanded uncertainty (U cal or %U rel(cal) ) using a coverage factor (k) equal to 2. So, the u cal can be obtained as: Instead, when clinical laboratories prepare their calibration materials, they are entirely responsible for estimating the u cal . In these cases, the u cal can be calculated taking into account all information used to prepare the calibration materials, and statistically combining the uncertainties associated with each one of the sequential value assignment steps utilising the law for the propagation of uncertainty (8,9,22).

Uncertainty related to the long-term intermediate imprecision
Most of the components of the MU are included in the long-term intermediate imprecision. This imprecision can be calculated from internal quality control (IQC) data (6).
When clinical laboratories estimate the u Rw , there are different considerations that they should take into account (6,7,13): a) The IQC materials used for estimating the u Rw should comply with specific attributes or characteristics. For example, the materials should be commutable and different from that used to check the correct alignment of the measuring systems.
b) The IQC material data must be collected for a sufficiently long-time-interval to reflect most of the sources of variability influencing the measurement process.
c) Different IQC material levels at mean values close to important medical decision limits should be used to know the u Rw behaviour across the measuring interval of the measuring systems. d) A precision study (e.g., comparing variances using the F-test) of representative human samples and IQC materials should be performed to verify that the magnitude of imprecision for both materials is similar. An example of how to assess this type of studies was published by Fuentes-Arderiu et al. (23). e) In clinical laboratories, it is common to indistinctly measure a biological quantity with more than one identical measuring system (or different modules of the same measuring system). Therefore, it would be advisable to obtain an (Equation (Eq.) 1) (Eq. 2). f) To avoid the effect of IQC material lot changes on estimating uncertainty, as well as for practical reasons, the use of a single IQC material lot during the estimation study it would be advisable (6,8).
As far as possible, clinical laboratories should comply with most of these considerations to perform an adequate u Rw estimation, as well as to avoid a possible over-estimate of the u Rw .
When only one calibrator lot, IQC lot, and a unique measuring system are used during a specified time interval, the u Rw can be calculated as the classical standard deviation (s): where x i represents IQC values obtained in a specified time-interval, n the number of IQC replicate measurements in a specified time-interval and x the IQC mean value obtained in a specified timeinterval.
Furthermore, when two or more lots of calibration or IQC materials are involved in a specified time interval, or when two or more identical measuring systems are used to measure the same biological quantity, the u Rw can be calculated as a pooled standard deviation (s p ) (22): where n i is the number of IQC replicate measurements using the calibrator lot i (or the number of IQC replicate measurements using the IQC lot i; or the number of IQC replicate measurements using measuring system i), s i is the standard deviation obtained using the calibrator lot i (or the standard deviation obtained using the IQC lot i, or the standard deviation obtained using the measuring system i), x i is the IQC mean value obtained using the calibrator lot i (or the IQC mean value obtained using the IQC lot i, or the IQC mean value obtained using the measuring system i), x p is the IQC pooled mean calculated using all calibrator lots (or IQC pooled mean calculated using all IQC lots, or IQC mean value calculated using all measuring systems) and N is the total number of IQC replicate measurements.
Uncertainty related to the bias At present, how to deal with the bias on clinical measurements and how to calculate the bias component of uncertainty continues to be a matter of debate. Some authors firmly state that the bias (or its uncertainty) must not be included in the uncertainty budget because the bias component is already will be part of the u cal (13). In other words, it is expected that IVD manufacturers must ensure the traceability of their measuring systems to the highest-order available references. This statement is partially correct because it is known that, in several cases, the IVD manufacturers continue to prepare their calibration materials in-house without any traceability to high-order metrological references, although they are requested to comply with European Regulation 2017/746.
In contrast, other authors opine that when a significant bias is detected this one should be eliminated (24,25). If the bias cannot be eliminated, there are two ways of proceeding: 1) to correct the bias by applying a correction factor and incorporating its uncertainty to the uncertainty budget, or 2) to include the bias itself in the uncertainty budget. It should be noted that the first point would only be applicable for those cases in which the bias study is assessed using certified reference material (CRM), and when the traceability declared by the IVD manufacturer is to the same CRM used to evaluate the bias study (24,25).
Regarding the u b , different procedures allow estimating the measuring system bias, is the one based on the use of reference materials the most widely used. Reference material can be a CRM, an IQC material (with or without an associated IQC inter-laboratory scheme), or a control material belonging to an external quality assurance service Uncertainty estimation and traceability description (EQAS) (6,7,25). Of all of them, CRM or commutable IQC or EQAS control materials with values assigned by an international conventional or primary measurement procedures should be used whenever possible. In the absence of these CRM or commutable control materials, inter-laboratory IQC followed by control materials from an EQAS can be used (14).
When a CRM is used to estimate the bias (b), the b and its uncertainty (u b ) can be calculated as (24): where x designates the mean value obtained after processing the CRM in a specific time-interval, μ is the CRM assigned value and u μ the uncertainty associated with the CRM assigned value. Note that Eq. 5 should be used to calculate the x value if more than one calibration lot or measuring system is used.
Bias studies using IQC materials can be performed following the Farrance et al. recommendations (22): • When two or more lots of calibrator or IQC materials are involved in a specified time interval, or when two or more identical measuring systems are used to measure the same biological quantity, the bias can be calculated as a weighted mean value of bias (b w ) (24): and its uncertainty (u b w ) as: The ICQ manufacturer must provide the μ i,k . Otherwise, if the IQC material presents an associated IQC inter-laboratory scheme, it can be estimated as (26): In the previous equations R represents the total pool size (total number of replicate measurements, i.e., of IQC values), N the number of IQC materials levels used, m the number of calibrator lots (or IQC material lots or measuring systems) used, n i,k the number of replicate measurements using the IQC material level i for the calibrator lot k (or IQC material k, or measuring system k), b i the mean bias over n i replicates, using the IQC material level i for the calibrator lot k (or IQC material k, or measuring system k), x i the pooled mean value obtained using the IQC material level i for the calibrator lot k (or IQC material k, or measuring system k), μ i,k the reference value using the IQC material level i for the calibrator lot k (or IQC material k, or measuring system k) -this value can be the value assigned by the manufacturer of the IQC material or conventional value calculated as the mean of arithmetic means of peer-group laboratories participating in an inter-laboratory IQC program (e.g. UNITY from Bio-Rad Laboratories) using the IQC material level i for the calibrator lot k (or IQC material k, or measuring system k), u μ i,k is the uncertainty associated with the reference value μ i,k , s Labs i,k is the robust peer-group standard deviation obtained using the IQC material level i for the calibrator lot k (or IQC material k, or measuring system k), q i,k the number of peer-group laboratories participating in the IQC material level i for the calibrator lot k (or IQC material k, or measuring system k).
• When only one calibrator lot, IQC lot, and a unique measuring system are used during a specified time interval, equations become: where N represents the number of IQC material levels used, n i the number of replicate measurements using the IQC material level i, M the total number of replicate measurements, b i the bias over n i replicates using the IQC material level i, x i the mean value obtained using the IQC material level i, μ i the reference value for the IQC material level i (this value corresponds to the conventional value calculated as the mean of arithmetic means of peergroup laboratories participating in an inter-laboratory IQC program (e.g. UNITY from Bio-Rad Laboratories), u μi the uncertainty associated with the mean reference value μ i , s Labsi the robust peer-group standard deviation obtained for the IQC material level i and q i the number of peergroup laboratories participating in the IQC material level i.
The bias can also be estimated from EQAS. In these cases, the bias and its uncertainty can be calculated as described above. Thus: • When more than one measuring system is used to measure the same quantity, a mean bias (b) can be calculated as (24) (Eq. 14), and its uncertainty (u b ) as: The EQAS' manufacturer must provide the u μ i or, if not, could be calculated as (26): where R is the total pool size (total number of measurements including all measuring systems and EQAS participations, N is the number of EQAS participations, e i,k is the measurement error for the EQAS participation i and the measuring system k, x i,k is the measured value obtained for the EQAS participation i and the measuring system k, μ i is thre reference value assigned by the EQAS manufacturer for the EQAS participation i, u μ i is the uncertainty associated with the reference value μ i , s Labs i is the robust peer-group standard deviation facilitated by the EQAS manufactured for participation i and q i is the number of peer-group laboratories for the EQAS participation i. • When only one measuring system is used, the bias (b) and its uncertainty (u b ) can be calculated as: If a significant bias is detected, its treatment should be different depending on the kind-of-reference material used. If a CRM is used, the bias should be eliminated by applying a correction factor to every individual measured value obtained, dividing the assigned value of the CRM by the mean value obtained in the bias study. Also, the MU associated with this correction factor (u cf ), calculated such as the u b , should be included in the uncertainty budget. On the contrary, if IQC or EQAS control materials are used, it is not recommended to apply a correction factor to eliminate the bias, and the bias itself should be included in the uncertainty budget (6,22,25).

Calculation of the combined standard uncertainty
Wen the individual contribution of each standard uncertainty source has been estimated, the combined standard uncertainty can be calculated by adding estimates of the standard uncertainties considered above, according to one of the following equations: Clinical laboratories should use Eq. 20 when the compatibility study shows that the bias is not statistically significant. Equation 21 should be used when a CRM is used to estimate the bias, the bias is significant, and it has been "eliminated" applying a correction factor. Equation 22 should use if IQC or EQAS materials are used to estimate the bias, and the laboratory cannot "eliminate" the bias.

Calculation of the expanded uncertainty
Expanded uncertainty (U) is calculated multiplying the u c by a coverage factor k: This k-value depends on the type of probability distribution, the level of statistical significance selected and the number of independent measurements made to obtain the u c . Under typical clinical laboratory working conditions, it is acceptable to use a k-value of 2 (6,7).

Comparison of the expanded uncertainty obtained with the maximum allowable expanded uncertainty
Finally, to know if a U value is acceptable, it must be compared with the maximum allowable (permissible) expanded uncertainty (U max ). Thus, an U value is considered acceptable if it is lower or equal than the previously selected U max by the laboratory.
Another controversial point that currently exists is how the U max should be established. Measurement uncertainty requirements for defining fitness-for-purpose limits may be based on clinical outcome studies, biological variation or state-ofthe-art, being those based on biological variation, despite their limitations, generally accepted and used (27)(28)(29)(30). However, it should be noted that unless a country has established legal metrological requirements (e.g. the German RiliBÄK), the selection of one type of requirement or another is a matter of consensus and depends on the clinical laboratory itself.
So, despite there are several ways to select the U max , we show here a procedure based on stateof-the-art to calculate the U max using on the RiliBÄK concept named "root mean square of measurement error" (∆) (31)(32)(33): %∆ rel(max) the maximum allowable pecent relative root mean square of measurement error, μ a the reference value for which the requirement has been established, CV max the maximum allowable coefficient of variation and %b rel(max) the maximum allowable percent relative bias.
The %∆ rel(max) values can be selected directly from RiliBÄK (31). Otherwise, they can be calculated from the CV max and %b rel(max) using biological variation data, state-of-the-art data, or data from different organizations such as CLIA, National Cholesterol Education Program for lipid-related quantities, European Medicine Agency (EMA) for drugs, among others (34)(35)(36)(37)(38).
To illustrate the proposal for the estimation of MU, some biological quantities that are routinely measured in clinical laboratories using both already "commercial" (i.e., those with CE marking) and "inhouse" validated measurement procedures have been selected (see Supplementary material 1). Table 1 shows the MU budget and the maximum allowable relative expanded uncertainty. Besides, Supplementary materials 2, 3 and 4 contain spreadsheets that allow calculating the primary measurement uncertainty sources (u cal , u Rw and u b ), the u c , and the U. Also, they include a study to know if the U obtained is or is not acceptable compared with the U max , and show an example of how to specify the measurand. Every supplementary material considers the use of the three kind-of-materials to estimate the bias, CRM, IQC materials (with an associated IQC inter-laboratory scheme), and EQAS materials.

Metrological traceability description
As we commented before, the description of MT in clinical laboratories is a less controversy matter than the MU uncertainty and can be made simply based on the ISO 17511:2020 (5). All information needed to its description can be provided by the manufacturers of the reagents or calibration materials, as well as from certificates of analysis of CRM declared by international or national metrology institutes, and from the Joint Committee for Tracea-bility in Laboratory Medicine (JCTLM) database (39). For each biological quantity, the strategy to follow can be based on: • Obtaining the MT declared by the manufacturers. If this information is not present in brochures or incomplete information is found, this one can be acquired directly asking the manufacturers, or in some cases, from websites of government agencies, such as the Food and Drug Administration (FDA) (40). • Obtaining additional information related to the references (units of measurement, measurement procedures, or reference materials) to describe the calibration hierarchies and the sequence of result assignments up the point at which metrological traceability begins. This information can be obtained from the reagent's manufacturers, CRM certificates, or the JCTLM database (39). • Performing a table or flow chart from all information previously collected to describe the metrological traceability chain and the calibration hierarchy of the measurement results. As an example, Table 2 and Supplementary Material 5 show the MT description for some biological quantities.

Conclusions
This review provides practical suggestions of how clinical laboratories could estimate the MU and describe the MT of biological quantities results to help and motivate clinical laboratories to: 1) conduct this type of studies, 2) incorporate information regarding uncertainty and traceability in their reports, and 3) allow them a greater understanding of the importance that these concepts have in the laboratory medicine sciences. Also, in the "clinical laboratory accreditation era", this review could help laboratories in meeting those ISO 15189 requirements related to these two metrological concepts.

Potential conflict of interest
None declared. , one calibrator and IQC material lot, and two or more identical measuring systems to measure the selected quantities were used. Instead, for the mass concentration of sirolimus in blood and mass concentration of clozapine in serum, only one calibrator, IQC material lot and measuring system were used. ‡ The b and the u b for the substance concentration of sirolimus in blood were estimated using the certified reference material (CRM) ERM-DA111a. This CRM was processed twice per week for six months, and only one measuring system and calibration lot were used. The manufacturer provided the CRM assigned value and its uncertainty, and they were 10.64 nmol/L and 0.301 nmol/L, respectively. The b and the u b for the mass concentration of clozapine in serum and the pH in arterial blood were estimated using data from 10 external quality assessment scheme participations during the survey period 2019. The b and the u b for the rest of biological quantities using IQC materials, which they have the inter-laboratory quality control scheme associate it. The IQC data and the conditions used were the same that those described for estimating the u Rw .