Scientific papers are tools for communicating science between colleagues and peers. Every research needs to be designed, conducted and reported in a transparent way, honestly and without any deviation from the truth. Research which is not compliant with those basic principles is misleading. Such studies create distorted impressions and false conclusions and thus can cause wrong medical decisions, harm to the patient as well as substantial financial losses. This article provides the insight into the ways of recognizing sources of bias and avoiding bias in research.
Definition of bias
Bias is any trend or deviation from the truth in data collection, data analysis, interpretation and publication which can cause false conclusions. Bias can occur either intentionally or unintentionally (1). Intention to introduce bias into someone’s research is immoral. Nevertheless, considering the possible consequences of a biased research, it is almost equally irresponsible to conduct and publish a biased research unintentionally.
It is worth pointing out that every study has its confounding variables and limitations. Confounding effect cannot be completely avoided. Every scientist should therefore be aware of all potential sources of bias and undertake all possible actions to reduce and minimize the deviation from the truth. If deviation is still present, authors should confess it in their articles by declaring the known limitations of their work.
It is also the responsibility of editors and reviewers to detect any potential bias. If such bias exists, it is up to the editor to decide whether the bias has an important effect on the study conclusions. If that is the case, such articles need to be rejected for publication, because its conclusions are not valid.
Bias in data collection
Population consists of all individuals with a characteristic of interest. Since, studying a population is quite often impossible due to the limited time and money; we usually study a phenomenon of interest in a representative sample. By doing this, we hope that what we have learned from a sample can be generalized to the entire population (2). To be able to do so, a sample needs to be representative of the population. If this is not the case, conclusions will not be generalizable, i.e. the study will not have the external validity.
So, sampling is a crucial step for every research. While collecting data for research, there are numerous ways by which researchers can introduce bias in the study. If, for example, during patient recruitment, some patients are less or more likely to enter the study than others, such sample would not be representative of the population in which this research is done. In that case, these subjects who are less likely to enter the study will be under-represented and those who are more likely to enter the study will be over-represented relative to others in the general population, to which conclusions of the study are to be applied to. This is what we call a selection bias. To ensure that a sample is representative of a population, sampling should be random, i.e. every subject needs to have equal probability to be included in the study. It should be noted that sampling bias can also occur if sample is too small to represent the target population (3).
For example, if the aim of the study is to assess the average hsCRP (high sensitive C-reactive protein) concentration in healthy population in Croatia, the way to go would be to recruit healthy individuals from a general population during their regular annual health check up. On the other hand, a biased study would be one which recruits only volunteer blood donors because healthy blood donors are usually individuals who feel themselves healthy and who are not suffering from any condition or illness which might cause changes in hsCRP concentration. By recruiting only healthy blood donors we might conclude that hsCRP is much lower that it really is. This is a kind of sampling bias, which we call a volunteer bias.
Another example for volunteer bias occurs by inviting colleagues from a laboratory or clinical department to participate in the study on some new marker for anemia. It is very likely that such study would preferentially include those participants who might suspect to be anemic and are curious to learn it from this new test. This way, anemic individuals might be over-represented. A research would then be biased and it would not allow generalization of conclusions to the rest of the population.
Generally speaking, whenever cross-sectional or case control studies are done exclusively in hospital settings, there is a good chance that such study will be biased. This is called admission bias. Bias
exists because the population studied does not reflect the general population.
Another example of sampling bias is the so called survivor bias which usually occurs in cross-sectional studies. If a study is aimed to assess the association of altered KLK6 (human Kallikrein-6) expression with a 10 year incidence of Alzheimer’s disease, subjects who died before the study end point might be missed from the study.
Misclassification bias is a kind of sampling bias which occurs when a disease of interest is poorly defined, when there is no gold standard for diagnosis of the disease or when a disease might not be easy detectable. This way some subjects are falsely classified as cases or controls whereas they should have been in another group. Let us say that a researcher wants to study the accuracy of a new test for an early detection of the prostate cancer in asymptomatic men. Due to absence of a reliable test for the early prostate cancer detection, there is a chance that some early prostate cancer cases would go misclassified as disease-free causing the under- or over-estimation of the accuracy of this new marker.
As a general rule, a research question needs to be considered with much attention and all efforts should be made to ensure that a sample is as closely matched to the population, as possible.
Bias in data analysis
A researcher can introduce bias in data analysis by analyzing data in a way which gives preference to the conclusions in favor of research hypothesis. There are various opportunities by which bias can be introduced during data analysis, such as by fabricating, abusing or manipulating the data. Some examples are:
· reporting non-existing data from experiments which were never done (data fabrication);
· eliminating data which do not support your hypothesis (outliers, or even whole subgroups);
· using inappropriate statistical tests to test your data;
· performing multiple testing (“fishing for P”) by pair-wise comparisons (4), testing multiple endpoints and performing secondary or subgroup analyses, which were not part of the original plan in order “to find” statistically significant difference regardless to hypothesis.
For example, if the study aim is to show that one biomarker is associated with another in a group of patients, and this association does not prove significant in a total cohort, researchers may start “torturing the data” by trying to divide their data into various subgroups until this association becomes statistically significant. If this sub-classification of a study population was not part of the original research hypothesis, such behavior is considered data manipulation and is neither acceptable nor ethical. Such studies quite often provide meaningless conclusions such as:
· CRP was statistically significant in a subgroup of women under 37 years with cholesterol concentration > 6.2 mmol/L;
· lactate concentration was negatively associated with albumin concentration in a subgroup of male patients with a body mass index in the lowest quartile and total leukocyte count below 4.00 x 109/L.
Besides being biased, invalid and illogical, those conclusions are also useless, since they cannot be generalized to the entire population.
There is a very often quoted saying (attributed to Ronald Coase, but unpublished to the best of my knowledge), which says: “If you torture the data long enough, it will confess to anything”. This actually means that there is a good chance that statistical significance will be reached only by increasing the number of hypotheses tested in the work. The question is then: is this significant difference real or did it occur by pure chance?
Actually, it is well known that if 20 tests are performed on the same data set, at least one Type 1 error (α) is to be expected. Therefore, the number of hypotheses to be tested in a certain study needs to determined in advance. If multiple hypotheses are tested, correction for multiple testing should be applied or study should be declared as exploratory.
Bias in data interpretation
By interpreting the results, one needs to make sure that proper statistical tests were used, that results were presented correctly and that data are interpreted only if there was a statistical significance of the observed relationship (5). Otherwise, there may be some bias in a research.
However, wishful thinking is not rare in scientific research. Some researchers tend to believe so much in their original hypotheses that they tend to neglect the original findings and interpret them in favor of their beliefs. Examples are:
· discussing observed diff erences and associations even if they are not statistically signifi cant (the often used expression is “borderline significance”);
· discussing differences which are statistically significant but are not clinically meaningful;
· drawing conclusions about the causality, even if the study was not designed as an experiment;
· drawing conclusions about the values outside the range of observed data (extrapolation);
· overgeneralization of the study conclusions to the entire general population, even if a study was confined to the population subset;
· Type I (the expected effect is found significant, when actually there is none) and type II (the expected effect is not found significant, when it is actually present) errors (6).
Even if this is done as an honest error or due to the negligence, it is still considered a serious misconduct.
Unfortunately, scientific journals are much more likely to accept for publication a study which reports some positive than a study with negative findings. Such behavior creates false impression in the literature and may cause long-term consequences to the entire scientific community. Also, if negative results would not have so many difficulties to get published, other scientists would not unnecessarily waste their time and financial resources by re-running the same experiments.
Journal editors are the most responsible for this phenomenon. Ideally, a study should have equal opportunity to be published regardless of the nature of its findings, if designed in a proper way, with valid scientific assumptions, well conducted experiments and adequate data analysis, presentation and conclusions. However, in reality, this is not the case. To enable publication of studies reporting negative findings, several journals have already been launched, such as Journal of Pharmaceutical Negative Results, Journal of Negative Results in Biomedicine, Journal of Interesting Negative Results and some other. The aim of such journals is to counterbalance the ever-increasing pressure in the scientific literature to publish only positive results.
It is our policy at Biochemia Medica to give equal consideration to submitted articles, regardless to the nature of its findings.
One sort of publication bias is the so called funding bias which occurs due to the prevailing number of studies funded by the same company, related to the same scientific question and supporting the interests of the sponsoring company. It is absolutely acceptable to receive funding from a company to perform a research, as long as the study is run independently and not being influenced in any way by the sponsoring company and as long as the funding source is declared as a potential conflict of interest to the journal editors, reviewers and readers.
It is the policy of our Journal to demand such declaration from the authors during submission and to publish this declaration in the published article (7). By this we believe that scientific community is given an opportunity to judge on the presence of any potential bias in the published work.
There are many potential sources of bias in research. Bias in research can cause distorted results and wrong conclusions. Such studies can lead to unnecessary costs, wrong clinical practice and they can eventually cause some kind of harm to the patient. It is therefore the responsibility of all involved stakeholders in the scientific publishing to ensure that only valid and unbiased research conducted in a highly professional and competent manner is published (8).