Home About Me

Why Impact Factors Fell — and a Different Way to Think About Research Quality

This year’s journal impact factors arrived to an oddly quiet reaction online, as if nothing had happened. The reason is not hard to see: for a large majority of journals, the number went down. Last year almost everyone had good news to celebrate because impact factors rose across the board. This time, the story was decline, and people may enjoy absurd headlines in the abstract, but they are rarely enthusiastic when the bad news touches something tied closely to their own careers.

That said, impact factor swings do not bother me much. The metric was designed to evaluate journals, not people. It is understandable that publishers make a spectacle out of it, but using a journal-level indicator to judge individual researchers never made much sense. Most people who like to say they have published in “high-impact” journals probably do not have papers cited at the average level of those journals. If they did, they would talk about the citation counts of the papers themselves rather than the journal logo.

Still, the broad drop in impact factors does have a real cause, and the pandemic is probably the main one.

How the pandemic inflated journal metrics

Last year’s impact factors were based on citations from 2020 and 2021. During that period, many researchers were stuck at home writing. But the bigger issue lies in how impact factor is calculated: citations divided by citable items. The denominator, the number of articles, is relatively fixed. The numerator, citations, is much easier to inflate.

One common way to do that is through non-peer-reviewed formats such as editorials, news pieces, or “research highlights” that introduce recently published work and cite it immediately. The biggest general-interest journals have done this for a long time, and many smaller journals have copied the tactic. In practice, if a paper appears in a major journal and that same issue includes a short feature discussing it, the paper can pick up a citation almost instantly.

Normally this mechanism mostly contributes a limited amount of self-citation. But COVID changed the scale completely. In a very short time, tens of thousands of COVID-related papers and preprints appeared. Governments also poured money into research in order to accelerate development. The result was a huge wave of pandemic-related studies in 2020 and 2021, followed by intense media attention and journal coverage. Many of the outlets amplifying this work were broad-scope journals.

Suppose a COVID paper cites 30 or 40 references, while the paper itself receives only 3 or 4 citations after publication, which is not an unreasonable average for a normal article. That paper has effectively injected 30 or 40 outgoing citations into the literature while contributing very little incoming citation value of its own. Most of those references point toward established mid- to high-tier journals. Under those conditions, the impact factors of mid- and upper-level journals were bound to rise. And that is exactly what happened last year, especially in biology and biomedical journals touched by COVID research.

The numbers behind the surge

The 2022 impact factor used citation data from 2020 and 2021, so those are the years worth examining.

In Science, there were 175 COVID-related papers in 2020 and 142 in 2021. Even if we look only at citations received within the same year of publication, the difference is striking. The 2020 COVID papers were cited 7,673 times in that same year, about 44 citations per paper. The 2021 COVID papers received 4,006 same-year citations, around 28 per paper.

Now compare that with the journal’s full output:

  • In 2020, Science published 2,656 papers, which received 19,788 citations in the same year.
  • In 2021, it published 2,636 papers, with 12,943 same-year citations.
  • In 2022, it published 2,496 papers, with 10,343 same-year citations.
  • In 2019, before the pandemic, it published 2,729 papers, with 7,217 same-year citations.

The scale becomes obvious when you line those numbers up. The total same-year citation count for all Science papers published in 2019 was roughly comparable to the same-year citations earned by COVID-related papers alone in 2020. There is no real mystery here: COVID brought a massive, temporary wave of attention into the literature, and that was the main engine behind impact factor inflation over the last two years.

Why the decline came this year

What changed this year is that the attention faded and the volume of related projects moved back toward normal. In 2022, Science published 2,496 papers, 85 of them COVID-related. Those 85 papers received 1,076 same-year citations, or about 13 citations each.

The problem is that many follow-up citations arrive with a delay. By the time these later pandemic papers were published, the field’s momentum was already weakening, so the citation counts did not keep climbing the way they had during the peak years. At the same time, those papers still occupied publication space that could otherwise have gone to more conventional work. The predictable result was a broad decline, with general journals and biomedical journals hit especially hard.

This year’s statistics still include 2021, which was itself an inflated citation year. But within another cycle or two, the system will likely drift back toward something resembling 2019. The idea circulating online that new journals diluted impact factors exists, but its effect is nowhere near as large as the pandemic shock. Strong papers in established mid- and high-tier journals will continue to attract disproportionate attention over time. More papers entering the system often means more citations eventually flow toward the better work. At the journal level, though, that still says very little about individual researchers.

The real problem: every simple metric gets gamed

This loops back to the more important question: how should papers be evaluated?

The difficulty is that almost any single-number metric can be manipulated. Raw citation counts can be boosted. Paper counts can be boosted. Both are one-directional indicators: bigger is always better. Any monotonic metric will eventually be optimized in artificial ways.

The usual tricks are familiar: publish large numbers of mediocre papers to accumulate citations, or write review articles that naturally pull in many references and often earn many citations in return. That is why metrics become more robust when increasing the score also becomes meaningfully harder. The h-index, for example, is not impossible to game, but it is much harder to inflate casually.

The h-index, however, does not evaluate individual papers very well. So I have been thinking about a more general metric: a C-index.

A simple proposal: the C-index

The C-index is straightforward. For a given paper, take the number of citations it has received at the time of evaluation and divide that by the number of references the paper cites.

If the result is greater than 1, that paper has, in a meaningful sense, expanded influence beyond what it consumed from the literature. More people cited it than the number of papers it itself cited. For the field as a whole, that suggests the work had outward impact rather than merely resting on existing momentum.

For an individual researcher, count how many of their papers have a C-index greater than 1. That total is the researcher’s C-index. The same logic can also be applied to a journal.

This sets a fairly high bar. An experimental paper often cites 30 or 40 references. Reaching 30 or 40 citations is already a sign of quality in almost any journal. One major advantage of this metric is that it largely filters out the distortions caused by review papers. Review articles can easily gather hundreds of citations, but they also tend to cite a huge number of prior studies. Only truly landmark reviews would maintain a C-index above 1 by a wide margin. Most would not.

The core assumption is also intuitive: if a study is genuinely pioneering, it should not need to lean on more prior literature than the amount of later literature that leans on it. If a research article cites more than a hundred papers, the most likely explanation is not that it is unusually original, but that its experimental support is limited and it relies heavily on others’ work for justification.

A metric like this would also weaken the effect of unreasonable reviewer demands to “add a few citations,” especially when the real goal is to increase the reviewer’s own citation count. Every extra reference raises the threshold the paper must clear to achieve a C-index above 1.

Another appealing feature is that the C-index is naturally resistant to superficial gaming. To gain a point, a paper must be recognized by readers and survive peer review while still attracting more citations than it gave out. Work built mostly from permutations of existing ideas tends to generate more self-citation and more dependence on prior literature. The more heavily you cite your own related work, the harder it becomes to keep the ratio high.

If evaluation used something like the C-index, authors might become more selective with references instead of citing indiscriminately to satisfy reviewers. Those bloated papers are often exactly the ones that do little to lift a journal’s average impact. Whether one looks at a person or a journal, a higher C-index would suggest a larger share of genuinely original work. A review article with a strong C-index would also be especially worth reading, including for teaching purposes.

Why most researchers would not score very high

My guess is that most researchers would not reach a double-digit C-index. Most scientific work is follow-up work rather than field-defining work, and follow-up studies naturally cite more than pioneering ones.

Someone could say: fine, then I will just cite fewer papers. But that runs straight into peer review. A follow-up paper with too few references may be criticized for weak background knowledge or insufficient grounding in the field.

The main practical issue is not the idea itself but choosing the citation window. That is manageable. Every discipline has its own citation half-life, so the metric could be calculated using citations accumulated over roughly that half-life after publication. In fast-moving fields, the C-index would be even harder to manipulate. In slow-moving fields with long citation half-lives, the metric would naturally reward work with lasting value.

In principle, that makes cross-field comparison possible, though I am not sure that is especially meaningful. Research assessment often ends up comparing apples and oranges anyway.

What happens if we apply it to Science?

Take a two-year window, matching the impact factor framework. Instead of checking each paper’s reference list individually, use averages.

In 2019, Science published 2,729 papers that together cited 112,842 references, an average of about 41 references per paper. How many of those papers were cited more times than the number of references they contained over 2020–2021? Roughly 550 to 600. That would put the journal’s C-index at around 550–600.

In 2020 the average reference count rose to 45 per paper, but not dramatically, which suggests the pandemic did not alter this side of the equation very much. Using the 2021–2022 citation window, the journal’s C-index again comes out to about 550–600.

That stability is the interesting part. This metric appears relatively immune to short-lived hot topics. Whether the field is in a boom cycle or not, the relationship between how much researchers cite and how much the most influential papers are cited remains fairly stable. Even for a journal as prominent as Science, the yearly change is limited. Roughly 20% of its papers seem to qualify as truly original by this standard.

A comparison with ES&T

Now consider ES&T, a top journal in environmental science.

In 2019, it published 1,548 papers citing a total of 43,981 references, or 28.4 references per paper on average. Its C-index would be around 150–200, meaning an originality share of roughly 13%.

In 2020, it published 1,691 papers citing 35,276 references, about 21 per paper on average. Its C-index rises to around 250–300, pushing the originality share to about 20%.

That comparison suggests something important: the gap between broad-scope journals and leading specialty journals may not be as large as people assume when it comes to the proportion of highly original papers. Generalist flagship journals seem to maintain around 20% genuinely high-originality, high-impact work under ordinary conditions. A top subfield journal can approach that level when major scientific problems emerge, though in quieter times it may sit somewhat lower.

It also implies a less flattering truth. Roughly 80% of papers even in elite journals probably have limited long-term influence. Some are there because of networks, some because of hype, some because of luck. Many do not actually reach the journal’s average citation level, let alone reshape the field.

What the C-index would mean for individuals

For individuals, reaching a double-digit C-index would probably already place someone at the professor level. That would imply more than ten genuinely original outputs, or at least a fairly substantial h-index, perhaps somewhere around the twenties or thirties.

If you look only at first-author papers, many professors probably had not yet accumulated ten such papers before they started their own groups. Once corresponding authorship enters the picture, attribution becomes murkier: was the originality mainly due to the student, the postdoc, or the principal investigator? But for evaluating a research group as a whole, the C-index still makes sense. At a minimum, a group should probably have a C-index above 1 before it can seriously claim a track record of original work.

I am not especially worried about whether the metric is easy to calculate. As an unofficial indicator, nobody is trying to optimize for it yet, and it would not be easy to optimize anyway. For now it is something people can calculate out of curiosity.

If one day it became accepted more broadly, that would not surprise me. Research should be built on pioneering work, not just on an ever-expanding pile of derivative follow-up studies.