‘X percent of journal articles in the humanities are never cited.’ How often have we seen this claim made? Much like the fabled Eskimo words for snow, the clue that it’s probably bunkum lies in the fact that X varies wildly depending on who’s speaking. And in that it doesn’t really matter to the speaker what X is, as long as it’s a lot.


Some recent(ish) efforts to claim that most journal articles in the humanities are rarely or never cited by other researchers have led me to consider the completeness of available citation data, and some of the unspoken assumptions that inform interpretations of what data we have. As it turns out, there are good reasons for treating existing statistics with extreme caution at the very least.


Claims about low or non-citation rates in the humanities generally have their empirical bases in studies conducted in the sciences that examine citation rates across a range of disciples. Recently, similar data have also come from the Google Scholar Metrics h5 index.


One study, conducted at the Institute for Scientific Information (ISI) in Philadelphia

and reported in Science in December 1990 and January 1991, suggested that either 93% or 98% of humanities journal articles go uncited, depending on how one cuts the data. The latter claim was recently recycled by Steven Pearlstein in the Washington Post; in response, Libby Nelson in Vox Education pointed to the 93% figure as the original researchers’ preferred calculation.


The second source, Google’s h5 index, is a complicated metric that I won’t go into here, but according to Patrick Dunleavy, Professor of Political Science at the London School of Economics, it appears to show humanities scholars citing one another’s work at a paltry fraction of the rate of those in the sciences, especially the medical and life sciences.


Other figures arise, frankly, from hearsay. In 2014, Dahlia Remler of the City University of New York sought to debunk a 90% non-citation figure—for all disciplines—that had been doing the rounds online. Sure enough, it had no examinable basis, having been claimed by the editor of the magazine Physics Today, who took it from a presentation he once attended that could not be reproduced.


In the blog post, however, Remler also claimed that 82% of humanities articles go uncited, even though the figure does not in fact appear in the source she gives for it. To be sure, her source, a paper published online by two Canadian academics, does show low humanities citation rates, but these are hedged around by so many caveats as to suggest that, for the humanities, the figures are basically meaningless. In her post, Remler did note good cause, of the sort discussed below, to be careful of the statistics she gave. This has not stopped the 82% figure, as an absolute non-citation rate shorn of such qualifications, gaining a lease of life of its own among academics and others who have been less than careful in examining their sources (one such even takes the liberty of rounding the figure down to 80%). In seeking to head one myth off at the pass, Remler inadvertently generated another.


Citation figures that have a proper empirical basis typically come from examinations of article citations in other journal articles over a five-year window following publication. On this front, the h5 index is less generous than more conventional measures, allowing publications only up to five years to be cited, at the time of writing up to five years prior to June 2015.


Crucially, citations of articles in books are always excluded. Citations are also counted only in selected journals. The ISI database covered only the ‘top 10% of all scientific journals published worldwide’, while the h5 index excludes publications with fewer than 100 articles published over 2010–14, which, as Katie Barclay of the University of Adelaide pointed out on Twitter, means a great many humanities journals are not counted.


The significance of other limitations notwithstanding, I focus here just on the exclusion of books and the five-year citation window. These seem to me to be the major shortcomings of the existing datasets. Both are likely to be seriously underestimated by non-humanities scholars who approach citation statistics from the disciplinary norms of the natural or social sciences.


The five-year window, perhaps appropriate enough for disciplines in which knowledge develops rapidly and just as rapidly becomes obsolete, is inadequate for the humanities. Long humanities publication lead-times mean many citations fall outside this window; at the same time, much humanities research has a much longer shelf-life—better measured in decades than years—than that in other fields.


The exclusion of books from citation data is similarly likely to give any humanities scholar pause. The difficulty though, precisely because of that exclusion, is that we don’t know how different the data would look if books were included. Does leaving books out mean that we should subject citation statistics to a caveat, or does that exclusion fatally undermine the representativeness of the data?


A small experiment suggests itself. Though the experiment is very modest and only exploratory, the results hint that, despite the extraordinary richness of modern scholarly databases, citation rates in the humanities remain extremely uncertain. Both the exclusion of books and the five-year window begin to look like serious impediments to ascertaining any meaningful statistics.


My experiment involves choosing a single article and tracking down all the citations to it—both in journal articles and in books—that I can find, and comparing these with the citations listed in Google Scholar (note: the Google Scholar Metrics h5 index is based on the Google Scholar database, but is further narrowed down by the exclusion of books and, as above, certain journal articles; Google Scholar is however a good indication of what Google’s algorithm is capable of finding in the first instance, before these further exclusions are made).


I chose Olive Anderson, ‘The Political Uses of History in Mid Nineteenth-Century England’, Past & Present 36 (1967): 87–105. I picked this article mainly because I am familiar with it from my own research, and I know that it continues to be widely cited today. Its age would usually see it excluded from the body of articles subjected to citation quantification, but there are reasons for choosing an older article that I will come back to below.


Using a combination of my own research knowledge and notes, Google Scholar, and keyword searches in Google Books and JSTOR, I found a total of 49 sources citing Anderson’s article, 14 in journal articles and 35 in books (these are listed below). As of March 2016, Google Scholar lists just 27, or around 55%, of these citations. It identifies all 14 article citations that I found, but just 13 of the 35 book citations, around 37%.


The severe limitations of Google’s dataset are apparent. While Google Scholar is good at identifying journal article citations, its hit rate for book citations is only around a third, and its ability to identify even these seems often to rely on publishers’ ebooks, where these exist. There appears to be only limited linkage between Google Scholar and Google’s own OCR’d Books dataset.


I make no claim for the completeness of the list. It is very likely not exhaustive, since my method of finding citations beyond those listed in Google Scholar relies mainly on keyword searches in databases that are not themselves exhaustive. It is instructive that I have been able to include Peter Mandler’s History and National Life and Jeremy Black’s Using History only because of my own reading. Neither citation appeared in my online searches, probably because both are set to ‘no preview’ in Google Books. There are no doubt other sources with which I am not familiar. Any additional found citations would only further downgrade Google Scholar’s hit rate. The incompleteness of the Google Scholar data described here is a best-case scenario.


Scholars who use Google Scholar as a research tool therefore need to be aware of its inherent strengths and weaknesses. As for citation statistics based on the h5 index, Google’s weakness on book citations is a moot point, since as noted above the index excludes these anyway, but the overall balance between found book and journal citations is suggestive.


Precisely because of the exclusion of books from datasets we cannot know if Anderson’s article is representative; more systematic studies would be welcome. But if it is more or less typical, and something like 70% of citations (in this case 35 out of 49) are in books, this would cast serious doubt on the meaningfulness of any article-only citation metric. In his blog post on the h5 index, Dunleavy claimed that its supposed completeness had put paid to ‘we can’t be compared with STEM’ special pleading in the humanities. It has done nothing of the sort. In a disciplinary context in which book citations appear to be, at the very least, more common than journal article citations, citation metrics that ignore books are at best dangerously misleading and at worst next to useless.


Finally, I come back to my reasons for choosing an older article. What is significant here is the slippage, greased by assumptions imported from the sciences, between not being cited within five years and never being cited. These assumptions are often made explicit when citation metrics move from a research to a journalism or a marketing context: ‘never cited by another researcher’, ‘not even cited once’ and ‘fail to get cited at all’ are the sorts of phrases then used.


In this regard, it is worth noting that of the 49 identified citations, not one of them is in research published in the five years after 1967. The earliest is in P. B. M. Blaas’s Continuity and Anachronism, published 11 years later. This continued shelf-life is invisible if we base calculations only on research published within the last five years. Had five-year citation windows been imposed in the early 1970s, Anderson’s article would likely have been written off as another entry in the dreaded ‘never cited’ category.


All of this suggests that, for the humanities, citation statistics need to be taken with a very large dose of salt. There is no universal database from which such metrics can be extracted. The best we have, Google Scholar, is drastically inadequate. Those who argue for extremely low humanities citation rates are guilty of an unfounded reversal of the onus of proof. Overconfident of the completeness of their data, they mistake an absence of citation evidence for proof of non-citation. The resulting willingness to believe that a wide discrepancy between humanities and non-humanities metrics reflects a problem with how the humanities is carried out rather than with the methods of comparison seems to reveal an implicit (and sometimes explicit) belief that most humanities research is a trivial, unnecessary luxury anyway. The possibility that humanities scholars might be doing their jobs perfectly well, according to the norms and standards of their respective disciplines, seems too often not to enter into the equation. Such perspectives do the humanities a grave disservice.


