A Journey to Open Access (1) Green open access for over 20 years

My education into open access began over 40 years ago, when I was a practicing theoretical high energy physicist. This was in the 1970’s – in the days of typewriters – and in those days we typed up our research papers, made 100 xerox copies and submitted the original to Physical Review, Nuclear Physics or whatever journal we wanted. The copies were sent round to our ‘peer’ high energy physics research groups around the world and were known as ‘preprints’. While the paper copy to the journal was undergoing refereeing, these preprints allowed researchers to immediately build upon and refer to work done by other researchers prior to publication. This was the preprint tradition in the fast moving field of high energy physics. When papers were accepted for publication, the references to preprints that had since been published were usually updated in the published version. It has always baffled me – now that I work in the field of computer science that if anything is even faster moving than high energy physics – that there is no similar tradition. In computer science, it can take several years for a paper to get published in a journal – by which time they really only serve an archival purpose and as evidence for tenure committees. In contrast to the physics preprint system, the computer science community uses refereed workshop publications to provide a rapid – or at least more rapid – publication vehicle.

With the widespread availability of the Internet, and with the advent of the World Wide Web, theoretical physicist Paul Ginsparg set up a web site to save high energy physicists both the postage and the trouble of circulating preprints. The electronic version of the preprint – inevitably called an e-Print – is typically submitted to a journal and simultaneously posted to the arXiv website (http://arxiv.org/). This is now the standard method of scholarly communication of a very large fraction of the physics, astronomy and mathematics communities.

‘arXiv is the primary daily information source for hundreds of thousands of researchers in physics and related fields. Its users include 53 physics Nobel laureates, 31 Fields medalists and 55 MacArthur fellows, as well as people in countries with limited access to scientific materials. The famously reclusive Russian mathematician Grigori Perelman posted the proof for the 100-year-old Poincaré Conjecture solely in arXiv.’ Reference: http://phys.org/news142785151.html#jCp

The arXiv repository is now over 20 years old and has a submission rate of over 7,000 e-Prints per month and full text versions of over half a million research papers are available free both to researchers and the general public. More than 200,000 articles are downloaded from arXiv each week by about 400,000 users. Most, but not all, of the e-Prints are eventually published in a journal and this amounts to a sort of post-publication ‘quality stamp’. The apparent drawback of multiple, slightly different versions of a paper turns out not to be a serious drawback in practice. Citation counts for high energy physicists usually count either the e-Print version or the public version. A detailed study of the arXiv system by Anne Gentil-Beccot, Salvatore Mele and Travis C. Brooks is published as ‘Citing and Reading Behaviours in High-Energy Physics. How a Community Stopped Worrying about Journals and Learned to Love Repositories’. The paper is, of course, available as arXiv:0906.5418.

In the terminology of today, arXiv represents a spectacularly successful example of ‘Green Open Access’. This is the situation in which researchers continue to publish in refereed, subscription-based journals but also self-archive versions of their papers either in subject-based repositories – as for arXiv and the high energy physics community – or in institutionally-based repositories. In certain fields – such as the bio-medical area with the US PubMedCentral repository – these full-text versions may only be available to the public after embargo period of 6 or 12 months. The alternative open access model – so-called ‘Gold Open Access’ – is one in which researchers or their funders pay the journal publishers to make the full text version of the paper freely available.

Why should you care? The research described in the papers was typically funded by a grant from a Government funding agency – think NSF or NIH in the USA or RCUK in the UK. The research papers are reviewed by researchers whose salary generally also comes from the public purse. The publishers organize the review process and publish the journals – and then restrict access to these papers to those who can afford to pay a subscription to their journals. Since the research was both funded and reviewed by researchers supported by public money raised by taxes it seems not unreasonable to demand that the general public should be allowed access to this research without having to pay an additional access fee. Now that we have the Web and the technology to make perfect digital copies of documents at zero cost, it is clear that the old rules in which publishers controlled dissemination through the printing press needs to change – just like it has for music and journalism. No one begrudges publishers some reward for the efforts at quality control and supporting a prestigious ‘branded journal’ like Nature. But, as will be seen in the next post, the central issue for universities is now one of affordability of their present journal offerings. Subscription fees to journals have risen much faster than inflation over the last 15 years or more and now constitute an unreasonable ‘tax’ on scarce research funds that is now going to shareholders of the publishing companies.

To be continued …

Original post from 2103

Re-booting my blog …

After leaving Microsoft Research in Redmond 15 months ago, I have just now got around to retrieving my original blog posts on Open Access. So the next posts are just my earlier posts on open access from 2013 – which still seem to me surprisingly relevant. The first six posts are installments on my personal journey to open access and the second six are a set of contributions reviewing the status of open access around the globe.

Of course it is very frustrating that these 2013 posts are indeed still relevant and we have not made more progress to open access. More importantly we need to make significant progress towards open science – with persistent links from research papers to the relevant data and software required to produce the scientific results. In the next few months I will look at where we are now on the road to reproducible open science.

So the next twelve posts are ‘old’ and from 2013 …

Tony Hey, 10th March 2017