A Global View of Open Access (1) : A French perspective on Open Access and the Episciences Initiative

This second series of blog articles on Open Access will look at the global perspective. In order to give an authoritative view the entries are by invitation and authored by relevant researchers in the different countries. I am very pleased that to open this second series with a view from France, contributed by Claude Kirchner, Executive Officer for Research and Technology Transfer for Innovation, and colleagues Laurent Romary and Pascal Guitton. In true Napoleonic fashion, France has a centralized research repository called HAL and is participating in ambitious plans to create ‘epi-journals’ – a new type of overlay journal based on peer-reviewed pre-prints (http://episciences.org/ )


Tony Hey

31st May 2013

A strong Green open access policy for France… and even more for Inria

2013 will probably appear as an important milestone in the developments of Open Access in France. On 24 January, during an open access awareness event organised by the CNRS and the national consortium of University libraries (Couperin), the French Ministry for research and higher education, Geneviève Fioraso, expressed a support to the Open Access movement, stating that « L’information scientifique est un bien commun qui doit être disponible pour tous » (“Scientific information is a public good that should be available to all”) and showed a strong preference to the green route to open access and in particular to the use of the national publication repository HAL (maintained by CNRS). She also strengthened the role of the national coordination on scientific information (BSN – Bibliothèque Scientifique Numérique) and its role to help higher education and research institutions coordinate their policy in this domain. Following this, a major memorandum of understanding was signed by 25 national institutions to state their willingness to work together in making HAL a reference repository for all research productions in France.

Inria, the French research institution for computational sciences and applied mathematics actually played a seminal role in making such progress possible. It has had a long-standing involvement in the open access movement. It was an early signatory of the Berlin Declaration in 2003 and as soon as April 2005, it officially set its own portal (HAL-Inria) on the national HAL repository. At that time, it recommended that all publications from its researchers should be deposited there. In 2006, being a signatory to the national agreement on open archiving, it accelerated its involvement in designing additional deposit, presentation and dissemination services to the HAL platform at the benefit of its researchers.

In the recent period, Inria has identified how difficult it has become to work in collaborative partnership with publishers (private, but also professional associations and learned societies) in defining new publishing business and editorial models. In this context, Inria decided to take the bull by the horns and to proactively contribute to the elaboration of such models. In the beginning of 2013 it issued a deposit mandate, whereby HAL-Inria becomes the only source of information for all reporting and assessment activities of its researchers, teams and research centres.

Going even further, Inria is now engaging forces in experimenting new publication frameworks. It is thus involved in the Episciences initiative, which aims at creating a peer-reviewing environment coupled to the deposit of pre-prints in HAL, with reduced overhead costs and maximal dissemination efficiency.

The underlying vision is that of a research infrastructure where no fee is applied to its users (whether author or reader) and which offers a set of basic services facilitating an efficient dissemination and review of scholarly papers. Like traditional journals, scientific quality is ensured by the recognition of the editorial committee that carries out the peer-reviewing process.

The epi-journal platform is conceived in the spirit of traditional peer-reviewed journals, with additional facilities resulting from its leaning against a publication repository. Indeed, open archives are now widely available and can be used by any researcher to store, index and make freely available any of his publicly accessible research documents. These documents can be for instance research papers, experiments, data, programs, videos. Such archives as arXiv or HAL are widely accessible and provide a sustainable and free service. In the case of the HAL platform for instance, papers are finely associated with affiliation information for authors, with generic long term archiving facilities, as well as additional services facilitating the creation of personal or institutional web pages.

In order to support the editorial committees for the journals hosted on the platform in their day to day business, a support in terms of editorial management will be provided. This will comprise:

  • Management of the peer-review process, comprising the channelling of community based feedback;
  • Handling the management of the journal volumes and issues;
  • Contribution to some basic quality checking tasks (bibliography, meta-data, cross-references);
  • Community management: advertising papers to various channels and social networks, moderation of online discussions;
  • General visibility: interaction with major indexing services and databases (DBLP, Thomson Reuters, Scopus…), as well as adequate mirroring on relevant thematic repositories (ArXiv, PMC, RePEc,  etc.).

Through the hosting on the national repository infrastructure HAL, all journals will benefit from a high quality technical environment comprising 24/24 7/7 services, long term archiving and proper authentication and authorization infrastructure.

As to the copyright policy, we want the IP to remain with the authors, who will only grant the journal (and hence the platform) a non-exclusive right to publish under its brand. Besides, the journals will decide on the licence to be applied, but a strong recommendation will be made to adopt a generic creative commons CC-BY (attribution) licence, which is quite adequate for scholarly purposes.

Finally, we will ensure that journal titles be freed from any private ownership. When the title is not properly hosted by an academic institution or a scientific society, a consortium of supporting organisation should be able to take ownership of such orphan titles at the service of editorial committees.

In order to provide a sustainable service, we will put together step by step a consortium of interested parties that may provide further cash or in-kind contribution to the further exploitation of the platform. It is anticipated that such contributions can be taken out of the existing scientific information budget of the interested institutions (e.g. subscriptions).

This is to our view the only way not only to master our scientific information budgets, but also to master the services we need to disseminate our research results in good conditions. Indeed, investing in such new services is just a step towards the definition of more integrated virtual research environments facilitating eScholarship.

Laurent Romary, Pascal Guitton, Claude Kirchner


A Journey to Open Access (6) : The Open Access Revolution – The Next Steps …

Since the beginning of the year, the momentum for open access to research publications has grown dramatically. On February 13th 2013, the Fair Access to Science and Technology Research (FASTR) Act was introduced in the US Senate by Jon Cornyn (R-TX) and Ron Wyden (D-OR), and in the House by Mike Doyle (D-PA), Zoe Lofgren (D-CA) and Kevin Yoder (R-KS). FASTR would require open access to peer-reviewed research papers arising from federally-funded research and would require the major federal research funding agencies – including DOE, NIH and NSF – to make these papers freely available to the public through a digital archive maintained by the agency. Significantly, the bill not only talks about the requirement of accessibility – with a suggested maximum embargo time of 6 months – but also highlights the need to maximize the utility of the research by enabling re-use:

“The United States has a substantial interest in maximizing the impact of the research it funds by enabling a wide range of reuses of the peer-reviewed literature reporting the results of such research, including by enabling automated analysis by state-of-the-art technologies.”

Such automated analysis would permit a genuine realization of the vision of the Memex put forward by Vannevar Bush in his seminal paper ‘As We May Think’. The FASTR act also includes another far-sighted requirement that federal agencies consider whether or not the terms of use should include “a royalty free copyright license that is available to the public and that permits the reuse of those research papers, on the condition that attribution is given to the author or authors of the research and any others designated by the copyright owner”. As Heather Joseph points out in her SPARC newsletter – http://www.arl.org/sparc/media/blog/with-introduction-of-fastr-congress-picks-up-the-p.shtml – this would effectively require research papers to be published under some form of Creative Commons license.

On February 22nd, just eight days after FASTR was introduced into both houses of Congress, the White House issued a directive requiring the major Federal Funding agencies “to develop a plan to support increased public access to the results of research funded by the Federal Government.” Significantly, these results include not only peer-reviewed publications but also digital data. The memorandum defines digital data “as the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as laboratory specimens.”

The White House memorandum is from John Holdren, Director of the Office of Science and Technology Policy and underlines the Obama Administration’s belief that federally-supported basic research can catalyze innovative breakthroughs that can help grow the US economy:

“Access to digital data sets resulting from federally funded research allows companies to focus resources and efforts on understanding and exploiting discoveries. For example, open weather data underpins the forecasting industry, and making genome sequences publicly available has spawned many biotechnology innovations. In addition, wider availability of peer-reviewed publications and scientific data in digital formats will create innovative economic markets for services related to curation, preservation, analysis and visualization. Policies that mobilize these publications and data for re-use through preservation and broader public access also maximize the impact and accountability of the Federal research investment. These policies will accelerate scientific breakthroughs and innovation, promote entrepreneurship, and enhance economic growth and job creation.”


We now have OA mandates coming from both the Legislative and the Executive branches of the US Government. The White House memorandum covers both research publications and research data and requires the relevant Federal Agencies to deliver a plan within six months from February 2013. It is noteworthy that both the White House memorandum and the bi-partisan FASTR bill require green open access via repositories and say nothing about gold – in contrast to the approach preferred by the Finch Report and by the Research Councils in the UK. For more commentary on both FASTR and the White House memorandum see Peter Suber’s blog:


In the USA, I believe that these developments represent a tipping point for the Open Access movement. But besides the dramatic moves towards Open Access in the US and the UK, there have also been significant developments elsewhere around the world. In Europe, a press release from the European Commission in July 2012 about the new Horizon 2020 Research Framework stated that:

“As a first step, the Commission will make open access to scientific publications a general principle of Horizon 2020 … As of 2014, all articles produced with funding from Horizon 2020 will have to be accessible … The goal is for 60% of European publicly-funded research articles to be available under open access by 2016.”

Note that like the USA – and unlike the UK – the European Commission also does not favor gold OA over green. Similarly, in Australia, the National Health and Medical Research Council (NHMRC) and the Australian Research Council (ARC) both back green OA via repositories. In July 2012, the NHMRC policy stated:

“NHMRC therefore requires that any publications arising from an NHMRC supported research project must be deposited into an open access institutional repository within a twelve month period from the date of publication.”

Following this example, ARC introduced an open access policy for ARC funded research with effect from 1 January 2013. Their policy requires that any publications arising from an ARC supported research project must be deposited into an open access institutional repository within a twelve month period from the date of publication. These are just a few of the many examples of what it is clearly now an inexorable move towards the new norm of open access for research publications.

Back in the UK, some re-thinking of Finch and RCUK’s OA policy is taking place. A recent review by the House of Lords criticized RCUK for failures in communication and for lack of clarity about its policy and guidance. Prior to a more complete review of its policy in 2014, RCUK issued a revision of its Open Access policy on the 6th March 2013. The major change was that there is now an explicit statement that although RCUK prefers gold, either green or gold is acceptable. The Department for Business, Innovation and Skills (BIS) has also launched an inquiry into open access which has yet to report. Finally, on February 25th, the Higher Education Funding Council for England (HEFCE) is consulting the research community on ‘the role of open-access publishing in the submission of outputs to the post-2014 Research Excellence Framework (REF)’. For non-UK readers, the REF is a research review process conducted by HEFCE, the major UK university funding organization, to determine national university and departmental research rankings. Their intent is ‘to require that outputs meeting the REF open access requirement (whether published by the gold or green route) shall be accessible through a repository of the submitting institution’.

Finally, in May of last year there was the inaugural meeting of a new organization called the Global Research Council (GRC) in Washington DC. The meeting was prompted by the White House Office of Science and Technology Policy who invited the NSF to host a meeting of the world’s research funding agencies to discuss global standards of peer review for basic research. The GRC is a virtual organization with members of the Governing Board from the US, Germany, Brazil, Saudi Arabia, Japan, China, Europe, Canada, Russia and India. The result of the first summit attended by around 50 research agencies was an agreed statement on ‘Merit Review’.



The second summit meeting of the GRC will take place in Berlin from 27th to 29th May 2013, hosted by the German Research Foundation (DFG) and the Brazilian CNPq agency. The main goal of this summit will be to agree on an action plan for implementing Open Access to Publications as the main paradigm of scientific communication in the following years’. Such unanimity on Open Access between the major global research funding organizations will surely bring about both a more sustainable model of scholarly communication and a more efficient research process for solving some of the major scientific challenges facing the world.

What scholarly communication structures will emerge in the future? I recommend reading an interesting paper by Paul Ginsparg, playfully titled ‘As We May Read’.


In particular, his conclusions deserve serious consideration:

“On the one-decade time scale, it is likely that more research communities will join some form of global unified archive system without the current partitioning and access restrictions familiar from the paper medium, for the simple reason that it is the best way to communicate knowledge and hence to create new knowledge. Ironically, it is also possible that the technology of the 21st century will allow the traditional players from a century ago, namely the professional societies and institutional libraries, to return to their dominant role in support of the research enterprise.”

This entry concludes this series of articles on my personal journey to Open Access. However, I must thank my colleagues at the University of Southampton in the UK who educated me and collaborated to achieve great things at the University – Wendy Hall, Les Carr, Chris Gutteridge, Steve Hitchcock, Tim Brody and Jessie Hey in the Department of Electronics and Computer Science, Mark Brown, Pauline Simpson and Wendy White in the University Library, and Alma Swan from Key Perspectives. But most of all I should thank the world’s most persistent evangelist for green open access, Stevan Harnad.

In this series I have so far only mentioned two of the three pioneers of Open Access – Paul Ginsparg, who created the physics arXiv, and David Lipman of NCBI and PubMed Central.  But the third pioneer who deserves our thanks and homage is Stevan Harnad whose ‘Subversive Proposal’ paper in 1994 was the opening salvo in what has turned out to be a twenty year battle for Open Access. Stevan has steadfastly evangelized green OA as the best way to make research publications accessible. His advocacy of the Immediate-Deposit/Optional-Access model successfully adopted by the University of Liege in Belgium is both rational and compelling. Any given deposit can be made Closed Access instead of OA for the period of any embargo time but the requirement for immediate deposit has enabled Liege to capture over 80% of its annual refereed research output in their repository. So my final words in this series are a ‘thank you’ to Stevan Harnad – and the hope that he can now get some sleep and not feel the need to respond instantly to emails on OA at any time of the day or night!

Tony Hey

April 4th 2013

A Journey to Open Access (5) : Open Access in the USA – The Open Access Policies of the DOE, NIH and NSF

In a previous entry I wrote about the open access policy of the NIH and their PubMed Central repository. While the NIH has set a great example for open access, it is actually another US funding agency that has been the real pioneer in making the results of its non-classified R&D accessible to both researchers and the general public for over fifty years. This is the DOE, the US Department of Energy – not the NSF as one might have expected. The DOE policy was established in the 1940’s by none other than General Groves, who had led the Manhattan atomic bomb project in such secrecy during the war:

It was just over 60 years ago that General Leslie Groves, commanding the Manhattan Engineer District in Oak Ridge, TN, mandated that all classified and unclassified information related to the Atomic Bomb be brought together into one central file. Thus, in 1947, the precursor to the Office of Scientific and Technical Information (OSTI, www.osti.gov) was born.

From the OSTI website we read:

‘Established in 1947, DOE’s Office of Scientific and Technical Information (OSTI) fulfills the agency’s responsibilities related to the collection, preservation, and dissemination of scientific and technical information emanating from DOE R&D activities. This responsibility has been codified in the organic, or enabling, legislation of DOE and its predecessor agencies and, more recently, was defined as a specific OSTI responsibility in the Energy Policy Act of 2005.’

The declared mission of OSTI is ‘to advance science and sustain technological creativity by making R&D findings available and useful to DOE researchers and the public’. The Office was founded on the principle that science progresses only if knowledge is shared and the corollary that accelerating the sharing of knowledge accelerates the advancement of science.




The OSTI facility is located in Oak Ridge, Tennessee.


Although I had interacted with many of the DOE Labs over the years, I am ashamed to say that I only became aware of the activities of OSTI a few years ago. This was through our work with the British Library on Virtual Research Environments. It was Richard Boulderstone who first told me about OSTI and its leadership of the international consortium called the WorldWideScience Alliance (see http://worldwidescience.org). This is a federation of 70 national science portals giving access to over 80 research databases. OSTI have been instrumental in developing a multilingual federated search tool that allows a user to search all of these individual databases. Microsoft Research was involved in developing the translation service for the search tool using Microsoft Translator. When a user enters a query, it is translated into the appropriate language and sent to all of the WorldWideScience databases. The results are returned in relevance-ranked order, translated back into the user’s preferred language. Ten languages are currently supported: Arabic, Chinese, English, French, German, Japanese, Korean, Portuguese, Russian, and Spanish. (See http://research.microsoft.com/en-us/projects/translator/).

In 2011, OSTI partnered with Microsoft Research again and used the Microsoft Research Audio Video Indexing System (MAVIS) tool to build a multimedia search engine called ScienceCinema. (For details of the MAVIS project see http://research.microsoft.com/en-us/projects/mavis/). This makes approximately 1,000 DOE videos available and searchable by the public. ScienceCinema content continues to grow with the recent initial installment from the multimedia collection of CERN, the European Organization for Nuclear Research. ScienceCinema was launched in February 2011 and named as one of six new initiatives in DOE’s Open Government Plan 2.0 (http://www.osti.gov/sciencecinema/).

OSTI are also active in a number of other exciting open access projects such as ScienceAccelerator.gov and Science.gov, which bring together R&D information from 13 federal agencies. In addition, OSTI’s E-print network (http://www.osti.gov/eprints) provides a gateway to 35,000 websites and databases worldwide – including arXiv – and some 30,000 scientific and technical information institutional repositories. The network contains more than 5 million e-prints and its contents are searchable via Science.gov.

Jim Gray, in his January 2007 talk to the Computer Science and Telecommunications Board of the US National Research Council, called for federal science agencies to ‘establish digital libraries that support other sciences in the same way the National Library of Medicine supports the biosciences’. On February 15th of this year OSTI announced the launch of the National Library of Energy (NLE) as ‘a virtual library and open government resource to advance energy literacy, innovation and security’. The OSTI NLE search tool gives users easy access to all the major DOE information sources on energy – not only R&D results but also relevant information and technology for home-owners as well as analyses of the energy market (http://www.osti.gov/nle/ ).

The latest innovation in open access from OSTI is its development of a portal called PAGES – a Public Access Gateway for Energy and Science. This will be a web-based portal that ensures that scholarly publications resulting from DOE research are publicly accessible and searchable at no charge to readers. The research papers will either be accessible through links to publisher sites for articles that they make publicly accessible or links to a copy of the final accepted manuscript hosted in a repository, after an agreed embargo period.

To conclude, in this post I wanted to highlight the pioneering role that OSTI and its staff have played in fostering public access to non-classified R&D results from the Department of Energy in the US for more than 50 years. More recently, the National Institutes of Health have also played a prominent role in furthering the cause of open access with their PubMed Central repository and other publicly accessible databases in the National Library of Medicine. By contrast, it is surprising – at least to me – that the major US funder of university research, the National Science Foundation, has not played a similarly active role in moving towards delivering open access of the results of its research. However, the NSF is to be applauded for taking the first step towards an ‘open data’ agenda by requiring all research proposals to include a data management plan.

In my last posting in this series on open access, I will discuss the recent announcement on February 22nd from the White House’s Office of Science and Technology Policy on ‘Increasing Access to the Results of Federally Funded Research’(http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf)

To be concluded …

Originally posted in 2013

A Journey to Open Access (4) : Open Access in the UK: The Finch Report and RCUK’s Open Access Policy

In the UK, the JISC organization has long pioneered the exploration of different models of open access and, in particular, the role of institutional repositories.  Although JISC’s future is now somewhat uncertain because of the recent change in its funding status to that of a charity, JISC is seen internationally as a major innovator in the use of advanced ICT in Higher Education. In Europe, only the Dutch SURF organization can match the breadth and originality of JISC programs. Such an innovative ‘applied research’ funding agency is lacking in the US – although the role of JISC is partially met by organizations such as the Mellon Foundation.

Until 2006, I was Chair of the JISC Committee in Support of Research. Our Committee was able to fund many innovative projects and initiatives, including the pilot study that led to the adoption of the Internet2 Shibboleth authentication by UK universities, the establishment of the Digital Curation Center (DCC) in Edinburgh, a test-bed ‘lambda network’ for high-data rate transfers and an experimental text mining service offered by the National Centre for Text Mining (NaCTeM) in Manchester. In April 2005, my committee produced a leaflet explaining the basics of ‘Open Access’. I particularly remember having to insist that the author of the report, one Alma Swan, now well-known to the Open Access community, should put the section on ‘Green Open Access’ via repositories before the section on ‘Gold Open Access’ Journals.

Other committees of JISC also funded a large number of projects exploring different aspects of open access repositories. From 2002 – 2005 the JISC FAIR Program – Focus on Access to Institutional Repositories – funded projects like the SHERPA project at Nottingham and the TARDis project at Southampton. From 2006 – 2007, the JISC Digital Repositories Program funded another 20 projects including the OpenDOAR project – a Directory of academic Open Access Repositories – and the EThOS project – to build a national e-thesis service. JISC also funded a Repository and Preservation Program which included the PRESERV project at Southampton that looked at preservation issues for eprints. All of this preamble is intended to show that the UK has had a vibrant and active ‘research repository community’ for over a decade. The ROAR site currently lists 250 UK university repositories. It is unfortunate that the ‘Working Group on Expanding Access to Published Research Findings’ – better known as the Finch Committee – seem to have chosen to ignore much of this seminal work.

The UK Government has adopted an explicit commitment to openness and transparency http://www.cabinetoffice.gov.uk/transparency In the context of research, this has been interpreted as making the results of ‘publicly funded research’ open, accessible and exploitable. The Government’s belief is that open access to research results will drive innovation and growth as well as increasing the public’s trust in research. With such a laudable intent, the Government set up the Finch Committee to explore how best the UK could ‘expand access to published research findings’. Unfortunately for the outcome, conventional scholarly publishers were the best represented stakeholder group on the Committee which consisted of five publishers, four researchers or university administrators, three funders and two librarians. The majority of the ‘Finch Report’ recommendations were accepted by Minister David Willets and a version of them promulgated by the combined Research Councils organization, RCUK – roughly equivalent to the NSF – in July 2012. The RCUK policy can be summarized as follows (quoting Peter Suber’s SPARC Open Access Newsletter, issue #165):

  • RCUK-funded authors ‘must’ publish in RCUK-compliant journals. A journal is RCUK-compliant if it offers a suitable gold option or a suitable green option. It need not offer both.
  • To offer a suitable gold option, a journal must provide immediate (un-embargoed) OA to the version of record from its own we site, under a CC-BY license, and must allow immediate deposit of the version of record in an OA repository, also under a CC-BY license. It may but need not levy an Author Processing Charge (APC).
  • To offer a suitable green option, a journal must allow deposit of the peer-reviewed manuscript (with or without subsequent copy-editing or formatting) in an OA repository not operated by the publisher.


To compensate the publishers – or, in the view of the Finch Committee, give them time to move their business models to accommodate the new open access world – the Finch Report advocates increasing funding to publishers ‘during a transition period’ by establishing ‘publication funds within individual universities to meet the costs of APCs’. In addition, the report also explicitly deprecates the use of institutional repositories by effectively relegating them to only providing ‘effective routes to access for research publications including reports, working papers and other grey literature, as well as theses and dissertations’.

Peter Suber, a very balanced advocate for open access, has given a detailed critique of these recommendations – as well as enumerating several erroneous assumptions made by the group about open access journals and repositories (see issue #165 of the SPARC Open Access Newsletter (http://www.earlham.edu/~peters/fos/newsletter/09-02-12.htm). Let me highlight some key points that he makes – with which I am in entire agreement.

First and foremost, we should all applaud the group for its robust statement in favor of open access:

‘the principle that the results of research that has been publicly funded should be freely accessible in the public domain is a compelling one, and fundamentally unanswerable.’

Similarly, the Finch Committee are equally forthright about their intent to induce change in the scholarly publishing industry:

‘Our recommendations and the establishment of systematic and flexible arrangements for the payment of APCs will stimulate publishers to provide an open access option in more journals.’

Minister David Willets endorsed this goal and told the Publishers Association that:

‘To try to preserve the old model is the wrong battle to fight.’

Let me be clear, these statements represent huge progress for the Open Access movement in the UK. The Government is to be commended on its stance on openness: unfortunately I feel that the Finch Committee missed an opportunity by not supporting mandated green open access repositories in addition to gold OA.

A major problem with the Finch and RCUK endorsements of gold OA as the preferred route to open access – and their explicit deprecation of green OA – is that the proposed interim settlement is unreasonably generous to the publishers at the expense of the UK Research Councils and HEFC-funded UK universities. By giving publishers the choice of being paid for gold OA or offering an unpaid green OA option, it is clear that publishers will cancel their green option and opt to pick up more money by introducing a gold option. Their shareholders would demand no less. Even the majority of OA publishers who currently charge no APC fee – contrary to the assumptions of the Finch Group – will be motivated to pick up the money on the table. Similarly, publishers who now only offer Toll Access via subscriptions will be quite happy to pick up more money by offering a gold OA option in addition to their subscription charges.

As I made clear in Part 2 of this series of articles on open access, the serials crisis means that universities are already unable to afford the subscriptions to Toll-Access (TA) journals that the publishers are offering. To offer them more money to effect some change that they should have initiated over a decade ago seems to me to make no sense. Instead of making generous accommodations for the interests of publishers, the Finch Group should have looked at the problem purely from the point of view of what was in the public interest. Now that publishers receive articles in electronic form, and research papers can be disseminated via the Web at effectively zero cost, what have publishers done in the last fifteen years or more to adapt their business models to these new realities? The answer is that they have raised journal prices by far more than the rise in the cost of living. It is this rise in subscription costs that has resulted in subscription cancellations – not competition caused by the availability of articles in green open access repositories.

Despite green OA approaching the 100% level in Physics, both the American Physical Society and the Institute of Physics have said publicly that they have seen no cancellations they can attribute to arXiv and green OA. Similarly, the Nature Publishing Group has said openly that ‘author self-archiving [is] compatible with subscription business models’. The American Association for the Advancement of Science (AAAS) – who publish ‘Science’ – also ‘endorse the green-mandating NIH policy’. There is much concern in the Finch Report for Scholarly Society publishers. In fact a survey in December 2011 showed that 530 scholarly societies currently publish over 600 OA journals. While it is true that some societies use subscription prices to subsidize other member activities, this need not be the case. Now that we have the Web, the monopoly endowed by ownership of a printing press is gone forever. Just ask the music industry or the news media.

Let me give three anecdotal examples of the serials crisis:

  • In 2007 the University of Michigan’s libraries cancelled about 2,500 journal subscriptions because of budget cuts and the increasing costs of the subscriptions.
  • In 2008, Professor Stuart Sheiber of Harvard explained ‘that cumulative price increases had forced the Harvard library to undertake “serious cancellation efforts” for budgetary reasons’.
  • In 2009 – 2011, the UC San Diego Libraries continued to cancel journal subscriptions because of budget cuts and increasing costs of subscriptions. Around 500 titles ($180,000 worth) were canceled in FY 2009/10, and about the same number were projected to be cancelled in FY 2010/11. It also cancelled many of its satellite libraries.


In fact, any research university library around the world will have a similar story to tell. When even such a relatively wealthy university as Harvard has problems with journal subscription increases surely it is time to take note!

The transitional period envisaged by Finch and RCUK is projected to cost the UK Research Councils and Universities a minimum of £37M over the next two years. This is money that will have to come out of hard-pressed Research Council budgets and already reduced university HEFC funding. Instead of continuing to listen to the special pleading of publishers, what is needed now is some leadership from RCUK. They need to put in place a policy with some sensible provisions that do not unduly ‘feather-bed’ the publishers and that is also affordable by UK universities. Instead of being overly concerned with the risks of open access to commercial publishers, RCUK should remember its role as a champion of the public interest.

What should RCUK do now? In my opinion, RCUK could make a very small but significant change in its open access policy and adopt a rights-retention green OA mandate that requires ‘RCUK-funded authors to retain certain non-exclusive rights and use them to authorize green OA’. In the words of Peter Suber, this would ‘create a standing green option regardless of what publishers decide to offer on their own.’ In addition, RCUK should recommend that universities follow the Open Access policy guidelines of Harvard, set out by their Office of Scholarly Communication http://osc.hul.harvard.edu/authors/policy_guide. Under this policy, Harvard authors are required to deposit a full text version of their paper in DASH, the Harvard Open Access Repository even in the case where the publisher does not permit open access and the author has been unable to obtain a waiver from the publisher.

The scholarly publishers have had plenty of time to read the writing on the wall. They have shown their intransigence to adjust to the new reality for more than fifteen years. It seems manifestly unreasonable to give them a very significant amount of more money and more time to do what they should have been exploring fifteen years ago. By insisting on a green option RCUK will help generate the required and inevitable changes to the scholarly publishing business and get a fairer deal for both academia and the tax-paying public.

In this short overview I have omitted many subtleties and details – such as embargo times, ‘libre green’, CC-BY licenses and other flavors of green OA. Peter Suber’s SPARC Open Access Newsletter #165 and his book on Open Access (MIT Press Essential Knowledge Series, 2012) gives a much more complete discussion with detailed references.

Also, in the interests of full disclosure, I should stress that I am not ‘anti-publisher’ and have been an editor for the Wiley journal, ‘Concurrency and Computation: Practice and Experience’ (CCP&E), for many years. In fact it is ironic that my University, Southampton, could not afford to subscribe to CCP&E even though it was essential reading for my research group of over 30 researchers. From this experience, and from my time as Dean of Engineering, I came to believe that the unsustainable, escalating costs of journal subscriptions together with the advent of Web have irrevocably changed what we require from the scholarly publishing industry. And, after working with many different research disciplines during my time as the UK’s e-Science Director, and now at Microsoft Research, I have seen at first hand the inefficiencies of the present system and the large amount of unnecessary ‘re-inventing the wheel’ that goes on in the name of original research. Because of this I passionately believe that open access to full text research papers and to the research data can dramatically improve the efficiency of scientific research. And the world surely needs to solve some major health and environmental challenges!

To be continued …

Originally posted in 2013

A Journey to Open Access (3): Jim Gray and the Coming Revolution in Scholarly Communication

When I joined Microsoft in 2005 to create an ‘eScience’ research program with universities, Turing Award winner Jim Gray became a colleague as well as a friend. I had first met Jim in 2001 and spent the next four years having great debates about eScience. Roughly speaking, eScience is about using advanced computing technologies to assist scientists in dealing with an ever increasing deluge of scientific data. Although Jim was a pioneer of relational databases and transaction processing for the IT industry, he had recently started working with scientists to demonstrate the value of database technologies on their large datasets and to use them to ‘stress test’ Microsoft’s SQL Server product. With astronomer Alex Szalay from Johns Hopkins University, Jim and some of Alex’s students built one of the first Web Services for scientific data. The data was from the Sloan Digital Sky Survey (SDSS) – which is something like the astronomical equivalent of the human genome project. Although the tens of Terabytes of the SDSS now seems a quite modest amount of data, the Sloan survey was the first high resolution survey of more than a quarter of the night sky. After the first phase of operation, the final SDSS dataset included 230 million celestial objects detected in 8,400 square degrees of imaging and spectra of 930,000 galaxies, 120,000 quasars, and 225,000 stars. Since there are only around 10,000 or so professional astronomers, publishing the data on the Skyserver web site http://cas.sdss.org/dr7/en/  constituted a new model of scholarly communication – one in which the data is published before it has all been analyzed. The public availability of such a large amount of astronomy led to one of the first really successful ‘citizen science’ projects. GalaxyZoo,  http://www.galaxyzoo.org/, asked the general public for help in classifying a million galaxy images from the SDSS. More than 50 million classifications were received by the project during its first year, and more than 150,000 people participated. Jim’s SkyServer and the Sloan Digital Sky Survey pioneered not only open data and a new paradigm for publication but also a crowd-sourcing framework for genuine citizen science.

Jim also worked with David Lipman and colleagues at the National Center for Biotechnology Information, NCBI, a division of the National Library of Medicine (NLM) at the National Institutes of Health (NIH). The NIH had established a policy on open access that required

‘all investigators funded by the NIH submit … to the National Library of Medicine’s PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication.’

The NIH’s PubMed Central deposit policy was initially voluntary, but was signed into law by George W. Bush in late 2007. The percentage compliance rate then improved dramatically and now the NIH have taken a further step of announcing that, sometime in 2013, they ‘will hold processing of non-competing continuation awards if publications arising from grant awards are not in compliance with the Public Access Policy.’

PubMed Central is a freely accessible database of full-text research papers in the biomedical and life sciences. The clear benefits of such an open access archive of peer-reviewed papers are summarized on the NIH website  http://publicaccess.nih.gov/FAQ.htm#753

‘Once posted to PubMed Central, results of NIH-funded research become more prominent, integrated and accessible, making it easier for all scientists to pursue NIH’s research priority areas competitively. PubMed Central materials are integrated with large NIH research data bases such as Genbank and PubChem, which helps accelerate scientific discovery. Clinicians, patients, educators, and students can better reap the benefits of papers arising from NIH funding by accessing them on PubMed Central at no charge. Finally, the Policy allows NIH to monitor, mine, and develop its portfolio of taxpayer funded research more effectively, and archive its results in perpetuity.’

Jim’s work with NCBI was to help them develop a ‘portable’ version of the repository software, pPMC, that could be deployed at sites in other countries. In the UK, the Wellcome Trust, a major funder of biomedical research, had adopted a similar open access policy to the NIH. With assistance from NCBI, Wellcome collaborated with the British Library and JISC to deploy the portable version of PubMed Central archive software. The UKPubMed Central repository was established in 2007. Just last year, this was enlarged and re-branded as EuropePubMed Central http://europepmc.org/ since this service is now also supported by funding agencies in Italy and Austria and by the European Research Council. PMC Canada was launched in 2009.

NCBI were also responsible for developing two, XML-based, Document Type Definitions or DTDs:

‘The Publishing DTD defines a common format for the creation of journal content in XML. The Archiving DTD also defines journal articles, but it has a more open structure; it is less strict about required elements and their order. The Archiving DTD defines a target content model for the conversion of any sensibly structured journal article and provides a common format in which publishers, aggregators, and archives can exchange journal content.’

These DTDs have now been adopted by NISO, the National Information Standards Organization, and form the basis for NISO’s Journal Article Tag Suite or JATS http://jats.nlm.nih.gov/index.html

As is now well-known, Jim Gray was lost at sea at the end of January 2007. A few weeks before this tragic event, Jim had given a talk to the National Research Council’s Computer Science and Telecommunications Board. With Gordon Bell’s encouragement, I and two colleagues edited a collection of articles about Jim’s vision of a ‘Fourth Paradigm’ of data-intensive scientific research http://research.microsoft.com/en-us/collaboration/fourthparadigm/default.aspx The collection also included a write-up of Jim’s last talk in which he talked about not one, but two revolutions in research. The first revolution was the Fourth Paradigm; the second was about what he called ‘The Coming Revolution in Scholarly Communication’. In this section, Jim talked about the pioneering efforts towards open access for NIH funded life sciences research with NCBI’s full-text repository PubMed Central. But he believed that the Internet could do much more than just make available the full text of research papers:

‘In principle, it can unify all the scientific data with all the literature to create a world in which the data and the literature interoperate with each other (Figure 3). You can be reading a paper by someone and then go off and look at their original data. You can even redo their analysis. Or you can be looking at some data and then go off and find out all the literature about this data. Such a capability will increase the “information velocity” of the sciences and will improve the scientific productivity of researchers. And I believe that this would be a very good development!’

I include his Figure 3 below:

After talking about open access and overlay journals, peer review, publishing data, Jim goes on to discuss the role that ontologies and semantics will play on the road from data to information to knowledge. As a specific example, he talks about Entrez, a wonderful cross-database search tool supported by the NCBI:

‘The best example of all of this is Entrez, the Life Sciences Search Engine, created by the National Center for Biotechnology Information for the NLM. Entrez allows searches across PubMed Central, which is the literature, but they also have phylogeny data, they have nucleotide sequences, they have protein sequences and their 3-D structures, and then they have GenBank. It is really a very impressive system. They have also built the PubChem database and a lot of other things. This is all an example of the data and the literature interoperating. You can be looking at an article, go to the gene data, follow the gene to the disease, go back to the literature, and so on. It is really quite stunning!’

This was Jim’s vision for the future of scientific research – an open access world of full text publications and data, a global digital library that can truly accelerate the progress of science. Of course, the databases at NCBI are all carefully curated and marked up using the NLM DTDs. Outside NCBI’s walled garden, in the wild world, we have a plethora of different archives, repositories and databases – and replicating the success of a federated search tool like Entrez will be difficult. Yet this is the vision that inspires me. And it is this vision that leads me to support the open access movement for more than just the blunt economic facts that the university library system can no longer afford what publishers are offering.

To be continued …

Originally posted in 2013


A Journey to Open Access (2): University Research Management and Institutional Repositories

University Deans are required to do many things for their university, including taking some responsibility for the research output of their Faculty. Each year, capturing all forms of research deliverables – journal papers, technical reports, conference and workshop proceedings, presentations and Doctorate and Masters theses – is a necessary and important chore. This is especially important in the UK – where the research funds allocated to each department by the Government are explicitly linked to the quality of its research over a four or five year period.

First as Chair of the Electronics and Computer Science Department, and then as Dean of Engineering at the University of Southampton, I was responsible for two of these ‘Research Assessment’ cycles in the UK. It was during the preparation of these research returns that I encountered an interesting problem: the University library could no longer afford to subscribe to all the journals in which our 200 engineering faculty members – plus a similar number of postdocs and graduate students – chose to publish. This meant that just assembling the published copies of all the publications of all research staff and students became a much less straightforward exercise. The reason for this problem is well-known to librarians – it is the so-called ‘serials crisis’. This crisis is dramatically illustrated below in a graph that shows the relative growth of serial expenditures at ARL Libraries versus the consumer price index over the past twenty-five years.

These are typical expenditure curves for all university libraries – and the University of Southampton was no exception. It was for this reason that the University Library sends out a questionnaire each year asking staff which journals they would least mind cancelling! Yet the serials crisis is a curious sort of crisis in that most research staff are simply unaware of any problem. They feel free to publish in whatever journal is most appropriate for their research and see no reason to restrict their choice to the journals that the University can afford to subscribe to.

The Research Assessment exercise in the UK is intended to measure ‘research impact’ and this is judged in a number of ways. One form of research impact that can easily be measured is the number of citations by other researchers to each paper. In order to garner citations, a research paper needs to be accessible and read by other researchers. Not all researchers – and certainly not the general public whose taxes have usually helped fund the research – have access to all research journals. Physicists have solved this accessibility problem by setting up arXiv – a repository for un-refereed, pre-publication ePrints. The US National Library of Medicine has solved the accessibility problem in a different fashion. The full text of all research papers produced from research funded by the National Institutes of Health are required to be deposited in the PubMedCentral (PMC) repository after publication in a journal, usually after some ‘reasonable’ embargo period from 6 to 12 months. Similar open access policies have now been adopted by other funders of biomedical research such as the Wellcome Trust and the Bill and Melinda Gates Foundation.

The repositories PMC and arXiv are examples of subject-specific, centralized research repositories. However, it is my firm belief that each research university needs to establish and maintain its own open access ‘institutional repository’ covering all the fields of research pursued by the university. At Southampton, in the Electronics and Computer Science Department, with colleagues Les Carr, Wendy Hall and Stevan Harnad, we established a Departmental Repository to capture the full text versions of all the research output of the Department to assist us in monitoring and assessing our research impact. A graduate student in the Department, Rob Tansley, worked with Les Carr and Stevan Harnad to develop, in 2000, the EPrints open source repository software. Robert went on to work for Hewlett-Packard Laboratories in the US and wrote the DSpace Repository software in collaboration with MIT.  The EPrints and DSpace repository software are now used by many hundreds of universities around the world. For a list of repositories and software see:  http://roar.eprints.org/

As Dean of Engineering, I tried to use the example of the EPrints repository in Electronics and Computer Science as a model for the entire Engineering Faculty. By the time I left Southampton, this had only partially been implemented, but I was enormously pleased to see that by 2006 the University had mandated that all research papers from all departments must be deposited in the ‘ePrints Soton’ repository. In 2008, this was extended to include PhD and MPhil theses. For more details of Southampton’s research repository, well managed by the University Library, see: http://www.southampton.ac.uk/library/research/eprints/

There is much more that can be said about this ‘Green’ route to Open Access via deposit of full text of research papers in Institutional Repositories. For a balanced account, I recommend Peter Suber’s recent book on ‘Open Access’ published by MIT Press, to be available under Open Access 12 months after publication. Peter describes the different varieties of Open Access – such as green/gold, gratis/libre – and also issues of assigning ‘permission to publish’ to publishers versus assigning copyright (https://mitpress.mit.edu/books/open-access). In addition, the Open Archive Initiative supports two community-supported repository standards: OAI-PMH for metadata and OAI-ORE for aggregating resources from different sites into compound digital objects (http://www.openarchives.org/). Also relevant is the Confederation of Open Access Repositories or COAR whose website states:

COAR, the Confederation of Open Access Repositories, is a young, fast growing association of repository initiatives launched in October 2009, uniting and representing 90 institutions worldwide (Europe, Latin America, Asia, and North America). Its mission is to enhance greater visibility and application of research outputs through global networks of Open Access digital repositories.

Why is all this important? It is important because the present scholarly communication model is no longer viable. While many journal publishers perform a valuable service in arranging peer review and in publishing high quality paper and online journals, the unfortunate truth is that universities can no longer afford the costs of the publishers’ present offerings. For example, it was not possible for me as Dean to establish a new research area in the Faculty and have the library purchase the relevant new journals. In such an unsustainable situation, it is obvious that we need to arrive at a more affordable scholarly publishing model. However, instead of just waiting for such a model to magically emerge, university librarians need to be proactive and take up their key role as the guardians of the intellectual output of their university researchers. It is the university library that has both the resources and the expertise to maintain the university’s institutional research repository. And this is not just an academic exercise. Managing the university’s research repository will surely become a major part of the university’s ‘reputation management’ strategy. Studies of arXiv have shown there to be a significant citation advantage for papers first posted in arXiv, and subsequently published in journals, compared to papers just published in journals (arXiv:0906.5418). Similarly, it is likely that versions of research papers that are made freely available through an institutional repository will also acquire a citation advantage – although this conclusion is currently controversial. Nevertheless, like it or not, universities will increasingly be evaluated and ranked on the published information they make available on the Web. For example, the Webometrics Ranking of World Universities takes account of the ‘visibility and impact’ of web publications and includes both an ‘openness’ and an ‘excellence’ measure for research repositories and citations (http://www.webometrics.info/). I am pleased to see that Southampton features in 32nd place in Europe and 119th in their World rankings J

To be continued …

Originally posted in 2013

A Journey to Open Access (1) Green open access for over 20 years

My education into open access began over 40 years ago, when I was a practicing theoretical high energy physicist. This was in the 1970’s – in the days of typewriters – and in those days we typed up our research papers, made 100 xerox copies and submitted the original to Physical Review, Nuclear Physics or whatever journal we wanted. The copies were sent round to our ‘peer’ high energy physics research groups around the world and were known as ‘preprints’. While the paper copy to the journal was undergoing refereeing, these preprints allowed researchers to immediately build upon and refer to work done by other researchers prior to publication. This was the preprint tradition in the fast moving field of high energy physics. When papers were accepted for publication, the references to preprints that had since been published were usually updated in the published version. It has always baffled me – now that I work in the field of computer science that if anything is even faster moving than high energy physics – that there is no similar tradition. In computer science, it can take several years for a paper to get published in a journal – by which time they really only serve an archival purpose and as evidence for tenure committees. In contrast to the physics preprint system, the computer science community uses refereed workshop publications to provide a rapid – or at least more rapid – publication vehicle.

With the widespread availability of the Internet, and with the advent of the World Wide Web, theoretical physicist Paul Ginsparg set up a web site to save high energy physicists both the postage and the trouble of circulating preprints. The electronic version of the preprint – inevitably called an e-Print – is typically submitted to a journal and simultaneously posted to the arXiv website (http://arxiv.org/). This is now the standard method of scholarly communication of a very large fraction of the physics, astronomy and mathematics communities.

‘arXiv is the primary daily information source for hundreds of thousands of researchers in physics and related fields. Its users include 53 physics Nobel laureates, 31 Fields medalists and 55 MacArthur fellows, as well as people in countries with limited access to scientific materials. The famously reclusive Russian mathematician Grigori Perelman posted the proof for the 100-year-old Poincaré Conjecture solely in arXiv.’ Reference: http://phys.org/news142785151.html#jCp

The arXiv repository is now over 20 years old and has a submission rate of over 7,000 e-Prints per month and full text versions of over half a million research papers are available free both to researchers and the general public. More than 200,000 articles are downloaded from arXiv each week by about 400,000 users. Most, but not all, of the e-Prints are eventually published in a journal and this amounts to a sort of post-publication ‘quality stamp’. The apparent drawback of multiple, slightly different versions of a paper turns out not to be a serious drawback in practice. Citation counts for high energy physicists usually count either the e-Print version or the public version. A detailed study of the arXiv system by Anne Gentil-Beccot, Salvatore Mele and Travis C. Brooks is published as ‘Citing and Reading Behaviours in High-Energy Physics. How a Community Stopped Worrying about Journals and Learned to Love Repositories’. The paper is, of course, available as arXiv:0906.5418.

In the terminology of today, arXiv represents a spectacularly successful example of ‘Green Open Access’. This is the situation in which researchers continue to publish in refereed, subscription-based journals but also self-archive versions of their papers either in subject-based repositories – as for arXiv and the high energy physics community – or in institutionally-based repositories. In certain fields – such as the bio-medical area with the US PubMedCentral repository – these full-text versions may only be available to the public after embargo period of 6 or 12 months. The alternative open access model – so-called ‘Gold Open Access’ – is one in which researchers or their funders pay the journal publishers to make the full text version of the paper freely available.

Why should you care? The research described in the papers was typically funded by a grant from a Government funding agency – think NSF or NIH in the USA or RCUK in the UK. The research papers are reviewed by researchers whose salary generally also comes from the public purse. The publishers organize the review process and publish the journals – and then restrict access to these papers to those who can afford to pay a subscription to their journals. Since the research was both funded and reviewed by researchers supported by public money raised by taxes it seems not unreasonable to demand that the general public should be allowed access to this research without having to pay an additional access fee. Now that we have the Web and the technology to make perfect digital copies of documents at zero cost, it is clear that the old rules in which publishers controlled dissemination through the printing press needs to change – just like it has for music and journalism. No one begrudges publishers some reward for the efforts at quality control and supporting a prestigious ‘branded journal’ like Nature. But, as will be seen in the next post, the central issue for universities is now one of affordability of their present journal offerings. Subscription fees to journals have risen much faster than inflation over the last 15 years or more and now constitute an unreasonable ‘tax’ on scarce research funds that is now going to shareholders of the publishing companies.

To be continued …

Original post from 2103