A Global View of Open Access (2) : The perspective from Brazil and the SCIELO open access portal

This second posting in the series of blog articles on global views of Open Access comes from Brazil. I am very pleased to introduce this second article by Professor Carlos Henrique de Brito Cruz, the Scientific Director of the Sao Paulo Research Foundation (FAPESP) – better known to me just as ‘Brito’. The article starts by describing a rather different direction on open access from the focus on repositories. The SCIELO portal works with Brazilian open access journals to give more visibility to Brazilian research. SCIELO was started in 1997 and now has operational collections in most countries in Latin America. Only relatively recently has there been a movement towards creating open access repositories in major research universities in Brazil.

Enjoy! Tony Hey 31 May 2013

 Open access Initiatives in Brazil and Latin America

One of the earliest open access initiatives is the portal SCIELO, created by the São Paulo Research Foundation (FAPESP) in partnership with the Pan-American Health Organization (PAHO) in 1997 to increase the visibility of a collection of scientific journals edited in Brazil. The Brazilian collection, which started with 10 journals, grew to 269 in 2013, and its articles receive 1.2 million views each day.

Originally, the proposal for the SCIELO Portal was brought to FAPESP by researchers who were motivated by the article “Lost science in the third world”, by W. Wayt Gibbs (http://www.nature.com/scientificamerican/journal/v273/n2/pdf/scientificamerican0895-92.pdf).  The argument put forth by the proposers[1] was that by using the (then new) possibilities offered by digital access, the articles published in Brazilian journals would gain international visibility. These objectives were achieved to an extent well beyond the initial expectations of the proposers and of FAPESP when the initial proposal was approved. According to Webometrics[2], a high visibility ranking of international repositories, SCIELO Brazil ranked first among the “Portals” in 2011, among all scientific repositories in the world. In the DOAJ portal, Brazil ranks second in the quantity of open access journals[3] (884 journals following the U.S. with 1,334 journals).

Another indicator of the success of the SCIELO idea is that it gave rise to several spin-off sites: now there are SCIELO portals in eleven countries, shown in Table 1, and collections in development in other five. There are also two “thematic” SCIELO portals: Public Health and Social Sciences.  In each of these, the team from SCIELO Brazil was instrumental in lending technology and organizational support.

From the beginning, SCIELO was much more than an open access repository, having many characteristics of a publisher. This was part of the strategy to raise the standards of the participating journals, with the objective of enhancing their visibility. An Editorial Board selects the journals, which must comply to a set of procedural and quality standards to be included. For example, they must have an international editorial board, demonstrate stable periodicity, and adhere to peer-review procedures to select articles.

Presently SCIELO Brazil offers to the selected/participating journals the following services:

a) Multilingual publication. Language is a well-known barrier for the visibility of scientific articles published in Brazil; so that SCIELO works with the editors of participating journals to facilitate the publication of the full texts in English (all articles in SCIELO have titles, summary,and keywords with English version). In 2012 54% of the full texts were in English, 62% in Portuguese, and 16% in both languages.

Table 1. Summary data on the SIELO collections.

Country Starting year Journals Documents
Collections fully operational Qty %
Argentina 2004 102 18,302 4%
Brazil 1997 269 233,500 57%
Chile 1998 89 37,156 9%
Colombia 2004 152 27,972 7%
Costa Rica 2000 11 4,721 1%
Cuba 2001 42 19,667 5%
Mexico 2003 103 15,696 4%
Portugal 2004 26 7,057 2%
South Africa 2009 23 5,553 1%
Spain 2001 33 23,328 6%
Venezuela 2000 33 14,214 3%
Total titles, fully operational   883 407,166 100%
Collections under development    
Bolivia 2009 14 2,507  
Paraguay 2007 7    
Peru 2004 14 4,932  
Uruguay 2005 10 1,803  
West Indies 2006 1 1,072  
Total titles, under development 46 10,314  
Thematic collections        
Public Health (*) 2000 15 25,502  
Social Sciences 2006 33 665  
Total titles, thematic collections   48 26,167  

b) Ahead of print publication of selected articles. This service is used by 54 journals (out of 269) and it is expected that by the end of 2014 50% of the collection will use it.

c) Online manuscript processing through ScholarOne. This service, in use by 60 journals, is being offered on a progressive basis at a rate of five additional journals per month. The use of this tool facilitates the participation of editors and reviewers from countries other than Brazil, contributing to the internationalization strategies of the journals.

d) Full text formatting with XML and the production of HTML, PDF, and EPUB (for smartphones and tablets). All journals will use this service by the end of 2014.

As of 2013, SCIELO is working with Thomson-Reuters’s Web of Science to provide the operation of the SCIELO Citation Index as a part of the Web of Knowledge (WoK) Platform. This is expected to bring a boost in the visibility of the journals in SCIELO, as all tools for search, navigation, connection to full articles will be integrated with those of the WoK..

Finally, it is worth mentioning two new initiatives that are in their final preparation stages to go into the implementation phase later in 2013.

One is the creation of open access repositories in the main universities in the state of São Paulo, Brazil (which responds for 50% of the total articles published by authors in Brazil) for all articles published with funding from FAPESP.  FAPESP adopted an open access policy, according to which all articles resulting from its funding must be made accessible openly, to an extent that considers the restrictions of the journal in which they were published. FAPESP does not want to interfere with the choice by the researchers of the journals in which they will publish their work, so the agency is willing to comply with whatever is the policy of each particular journal.

The second initiative aims at working with some journals published in Brazil, to be selected through an open call for proposals, to offer them special support to advance their professionalization, visibility, and impact. Proposals will be selected on the basis of plans submitted by editors aiming at to professionalizing their operations in a sustainable way and proposing a strategy for increasing the journal’s articles visibility and impact. Presently FAPESP is working with other funding agencies in Brazil to secure nationwide support, so that the initiative can be national.

It is our view that open access has been playing an important role in increasing the visibility of the science done in Brazil and Latin America. The results obtained with SCIELO, one of the main open access portals in the world, are very concrete and motivate new initiatives in this direction.

Carlos Henrique de Brito Cruz

Scientific Director, São Paulo Research Foundation

[1] The proposal was presented to FAPESP by Rogério Meneghini and Abel Packer, with the support of BIREME the Health Information Center of the Pan American Health Organization.

[2] http://repositories.webometrics.info/en/top_portals, accesed on May 25th, 2013.

[3] http://www.doaj.org/doaj?func=byCountry&uiLanguage=en, accessed on June 2nd, 2013.

A Global View of Open Access (1) : A French perspective on Open Access and the Episciences Initiative

This second series of blog articles on Open Access will look at the global perspective. In order to give an authoritative view the entries are by invitation and authored by relevant researchers in the different countries. I am very pleased that to open this second series with a view from France, contributed by Claude Kirchner, Executive Officer for Research and Technology Transfer for Innovation, and colleagues Laurent Romary and Pascal Guitton. In true Napoleonic fashion, France has a centralized research repository called HAL and is participating in ambitious plans to create ‘epi-journals’ – a new type of overlay journal based on peer-reviewed pre-prints (http://episciences.org/ )


Tony Hey

31st May 2013

A strong Green open access policy for France… and even more for Inria

2013 will probably appear as an important milestone in the developments of Open Access in France. On 24 January, during an open access awareness event organised by the CNRS and the national consortium of University libraries (Couperin), the French Ministry for research and higher education, Geneviève Fioraso, expressed a support to the Open Access movement, stating that « L’information scientifique est un bien commun qui doit être disponible pour tous » (“Scientific information is a public good that should be available to all”) and showed a strong preference to the green route to open access and in particular to the use of the national publication repository HAL (maintained by CNRS). She also strengthened the role of the national coordination on scientific information (BSN – Bibliothèque Scientifique Numérique) and its role to help higher education and research institutions coordinate their policy in this domain. Following this, a major memorandum of understanding was signed by 25 national institutions to state their willingness to work together in making HAL a reference repository for all research productions in France.

Inria, the French research institution for computational sciences and applied mathematics actually played a seminal role in making such progress possible. It has had a long-standing involvement in the open access movement. It was an early signatory of the Berlin Declaration in 2003 and as soon as April 2005, it officially set its own portal (HAL-Inria) on the national HAL repository. At that time, it recommended that all publications from its researchers should be deposited there. In 2006, being a signatory to the national agreement on open archiving, it accelerated its involvement in designing additional deposit, presentation and dissemination services to the HAL platform at the benefit of its researchers.

In the recent period, Inria has identified how difficult it has become to work in collaborative partnership with publishers (private, but also professional associations and learned societies) in defining new publishing business and editorial models. In this context, Inria decided to take the bull by the horns and to proactively contribute to the elaboration of such models. In the beginning of 2013 it issued a deposit mandate, whereby HAL-Inria becomes the only source of information for all reporting and assessment activities of its researchers, teams and research centres.

Going even further, Inria is now engaging forces in experimenting new publication frameworks. It is thus involved in the Episciences initiative, which aims at creating a peer-reviewing environment coupled to the deposit of pre-prints in HAL, with reduced overhead costs and maximal dissemination efficiency.

The underlying vision is that of a research infrastructure where no fee is applied to its users (whether author or reader) and which offers a set of basic services facilitating an efficient dissemination and review of scholarly papers. Like traditional journals, scientific quality is ensured by the recognition of the editorial committee that carries out the peer-reviewing process.

The epi-journal platform is conceived in the spirit of traditional peer-reviewed journals, with additional facilities resulting from its leaning against a publication repository. Indeed, open archives are now widely available and can be used by any researcher to store, index and make freely available any of his publicly accessible research documents. These documents can be for instance research papers, experiments, data, programs, videos. Such archives as arXiv or HAL are widely accessible and provide a sustainable and free service. In the case of the HAL platform for instance, papers are finely associated with affiliation information for authors, with generic long term archiving facilities, as well as additional services facilitating the creation of personal or institutional web pages.

In order to support the editorial committees for the journals hosted on the platform in their day to day business, a support in terms of editorial management will be provided. This will comprise:

  • Management of the peer-review process, comprising the channelling of community based feedback;
  • Handling the management of the journal volumes and issues;
  • Contribution to some basic quality checking tasks (bibliography, meta-data, cross-references);
  • Community management: advertising papers to various channels and social networks, moderation of online discussions;
  • General visibility: interaction with major indexing services and databases (DBLP, Thomson Reuters, Scopus…), as well as adequate mirroring on relevant thematic repositories (ArXiv, PMC, RePEc,  etc.).

Through the hosting on the national repository infrastructure HAL, all journals will benefit from a high quality technical environment comprising 24/24 7/7 services, long term archiving and proper authentication and authorization infrastructure.

As to the copyright policy, we want the IP to remain with the authors, who will only grant the journal (and hence the platform) a non-exclusive right to publish under its brand. Besides, the journals will decide on the licence to be applied, but a strong recommendation will be made to adopt a generic creative commons CC-BY (attribution) licence, which is quite adequate for scholarly purposes.

Finally, we will ensure that journal titles be freed from any private ownership. When the title is not properly hosted by an academic institution or a scientific society, a consortium of supporting organisation should be able to take ownership of such orphan titles at the service of editorial committees.

In order to provide a sustainable service, we will put together step by step a consortium of interested parties that may provide further cash or in-kind contribution to the further exploitation of the platform. It is anticipated that such contributions can be taken out of the existing scientific information budget of the interested institutions (e.g. subscriptions).

This is to our view the only way not only to master our scientific information budgets, but also to master the services we need to disseminate our research results in good conditions. Indeed, investing in such new services is just a step towards the definition of more integrated virtual research environments facilitating eScholarship.

Laurent Romary, Pascal Guitton, Claude Kirchner


A Journey to Open Access (6) : The Open Access Revolution – The Next Steps …

Since the beginning of the year, the momentum for open access to research publications has grown dramatically. On February 13th 2013, the Fair Access to Science and Technology Research (FASTR) Act was introduced in the US Senate by Jon Cornyn (R-TX) and Ron Wyden (D-OR), and in the House by Mike Doyle (D-PA), Zoe Lofgren (D-CA) and Kevin Yoder (R-KS). FASTR would require open access to peer-reviewed research papers arising from federally-funded research and would require the major federal research funding agencies – including DOE, NIH and NSF – to make these papers freely available to the public through a digital archive maintained by the agency. Significantly, the bill not only talks about the requirement of accessibility – with a suggested maximum embargo time of 6 months – but also highlights the need to maximize the utility of the research by enabling re-use:

“The United States has a substantial interest in maximizing the impact of the research it funds by enabling a wide range of reuses of the peer-reviewed literature reporting the results of such research, including by enabling automated analysis by state-of-the-art technologies.”

Such automated analysis would permit a genuine realization of the vision of the Memex put forward by Vannevar Bush in his seminal paper ‘As We May Think’. The FASTR act also includes another far-sighted requirement that federal agencies consider whether or not the terms of use should include “a royalty free copyright license that is available to the public and that permits the reuse of those research papers, on the condition that attribution is given to the author or authors of the research and any others designated by the copyright owner”. As Heather Joseph points out in her SPARC newsletter – http://www.arl.org/sparc/media/blog/with-introduction-of-fastr-congress-picks-up-the-p.shtml – this would effectively require research papers to be published under some form of Creative Commons license.

On February 22nd, just eight days after FASTR was introduced into both houses of Congress, the White House issued a directive requiring the major Federal Funding agencies “to develop a plan to support increased public access to the results of research funded by the Federal Government.” Significantly, these results include not only peer-reviewed publications but also digital data. The memorandum defines digital data “as the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as laboratory specimens.”

The White House memorandum is from John Holdren, Director of the Office of Science and Technology Policy and underlines the Obama Administration’s belief that federally-supported basic research can catalyze innovative breakthroughs that can help grow the US economy:

“Access to digital data sets resulting from federally funded research allows companies to focus resources and efforts on understanding and exploiting discoveries. For example, open weather data underpins the forecasting industry, and making genome sequences publicly available has spawned many biotechnology innovations. In addition, wider availability of peer-reviewed publications and scientific data in digital formats will create innovative economic markets for services related to curation, preservation, analysis and visualization. Policies that mobilize these publications and data for re-use through preservation and broader public access also maximize the impact and accountability of the Federal research investment. These policies will accelerate scientific breakthroughs and innovation, promote entrepreneurship, and enhance economic growth and job creation.”


We now have OA mandates coming from both the Legislative and the Executive branches of the US Government. The White House memorandum covers both research publications and research data and requires the relevant Federal Agencies to deliver a plan within six months from February 2013. It is noteworthy that both the White House memorandum and the bi-partisan FASTR bill require green open access via repositories and say nothing about gold – in contrast to the approach preferred by the Finch Report and by the Research Councils in the UK. For more commentary on both FASTR and the White House memorandum see Peter Suber’s blog:


In the USA, I believe that these developments represent a tipping point for the Open Access movement. But besides the dramatic moves towards Open Access in the US and the UK, there have also been significant developments elsewhere around the world. In Europe, a press release from the European Commission in July 2012 about the new Horizon 2020 Research Framework stated that:

“As a first step, the Commission will make open access to scientific publications a general principle of Horizon 2020 … As of 2014, all articles produced with funding from Horizon 2020 will have to be accessible … The goal is for 60% of European publicly-funded research articles to be available under open access by 2016.”

Note that like the USA – and unlike the UK – the European Commission also does not favor gold OA over green. Similarly, in Australia, the National Health and Medical Research Council (NHMRC) and the Australian Research Council (ARC) both back green OA via repositories. In July 2012, the NHMRC policy stated:

“NHMRC therefore requires that any publications arising from an NHMRC supported research project must be deposited into an open access institutional repository within a twelve month period from the date of publication.”

Following this example, ARC introduced an open access policy for ARC funded research with effect from 1 January 2013. Their policy requires that any publications arising from an ARC supported research project must be deposited into an open access institutional repository within a twelve month period from the date of publication. These are just a few of the many examples of what it is clearly now an inexorable move towards the new norm of open access for research publications.

Back in the UK, some re-thinking of Finch and RCUK’s OA policy is taking place. A recent review by the House of Lords criticized RCUK for failures in communication and for lack of clarity about its policy and guidance. Prior to a more complete review of its policy in 2014, RCUK issued a revision of its Open Access policy on the 6th March 2013. The major change was that there is now an explicit statement that although RCUK prefers gold, either green or gold is acceptable. The Department for Business, Innovation and Skills (BIS) has also launched an inquiry into open access which has yet to report. Finally, on February 25th, the Higher Education Funding Council for England (HEFCE) is consulting the research community on ‘the role of open-access publishing in the submission of outputs to the post-2014 Research Excellence Framework (REF)’. For non-UK readers, the REF is a research review process conducted by HEFCE, the major UK university funding organization, to determine national university and departmental research rankings. Their intent is ‘to require that outputs meeting the REF open access requirement (whether published by the gold or green route) shall be accessible through a repository of the submitting institution’.

Finally, in May of last year there was the inaugural meeting of a new organization called the Global Research Council (GRC) in Washington DC. The meeting was prompted by the White House Office of Science and Technology Policy who invited the NSF to host a meeting of the world’s research funding agencies to discuss global standards of peer review for basic research. The GRC is a virtual organization with members of the Governing Board from the US, Germany, Brazil, Saudi Arabia, Japan, China, Europe, Canada, Russia and India. The result of the first summit attended by around 50 research agencies was an agreed statement on ‘Merit Review’.



The second summit meeting of the GRC will take place in Berlin from 27th to 29th May 2013, hosted by the German Research Foundation (DFG) and the Brazilian CNPq agency. The main goal of this summit will be to agree on an action plan for implementing Open Access to Publications as the main paradigm of scientific communication in the following years’. Such unanimity on Open Access between the major global research funding organizations will surely bring about both a more sustainable model of scholarly communication and a more efficient research process for solving some of the major scientific challenges facing the world.

What scholarly communication structures will emerge in the future? I recommend reading an interesting paper by Paul Ginsparg, playfully titled ‘As We May Read’.


In particular, his conclusions deserve serious consideration:

“On the one-decade time scale, it is likely that more research communities will join some form of global unified archive system without the current partitioning and access restrictions familiar from the paper medium, for the simple reason that it is the best way to communicate knowledge and hence to create new knowledge. Ironically, it is also possible that the technology of the 21st century will allow the traditional players from a century ago, namely the professional societies and institutional libraries, to return to their dominant role in support of the research enterprise.”

This entry concludes this series of articles on my personal journey to Open Access. However, I must thank my colleagues at the University of Southampton in the UK who educated me and collaborated to achieve great things at the University – Wendy Hall, Les Carr, Chris Gutteridge, Steve Hitchcock, Tim Brody and Jessie Hey in the Department of Electronics and Computer Science, Mark Brown, Pauline Simpson and Wendy White in the University Library, and Alma Swan from Key Perspectives. But most of all I should thank the world’s most persistent evangelist for green open access, Stevan Harnad.

In this series I have so far only mentioned two of the three pioneers of Open Access – Paul Ginsparg, who created the physics arXiv, and David Lipman of NCBI and PubMed Central.  But the third pioneer who deserves our thanks and homage is Stevan Harnad whose ‘Subversive Proposal’ paper in 1994 was the opening salvo in what has turned out to be a twenty year battle for Open Access. Stevan has steadfastly evangelized green OA as the best way to make research publications accessible. His advocacy of the Immediate-Deposit/Optional-Access model successfully adopted by the University of Liege in Belgium is both rational and compelling. Any given deposit can be made Closed Access instead of OA for the period of any embargo time but the requirement for immediate deposit has enabled Liege to capture over 80% of its annual refereed research output in their repository. So my final words in this series are a ‘thank you’ to Stevan Harnad – and the hope that he can now get some sleep and not feel the need to respond instantly to emails on OA at any time of the day or night!

Tony Hey

April 4th 2013

A Journey to Open Access (5) : Open Access in the USA – The Open Access Policies of the DOE, NIH and NSF

In a previous entry I wrote about the open access policy of the NIH and their PubMed Central repository. While the NIH has set a great example for open access, it is actually another US funding agency that has been the real pioneer in making the results of its non-classified R&D accessible to both researchers and the general public for over fifty years. This is the DOE, the US Department of Energy – not the NSF as one might have expected. The DOE policy was established in the 1940’s by none other than General Groves, who had led the Manhattan atomic bomb project in such secrecy during the war:

It was just over 60 years ago that General Leslie Groves, commanding the Manhattan Engineer District in Oak Ridge, TN, mandated that all classified and unclassified information related to the Atomic Bomb be brought together into one central file. Thus, in 1947, the precursor to the Office of Scientific and Technical Information (OSTI, www.osti.gov) was born.

From the OSTI website we read:

‘Established in 1947, DOE’s Office of Scientific and Technical Information (OSTI) fulfills the agency’s responsibilities related to the collection, preservation, and dissemination of scientific and technical information emanating from DOE R&D activities. This responsibility has been codified in the organic, or enabling, legislation of DOE and its predecessor agencies and, more recently, was defined as a specific OSTI responsibility in the Energy Policy Act of 2005.’

The declared mission of OSTI is ‘to advance science and sustain technological creativity by making R&D findings available and useful to DOE researchers and the public’. The Office was founded on the principle that science progresses only if knowledge is shared and the corollary that accelerating the sharing of knowledge accelerates the advancement of science.




The OSTI facility is located in Oak Ridge, Tennessee.


Although I had interacted with many of the DOE Labs over the years, I am ashamed to say that I only became aware of the activities of OSTI a few years ago. This was through our work with the British Library on Virtual Research Environments. It was Richard Boulderstone who first told me about OSTI and its leadership of the international consortium called the WorldWideScience Alliance (see http://worldwidescience.org). This is a federation of 70 national science portals giving access to over 80 research databases. OSTI have been instrumental in developing a multilingual federated search tool that allows a user to search all of these individual databases. Microsoft Research was involved in developing the translation service for the search tool using Microsoft Translator. When a user enters a query, it is translated into the appropriate language and sent to all of the WorldWideScience databases. The results are returned in relevance-ranked order, translated back into the user’s preferred language. Ten languages are currently supported: Arabic, Chinese, English, French, German, Japanese, Korean, Portuguese, Russian, and Spanish. (See http://research.microsoft.com/en-us/projects/translator/).

In 2011, OSTI partnered with Microsoft Research again and used the Microsoft Research Audio Video Indexing System (MAVIS) tool to build a multimedia search engine called ScienceCinema. (For details of the MAVIS project see http://research.microsoft.com/en-us/projects/mavis/). This makes approximately 1,000 DOE videos available and searchable by the public. ScienceCinema content continues to grow with the recent initial installment from the multimedia collection of CERN, the European Organization for Nuclear Research. ScienceCinema was launched in February 2011 and named as one of six new initiatives in DOE’s Open Government Plan 2.0 (http://www.osti.gov/sciencecinema/).

OSTI are also active in a number of other exciting open access projects such as ScienceAccelerator.gov and Science.gov, which bring together R&D information from 13 federal agencies. In addition, OSTI’s E-print network (http://www.osti.gov/eprints) provides a gateway to 35,000 websites and databases worldwide – including arXiv – and some 30,000 scientific and technical information institutional repositories. The network contains more than 5 million e-prints and its contents are searchable via Science.gov.

Jim Gray, in his January 2007 talk to the Computer Science and Telecommunications Board of the US National Research Council, called for federal science agencies to ‘establish digital libraries that support other sciences in the same way the National Library of Medicine supports the biosciences’. On February 15th of this year OSTI announced the launch of the National Library of Energy (NLE) as ‘a virtual library and open government resource to advance energy literacy, innovation and security’. The OSTI NLE search tool gives users easy access to all the major DOE information sources on energy – not only R&D results but also relevant information and technology for home-owners as well as analyses of the energy market (http://www.osti.gov/nle/ ).

The latest innovation in open access from OSTI is its development of a portal called PAGES – a Public Access Gateway for Energy and Science. This will be a web-based portal that ensures that scholarly publications resulting from DOE research are publicly accessible and searchable at no charge to readers. The research papers will either be accessible through links to publisher sites for articles that they make publicly accessible or links to a copy of the final accepted manuscript hosted in a repository, after an agreed embargo period.

To conclude, in this post I wanted to highlight the pioneering role that OSTI and its staff have played in fostering public access to non-classified R&D results from the Department of Energy in the US for more than 50 years. More recently, the National Institutes of Health have also played a prominent role in furthering the cause of open access with their PubMed Central repository and other publicly accessible databases in the National Library of Medicine. By contrast, it is surprising – at least to me – that the major US funder of university research, the National Science Foundation, has not played a similarly active role in moving towards delivering open access of the results of its research. However, the NSF is to be applauded for taking the first step towards an ‘open data’ agenda by requiring all research proposals to include a data management plan.

In my last posting in this series on open access, I will discuss the recent announcement on February 22nd from the White House’s Office of Science and Technology Policy on ‘Increasing Access to the Results of Federally Funded Research’(http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf)

To be concluded …

Originally posted in 2013

A Journey to Open Access (4) : Open Access in the UK: The Finch Report and RCUK’s Open Access Policy

In the UK, the JISC organization has long pioneered the exploration of different models of open access and, in particular, the role of institutional repositories.  Although JISC’s future is now somewhat uncertain because of the recent change in its funding status to that of a charity, JISC is seen internationally as a major innovator in the use of advanced ICT in Higher Education. In Europe, only the Dutch SURF organization can match the breadth and originality of JISC programs. Such an innovative ‘applied research’ funding agency is lacking in the US – although the role of JISC is partially met by organizations such as the Mellon Foundation.

Until 2006, I was Chair of the JISC Committee in Support of Research. Our Committee was able to fund many innovative projects and initiatives, including the pilot study that led to the adoption of the Internet2 Shibboleth authentication by UK universities, the establishment of the Digital Curation Center (DCC) in Edinburgh, a test-bed ‘lambda network’ for high-data rate transfers and an experimental text mining service offered by the National Centre for Text Mining (NaCTeM) in Manchester. In April 2005, my committee produced a leaflet explaining the basics of ‘Open Access’. I particularly remember having to insist that the author of the report, one Alma Swan, now well-known to the Open Access community, should put the section on ‘Green Open Access’ via repositories before the section on ‘Gold Open Access’ Journals.

Other committees of JISC also funded a large number of projects exploring different aspects of open access repositories. From 2002 – 2005 the JISC FAIR Program – Focus on Access to Institutional Repositories – funded projects like the SHERPA project at Nottingham and the TARDis project at Southampton. From 2006 – 2007, the JISC Digital Repositories Program funded another 20 projects including the OpenDOAR project – a Directory of academic Open Access Repositories – and the EThOS project – to build a national e-thesis service. JISC also funded a Repository and Preservation Program which included the PRESERV project at Southampton that looked at preservation issues for eprints. All of this preamble is intended to show that the UK has had a vibrant and active ‘research repository community’ for over a decade. The ROAR site currently lists 250 UK university repositories. It is unfortunate that the ‘Working Group on Expanding Access to Published Research Findings’ – better known as the Finch Committee – seem to have chosen to ignore much of this seminal work.

The UK Government has adopted an explicit commitment to openness and transparency http://www.cabinetoffice.gov.uk/transparency In the context of research, this has been interpreted as making the results of ‘publicly funded research’ open, accessible and exploitable. The Government’s belief is that open access to research results will drive innovation and growth as well as increasing the public’s trust in research. With such a laudable intent, the Government set up the Finch Committee to explore how best the UK could ‘expand access to published research findings’. Unfortunately for the outcome, conventional scholarly publishers were the best represented stakeholder group on the Committee which consisted of five publishers, four researchers or university administrators, three funders and two librarians. The majority of the ‘Finch Report’ recommendations were accepted by Minister David Willets and a version of them promulgated by the combined Research Councils organization, RCUK – roughly equivalent to the NSF – in July 2012. The RCUK policy can be summarized as follows (quoting Peter Suber’s SPARC Open Access Newsletter, issue #165):

  • RCUK-funded authors ‘must’ publish in RCUK-compliant journals. A journal is RCUK-compliant if it offers a suitable gold option or a suitable green option. It need not offer both.
  • To offer a suitable gold option, a journal must provide immediate (un-embargoed) OA to the version of record from its own we site, under a CC-BY license, and must allow immediate deposit of the version of record in an OA repository, also under a CC-BY license. It may but need not levy an Author Processing Charge (APC).
  • To offer a suitable green option, a journal must allow deposit of the peer-reviewed manuscript (with or without subsequent copy-editing or formatting) in an OA repository not operated by the publisher.


To compensate the publishers – or, in the view of the Finch Committee, give them time to move their business models to accommodate the new open access world – the Finch Report advocates increasing funding to publishers ‘during a transition period’ by establishing ‘publication funds within individual universities to meet the costs of APCs’. In addition, the report also explicitly deprecates the use of institutional repositories by effectively relegating them to only providing ‘effective routes to access for research publications including reports, working papers and other grey literature, as well as theses and dissertations’.

Peter Suber, a very balanced advocate for open access, has given a detailed critique of these recommendations – as well as enumerating several erroneous assumptions made by the group about open access journals and repositories (see issue #165 of the SPARC Open Access Newsletter (http://www.earlham.edu/~peters/fos/newsletter/09-02-12.htm). Let me highlight some key points that he makes – with which I am in entire agreement.

First and foremost, we should all applaud the group for its robust statement in favor of open access:

‘the principle that the results of research that has been publicly funded should be freely accessible in the public domain is a compelling one, and fundamentally unanswerable.’

Similarly, the Finch Committee are equally forthright about their intent to induce change in the scholarly publishing industry:

‘Our recommendations and the establishment of systematic and flexible arrangements for the payment of APCs will stimulate publishers to provide an open access option in more journals.’

Minister David Willets endorsed this goal and told the Publishers Association that:

‘To try to preserve the old model is the wrong battle to fight.’

Let me be clear, these statements represent huge progress for the Open Access movement in the UK. The Government is to be commended on its stance on openness: unfortunately I feel that the Finch Committee missed an opportunity by not supporting mandated green open access repositories in addition to gold OA.

A major problem with the Finch and RCUK endorsements of gold OA as the preferred route to open access – and their explicit deprecation of green OA – is that the proposed interim settlement is unreasonably generous to the publishers at the expense of the UK Research Councils and HEFC-funded UK universities. By giving publishers the choice of being paid for gold OA or offering an unpaid green OA option, it is clear that publishers will cancel their green option and opt to pick up more money by introducing a gold option. Their shareholders would demand no less. Even the majority of OA publishers who currently charge no APC fee – contrary to the assumptions of the Finch Group – will be motivated to pick up the money on the table. Similarly, publishers who now only offer Toll Access via subscriptions will be quite happy to pick up more money by offering a gold OA option in addition to their subscription charges.

As I made clear in Part 2 of this series of articles on open access, the serials crisis means that universities are already unable to afford the subscriptions to Toll-Access (TA) journals that the publishers are offering. To offer them more money to effect some change that they should have initiated over a decade ago seems to me to make no sense. Instead of making generous accommodations for the interests of publishers, the Finch Group should have looked at the problem purely from the point of view of what was in the public interest. Now that publishers receive articles in electronic form, and research papers can be disseminated via the Web at effectively zero cost, what have publishers done in the last fifteen years or more to adapt their business models to these new realities? The answer is that they have raised journal prices by far more than the rise in the cost of living. It is this rise in subscription costs that has resulted in subscription cancellations – not competition caused by the availability of articles in green open access repositories.

Despite green OA approaching the 100% level in Physics, both the American Physical Society and the Institute of Physics have said publicly that they have seen no cancellations they can attribute to arXiv and green OA. Similarly, the Nature Publishing Group has said openly that ‘author self-archiving [is] compatible with subscription business models’. The American Association for the Advancement of Science (AAAS) – who publish ‘Science’ – also ‘endorse the green-mandating NIH policy’. There is much concern in the Finch Report for Scholarly Society publishers. In fact a survey in December 2011 showed that 530 scholarly societies currently publish over 600 OA journals. While it is true that some societies use subscription prices to subsidize other member activities, this need not be the case. Now that we have the Web, the monopoly endowed by ownership of a printing press is gone forever. Just ask the music industry or the news media.

Let me give three anecdotal examples of the serials crisis:

  • In 2007 the University of Michigan’s libraries cancelled about 2,500 journal subscriptions because of budget cuts and the increasing costs of the subscriptions.
  • In 2008, Professor Stuart Sheiber of Harvard explained ‘that cumulative price increases had forced the Harvard library to undertake “serious cancellation efforts” for budgetary reasons’.
  • In 2009 – 2011, the UC San Diego Libraries continued to cancel journal subscriptions because of budget cuts and increasing costs of subscriptions. Around 500 titles ($180,000 worth) were canceled in FY 2009/10, and about the same number were projected to be cancelled in FY 2010/11. It also cancelled many of its satellite libraries.


In fact, any research university library around the world will have a similar story to tell. When even such a relatively wealthy university as Harvard has problems with journal subscription increases surely it is time to take note!

The transitional period envisaged by Finch and RCUK is projected to cost the UK Research Councils and Universities a minimum of £37M over the next two years. This is money that will have to come out of hard-pressed Research Council budgets and already reduced university HEFC funding. Instead of continuing to listen to the special pleading of publishers, what is needed now is some leadership from RCUK. They need to put in place a policy with some sensible provisions that do not unduly ‘feather-bed’ the publishers and that is also affordable by UK universities. Instead of being overly concerned with the risks of open access to commercial publishers, RCUK should remember its role as a champion of the public interest.

What should RCUK do now? In my opinion, RCUK could make a very small but significant change in its open access policy and adopt a rights-retention green OA mandate that requires ‘RCUK-funded authors to retain certain non-exclusive rights and use them to authorize green OA’. In the words of Peter Suber, this would ‘create a standing green option regardless of what publishers decide to offer on their own.’ In addition, RCUK should recommend that universities follow the Open Access policy guidelines of Harvard, set out by their Office of Scholarly Communication http://osc.hul.harvard.edu/authors/policy_guide. Under this policy, Harvard authors are required to deposit a full text version of their paper in DASH, the Harvard Open Access Repository even in the case where the publisher does not permit open access and the author has been unable to obtain a waiver from the publisher.

The scholarly publishers have had plenty of time to read the writing on the wall. They have shown their intransigence to adjust to the new reality for more than fifteen years. It seems manifestly unreasonable to give them a very significant amount of more money and more time to do what they should have been exploring fifteen years ago. By insisting on a green option RCUK will help generate the required and inevitable changes to the scholarly publishing business and get a fairer deal for both academia and the tax-paying public.

In this short overview I have omitted many subtleties and details – such as embargo times, ‘libre green’, CC-BY licenses and other flavors of green OA. Peter Suber’s SPARC Open Access Newsletter #165 and his book on Open Access (MIT Press Essential Knowledge Series, 2012) gives a much more complete discussion with detailed references.

Also, in the interests of full disclosure, I should stress that I am not ‘anti-publisher’ and have been an editor for the Wiley journal, ‘Concurrency and Computation: Practice and Experience’ (CCP&E), for many years. In fact it is ironic that my University, Southampton, could not afford to subscribe to CCP&E even though it was essential reading for my research group of over 30 researchers. From this experience, and from my time as Dean of Engineering, I came to believe that the unsustainable, escalating costs of journal subscriptions together with the advent of Web have irrevocably changed what we require from the scholarly publishing industry. And, after working with many different research disciplines during my time as the UK’s e-Science Director, and now at Microsoft Research, I have seen at first hand the inefficiencies of the present system and the large amount of unnecessary ‘re-inventing the wheel’ that goes on in the name of original research. Because of this I passionately believe that open access to full text research papers and to the research data can dramatically improve the efficiency of scientific research. And the world surely needs to solve some major health and environmental challenges!

To be continued …

Originally posted in 2013

A Journey to Open Access (3): Jim Gray and the Coming Revolution in Scholarly Communication

When I joined Microsoft in 2005 to create an ‘eScience’ research program with universities, Turing Award winner Jim Gray became a colleague as well as a friend. I had first met Jim in 2001 and spent the next four years having great debates about eScience. Roughly speaking, eScience is about using advanced computing technologies to assist scientists in dealing with an ever increasing deluge of scientific data. Although Jim was a pioneer of relational databases and transaction processing for the IT industry, he had recently started working with scientists to demonstrate the value of database technologies on their large datasets and to use them to ‘stress test’ Microsoft’s SQL Server product. With astronomer Alex Szalay from Johns Hopkins University, Jim and some of Alex’s students built one of the first Web Services for scientific data. The data was from the Sloan Digital Sky Survey (SDSS) – which is something like the astronomical equivalent of the human genome project. Although the tens of Terabytes of the SDSS now seems a quite modest amount of data, the Sloan survey was the first high resolution survey of more than a quarter of the night sky. After the first phase of operation, the final SDSS dataset included 230 million celestial objects detected in 8,400 square degrees of imaging and spectra of 930,000 galaxies, 120,000 quasars, and 225,000 stars. Since there are only around 10,000 or so professional astronomers, publishing the data on the Skyserver web site http://cas.sdss.org/dr7/en/  constituted a new model of scholarly communication – one in which the data is published before it has all been analyzed. The public availability of such a large amount of astronomy led to one of the first really successful ‘citizen science’ projects. GalaxyZoo,  http://www.galaxyzoo.org/, asked the general public for help in classifying a million galaxy images from the SDSS. More than 50 million classifications were received by the project during its first year, and more than 150,000 people participated. Jim’s SkyServer and the Sloan Digital Sky Survey pioneered not only open data and a new paradigm for publication but also a crowd-sourcing framework for genuine citizen science.

Jim also worked with David Lipman and colleagues at the National Center for Biotechnology Information, NCBI, a division of the National Library of Medicine (NLM) at the National Institutes of Health (NIH). The NIH had established a policy on open access that required

‘all investigators funded by the NIH submit … to the National Library of Medicine’s PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication.’

The NIH’s PubMed Central deposit policy was initially voluntary, but was signed into law by George W. Bush in late 2007. The percentage compliance rate then improved dramatically and now the NIH have taken a further step of announcing that, sometime in 2013, they ‘will hold processing of non-competing continuation awards if publications arising from grant awards are not in compliance with the Public Access Policy.’

PubMed Central is a freely accessible database of full-text research papers in the biomedical and life sciences. The clear benefits of such an open access archive of peer-reviewed papers are summarized on the NIH website  http://publicaccess.nih.gov/FAQ.htm#753

‘Once posted to PubMed Central, results of NIH-funded research become more prominent, integrated and accessible, making it easier for all scientists to pursue NIH’s research priority areas competitively. PubMed Central materials are integrated with large NIH research data bases such as Genbank and PubChem, which helps accelerate scientific discovery. Clinicians, patients, educators, and students can better reap the benefits of papers arising from NIH funding by accessing them on PubMed Central at no charge. Finally, the Policy allows NIH to monitor, mine, and develop its portfolio of taxpayer funded research more effectively, and archive its results in perpetuity.’

Jim’s work with NCBI was to help them develop a ‘portable’ version of the repository software, pPMC, that could be deployed at sites in other countries. In the UK, the Wellcome Trust, a major funder of biomedical research, had adopted a similar open access policy to the NIH. With assistance from NCBI, Wellcome collaborated with the British Library and JISC to deploy the portable version of PubMed Central archive software. The UKPubMed Central repository was established in 2007. Just last year, this was enlarged and re-branded as EuropePubMed Central http://europepmc.org/ since this service is now also supported by funding agencies in Italy and Austria and by the European Research Council. PMC Canada was launched in 2009.

NCBI were also responsible for developing two, XML-based, Document Type Definitions or DTDs:

‘The Publishing DTD defines a common format for the creation of journal content in XML. The Archiving DTD also defines journal articles, but it has a more open structure; it is less strict about required elements and their order. The Archiving DTD defines a target content model for the conversion of any sensibly structured journal article and provides a common format in which publishers, aggregators, and archives can exchange journal content.’

These DTDs have now been adopted by NISO, the National Information Standards Organization, and form the basis for NISO’s Journal Article Tag Suite or JATS http://jats.nlm.nih.gov/index.html

As is now well-known, Jim Gray was lost at sea at the end of January 2007. A few weeks before this tragic event, Jim had given a talk to the National Research Council’s Computer Science and Telecommunications Board. With Gordon Bell’s encouragement, I and two colleagues edited a collection of articles about Jim’s vision of a ‘Fourth Paradigm’ of data-intensive scientific research http://research.microsoft.com/en-us/collaboration/fourthparadigm/default.aspx The collection also included a write-up of Jim’s last talk in which he talked about not one, but two revolutions in research. The first revolution was the Fourth Paradigm; the second was about what he called ‘The Coming Revolution in Scholarly Communication’. In this section, Jim talked about the pioneering efforts towards open access for NIH funded life sciences research with NCBI’s full-text repository PubMed Central. But he believed that the Internet could do much more than just make available the full text of research papers:

‘In principle, it can unify all the scientific data with all the literature to create a world in which the data and the literature interoperate with each other (Figure 3). You can be reading a paper by someone and then go off and look at their original data. You can even redo their analysis. Or you can be looking at some data and then go off and find out all the literature about this data. Such a capability will increase the “information velocity” of the sciences and will improve the scientific productivity of researchers. And I believe that this would be a very good development!’

I include his Figure 3 below:

After talking about open access and overlay journals, peer review, publishing data, Jim goes on to discuss the role that ontologies and semantics will play on the road from data to information to knowledge. As a specific example, he talks about Entrez, a wonderful cross-database search tool supported by the NCBI:

‘The best example of all of this is Entrez, the Life Sciences Search Engine, created by the National Center for Biotechnology Information for the NLM. Entrez allows searches across PubMed Central, which is the literature, but they also have phylogeny data, they have nucleotide sequences, they have protein sequences and their 3-D structures, and then they have GenBank. It is really a very impressive system. They have also built the PubChem database and a lot of other things. This is all an example of the data and the literature interoperating. You can be looking at an article, go to the gene data, follow the gene to the disease, go back to the literature, and so on. It is really quite stunning!’

This was Jim’s vision for the future of scientific research – an open access world of full text publications and data, a global digital library that can truly accelerate the progress of science. Of course, the databases at NCBI are all carefully curated and marked up using the NLM DTDs. Outside NCBI’s walled garden, in the wild world, we have a plethora of different archives, repositories and databases – and replicating the success of a federated search tool like Entrez will be difficult. Yet this is the vision that inspires me. And it is this vision that leads me to support the open access movement for more than just the blunt economic facts that the university library system can no longer afford what publishers are offering.

To be continued …

Originally posted in 2013


A Journey to Open Access (2): University Research Management and Institutional Repositories

University Deans are required to do many things for their university, including taking some responsibility for the research output of their Faculty. Each year, capturing all forms of research deliverables – journal papers, technical reports, conference and workshop proceedings, presentations and Doctorate and Masters theses – is a necessary and important chore. This is especially important in the UK – where the research funds allocated to each department by the Government are explicitly linked to the quality of its research over a four or five year period.

First as Chair of the Electronics and Computer Science Department, and then as Dean of Engineering at the University of Southampton, I was responsible for two of these ‘Research Assessment’ cycles in the UK. It was during the preparation of these research returns that I encountered an interesting problem: the University library could no longer afford to subscribe to all the journals in which our 200 engineering faculty members – plus a similar number of postdocs and graduate students – chose to publish. This meant that just assembling the published copies of all the publications of all research staff and students became a much less straightforward exercise. The reason for this problem is well-known to librarians – it is the so-called ‘serials crisis’. This crisis is dramatically illustrated below in a graph that shows the relative growth of serial expenditures at ARL Libraries versus the consumer price index over the past twenty-five years.

These are typical expenditure curves for all university libraries – and the University of Southampton was no exception. It was for this reason that the University Library sends out a questionnaire each year asking staff which journals they would least mind cancelling! Yet the serials crisis is a curious sort of crisis in that most research staff are simply unaware of any problem. They feel free to publish in whatever journal is most appropriate for their research and see no reason to restrict their choice to the journals that the University can afford to subscribe to.

The Research Assessment exercise in the UK is intended to measure ‘research impact’ and this is judged in a number of ways. One form of research impact that can easily be measured is the number of citations by other researchers to each paper. In order to garner citations, a research paper needs to be accessible and read by other researchers. Not all researchers – and certainly not the general public whose taxes have usually helped fund the research – have access to all research journals. Physicists have solved this accessibility problem by setting up arXiv – a repository for un-refereed, pre-publication ePrints. The US National Library of Medicine has solved the accessibility problem in a different fashion. The full text of all research papers produced from research funded by the National Institutes of Health are required to be deposited in the PubMedCentral (PMC) repository after publication in a journal, usually after some ‘reasonable’ embargo period from 6 to 12 months. Similar open access policies have now been adopted by other funders of biomedical research such as the Wellcome Trust and the Bill and Melinda Gates Foundation.

The repositories PMC and arXiv are examples of subject-specific, centralized research repositories. However, it is my firm belief that each research university needs to establish and maintain its own open access ‘institutional repository’ covering all the fields of research pursued by the university. At Southampton, in the Electronics and Computer Science Department, with colleagues Les Carr, Wendy Hall and Stevan Harnad, we established a Departmental Repository to capture the full text versions of all the research output of the Department to assist us in monitoring and assessing our research impact. A graduate student in the Department, Rob Tansley, worked with Les Carr and Stevan Harnad to develop, in 2000, the EPrints open source repository software. Robert went on to work for Hewlett-Packard Laboratories in the US and wrote the DSpace Repository software in collaboration with MIT.  The EPrints and DSpace repository software are now used by many hundreds of universities around the world. For a list of repositories and software see:  http://roar.eprints.org/

As Dean of Engineering, I tried to use the example of the EPrints repository in Electronics and Computer Science as a model for the entire Engineering Faculty. By the time I left Southampton, this had only partially been implemented, but I was enormously pleased to see that by 2006 the University had mandated that all research papers from all departments must be deposited in the ‘ePrints Soton’ repository. In 2008, this was extended to include PhD and MPhil theses. For more details of Southampton’s research repository, well managed by the University Library, see: http://www.southampton.ac.uk/library/research/eprints/

There is much more that can be said about this ‘Green’ route to Open Access via deposit of full text of research papers in Institutional Repositories. For a balanced account, I recommend Peter Suber’s recent book on ‘Open Access’ published by MIT Press, to be available under Open Access 12 months after publication. Peter describes the different varieties of Open Access – such as green/gold, gratis/libre – and also issues of assigning ‘permission to publish’ to publishers versus assigning copyright (https://mitpress.mit.edu/books/open-access). In addition, the Open Archive Initiative supports two community-supported repository standards: OAI-PMH for metadata and OAI-ORE for aggregating resources from different sites into compound digital objects (http://www.openarchives.org/). Also relevant is the Confederation of Open Access Repositories or COAR whose website states:

COAR, the Confederation of Open Access Repositories, is a young, fast growing association of repository initiatives launched in October 2009, uniting and representing 90 institutions worldwide (Europe, Latin America, Asia, and North America). Its mission is to enhance greater visibility and application of research outputs through global networks of Open Access digital repositories.

Why is all this important? It is important because the present scholarly communication model is no longer viable. While many journal publishers perform a valuable service in arranging peer review and in publishing high quality paper and online journals, the unfortunate truth is that universities can no longer afford the costs of the publishers’ present offerings. For example, it was not possible for me as Dean to establish a new research area in the Faculty and have the library purchase the relevant new journals. In such an unsustainable situation, it is obvious that we need to arrive at a more affordable scholarly publishing model. However, instead of just waiting for such a model to magically emerge, university librarians need to be proactive and take up their key role as the guardians of the intellectual output of their university researchers. It is the university library that has both the resources and the expertise to maintain the university’s institutional research repository. And this is not just an academic exercise. Managing the university’s research repository will surely become a major part of the university’s ‘reputation management’ strategy. Studies of arXiv have shown there to be a significant citation advantage for papers first posted in arXiv, and subsequently published in journals, compared to papers just published in journals (arXiv:0906.5418). Similarly, it is likely that versions of research papers that are made freely available through an institutional repository will also acquire a citation advantage – although this conclusion is currently controversial. Nevertheless, like it or not, universities will increasingly be evaluated and ranked on the published information they make available on the Web. For example, the Webometrics Ranking of World Universities takes account of the ‘visibility and impact’ of web publications and includes both an ‘openness’ and an ‘excellence’ measure for research repositories and citations (http://www.webometrics.info/). I am pleased to see that Southampton features in 32nd place in Europe and 119th in their World rankings J

To be continued …

Originally posted in 2013