I've never had an idea that couldn't be improved by sharing it with as many people as possible -- and I don't think anyone else has, either. That's why I have become interested in the various "Open" movements making increasing inroads into the practice of modern science. Here I will try to give a brief introduction to Open Access to research literature; in the second instalment I will look at ways in which the same concept of "openness" is being extended to encompass data as well as publications, and beyond that, what a fully Open practice of science might look like.
The original paradigm: Open Source
Although the underlying concept of information as a public good goes back at least to the invention of the printing press and the end of the aristocratic/theocratic duopoly on literacy, programmers were the first people I know of to popularize this sort of "openness" in an academic setting. Richard Stallman started the GNU Project in 1983/4 as a reaction against the rising influence of proprietary software, and a year or so later founded the Free Software Foundation, which "is dedicated to promoting computer users' rights to use, study, copy, modify, and redistribute computer programs." What Stallman and the FSF mean by "free software" is famously summed up by the dictum, "free as in speech, not free as in beer"; more precisely, they mean "free" as in:
- The freedom to run the program, for any purpose
- The freedom to study how the program works, and adapt it to your needs
- The freedom to redistribute copies
- The freedom to improve the program and release your improvements to the public
Access to the source code is a precondition for these freedoms, and many advocates prefer that the "four fundamental freedoms" also be combined with some form of copyleft (basically a licence which explicitly disallows use of the original resource in any way that restricts the four freedoms for anyone else). About a decade later the Open Source Initiative appeared, offering itself as a "more pragmatic" approach to free software. The two definitions are pretty similar, though the OSI version allows some licencing that the FSF considers too restrictive of end users. Today, both the FSF and the OSI are powerhouse advocates for non-proprietary software, code that you can get your hands on and hack to your heart's content. There is a wealth of free software freely available for scientific purposes: for instance, the OpenScience Project maintains a list, as do (inter many alia) the NCEAS, the CBS and Indiana University. The NIH and EBI both maintain extensive services, there's an entire Linux distribution for science, SourceForge lists over 350 projects under "scientific", and a simple google search finds dozens of free applications for molecular biology.
Open Access
By analogy with Open Source, Open Access to the research literature entails the freedom to read, use and redistribute the published results of scholarly research and derivative works based on those publications. What follows is a version of Peter Suber's very brief introduction to OA; for more details, see his full Open Access Overview and Timeline of the OA Movement. The bottom line is this:
Open-access (OA) literature is digital, online, free of charge, and free of most copyright and licensing restrictions. What makes it possible is the internet and the consent of the author or copyright-holder.
Most scholarly journals do not pay authors, who therefore do not lose revenue by publishing under OA conditions. Thus the controversies about OA to music and film (was Napster "piracy"? did it cost any actual musicians any money?) do not apply to the scholarly literature, the authors of which are clearly better off if access to their work is not restricted. Online publishing is much less expensive than its print-only ancestor, but it is not free; the big question of OA is how to pay the bills that do remain without charging access fees. Nearly all current OA models reduce to one of two basic blueprints: OA archives/repositories, and OA journals.
OA archives or repositories simply make their contents freely available to the world. They may contain preprints (the author's version prior to peer review), refereed postprints, or both. Archiving preprints does not require any form of permission, and a majority of journals already permit authors to archive their postprints. Archives which comply with the metadata harvesting protocol of the Open Archives Initiative are interoperative and can be searched as though they comprised a single (enormous, virtual) database, using high-level services such as OAIster. There are a number of open-source software packages available for building and maintaining OAI-compliant archives; Peter Suber maintains a list of lists of such archives, and SHERPA maintains a database of journal policies regarding pre/post-print archiving. Archives cost very little to set up and maintain, and increasing numbers of universities and research institutions are building their own. PubMed Central, maintained by the NIH, is probably the largest and best-known in biomedical science. ArXiv, run by Cornell University, is the principal means of transfer of research results for many (if not most) mathematicians and physicists. Stevan Harnad, a leading advocate of self-archiving, maintains a comprehensive self-archiving FAQ file.
OA journals are in most respects the same sorts of entities as traditional paid-access journals, but without the access fees. They perform peer review, and make the refereed articles available free to all comers. They pay the bills in a number of different ways. About half charge author-side fees, though who actually pays these is widely variable (author, author's institution, funding body, etc.). Publishing in an OA journal is obviously 100% compatible with self-archiving. The DOAJ currently lists nearly 2500 peer-reviewed OA journals, of which more than 700 are searchable at the article level; for larger lists of OA journals which may or may not be peer-reviewed, see JournalSeek or Yahoo's Free Full Text. Three of the most prominent OA journal publishers are the Public Library of Science, Hindawi Publishing and BioMed Central, and a number of traditional publishing companies now offer OA options.
A personal example
I have yet to publish any data here in the US, but I published a dozen or so articles while I was at the University of Queensland. More than half of these are not freely available from the journals in which they were published (J Clin Virol, Virology, Biochim Biophys Acta, Mol Biochem Parasitol, Acta Tropica -- all Elsevier journals, pfui! -- and Rev Med Virol from Wiley InterScience). I couldn't find any full-text copies online using Google Scholar or PubMed, either. You cannot read these seven papers of mine without paying a fee (usually around $30) or physically going to a library which carries (and has therefore paid for) the journal and issue in question. Neither can my professional colleagues, unless their institution happens to subscribe to the journal or some package which includes it; these subscription fees are commonly extortionate (Elsevier being a particularly egregious offender).
For you as a taxpayer, this means that you are denied access to information you've already paid for (since I've always been funded by government grants). For me as a scientist, it means that more than half of my life's work to date is, while not useless, certainly of much less use to the world than it might be. Given that a large part of why I do what I do is that I want to leave the world a better place than I found it, that is simply not acceptable to me. Fortunately, according to RoMEO, all of the journals concerned allow postprint archiving by authors, so I might be able to rescue it. Searching for "queensland" in DOAR (one of a number of such directories) leads me to ePrints UQ, so there is a relevant archive for me to use, but there's a catch: you have to be a current UQ staff member to deposit. I can (and will) talk to David Harrich, my boss at the time, about archiving all of our HIV papers, since Dave is still at UQ. My schistosomiasis papers, though, have no one on the author lists who could deposit them, so I'll have to contact the staff at ePrints UQ and see whether there's a way for ex-staff to deposit articles. If there isn't, I'll have to either find another repository that will take the articles, or make one of my own. Since my current employers don't have an institutional repository, I'm going to have to make that choice anyway for upcoming papers. Both arXiv and Cogprints will take biology papers, although mine don't seem to fit into any of their categories, and Peter Suber has mentioned building a Universal Repository in collaboration with the Internet Archive, but I'm not sure if anything has come of that endeavour. That leaves me with the option of building my own archive, for the purposes of which there are numerous open-source software packages available. Alternatively, at least as a first step, I could simply upload the papers to my own webspace somewhere and try to make sure the the Internet Archive and Google Scholar know about them, so that they would be available though not interoperable with other repositories. Finally, there's one last catch: Elsevier won't let me use their pdf versions, and I don't have the original files in most instances. So whatever I do, I'm going to have to track down the published versions and then reverse-engineer an "unofficial" version.
Why would I go to all this trouble? Because OA offers significant benefits and advantages to a variety of stakeholders:
Benefits of Open Access
1. Maximal research efficiency. The usual version of Linus' Law says that given enough eyeballs, all bugs are shallow -- meaning that with enough people co-operating on a development process, nearly every problem will be rapidly discovered and solved. The same is clearly true of complex research problems. and OA provides a powerful framework for co-operation. For instance, Brody et al.
showed that, for articles in the high-energy physics section of arXiv
(one of the oldest archives available for such study), the time between
deposit and citation has been decreasing steadily since 1991, and
dropped by about half between 1999 and 2003. Alma Swan explains:
"the research cycle in high energy physics is approaching maximum
efficiency as a result of the early and free availability of articles
that scientists in the field can use and build upon rapidly".
Moreover, the machine readability of a properly formatted body of open access literature opens up immense new possibilities. Paul Ginsparg, founder of arXiv, observes:
True open access permits any third party to aggregate and data mine the articles, themselves treated as computable objects, linkable and interoperable with associated databases. We are still just scratching the surface of what can be done with large and comprehensive full-text aggregations.
...exciting new developments in text-mining and data-mining are beginning to show what can be done to create new, meaningful scientific information from existing, dispersed information using computer technologies. Research articles and accompanying data files can be searched, indexed and mined using semantic technologies to put together pieces of hitherto unrelated information that will further science and scholarship in ways that we have yet to begin imagining. These technologies are just in their infancy at the moment. Real scientific advances will be made using them but the technologies can only be applied effectively to the open access corpus: literature and data hidden behind journal or databank access restrictions are invisible to the computer tools that can do this work...
Examples of such precocious infants include cheminformatics.org and the family of utilities and tools available through the NIH/NLM's PubMed interface.
2. Maximal return on public investment. Just as OA is, at least for now, primarily (though not exclusively) aimed at literature for which the authors are not paid any kind of royalty, so one obvious focus of attention is government-funded research. Why should taxpayers pay twice, once to support the research and then again when the scientists they are funding need access to the literature? More importantly, open access to a body of knowledge makes that knowledge more available and useful to researchers, physicians, manufacturers, inventors and others who make of it the various socially desirable outcomes, such as advances in health care, that government funding of research is intended to produce. Peter Suber has gone over this intuitive position in some detail here.
3. Advantages for authors. There are well over 20,000 scholarly journals, and even the best-funded libraries can afford subscriptions to only a fraction of them. OA offers authors a virtually unlimited, worldwide audience: the only access barrier is internet access (which is, of course, cheaper to provide in poorer nations than comprehensive libraries of print journals would be!). There is a large and steadily growing body of evidence showing that OA measurably increases citation indices (that is, the number of times other papers refer to a given article). For instance, of the papers published in the Astrophysical Journal in 2003, 75% are also available in the OA arXiv database; the latter papers account for 90% of the citations to any 2003 Astrophysical Journal article, a 250% citation advantage for OA. Repeating the exercise with other journals returns similar results.
Not only is this of vital importance to academics when it comes to applying for funding or competing for tenure, it's more or less the whole damn point of publishing research in the first place: so that other people can read and use it!
4. Advantages for publishers: the benefits that accrue to authors of OA works also work to the advantage of publishers: more widely read, used and cited articles translates to more submissions and a wider audience for advertising, paid editorials and other value-add schemes.
5. Advantages for administrators. One of the best available proxy measures for research impact is citation counting: how many times has a given paper been cited by other researchers in their published work? This idea led to the development of the impact factor, a measure of a particular journal's importance within its own field. These sorts of bibliometric indicators are relied upon heavily by science administrators making decisions about funding, by faculties making decisions about tenure cases, and so on. Open access, by removing the subscription barriers that splinter the research literature into inaccessible proprietary islands, raises the possibility of vast improvements in our ability to measure and manage scientific productivity.
6. Scalability. Peter Suber has pointed out that, because it reduces production, distribution, storage and access costs so dramatically, OA "accommodates growth on a gigantic scale and, best of all, supports more effective tools for searching, sorting, indexing, filtering, mining, and alerting --the tools for coping with information overload." Online distribution is necessary but not sufficient for scalability, because subscribers to paid-access journals do not have unlimited budgets even if they are enormous institutional libraries. For end users to keep pace with the explosive growth of available information, the cost of access has to be kept down to the cost of getting online.
Tune in Next Time
In the second instalment, I will look at open access to raw experimental data, cooperation over competition as a research model and the ever-expanding role of the Web in science. In the meantime, if this has piqued anyone's interest in OA (and I hope it has!), here are my Simpy collections of open access and open science links.
One Last Thing
This is an immense topic, and anyone who knows anything much about it will certainly see things I've missed or got wrong. That's what the comments are for! Blogs are conversation tools, and I'd appreciate your feedback.
Update: part 2 is here, part 3 is here.
....
This
work is licensed under a
Creative Commons Attribution 3.0 License.
Bill, you've done an excellent job of explaining something important that I knew very little about. Thanks, and welcome to 3QD!
Posted by: Abbas Raza | Monday, October 30, 2006 at 03:09 AM
*whew*
Thanks, Abbas. There was some ugliness for a while, because I used Google Docs to write the thing, and some of the resulting html was a bit funky. But I think it's OK now.
Posted by: Bill | Monday, October 30, 2006 at 03:33 AM
Bill, regarding the need to finance online peer vetting and publication, I refer you to this article about "penny per page" revenue generation:
http://computer.howstuffworks.com/penny-per-page.htm
I think there will always be a need for formal, established journal sites rather than just sticking scientific papers up at unofficial sites. Firstly, peer revue and vetting offers protection from fraud and made up data, and secondly, individual reseachers want and deserve to be recognized, rather than have their work "data mined" and cannibalised without acknowledgement.
Posted by: aguy109 | Monday, October 30, 2006 at 06:32 AM
aguy said "have their work "data mined" and cannibalised without acknowledgement."
I think arXiv deals with this danger brilliantly. It puts an indelible time-stamp on EVERY version of the manuscript you submit. One can submit newer versions forever. By the same token, older versions cannot be removed; they remain accessible to the public forever.
Posted by: rrtucci | Monday, October 30, 2006 at 09:34 AM
Now this doesnt work for every signle article but if you have this plug-in called a netpass then you can get free access to lots of subscription or pay to view articles. Its free so whatever.....Its at:
http://www.congoo.com/netpass/install
Posted by: Kate | Monday, October 30, 2006 at 04:04 PM
great read, kudos!
Posted by: Johan | Monday, October 30, 2006 at 05:21 PM
Thank you Bill Hooker for a most enlightening article. The devil is always in the details. True Open Access must also accompany true freedom of speech without any restrictions by prejudice by any source, including, but not limited to, publisher, name(s) of publisher, referees, usually unnamed (anonymous). It seems to me what you are discussing provides a golden opportunity, a long last, to require any "peer reviewed" articles to have the true names, addresses and telephone numbers of the "reviewers" posted at the article itself, along with all their critical comments. Ditto for "non accepted" or "non published" or "stepchild articles", etc. But in the end, nothing will work any differently than the present system if those humans operating it only stll follow the old prejudiced ways of exclusion and subjectivity as opposed to a goal of objectivity. For example your "bug paradigm" has totally failed in cancer research for almost a half century. In the end, "you can still only take a horse to water but you can't make it drink that water". For example, if you make Dr. Otto Warburg's 500+ scientific papers available to the Cancer Generals of the United States through all this, that sill won't guarantee the Cancer Generals actually read and consider the content of those scientific papers since they have had access to them for decades since they were published in Germany anyway, but they and the medical orthdoxy, for the most part, have been obstructing and disregarding their scientific content from the laboratory because of prejudice and corruption and conflicts of interest.
Posted by: Winfield J. Abbe | Monday, October 30, 2006 at 05:48 PM
Thanks for this post, Bill!
If anyone would like to take advantage of a free, open source solution to create your own journals - look up Open Journal Systems, at: http://pkp.sfu.ca/?q=ojs
This is part of the Public Knowledge Project. Disclosure: I am on the PKP Scholarly Communications Conference Planning Committee - for details, see http://ocs.sfu.ca/pkp2007/
Posted by: Heather Morrison | Tuesday, October 31, 2006 at 12:04 AM
I'm a UQ staff member. Email me the details when I get back from my conference on or after the 12th of November, and I'll put it up for you.
Posted by: John Wilkins | Tuesday, October 31, 2006 at 03:56 AM
My comment is rather long, so I've posted it here:
http://theparachute.blogspot.com/2006/10/nephelokykkygia.html
Posted by: Jan Velterop | Tuesday, October 31, 2006 at 10:45 AM
John: sweet, thanks! It will take me a while, because as I said I'll have to reverse-engineer non-prop. versions -- but I'll definitely take you up on that.
Everyone else -- I'm at work, but I'll have proper replies this evening.
Posted by: Bill Hooker | Tuesday, October 31, 2006 at 11:42 AM
I think you're on the right track. Thanks for letting us in on the personal side of this pervasive issue. It's no better in the humanities, either, so I feel your pain (in my wallet especially).
Posted by: David | Tuesday, October 31, 2006 at 03:00 PM
aguy,
Peer review _does not_ prevent fraudulent data from being published unless data is fraudulent in a sloppy way. I have some more comments about this here:
http://biocurious.com/peer-review-and-scientific-publishing
Also, Open Access certainly doesn't imply that papers are simply placed on "unofficial sites." The arXiv that rrtucci refers to is far from unofficial and open access journals usually operate in the same as closed journals in terms of peer review. An interesting exception is PLOS One which is experimenting with open peer review. You can read more about that here:
http://www.plosone.org/
Posted by: Andre | Tuesday, October 31, 2006 at 07:55 PM
Thank you so much for your thoughts. I will share this article with my research methods class this week along with a discussion about different open access resources.
My own little remarks on the issue:
http://rowboat.smallsclone.com/archive/2006/10/Open_Access.shtml
Posted by: Nathan | Wednesday, November 01, 2006 at 02:12 PM
aguy: I'm a fan of micropayments, but they seem to have gone the way of the Dodo (I suspect the problem was that it's so hard to make them really convenient for consumers). Even so, I don't think they'd work for scholarly journals, since most people still browse such journals in the dead-tree version, and when accessing them online only want to grab a pdf or two. Rrtucci and Andre (do read his post, and the Ginsparg essay he points to) have the peer review idea covered. Let me just add that OA, by making the research cycle faster and authors' work more widely available, actually improves each authors' chances of recognition.
Winfield J Abbe -- again, you're talking about peer review, which is entirely separate from OA. There are, as Andre points out, numerous experiments with peer review underway. Another interesting one is Biology Direct.
Kate: assuming you're not a spambot, netpass doesn't work for scholarly journals. (There are subscription packages that work sorta like that, but hoo boy are they spendy.)
David: you might be interested in Peter Suber's essay on why OA is moving slowly in the humanities, and how to speed it up.
Heather: it doesn't have to be a science journal, right? I don't want to run a science journal, but I have an odd little idea for a poetry journal...
Jan: I'll comment on your post, if a certain Prof Harnad doesn't beat me to it. :-)
Posted by: Bill | Wednesday, November 01, 2006 at 11:58 PM
Andre and Bill, thanks for detailed answers to my comment, thats what I call service! :-)
I only ever had the one paper published (virology), so I'm not in your league. What you said about dead trees was no joke: when I was at uni in the 70s there were these vast indeces taking up miles of shelf space, a sort of 5-ton paper search engine in tiny print which you had to search through to get to an abstract, even before you discovered that the journal you needed wasn't in the library...
Posted by: aguy109 | Thursday, November 02, 2006 at 02:34 AM
Bill - this was really interesting. I've been trying to keep up with the whole 'open' movement, but haven't done such a great job. I'll pass the post on to my lab, I'm curious what they'll have to say about it.
Posted by: Pam | Saturday, November 04, 2006 at 03:23 PM
Pam, that's great. If they get interested, Peter Suber's Open Access News is the place to point them. I didn't make that clear enough in the original post. Not only is OAN the best news source, the sidebar links to Peter Suber's writing on OA is the best reference collection available on the topic.
Posted by: Bill | Saturday, November 04, 2006 at 07:07 PM
A really excellent post on a subject that's interested me for a long time. I'm still exploring all your links on this one, and I'm looking forward to reading part two.
Posted by: Chandra Clarke | Tuesday, December 12, 2006 at 11:40 PM
Really nice article. I also agree with you completely that knowledge should be free and readily available for all those who need it.. I am a student in the field of IT and i strongly believe in Open source software and applications..
Posted by: Ebenezer laryea | Wednesday, December 19, 2007 at 08:18 AM