"Houston, we have a problem. We're standing on the shoulders of old scholars, but it feels a bit shaky."
Well, no wonder. While rocket science has clear foundations, the physical laws of nature, for many other research fields it's trickier. We rely on hundreds of years of knowledge and assume (not trust) that work to be true. And that knowledge is seemingly disappearing very fast (remember my graveyard of chemical literature observation). Published literature, generally, is too hard to reproduce to be seen as an accurate capture of research history. In other words, these shoulders are 200 years old, and our support is failing.
Open Science attempts to overcome these issues. It defines an environment where all research output is important, where every one has access to shoulders, and trust can be replaced by reproducibility. This is a huge transition, ongoing for some 20 years now.
But I am happy to work with Rajarshi, Nina, Matthew, and Samuel to supporting the Open Science community in chemistry, for example, by allowing publications that describe a piece open source cheminformatics of software (Software article type). We're limited by what BioMedCentral can offer us, but within that context try to make a change.
The journal now exists 10 years, as marked by our latest editorial. We here describe our adoption of GitHub as a free, extra service, where we fork source code published in our journal, and announce our adoption of the obligatory ORCID for all authors.
These things bring me back to those shoulders. The full adoption of the ORCID allows research to be more easily found (more FAIR) and the copying of the source code aims at making the shoulders on which future cheminformatics stands more solid. Minor steps. But even minor steps matter.
Let's see where our journals takes open science cheminformatics.
Oh, and since you are reading this, I would love to see the American Chemical Society be more open to Open Science too. Please join me in requesting them to join the Initiative for Open Citations.
Because more and more cheminformatics I do is with Bioclipse scripts (see doi:10.1186/1471-2105-10-397) and that Bioclipse is currently unmaintained and has become hard to install, I decided to take the plunge and rewrite some stuff so that I could run the scripts from the command line. I wrote up the first release back in April.
Today, I release Bacting 0.0.5 (doi:10.5281/zenodo.3252486) which is the first release you can download from one of the main Maven repositories. I'm still far from a Maven or Grapes expert, but at least you can use Bacting now like this without actually having to download and compile the source code locally first:
Plan S has caused quite some discussion about what knowledge dissemination is. When it was announced, I was hesitant. But very quickly the opposition of Plan S convinced me that apparently something like Plan S is needed. I think Plan S focuses way too much on journal-channeled publishing, whereas I had rather seen it focus on Open Science (it partly does). We argued that much with cOAlition S recently (doi:10.5281/zenodo.2560200):
The risks brought forward by Plan S opponent are real. I don't always agree on the arguments, or simply just don't understand them. With some I agree, but disagree on the alternative. This has been a difficult position to follow, as some discussions taught me. For example, some claimed that I am in favor of article processing costs. Only in a toxic, black-white world, not being against them equals being in favor of them.
Journals articles have shown to be an expensive exercise of knowledge dissemination. It was the right solution, certainly 200 years ago. The cost has to be paid by someone. Via subscription (the "old" model), via package deals with nationals, universities, etc (upcoming), via a friendly funder (some wealthy foundation), or via the authors. Not accepting that the publishing costs money is utopian, if you ask me.
However, what is essential, and what too few people talk about, is that the open license of the research output. If you cannot share research output without paying again and again (instead of once), we inhibit innovation. If I cannot share literature with students, I cannot properly train them for their job.
So, it feels kinda awkward that I am considered doing something wrong, if I ensure my work is available under an CC-BY license. Check my fail rate at ImpactStory (e.g. a series of poster abstracts in Tox Sci).
Anyway, about two topics I want to clarify. First, APC should be as low as possible. That means the infrastructure should be efficient, reducing the amount of work. Open infrastructures likely have an important role here. Why do we not have open source articles submission platforms? Why don't we have open standard XML formats with matching editors so that we can submit articles in that format, rather than LaTeX or Word? Etc.
Every cent I spend on APC, I cannot spend on other research tasks. One obvious answer then, IMHO, is to return to publishing less in journals, and sharing more via other, better channels, such as open databases. I find it hard to reconcile complainers about the cost of publishing, but insisting on expensive business models.
I did not always pay this. There are reductions, sometimes a co-author pays, etc. But I have no problem paying for services rendered. And when I paid, it was always part of my job, and my employer (or project) pays. Now, there are rumors that scholars sometimes have to pay on their own account, as if it is representation cost. I'm appalled by this. I think the employers are bullying their scholars in an unacceptable way. There was a lot of discussion about academic freedom, but your employer forcing you this way into publishing in certain journals sounds like an example of that. We can discuss who is responsible for this: the funder or the employer. I know my answer.
Scholarly societies Two other aspects in the discussions are "what about poor countries" and "what about scholarly societies". I like to combine these. I welcome scholarly societies to pick up knowledge dissemination, in an open science way. I wish all scholarly societies would do that. But I am not sure why that necessarily has to be coupled at sponsoring society activities. That particularly feels awkward in the notion that we tend to have national societies. Why?
Why should an African scholar have to fund educational activities held in the United States or Europe via publishing in their journals? What is wrong with me paying a scholarly society APC so that everyone in the world can read my literature? What is wrong with wanting them to have access to all literature?
What is wrong with me wanting to be able to read all literature? Despite The Netherlands not being a poor country, Maastricht University is far from a rich university, and I regularly run into paywalls myself.
Yes, asking the Global South, or anyone (like a small SME) to pay 5000 euro is a lot (hell, for me it is; I'm happy that that is rare). Most publishers are not doing that. There is price differentiation and the Global South doesn't pay the European prices (tho publishers must do better in being transparent about this), which in response, some see as patronizing or even colonial (dividing the world in economic zones is quite common; is it unethical? well, there are more aspects of our economic systems I am not happy about).
I think the bigger problem is why Western scholars (the Global North?) is not publishing in journals published in/by the Global South. Why is that?
If we want a scholarly community to be internationally inclusive, why do we still have national scholarly societies? Maybe we can stop with that, please? What if I was not member of the Dutch chemical society, KNCV, but I was member of the Chemical Society, an scholarly society independent from continent or country?
Now, I am happy to see others are thinking in this direction too. For example, the Metabolomics Society takes this approach and a growing group of universities is rebooting the idea of a university publisher, but not limited to one university of even country (link to be added later: XXXX; once I found it again).
Because if we keep insisting on publishing in Global North (or western-led) journals (e.g. journals of Global North societies), I think we have a bigger problem than APCs, with respect to the North/South divide (and there certainly is a problem!).
I'm looking forward to reading your thoughts on how we can really reform open science knowledge dissemination.
Henry Rzepa asked us the following recently: ChemRxiv. Why? I strongly recommend reading his pondering. I agree with a number of them, particularly the point about the following. To follow the narrative of the joke: "how many article versions does it take for knowledge to disseminate?", the answer sometimes seems to be: "at least three, to make enough money of the system".
Now, I tend to believe preprints are a good thing (see also my interview in Preprintservers, doen of niet?, C2W, 2016. PDF): release soon, release often has served open science well. In that sense, a preprint can be like that: an form of open notebook science.
However, just as we suffer from data dumps for open source software, we see exactly the same with (open access) publishing now. Is the paper ready to be submitted for peer review, oh, let's quickly put it on a preprint server. A very much agree with Henry that the last thing we are waiting for is a third version of a published article. This is what worries me a great deal in the "green Open Access" discussion.
But it can be different. For example, people in our BiGCaT group actually are building up a routine of posting papers just before conferences. Then the oral presentation gives a laymens outline of the work, and if people want to really understand what the work is about, they can read the full paper. Of course, with the note that a manuscript may actually not be sufficient for that, so the preprint should support open science.
But importantly, a preprint is not a replacement for an proper CC-BY-licensed version of record (VoR). If the consensus that that is what preprints are about, then I'm no longer a fan.
Part of the LIPID MAPS classification scheme in Wikidata (try it).
A bit over a week I attended LIPID MAPS Working Group meeting in Cambridge, as I have become member of the Working Group 2: Tools and Technical Committee in autumn. That followed a fruitful effort by Eoin Fahy to make several LIPID MAPS pathways available in WikiPathways (see this Lipids Portal), e.g. the Omega-3/Omega-6 FA synthesis pathway. It was a great pleasure to attend the meeting, meet everyone, and I learned a lot about the internals of the LIPID MAPS project.
I showed them how we contribute to WikiPathways, particularly in the area of lipids. Denise Slenter and I have been working on having more identifier mappings in Wikidata, among which the lipids. Some results of that work was part of this presentation. One of the nice things about Wikidata is that you can make live Venn diagrams, e.g. compounds in LIPID MAPS for which Wikidata also has a statement about which species it is found in (try it):
Jean-Claude Bradley pitched the idea of Open Notebook Science, or Open-notebook science as the proper spelling seems to be. I have used notebooks a lot, but ever since I went digital, the use went down. During my PhD studies I still extensively used them. But in the process, I changed my approach. Influenced by open source practices.
After all, open source has had a long history of version control, where commit messages explain the reason why some change was made. And people that ever looked at my commits, know that my commits tend to be small. And know that my messages describe the purpose of some commit.
That is my open notebook. It is essential to record why a certain change was made and what exactly that change was. Trivial with version control. Mind you, version control is not limited to source code. Using the right approaches, data and writing can easily be tracked with version control too. Just check, for example, my GitHub profile. You will find journal articles been written, data collected, just as if they were equal research outputs (they are).
Another great example of version control for writing and data is provided by Wikipedia and Wikidata. Now, some changes I found hard to track there: when I asked the SourceMD tool (great work by Magnus Manske) to create items for books, I want to see the changes made. The tool did link to the revisions made at some point, but this service integration seems to break down now and then. Then I realized that I could use the EditGroups tool directly (HT to who wrote that), and found this specific page for my edits, which includes not just those via SourceMD but also all edits I made via QuickStatements (also by Magnus):
If only I could give a "commit message" which each QuickStatements job I run. Can I?
In fact, I have blogged about the scripts I wrote on my occasions and in 2015 I wrote up a few blog posts on how to install new extensions:
So, last x-mas I set out with the wish to be able to have others much more easily run my scripts and, second, be able to run them from the command line. To achieve that, installing and particularly publishing Bioclipse extensions had to become much easier. Maybe as easy of Groovy just Grab-bing the dependencies from the script itself. So, Bioclipse available from Maven Central, or so.
Of course, this approach would likely loose a lot of wonderful functionality, like the graphical UX, the plugin system, the language injection, and likely more. So, one important requirements was that any script using the command line must be identical to the script in Bioclipse itself. Well, with a few permissible exceptions: we are allowed to inject the Bioclipse managers manually.
Well, of course, I would not have been blogging this had I not succeeded to reach these goals in some way. Indeed, following up from a wonderful metaRbolomics meeting organized by de.NBI (~ ELIXIR Germany), and the powerful plans discussed with Emma Schymanski (and some ongoing work of persistent toxicants), and, fairly, actually not drowning in failed deadlines, just regularly way behind deadlines, and since I have a research line to run, I dived into hackmode. In some 14 hours, mostly in the evening hours of the past two days, I got a proof of principle up and running. The name is a reference to all the wonderful linguistic fun we had when I worked in Uppsala, thanks to Carl Mäsak, e.g. discussing the term Bioclipse Scripting Language and Perl 6.
It is not available yet from Maven Central, so there is a manual mvn clean install involved at this moment, but after that (the command installs it in your local Maven repository which will be recognized by Groovy), you can get started with something like (I marked in blue to extra sugar needed on the command line; the black code runs as is in Bioclipse 2.6.2):
workspaceRoot = "." def cdk = new net.bioclipse.managers.CDKManager(workspaceRoot);
list = cdk.createMoleculeList() println list println cdk.fromSMILES("COC")
What now? In the future, once it is available on Maven Central, you will be able to skip the local install command, and @Grab will just fetch things from that online repository. I will be tagging version 0.0.1 today, as I got my important script running that takes one or more SMILES strings, checks Wikidata, and makes QuickStatements to add missing chemicals. The first time you've (maybe) seen that, was three years ago, in this blog post.
You may wonder: why?? I asked myself the same thing, but there are a few things over the past 24 hours that I could answer and which may sketch where this is going:
that BSL book can actually show running the code and show the output in the book, just like with my CDK book;
Bioclipse offers interoperability layers, allowing me to pass a chemical structure from one Java library to another (e.g. from the CDK to Jmol to JOELib);
it allows me to update library versions without having to rebuild a full new Bioclipse stack (I'm already technically unable, let alone timewise unable);
I can start sharing Bioclipse scripts with articles that people can actually run; and,
all scripts are compatible, and all extensions I make can be easily copied into the main Bioclipse repository, if there ever will be a next major Bioclipse version (seems unlike now).
Now, it's just being patient and migrating manager by manager. It may be possible to use the the existing manager code, but that comes with so much language injection, that I decided to just take advantage of Open Science and just copy/paste the code. Most of the code is the same, minus progress monitors, and replacing Eclipse IFile code with regular Java code. But there are tons of managers, and reaching even 50% coverage will take, at the speed I can offer, months. Therefore, I'll focus on scripts I share with others, focus on reuse and reproducibility. More soon!
Open Science has been around for some time. Before Copyright became a thing, knowledge dissemination was mostly limited by how easy you could get knowledge from one place to another. The introduction of Copyright changed this. No longer the question was how to get people to know the new knowledge to how to get people to pay for new knowledge. One misconception, for example, is that publishing is a free market. Yes, you can argue that you can publish anywhere you like (theoretically, at least, but reality says otherwise), but the monopoly is in getting access: for every new fact (and republishing the same fact is a faux pas), there is exactly one provider of that fact.
Slowly this is changing, but only slowly. What this really needs, is open licenses, just like open source licenses. Licenses that allow fixing typos, allow resharing with your students, etc.
But contrary to what has been prevalent in the Plan S discussion, these ideas are not new. And people have been trying Open Science for more than two decades already.
I have been trying to dig up the oldest references (ongoing effort) of the term Open Science (in the current meaning), and had a CiteULike group for that. But CiteULike is shutting down, so I will blog the references I found, and add some context.