Thanks in Advance For Sharing Your Data

barbara bates turkey

Barbara Bates says to be sure to dress your turkey properly this season! Then invite him to eat some tofurky with you. From Flickr by carbonated

It’s American Thanksgiving this week, which means that hall traffic at your local university is likely to dwindle down to zero by Wednesday afternoon.  Because it’s a short week, this is a short post.  I wanted to briefly touch on data sharing policies in journals.

Will you be required to share your data next time you publish? If you are looking for a short answer, it’s probably not. Depending on the field you are in, the requirements for data sharing are not very… forceful. They often involve phrases like “strongly encourage” or “provided on demand”, rather than requiring researchers to archive their data, obtain an identifier, and submit that information alongside the journal article.  The journal Nature just beefed up their wording a bit; still no requirements for archiving though. Read the Nature policy on availability of data and materials.

Despite the slow progress towards data sharing mandates, there is a growing list of journals that sign up for the Joint Data Archiving Policy (JDAP), the brainchild of folks over at the Dryad Repository. The JDAP  verbiage, which journals can use in their instructions for authors, states that supporting data must be publicly available:

<< Journal >> requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as << list of approved archives here >>. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species.

The bold face emphasis was mine, which I did because it’s important: the journal requires, as a condition for publication, that you share your data.  Now we’re cooking with gas!

The JDAP was adopted in a joint and coordinated fashion by many leading journals in the field of evolution in 2011, and JDAP has since been adopted by other journals across various disciplines. A list of journals that require data sharing via the JDAP verbiage are below.

Two other interesting bits about data sharing, in this case in PLOS:

List of Journals that require data sharing:

  • The American Naturalist
  • Biological Journal of the Linnean Society
  • BMC Ecology
  • BMC Evolutionary Biology
  • BMJ
  • BMJ Open
  • Ecological Applications
  • Ecological Monographs
  • Ecology
  • Ecosphere
  • Evolution
  • Evolutionary Applications
  • Frontiers in Ecology and the Environment
  • Functional Ecology
  • Genetics
  • Heredity
  • Journal of Applied Ecology
  • Journal of Ecology
  • Journal of Evolutionary Biology
  • Journal of Fish and Wildlife Management
  • Journal of Heredity
  • Journal of Paleontology
  • Molecular Biology and Evolution
  • Molecular Ecology and Molecular Ecology Resources
  • Nature
  • Nucleic Acids Research
  • Paleobiology
  • PLOS
  • Science
  • Systematic Biology
  • ZooKeys

Good DMP Examples + Going Beyond Two Pages

Did you know that data management plans existed before the NSF started requiring them?? I know, it’s shocking. But they have inherent value despite their being relatively unknown to researchers until now. Proper, thorough data management plans (DMPs) are potentially a major time saver and a huge asset for the project.  Funders tend to have minimal requirements for DMPs (e.g., a mere two pages allowed for an NSF proposal), and as a result researchers tend to underestimate the importance of the document.  I’ve spoken to many researchers who wait until the last minute to start creating their DMP, and as a result their plans reflect their lack of knowledge about data stewardship and are not properly prepared for when data starts being generated by their project.

Here are a few ways to ensure you create a high-quality, thorough DMP:

You take advantage of experts. Librarians should be partners with the researcher in creating their data management plans. Librarians are information professionals, and their business is essentially figuring out how to manage and preserve information (i.e., data). Consult them regularly when creating a plan: even if they don’t fully understand your data, they know how to find good standards, appropriate repositories, and who to talk to on campus.

You take advantage of institutional resources, such as departmental servers, backup services, and IT professionals. Often researchers are unaware of the hardware and software available from their institutions; often the institutional services and resources are available at no or low cost.

You think carefully about your data, including considering file formats, common vocabularies, codes and metadata needed, and standards that will be used for metadata. This should be done as thoroughly as possible before any data are collected to prevent the need to go back and edit your datasets (i.e., the dreaded “find/replace” tasks).

You think carefully about your workflow and sketch out the plan for data processing and analysis.  Workflows can be very informal, consisting of a simple flow chart (read my blog post about this). By considering the iterations of the data before you start collecting, you are more likely to arrange your files, datasets, and collection procedures in a logical way.

You know exactly where your data will be stored, both during the project and after the project is completed.

michael scott, the office

Be a good manager of your data. Need a good example manager? Michael Scott of The Office. From NBC, M. Haaseth

Perhaps most importantly, consider this:  a data management plan should be created early on and should be revisited throughout the project.  Add a reminder to your calendar – every six months, re-read your plan. Make sure new members of your lab group have read the plan and understand it. Make changes based on new developments in the project, and ensure that the work of archiving the data is not pushed entirely to the end of the project.

What about examples? There are lots of examples out there for two-page NSF DMPs:

But be sure to check out more extensive examples and resources too:

Note that future development of the DMPTool will include a “DMP Library”, full of example DMPs where researchers can access others’ plans and share their DMPs. Now go forth and plan!

Looking for a Postdoc: Data Publication

Do you love all things data? Do you think data should be considered as important as traditional scholarly publications? We do here at CDL, and we are searching for a postdoc to work on the concept of data publication and what it may entail.  This person should have a background in data-rich research (e.g., natural or social sciences) OR a background Information Science.

The position will be based at the University of California’s California Digital Library, based at the Office of the President in Downtown Oakland. You will be working to make data publication available and viable for all UC researchers. CDL is interested in expanding its services to include data publication, and expects the fellow to research the most effective ways to make this possible. Read more about the position and how to apply at www.clir.org/fellowships/postdoc/applicants/cdl2013

This is made possible by the CLIR-DLF Postdoctoral Fellowship program.

oakland

Come work in Oakland – City of Dreams. From Flickr by anarchosyn

 

Researchers! Make Your Previous Work OA

For the last two weeks, I’ve been posting on Open Stuff, including Open Access and Open Data, Open Science, Open Notebooks, etc etc. I’m continuing the thread this week with a discussion of how researchers can make most, if not all, of their publications open.

Need a PDF that’s not OA? Use #Icanhazpdf. Still no luck? Console yourself with Lolcats. From icanhas.cheezburger.com

Why am I devoting a whole post to this? First, because it’s really important. Individuals without institutional affiliations (e.g., between jobs), or who are affiliated with institutions that have no/a poorly funded library (e.g., in 2nd or 3rd world countries), depend on open access articles for keeping up with the scholarly literature. The need for OA isn’t limited to jobless or international folks, though. For proof, one only has to notice that the Twitter community has developed a hash tag around this, #Icanhazpdf (Hat tip to the Lolcats phenomenon). Basically, you tweet the name of the article you can’t access and add the hashtag in hopes that someone out in the Twittersphere can help you out and send it to you.

Academic libraries must pay exhorbidant fees to provide their patrons (researchers) with access to scholarly publications.  The very patrons that need these publications are the ones that provide the content in the form of research articles.  Essentially, the researchers are paying for their own work, by proxy via their institution’s library.

In response to this, many institutions are enacting Open Access policies. The goal here is to encourage (or mandate) that their faculty provide post-print copies of all publications to an open access institutional repository. MIT  and Harvard were among the first to enact such policies. Closer to home, UC San Francisco Academic Senate signed off on an open access policy in May of this year. The policy will go up for a UC-wide vote in December, which would mean all University of California researchers would be required to place their publications in an open access institutional repository.

If you remember from two weeks back, one path to OA is “Green”, i.e. when you put a publication in an OA repository.  The publication may or may not have been originally published in an OA journal. How does this work, you ask? Let me demonstrate, using a researcher at the University of California as an example.

Let’s call our hypothetical researcher Jane. She has published three journal articles while working as a postdoc at UC Santa Cruz. The journals in which she published were were Conservation Genetics, Nature, and Ecology. None of these journals is, by default, an open access journal.  Jane wants to be sure her colleagues in Microneseia can access her articles, despite not being affiliated with a major library at an academic institution. What to do? (note: this workflow is based on eScholarship’s instructions for authors)

First, Jane should check her rights to the work.  An easy way to do this is to check SHERPA/RoMEO, a free resource for helping researchers navigate the copyright policies of journals. They provide you with a brief overview of the journal’s policy and what authors are allowed to do with their work. Here’s what she found:

Conservation Genetics:

 

Nature:

Ecology:

 

So Jane can make her Conservation Genetics and Nature articles OA by archiving a post-print version, as long as it’s not the publisher’s version/PDF.  For Ecology, she can post the publisher’s version so long as she acknowledges their copyright.

Now Jane can find a repository to place her journal articles. The repository should be open and make the articles freely available to anyone, anywhere. Jane checks out this list of repostories available from OpenDOAR and finds that the UC system has an open repository available to all UC researchers called eScholarship (housed at CDL!). She follows the easy steps on the eScholarship website and submits her three articles. She receives URLs, which she then emails to her colleagues in Micronesia.  And voila! Jane has participated in Open Access! Her scholarly works are now publicly available, and she has managed to ensure that anyone, anywhere can access her work.

Researchers: follow these steps to make your work available.

  1. Find out that status of your works’ copyright (use SHERPA/RoMEO)
  2. Identify an appropriate OA repository available to you (use OpenDOAR)
  3. Deposit your works and start sharing

To prevent future confusion about copyright, check out the SPARC author addendum generator: it helps you generate an addendum that you can attach to your signed author agreements, thereby ensuring some of your rights.