Collecting Journal Data Policies: JoRD

My last two posts have related to IDCC 2013; that makes this post three in a row. Apparently IDCC is a gift that just keeps giving (albeit a rather short post in this case).

Today the topic is the JoRD project, funded by JISC. JoRD stands for Journal Research Data; the JoRD Policy Bank is basically a project to collect and summarize data policies for a range of academic journals.

From the JISC project website, this project aims to

provide researchers, managers of research data and other stakeholders with an easy source of reference to understand and comply with Research Data policies.

How to go about this? The project’s objectives (cribbed and edited from the project site):

  1. Identify and consult with stakeholders; develop stakeholder requirements
  2. Investigate the current state of data sharing policies within journals
  3. Deliver recommendations on a central service to summarize journal research data policies and provide a reference for guidance and information on journal policies.

I’m most interested in #2: what are journals saying about data sharing? To tackle this, project members are collecting information about data sharing policies on the the top 100 and bottom 100 Science Journals, and the top 100 and bottom 100 Social Science Journals. Based on the stated journal policies about data sharing, they fill out an extensive spreadsheet. I’m anxious to see the final outcome of this data collection – my hunch is that most journals “encourage” or “recommend” data sharing, but do not mandate it.

I think of the JoRD Policy Bank as having two major benefits:

Educating Researchers. As  you may be aware, many researchers are a bit slow to jump on the data sharing bandwagon.  This is the case despite the fact that all signs point to future requirements for sharing at the time of publication (see my post about it, Thanks in Advance for Sharing Your Data). Once researchers come to terms with the fact that soon data sharing will not be optional, they will need to know how to comply. Enter JoRD Policy Bank!

Encouraging Publishers. The focus on stakeholder needs and requirements suggests that the outcomes of this project will provide guidance to publishers about how to proceed in their requirements surrounding data sharing. There might be a bit of peer pressure, as well: Journals don’t want to seem behind the times when it comes to data sharing, lest their credibility be threatened.

In general, the JoRD website is chock full of information about data sharing policies, open data, and data citation. Check it out!

C'mon researchers! Jump on the data sharing band wagon! From purlem.com

C’mon researchers! Jump on the data sharing band wagon! From purlem.com

Thoughts on Data Publication

If you read last week’s post on the IDCC meeting in Amsterdam, you may know that today’s post was inspired by a post-conference workshop on Data Publication, sponsored by the PREPARDE group. The workshop was “Data publishing, peer review and repository accreditation: everyone a winner?” (to access the workshop agenda, goals, and slides, go to the conference workshop website and scroll down to Workshop 6).

Basically the workshop focused on all things data publication, and incited lively discussion among those in attendance. Check out the workshop’s Twitter backchannel via this Storify by Sarah Callaghan of STFC.  My previous blog post about data publication sums it up like this:

The concept of data publication is rather simple in theory: rather than relying on journal articles alone for scholarly communication, let’s publish data sets as “first class citizens”.  Data sets have inherent value that makes them standalone scholarly objects— they are more likely to be discovered by researchers in other domains and working on other questions if they are not associated with a specific journal and all of the baggage that entails.

Stealing shamelessly from Sarah’s presentation, I’m providing a brief overview of issues surrounding data publication for those not well-versed:

First, the benefits of data publication:

  • Allows credit to data producers and curators (via data citation and emerging altmetrics)
  • Encourages reuse of datasets and discourages duplication of effort
  • Encourages proper curation and management of data (you don’t want to share messy data, right?)
  • Ensures completeness of the scientific record, as well as transparency and reproducibility of research (fundamental tenets of the scientific method!)
  • Improves discoverability of datasets (they will never be discovered on that old hard drive in your desk drawer)

We had an internal meeting here at CDL yesterday about data publication. After running through this list of benefits for those in attendance, one of my colleagues asked the question: “Does listing these benefits work? Do researchers want to publish their data?” I didn’t hesitate to answer “No”.

Why not? The biggest reason is a lack of time. Preparing data for sharing and publication is laborious, and overstretched researchers aren’t motivated by these benefits given the current incentive structures in research (papers, papers, papers. And citation of those papers.). Of course, I think this is changing in the very near future. Check out my post on data sharing mandates in the works. So let’s go with the assumption that researchers want to publish. How do they go about this?

Methods for “publishing” data:

  • A personal or lab webpage. This is a common choice for researchers who wish to share data since they can maintain control of the datasets. However, there are issues with stability, persistence, discoverability of these data, siloed on individual websites. Plus, website maintenance often falls to the bottom of a researcher’s to-do list.
  • A disciplinary repository. This is a common solution for only a select few data types (e.g., genetic data). Most disciplines are still awaiting a culture change that will motivate researchers to share their data in this way.
  • An institutional repository. Of course, researchers have to know that this is an option (most don’t), and must then properly prepare their data for deposit.
  • Supplementary materials.  In this case, the data accompany a primary journal article as supporting information. I recently shared data this way, but recognized that the data should also be placed in a curated repository.  There are a few reasons for this apparent duplication:
    • Supplemental materials are sometimes not available many years after publication due to broken links.
    • Journals are not particularly excited about archiving lots of supplementary data, especially if it’s a large volume of data. This is not their area of expertise, after all.
  • Data article. This is a new-ish option: basically, you publish your data in a proper data journal (see this semi-complete list of data journals on the PREPARDE blog).

Wondering what a “data article” is? Let’s look to Sarah again:

A data article describes a dataset, giving details of its collection, processing, software, file formats, et cetera, without the requirement of  novel analyses or ground-breaking conclusions.

That is, it’s a standalone product of research that can be cited as such. There is much debate surrounding such data articles. Among the issues are:

  • Is it really “publication”? How is this different from a landing page for the dataset that’s stored in a repository?
  • Traditional academic use of “publication” implies peer review. How do you review datasets?
  • How should publication differ depending on the discipline?

There are no easy answers to these questions, but I love hearing the debate. I’m optimistic that the forthcoming person we hire as a data publication postdoc will have some great ideas to contribute. Stay tuned!

Amsterdam! CC-BY license, C. Strasser

Amsterdam! CC-BY license, C. Strasser

 

All Things Data in Amsterdam

The International Digital Curation Conference is wrapping up today, and I feel like I just finished a big, tasty Thanksgiving dinner: full and slightly uncomfortable, but in the brain rather than the gut.  IDCC is a meeting that draws about 300 individuals from all over the world. Participants include librarians, repository administrators, publishers, funders, information technology folks, and people working at all manner of data and archiving organizations. Get these people in the same room, and the result is interesting talks, an amazing twitter backchannel, and novel ideas for collaboration. This was my first IDCC conference, and I was not disappointed.

Pre-workshops started on Monday, and I participated in a data management tools update (Data Management Planning: what’s happened, what’s happening and what’s coming next?), organized primarily by Martin Donnelly of the Digital Curation Centre in the UK. It was interesting to hear about the future of the DMPTool and DMPOnline, as well as an overview of current data policies in the UK, Europe, Australia, and the US. Martin and I are arranging a similar workshop for the iConference, held next month in Fort Worth TX.

On Tuesday, I was inundated with really great talks and conversations. The keynote speaker was Ewan Birney from the European Bioinformatics Institute on Bioinformatics infrastructure in Europe was chock full of great examples about how data sharing can benefit research. There was also a talk by  Kaitlin Thaney from Digital Science, who discussed the many projects they are funding, including Figshare and Altmetric. These two talks highlighted the many approaches people are taking to tackle digital data: we need both infrastructure and tools, as well as incentives and changes in the culture of research data.

Fun fact: Eddie and Alex Van Halen are Dutch! Photo from bumslogic.wordpress.com

Fun fact: Eddie and Alex Van Halen are Dutch! Photo from bumslogic.wordpress.com

Tuesday afternoon was devoted to a poster session where I schmoozed with folks over the DataUp poster. The DataUp team (Trisha Cruse, John Kunze, and myself) won 2nd place for best poster; first place went to the Right Field project (Right FIeld: Spreadsheet Annotation by Stealth), which was especially interesting given how closely aligned this project is with DataUp. Wednesday was more talks, meetings, and discussions. I’m excited about the post-conference workshop today on data publication. I’m guessing I will be inspired by this workshop and my next blog post will be about all things data publication.

Hungry for some Dutch music trivia? Wikipedia has a great list of songs about Amsterdam… including one by Van Halen.

NSF now allows data in biosketch accomplishments

Hip hip hooray for data! Contributed to Calisphere by Sourisseau Academy for State and Local History (click for more information)

Hip hip hooray for data! Contributed to Calisphere by Sourisseau Academy for State and Local History (click for more information)

Back in October, the National Science Foundation announced changes to its Grant Proposal Guidelines (Full GPG for January 2013 here).  I blogged about this back when the announcement was made, but now that the changes are official, I figure it warrants another mention.

As of January 2013, you can now list products in your biographical sketches, not just publications. This is big (and very good) news for data advocates like myself.

The change is that the biosketch for senior personnel should contain a list of 5 products closely related to the project and 5 other significant products that may or may not be related to the project. But what counts as a product? “products are…including but not limited to publications, data sets, software, patents, and copyrights.”

To make it count, however, it needs to be both citable and accessible. How to do this?

  1.  Archive your data in a repository (find help picking a repo here)
  2. Obtain a unique, persistent identifier for your dataset (e.g., a DOI or ARK)
  3. Start citing your product!

For the librarians, data nerds, and information specialists in the group, the UC3 has put together a flyer you can use to promote listing data as a product. It’s available as a PDF (click on the image to the right to download). For the original PPT that you can customize for your institution and/or repository, send me an email.

NSF_products_flyer

Direct from the digital mouths of NSF:

Summary of changes: http://www.nsf.gov/pubs/policydocs/pappguide/nsf13001/gpg_sigchanges.jsp

Chapter II.C.2.f(i)(c), Biographical Sketch(es), has been revised to rename the “Publications” section to “Products” and amend terminology and instructions accordingly. This change makes clear that products may include, but are not limited to, publications, data sets, software, patents, and copyrights.

New wording: http://www.nsf.gov/pubs/policydocs/pappguide/nsf13001/gpg_2.jsp

(c) Products

A list of: (i) up to five products most closely related to the proposed project; and (ii) up to five other significant products, whether or not related to the proposed project. Acceptable products must be citable and accessible including but not limited to publications, data sets, software, patents, and copyrights. Unacceptable products are unpublished documents not yet submitted for publication, invited lectures, and additional lists of products. Only the list of 10 will be used in the review of the proposal.

Each product must include full citation information including (where applicable and practicable) names of all authors, date of publication or release, title, title of enclosing work such as journal or book, volume, issue, pages, website and Uniform Resource Locator (URL) or other Persistent Identifier.