The New OSTP Policy & What it Means

Last week, the White House Office of Science and Technology Policy (OSTP) responded to calls for broader access to federally funded research. I was curious as to whether this policy had any teeth, so I actually read the official memorandum. Here I summarize and have a few thoughts.

The overall theme of the document is best represented by this phrase:

…wider availability of peer-reviewed publications and scientific data in digital formats will create innovative economic markets for services related to curation, preservation, analysis, and visualization.

OSTP must have fielded early concerns  from journal publishers, because several times in the memo there were sentiments like this:

The Administration also recognizes that publishers provide valuable services, including the coordination of peer review, that are essential for ensuring the high quality and integrity of many scholarly publications. It is critical that these services continue to be made available.

And now we get to the big change:

Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products

Each of the agency plans is required to outline strategies to:

  • leverage existing archives and partnerships with journals
  • improve public’s ability to locate and access data
  • provide optimized search, archival, and dissemination features that encourage accessibility and interoperability
  • notify researchers of their new obligations for increasing access to research products (e.g., guidance, conditions for funding)
  • measure and enforce researcher compliance

Draft plans for each agency are due within 6 months of the memo. This is all great news for open science advocates: agencies must require researchers to comply with open data mandates and help them do it.

Hopefully the teeth in this new OSTP memo won't be slowed down by its tiny arms. From Flickr by Hammerhead27

Hopefully the teeth in this new OSTP memo won’t be slowed down by its tiny arms. From Flickr by Hammerhead27

The memo then outlines what agency plans should include, breaking the guidelines into those for scientific articles, and those for data.

Scientific Articles:

New agency plans must include provisions for open access to scientific articles reporting on research. The memo provides two main guidelines related to this:

  • public access to research articles (including the ability to read, download, and analyze digitally) should happen within about 12 months post-publication
  • there should be free, full public access to the research article’s metadata, in standard format

Scientific Data:

First, the memo defines data:

…digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as laboratory specimens.

It then sets the following guidelines. The agency plans should:

  1. Maximize free public access while keeping in mind privacy/confidentiality, proprietary interests, and that not all data should be kept forever
  2. Ensure researchers create data management plans
  3. Allow costs for data preservation and access in proposal budgets
  4. Ensure evaluation of data management plan merits
  5. Ensure researchers comply with their data management plans
  6. Promote data deposition into public repositories
  7. Encourage public/private partnerships to ensure interoperability
  8. Develop approaches for identification and attribution of datasets
  9. Educate folks about data stewardship
  10. Assess long-term needs for repositories and infrastructure

This list got me excited: there might actually be some teeth in #4 and #5 above. We all know that the NSF’s data management plan requirements has been rather weak up to now, but this implies that there will now be more teeth to the requirement.

I’m also quite pleased to see #6: data should be deposited in public repositories. The icing on the cake is #8: datasets need identification and attribution. Overall, my feelings about this list can be summed up by one word – hooray!

Official versions of related documents:

Data Management Education: Part 2

Last week on Data Pub, I provided the impetus for my latest publication with co-author Stephanie Hampton in Ecosphere about data management education (available on the Ecosphere site). The manuscript is the result of my postdoctoral work with theDataONE organization. The question that spawned the research? Whatever happened to the lab notebook? This query resulted in a survey of whether undergraduates in ecology are being taught about data management. The short answer? No. Here are some more detailed results. 

We surveyed the instructors for Ecology courses at 48 institutions. First, we asked whether they cover various data management topics in their courses:


The gist? No, these topics aren’t covered. More on the why later…  Next, we asked whether these same topics were important for undergrads to understand. That is, should undergrads be learning about this? The results:


White data points are the average importance reported by instructors, on a a scale of one to five. So in general, YES… although apparently an understanding of databases and archiving, as well as re-use of data and meta-analysis, are less important skills. Next? Are these topics important to the instructors themselves when they are wearing their researcher hat?


These topics are therefore important to the researchers as well. In particular, reproducibility ranks quite highly for importance. Of course, you can’t reproduce results without first managing and sharing data, but I digress.

So why aren’t undergrads learning about this stuff? I asked the instructors to identify the barriers associated with teaching these topics to their undergrads.  The responses were free-form, however several answers rose to the top as repeat “offenders”:

The full set of free-form responses are available in the Appendix of the article. One theme that arose was that many instructors indicated that, given better access to resources and course materials, as well as a better understanding themselves of data stewardship issues, they might be more inclined to teach their undergraduates about good data stewardship. Did you hear that, librarians? This is our opportunity to help!

Here are relevant links to the manuscript:

Not teaching undergrads about how to handle data properly is like sending them into the bathroom stall with no TP. Prepare them! From Flickr by PDXdj

Not teaching undergrads about how to handle data properly is like sending them into the bathroom stall with no TP. Prepare them! From Flickr by PDXdj

Data Management Education?

Back in December, I published a paper in the open access journal Ecosphere about data management education (available on the Ecosphere site). The manuscript is the result of my postdoctoral work with the DataONE organization, advised by Stephanie Hampton at NCEAS. When I started working with Steph, she posed an interesting question: Whatever happened to the lab notebook? Yes, people still take notes and keep notebooks, but the concept has not carried over in full. That is, data and information are increasingly born digital: how do we capture that in a pen-and-paper lab notebook?

While in grad school I printed out a lot of tables and graphs, followed cutting and pasting them into a lab notebook. I eventually figured out I needed to keep track of file names associated with the printouts. Of course, there are also the methods I used while creating data tables and other outputs of my analyses: I basically neglected this part altogether. The result was a patchy notebook that in no way allowed for reproducibility of my work. Sadly, I don’t think I’m alone. Although the tide may be turning towards better data management and documentation (thanks NSF to requiring data management plans!), we have a very long way to go.

So Steph and I asked the question: Are data management and organization practices being taught to students? 

To answer this, we first had to decide what students we were asking about. We decided to focus on the students that are expected to understand the value of lab notebooks, diligent note-taking, and documentation of methods. Coverage of these topics might be a bit spotty at the high school level, but science classes in undergraduate institutions have always prioritized lab notebooks.

I set out to survey undergraduate institutions that are likely to teach future ecology graduate students. Why ecology? Partly because Steph an I are ecologists, who were based at the National Center for Ecological Analysis and Synthesis. Partly because DataONE focuses on Earth, environmental, ecological, atmospheric, and oceanographic data. But also, we needed to zero in on one group, so we chose ecologists.

I examined 38 large universities considered the best for graduate studies in ecology, plus 10 smaller liberal arts institutions whose outgoing ecology students receive the highest number of NSF Graduate Research Fellowships in ecology (for a full list of institutions, see Appendix A of the paper).

Besides the obvious social norms... what are undergrads learning at college?

Besides the obvious social norms… what are undergrads learning at college?

After selecting the institutions, I then surveyed the instructor for the institution’s ecology course. The survey (available in full as a PDF) asked about all things data management, including

  • Quality control and quality assurance
  • The proper way to name computer files
  • Types of files and software to use
  • Metadata generation Workflows
  • Protecting data
  • Databases and data archiving
  • Data re-use
  • Meta-analysis
  • Data sharing
  • Reproducibility
  • Notebook protocols (lab or field)

Next week I will go into a bit more detail about the results, but the gist is this: ecology undergraduates aren’t learning about data management. Although the professors find data management topics to be important for their own work, they are not inclined to find time in their curriculum to teach their students these topics. There are many reasons why this is the case; most notably time was mentioned, as well as the expectation that students would learn about these topics in other courses.

In case you can’t wait to find out what I found, here are links to the manuscript: