Impact Factors: A Broken System

From Flickr by The Official CTBTO Photostream

How big is your impact? Sedan Plowshare Crater, 1962. From Flickr by The Official CTBTO Photostream

If you are a researcher, you are very familiar with the concept of a journal’s Impact Factor (IF). Basically, it’s a way to grade journal quality. From Wikipedia:

The impact factor (IF) of an academic journal is a measure reflecting the average number of citations to recent articles published in the journal. It is frequently used as a proxy for the relative importance of a journal within its field, with journals with higher impact factors deemed to be more important than those with lower ones.

The IF was devised in the 1970s as a tool for research libraries to judge the relative merits of journals when allocating their subscription budgets. However it is now being used as a way to evaluate the merits of individual scientists– something for which it was never intended to be used.  As Björn Brembs puts it, “…scientific careers are made and broken by the editors at high-ranking journals.”

In his great post, “Sick of Impact Factors“, Stephen Curry says that the real problem started when impact factors began to be applied to papers and people.

I can’t trace the precise origin of the growth but it has become a cancer that can no longer be ignored. The malady seems to particularly afflict researchers in science, technology and medicine who, astonishingly for a group that prizes its intelligence, have acquired a dependency on a valuation system that is grounded in falsity. We spend our lives fretting about how high an impact factor we can attach to our published research because it has become such an important determinant in the award of the grants and promotions needed to advance a career. We submit to time-wasting and demoralising rounds of manuscript rejection, retarding the progress of science in the chase for a false measure of prestige.

Curry isn’t alone. Just last week Bruce Alberts, Editor-in-Chief of Science, wrote  a compelling editorial about Impact Factor distortions. Alberts’ editorial was inspired by the recently released San Francisco Declaration on Research Assessment (DORA). I think this is one of the more important declarations/manifestoes peppering the internet right now, and has the potential to really change the way scholarly publishing is approached by researchers.

DORA was created by a group of editors and publishers who met up at the Annual Meeting of the American Society for Cell Biology (ASCB) in 2012. Basically, it lays out all the problems with impact factors and provides a set of general recommendations for different stakeholders (funders, institutions, publishers, researchers, etc.). The goal of DORA is to improve “the way in which the quality of research output is evaluated”.  Read more on the DORA website and sign the declaration (I did!).

An alternative to IF?

If most of us can agree that impact factors are not a great way to assess researchers or their work, then what’s the alternative? Curry thinks the solution lies in Web 2.0 (quoted from this post):

…we need to find ways to attach to each piece of work the value that the scientific community places on it though use and citation. The rate of accrual of citations remains rather sluggish, even in today’s wired world, so attempts are being made to capture the internet buzz that greets each new publication…

That’s right, skeptical scientists: he’s talking about buzz on the internet as a way to assess impact. Read more about “alternative metrics” in my blog post on the subject: The Future of Metrics in Science.  Also check out the list of altmetrics-related tools at altmetrics.org. The great thing about altmetrics is that they don’t rely solely on citation counts, plus they are capable of taking other research products into account (like blog posts and datasets).

Other good reads on this subject:

Webinar Series on Data Management & DMPTool

Operators will be standing by to connect you to our awesome webinars. From Flickr by MarkGregory007

Operators will be standing by to connect you to our awesome webinars. From Flickr by MarkGregory007

One of the services we run at the California Digital Library is the DMPTool – this is an online tool that helps researchers create data management plans by guiding them through a series of prompts based on funder requirements. The tool provides resources and help in the form of links, help text, and suggested answers. It was developed by the CDL and many partners a couple of years ago, and it’s been wildly successful.

As a result of this success, we received two generous one-year grants: one from the Alfred P. Sloan Foundation to develop out and improve upon the existing DMPTool (read more in this post); and one from theInstitute of Museum and Library Services, focused on creating resources for librarians interested in promoting the DMPTool at their institutions.

Based on input from a group of librarians back in February, we determined that a webinar series would be useful for introducing the tool, communicating how to use it effectively, and describing how it can be customized for institutional needs. We plan to present webinars on Tuesdays, with current plans for ~15 webinars. The series will go into Fall 2013.

A few things to note:

  • All webinars will be recorded and made available for viewing afterward.
  • The webinar schedule might change a bit depending on presenters’ availability.
  • We are always interested in new webinar ideas; please send them to carly.strasser@ucop.edu or leave them as a comment below.
  • We plan to collect these webinars and make them available as a set. We then hope to create a short course in Data Management with the DMPTool that will offer certification for librarians as “DMPTool Experts” (we are still working on the title!).

Webinar Schedule

Note: for the most up-to-date schedule & links, visit theDMPTool Webinar Series Page or View Google Calendar

Date Topic
28 May Introduction to the DMPTool (details & registration)
4 Jun Learning about data management: Resources, tools, materials (details & registration)
18 Jun Customizing the DMPTool for your institution (details & registration)
25 Jun Environmental Scan: Who’s important at your campus (details & registration)
9 Jul Promoting institutional services with the DMPTool; EZID as example (details & registration)
16 Jul Health Sciences & DMPTool – Lisa Federer, UCLA (details & registration)
23 Jul Digital humanities and the DMPTool – Miriam Posner, UCLA (details & registration)
13 Aug Data curation profiles and the DMPTool – Jake Carlson, Purdue (details soon)
TBD How to give the data management sales pitch to various audiences
TBD Other tools and resources that work with/complement the DMPTool
TBD Beyond funder requirements: more extensive DMPs
TBD Case studies 1 – How librarians have successfully used the tool
TBD Case studies 2 – How librarians have successfully used the tool
TBD Outreach Kit Introduction
TBD Certification program introduction

Large Facilities & the Data they Produce

Last week I spent three days in the desert, south of Albuquerque, at the NSF Large Facilities Workshop. What are these “large facilities”, you ask? I did too… this was a new world for me, but the workshop ended up being a great learning experience.

The NSF has a Large Facilities Office within the Office of Budget, Finance and Award Management, which supports “Major Research Equipment and Facilities Construction” (MREFC for short). Examples of these Large Facilities include NEON (National Ecological Observatory Network), IRIS PASSCAL Instrument Center (Incorporated Research Institutions for Seismology Program for Array Seismic Studies of the Continental Lithosphere), and the NRAO (National Radio Astronomy Observatory). Needless to say, I spent half of the workshop googling acronyms.

I was there to talk about data management, which made me a bit of an anomaly. Other attendees administered  managed, and worked at large facilities. In the course of my conversations with attendees, I was surprised to learn that these facilities aren’t too concerned with data sharing, and most of these administrator types implied that the data were owned by the researcher; it was therefore the researcher’s prerogative to share or not to share. From what I understand, the scenario is this: the NSF page huge piles of money to get these facilities up and running, with hardware, software, technicians, managers, and on and on. The researchers then write a grant to the NSF or the facilities themsleves to do work using these facilities. The researcher is then under no obligation to share the data with their colleagues. Does this seem fishy to anyone else?

I understand the point of view of the administrators that attended this conference: they have enough on their plate to worry about, without dealing with the miriad problems that accompany data management, archiving, sharing et cetera. These problems are only compounded by researchers’ general resistance to share. For example, an administrator told me that, upon completion of their study, one researcher had gone into their system and deleted all of the data related to their project to make sure no one else could get it. I nearly fell over from shock.

Whatever cultural hangups the researchers have, aren’t these big datasets, being collected by expensive equipment, among the most important to be shared? Observations of the sky at a single point and time are not reproducible. You only get one shot at collecting data on an earthquake or the current spread rate for a rift zone. Not sharing these datasets is tantamount to scientific malpractice.

The Very Large Array, near Soccoro NM. This was the best workshop field trip EVER. CC-BY, Carly Strasser

The Very Large Array, near Soccoro NM. This was the best workshop field trip EVER. CC-BY, Carly Strasser

One administrator respectfully disagreed with my charge that they should be doing more to promote data sharing. He said that their workflow for data processing was so complex and nuanced that no one could ever reproduce the dataset, and certainly no one could ever understand what exactly was done to obtain results. This marks the second time I nearly fell over during a conversation. If science isn’t reproducible because it’s too complex, you aren’t doing it right. Yes, I realize that exactly reproducing results is nearly impossible under the best of circumstances. But to not even try? With datasets this important? When all analyses are done via computers? It seems ludicrous.

So, after three days of dry skin and mexican food, my takeaway from the workshop was this: All large facilities sponsored by NSF need to have thorough, clear policies about data produced using their equipment. These policies should include provisions for sharing, access, use, and archiving. They will most certainly be met with skepticism and resistance, but in these tight fiscal times, data sharing is of utmost importance when equipment this expensive is being used.