Looking for something? DataONE can help

ginsu knife

I’m not overselling it – DataONE is the Ginsu knife of data tools. From Flickr by inspector_81

At long last, DataONE has gone live.  For veterans of the DCXL/DataUp blog, you are probably well aware of the DataONE organization and project, but for newcomers I will provide a brief overview.  Fine print: this is NOT the official DataONE stance on DataONE.  This is merely my interpretation of it.

To explain DataONE, let’s have go through a little thought exercise. Let’s pretend I’m a researcher, starting a project on copepods in estuaries of the Pacific Northwest. I’m wondering who else has worked on them, what they have found, and whether I can use their data to help me parameterize my model.  Any researcher will tell you the best way to do this is to start searching for relevant journal articles.  I can then weave in and out of reference lists to hone in on the authors, topics, and species that might be of most use, continually refining my searches until I are satisfied.

Imagine I need the data from some of those articles I found.  I look for datasets on the authors’ websites, in the papers themselves, and online.  Some of the work was funded by NOAA, so I check there for data. I Google like crazy.  Alas, the data are nowhere to be found.

In real life, this is where I ended my search and started contacting authors directly.  Although I should have also checked data repositories, I didn’t. This was mostly because I wasn’t aware of them when I did this work back in 2008.  Sadly, many researchers are in a similar state of ignorance that I was.

The good news is that there are A LOT of data repositories out there (check out Databib.org for an intimidating list).  The bad news is it’s very difficult to know about and search all of the potential repositories with data you might want to use.


DataONE is all about linking together existing data repositories, allowing researchers to access, search, and discover all of the data through a single portal.  It’s basically cyber-glue for the different data centers out there. The idea is that you go to the DataONE search engine (ONEMercury) and hunt for data. It tells you where the data are housed, gives you lots of metadata, and gives you access to data when the authors have allowed this.

But wait, there’s MORE!

DataONE is also all about providing tools for researchers to find, use, organize, and manage their data throughout the research life cycle.  This is where DataUp connects with DataONE: DataUp will be part of the Investigator Toolkit, which also includes nifty things like the DMPTool, ONE-R (an R package for DataONE), and ONE-Drive (a Dropbox-esque way to look at data in DataONE, in production).

The exciting news this week is that DataONE’s search and discovery tool has gone live (check out the NSF press release or the DataONE press release).  You can now start looking for data that might be housed in any participating repository.  There are only a few data repositories (called member nodes in DataONE speak) currently on board, but the number is expected to increase exponentially over the coming years.

More questions about DataONE? I can help, or at least direct you to the person that can. Alternatively start poking around the DataONE website and ONEMercury, and give feedback so we can make it better.


ONEShare and #OR2012

From Flickr by ~Coqui

One of my UC3 colleagues is at the Open Repositories 2012 Meeting (#OR2012) in Edinburgh, Scotland this week.  This prompted me to ask two questions: (1) What does open repositories mean? and (2) Why didn’t I get to go to Scotland?  Of course, (2) is easily answered by my lack of knowledge about open repositories, i.e. question (1).  After a little internet sleuthing, I’ve figured out what they mean by “Open Repositories”, and I realized that I have first-hand knowledge of a repository that contributes to the ideas of OR, ONEShare.  In this post I will share my newfound OR knowledge and give you the lowdown on ONEShare.

First, Open Repositories.  Just in case you are new to the dataverse (that’s dweeb speak for data universe), a repository is basically a place to put your data.  There are loads of data repositories, and picking one to suit your needs is an important step in data management planning.  So what is this about open repositories?

Here is a bit of text from the OR2011 website:

Open Repositories is an annual conference that brings together an international community of stakeholders engaged in the development, management, and application of digital repositories. …attendees  exchange knowledge, best practices and ideas on strategic, technical, theoretical and practical issues.

Basically, the idea of the Open Repositories group is to share knowledge among those facing similar challenges.  It’s similar to the concepts of Open Science, Open Data, and Open Access: we can accomplish more if we pool our intellectual resources.  Follow the OR2012 meeting via the #OR2012 hashtag.

Now for ONEShare.  This is the data repository we’ve created specially for DataUp users.

The name: ONEShare is called this because it’s closely intertwined with DataONE, the group enabling federation of Earth, environmental and ecological repositories.  Many of the DataONE tools have “ONE” in the title (i.e., ONE-R, ONEMercury, and ONEDrive).

The concept: One of the major features for DataUp is connecting Excel users to a data repository – essentially streamlining the process for depositing and sharing your data.  Although there are many data repositories, none of them allow just anyone to deposit data [Correction! Several allow this. See the comment below].  ONEShare is meant to be a “catch-all” repository for data owners that have no relationship with an existing repository.  Think of it as a sort of Slideshare for data – there is a low bar for participation, and anyone can join.

In a sense, ONEShare is the epitome of the “Open Repositories” concept: a repository that’s truly open to anyone.  Maybe I can represent ONEShare at OR2013 on Prince Edward Island (Oh Canada, how I miss you!).

Strange Uses for Excel

Happy Independence Day (Americans) and a belated Happy Canada Day (Canadians)!  We are smack dab in the middle of the lazy days of summer, which means lots of folks are on vacation this week.  To honor these lazy days, I’m providing a complement to my post a few months back on “Fun Uses for Excel“.  In this edition: strange uses for Excel.  I must admit this post will be based primarily on a similar post at chandoo.org that I found fascinating.

  • Floor layouts. Will that couch you love fit underneath the window? You can use Excel as graph paper, mapping out rooms and furniture layouts. This also works for garden layouts and quilt designing (HT @Whitney!)
  • Amuse co-workers with the Speak on Enter feature.  On the chandoo.org post, Jeff Weir said

On those tortuously long work days where the clock seems to be running backward, I often turn on SPEAK ON ENTER and get Excel to speak the words “Take this job and shove it” to my co-workers (It’s actually from a country song, but has famously been covered by the punk group The Dead Kennedys). This really cracks them up. Speak on Enter is one of Excel’s most underrated functions, if you ask me. Why they didn’t put it right there on the ribbon in 2007 is a travesty. In fact, I’m not going to upgrade my version of Excel until they do.

I checked it out, and my Windows version of Excel 2007 has “Speak on Enter”, which you can add to your toolbar by going to “Excel Options” –> “Customize”.  Of course, you can also use the Excel “Speak” features to check data entry. Read more here.

  • Naming kids (?!). It’s actually quite strange how many people mentioned this on chandoo.org as a use for Excel.  Here’s one example from Brian S.:

For each kid, my wife and I separately brainstormed a list of viable first and middle names. I entered them into a workbook to identify any matches. (Thankfully there always have been matches.) Then I had formulas to display all possible combinations of those matches, as well as up to 2 additional “favorites” from each of us. Those results were manually whittled down based on their sound (which combinations appear fine), and whether the associated first/last or first/middle/last initials create an unexpected result. (I, with initials B.S., threw that requirement in.) This always led us to a 1st and 2nd choice. But if necessary, I was ready to move to a Web data extract to determine an additional “name uniqueness” value.

Choose baby names carefully… Nobody wants Jaques Strap for a kid. Mo knew this. (Don’t get it? read more at http://www.snpp.com/guides/moe_calls.html) From ign.com: Click the pic to visit