DARPA_Big_DataData-sharing is often much easier said than done. In the past, researchers created large and valuable databases which would often languish on the university’s server fading into oblivion after the particular post-doc or graduate student who created it had moved on. It has actually been shown that for the field of ecology, the likelihood of accessing data ever again decreases by 17% every year.

While that study is specific to a particular field, I can imagine some level of data loss in every field. Even if data was described in a publication, there is no easy way for an outside researcher to access it, or even know if that particular data would be useful in their new study. The times they are a-changing. There are now multiple venues to openly share data – one example is Figshare, where you can publicly share as much data as you like and also have a private data storage option. This service truly represents open access and the DIY mentality with “collaboration spaces” where multiple groups can work together on projects.

Now the journal Nature is launching a new initiative to catalog and describe data resources and make them widely available.

Scientific Data is the name of their new open-access online only publication of scientifically valuable datasets. This new system will allow researchers to publish their datasets outside of a traditional publishing system. They will be able to get a citation for their data even if it didn’t lead to a paper otherwise. These publications will include something called a Data Descriptor which is a detailed account of the collection and analysis of the data so that it can be combined with other similar data, or replicated by following the detailed description. These publications are also associated with the Nature brand name… which I think will definitely influence the number and variety of data contributors.

The data will be peer-reviewed with at least one expert in the experimental science and one expert in data standards. This may help to limit poor quality data from the collection.  It will also raise the cachet of the Nature data sharing enterprise. Whether that is for the better of the scientific enterprise remains to be seen. All of the collected data-sets will be set up to be fully searchable to help researchers identify other relevant or complementary data that they could be using in their experiments.

By making this resource open-access, Nature is encouraging replication and access by all including those outside of bench research. I think this is one area that Nature will excel in, attracting non-scientists to the data. They are also committing themselves to a high standard of speedy publication and high quality data management. This system seems like another way to increase the transparency in data analysis and provide more opportunities for replication with fewer grant dollars. In our current system, there is no way to know if someone has a data set hanging around that would be perfectly suited to help you in your research.  A system like Scientific Data can also help researchers meet the data sharing expectations of their grant funders for example, most, if not all NIH grants require sharing of data.

More data for everyone!!


