This week Science for the People is looking at how powerful computers and massive data sets are changing the we study each other, scientifically and socially. We’re joined by machine learning researcher Hannah Wallach, to talk about the definition of “big data,” and social science research techniques that use data about individual people to model patterns in human behavior. Then, we speak to Christian Rudder, co-founder of OkCupid and author of the OkTrends blog, about his book Dataclysm: Who We Are (When We Think No One’s Looking).
*Josh provides research & social media help to Science for the People and is, therefore, completely biased.
Data-sharing is often much easier said than done. In the past, researchers created large and valuable databases which would often languish on the university’s server fading into oblivion after the particular post-doc or graduate student who created it had moved on. It has actually been shown that for the field of ecology, the likelihood of accessing data ever again decreases by 17% every year.
While that study is specific to a particular field, I can imagine some level of data loss in every field. Even if data was described in a publication, there is no easy way for an outside researcher to access it, or even know if that particular data would be useful in their new study. The times they are a-changing. Continue reading
Tycho Brahe, Image from Wikipedia
I just heard about a new “big data” project called Project Tycho. They chose the name Tycho in honor of Tycho Brahe who made tons of detailed observations of the stars and planets. After his death, his data was used by Kepler to formulate the laws of planetary motion. This project wants to connect the vast amounts of public health data to scientists and policy researchers to improve their understanding of contagious diseases and their spread. Their undertaking is incredible; they digitized weekly Nationally Notifiable Disease Surveillance System reports from 1888-2013. Now that all of the data is digitized they are working their way through standardizing it and making it amenable to analysis. This entire dataset is available for search online. Continue reading
To me, the take home message from David Brooks’ article “What You’ll Do Next” and Tyler Cowen’s follow-on comment is that “Big Data” is a potentially useful tool, but alone it is not a coherent or inspiring approach to life.
The ENCODE media fail was epic enough that it totally dominated the discussion when the results were released to the public. Now our collective fury has abated1, I’d like to talk about, not what ENCODE did, but what it might mean for how we conduct genomic research in the future.
ENCODE produced an unprecedented amount of data with unprecedented levels of reproducibility between labs. This data will be useful to researchers around the world for year to come. To do so, however, it commanded tremendous resources and marginalized the concerns of independent researchers. Can we harness the data collection power of these collective projects without destroying the creativity and risk-taking of individual scientists in the crucible of collaborative compromise? Continue reading