Some context to Big Data – the topic for HealthStartup III VN:R_U [1.9.22_1171]
HealthStartup III, scheduled for June 2012 in the Netherlands, will focus on Big Data in healthcare. In this article we’d like to provide a little context, explaining what we mean by Big Data, the types of startups we’re looking for, and the questions we will likely be addressing in our discussions.
Big Data defined
Digital health companies are by definition data-driven businesses hence we need to make a clear distinction between digital health in a general sense and Big Data. With Big Data we’re referring to the collection, management and analysis of massive (hence ‘big’) data sets. Some commentators go further and define Big Data as any dataset that is too large for a typical database software tool to store, manage and analyze. Or in the words of Brad Peters in a recent Forbes article, Big Data is any “body of information that is so big it cannot be analyzed directly for profitable use in its raw form.”
For our purposes, we’ll stick to the somewhat broader definition of massive data sets or combined data sets that are being put to use in new ways. To use an example, we wouldn’t describe a medical record system of a large hospital, or even a national breast cancer database, as Big Data since these datasets are highly structured and built to serve a specific purpose. It is when we start combining these disparate datasets or analyzing them for new purposes (for which they weren’t originally designed – and hence require data structuring and analytical technologies that can make sense of this data in new ways) that we begin to talk about Big Data.
Thanks to Moore’s law we are gaining the ability to collect, process and analyze increasingly massive amounts of data, and the cost of doing so is decreasing too. As Daniel Kraft points out in a recent article (punting Big Data as the next big thing in healthcare) a key example is the human genome and genomic sequencing. Ten years it cost a billion dollars to get a human sequence. Today it can be done for under $1000 (via a pilot program from 23andMe) and the cost is expected to continue declining. This means that the number of people who will be sequenced will increase exponentially in the coming years. Imagine the opportunities for gaining new insight when all this data becomes accessible for analysis in a crowd-sourced way–or when this data is combined with other data, such as pharmaceutical research data, or medical record data.
In parallel, we are in the early stages of the Internet of Things. Cisco predicts that there will be 50 billion devices connected to the internet by 2020; IBM thinks there will be 1 trillion connected devices. Many of these devices will be collecting health data, from high tech medical devices and implants to consumer technologies that track your day-to-day health-related behaviours.
Imagine when all this data–medical records, genome sequences, public health data, self-monitoring data–become available for analysis on a macro-scale, when disparate datasets are combined to create new insights and new services, when ‘open data’ platforms and open middleware platforms (connecting disparate sensing devices at one end, and a multitude of apps at the other end) become available to startups to develop a myriad of new tools and services. That’s the promise of Big Data.
Already many health startups have Big Data ambitions. They may start out by offering a reasonably straightforward (self-)monitoring service (e.g. analyse your running, sleeping, eating, symptoms, pain, etc.) but most also look forward to the day when their aggregated customer data can be put to use to develop benchmarks and new services. Patient communities such as CureTogether and PatientsLikeMe illustrate this well, in the way they’re analyzing their aggregated users’ data and coming up with new medical insights.
Yet other Big Data enthusiasts lie in wait for third-party datasets to become available for query and analysis. For example, health data collected by governments, public health authorities or national healthcare services should increasingly become available as ‘open data’ (obviously anonymized to protect patient privacy), allowing startups to start building tools and services on the back of that data (or by combining it with their own data).
Obviously there are also tremendous opportunities in solving the technical challenges inherent to Big Data; i.e. the technologies required to collect, integrate, store and analyse data on a massive scale.
The promise of Big Data is exciting (want more inspiration? – check out this TED talk), obviously, but there are challenges too. In the earlier mentioned Forbes article, Brad Peters argues that the Big Data optimists assume two things: firstly, that the right tools can be found to collate and analyse all of this disparate data in an efficient way, and secondly, that there really is valuable information to be extracted from all of this raw data, or at least sufficiently valuable information to justify the cost of this endeavor. Peters suspects that these two criteria have not yet been met. In the longer term he’s still an optimist though, Peters:
“My gut tells me that Big Data is real, that we will find the right tools to explore it (and that they will be available to almost everyone via the Cloud), that we will find useful results almost from the beginning – and that will grow even richer the deeper we drill. And that a whole second generation of tools will convert those results into content that will form the basis for a whole new boom of entrepreneurial start-ups. There is too much historic precedent – such as the work of epidemiologists in the eighteen century looking at illness tables – to argue otherwise.
But my hunch is also that all of this will take at least five years, and maybe more, to pull off. And that we will need every bit of that half-decade, and probably a whole lot more, to deal with the larger cultural implications of Big Data.”
Healthcare also presents some unique challenges since much health data is personal, sensitive and highly regulated. How do we convince patients and consumers to make their health data available? Will health data become a currency of sorts? How do we convince the medical establishment to start sharing data? And who owns such data, including data collected by medical devices? Should (anonymized) health data be made ‘open source’; should health data liberation be obligatory? How valuable is health data? What can policymakers do to facilitate this debate, this process? Food for thought and discussion – see you in June in The Netherlands!