Thoughts on Genomics

Next Generation of Healthcare

Posted by | Thoughts on Genomics, Thoughts on Technology | No Comments

This is a blog post of thoughts on what the next generation of healthcare may look like. Some of these ideas are currently being worked on by universities, myself, family and colleagues, and some by enterprises.

The human genome had been declared fully mapped and complete in 2003, just shortly after the .com boom/bust era. As a historic milestone in the progression of humanity, we finally had the basic blue prints for what makes us. This mapped genome has become one of the standards to map new genomes against – more commonly called a reference genome. Since then, we’ve made leaps and bounds with advances in not only understanding how humans function but also in the ways we treat or medicate ourselves.

One of the challenges back then was that we didn’t have enough compute power to do genome mappings largely at scale. While we still face similar challenges today, we have much more access to scalable resources in order to make these mappings possible. There are many projects in flight right now such as the 1,000 Genomes project that want to take our knowledge to the next level by mapping massive amounts of genomes into one database to allow scientists and geneticists to find meaningful data. While this is the next progressive step in science, I’ll argue that the consumer facing side, that is Healthcare, is lacking behind this next wave of tech.

Some people may have heard of a buzz word called “1:1 Healthcare” where patients get personalized healthcare to cure them of their issues. Much of today is sweeping statement drugs that go through clinical trials en masse to find out if it works for the majority. To me, the 1:1 buzz word is where we will eventually go and I hope that some of the work people are doing will accelerate that. I believe that the next generation of healthcare will break down to something like this:

  1. Acceleration of Mapping, Analysis, and Storage
  2. Genomic Cloud, Machine Learning, Large Scalability
  3. Computational Drug Discovery, Personal Gene Sequencing, Streaming Genome Mapping
  4. On-Demand Personalized Healthcare

If we were to put the current state of advanced healthcare backed with genomics and computing on the technology adoption curve, we would be sitting just at the early adopters portion. Even there, many of the next generation technologies that are going to change the way healthcare functions are still being developed by universities. There are some organizations building the technology (Illumina or Synthetic Genomics) with their purchasers being governments, universities, or special divisions of enterprises, however they are not close to mass market adoption. I believe that the above 4 items will create the advent for a truly new healthcare system. Let’s look at them further.

Acceleration of Mapping, Analysis, and Storage

It took roughly 15 years to map the first human genome and around $3B. It now takes hours, and in some cases that my Dad has demonstrated, minutes. In future runs, he’s looking to map it in seconds, reducing the compute costs to cents. As genome mapping software switches to being able to run in parallel, we’ll see a dramatic decrease in the time it takes to map. My Dad mapped the genome in 8 minutes and 43 seconds on ~9,000 threads and is looking to scale up to 300,000 threads. We’ll also see the analysis of the genome begin to take seconds as well. Many types of analysis that scientists and geneticists are performing today take many hours to run one query of many in order to get meaningful insight. With new data structures popping up designed to handle massive data sets, such as Hadoop or Spark, the analysis time will be reduced to seconds.

These large datasets are arguably where our biggest barrier lies – data storage. Each genome mapping can produce roughly 30GB to 100GB and requires high I/O bandwidth within the server clusters.We will need to scale to potentially writing petabytes worth of genomic data to one data structure while simultaneously querying/analyzing the same scale. When running at scale, this is the next problem we will largely face that potentially can be solved via new data structures. Each of these areas provide leaps of improvement which set us up well for the next wave of challenging problems to solve.

Genomic Cloud, Machine Learning, Large Scalability

Many private corporations and open government collaborations are creating their own genome cloud databases, however they are not doing them at true scale. With advances in genomic mappings, we’ll see these databases explode in size and capabilities to help unlock many key secrets into how the human body works. Currently, patterns must be found through manual analysis and specialized querying which causes inefficiency – especially at scale. They are often one off techniques (such as Pathway Interactions) or manually coded/created queries. With the previous advances in database structures we will be able to start to using advanced machine learning techniques to create in-depth pattern recognition, regression analysis, clustering methods, and more to provide “always on” analysis. There are already open source projects, such as ADAM from BD Genomics, that are providing the building blocks for smarter genomics.

By providing these building blocks, we’ll reduce the bottleneck of analysis through automation which means we can increase the number of genomes going into the database. This key point provides us the ability to create truly scalable insight into these collections of genomes. The scientists and genetics community will see a transition from serial, single structure computing to massive parallel processing (MPP) where genomes will be mapped on container and thread based computing architectures. This will mean that we’ll see 100’s of genomes mapped per minute (or even seconds) versus days. One of the projects my Dad is working on specifically adheres to scaling a single genome mapping to 300,000 threads, however we are looking into options to scale genome mappings to 1M+ threads spit up by 10 genomes concurrently.

Computational Drug Discovery, Personal Gene Sequencing, Streaming Genome Mapping

In the past 5-7 years there has been an advancement in “computational drug discovery”, which basically means building software that interacts with a system designed to emulate how a drug would interact with a protein or enzyme. In the coming years, we’ll see genomes in databases gain computational properties paired with additional data (such as blood type analysis) that react to computation created drugs which will dramatically reduce the costs and time it takes to run clinical trials. Enterprises will be able to access these databases, segment based on their criteria that they drug is trying to tackle, and test out the drug to reduce the trial & error. This will be imperative, as well as cohesive, because personal gene sequencing will start to become widespread with devices that are affordable enough for the masses while small enough to fit in a home. These devices will include a variety of services such as blood testing paired with gene sequencing.

Sequencing devices today are small enough to fit on the top of a desk but costs range between $80,000 and $500,000 (sometimes more). I believe we’ll see software starting to drive the hardware to accelerate the reduction in costs due to more simple, accessible software which could increase the adoption/interest in the field. Genomics hardware providers will be forced to drive prices down to a more reasonable level which will be another potential for creating the advent of next generation healthcare.

With the advancements in personal gene sequencing, there will be the need to creating streaming genome mappings where individuals will receive results in seconds and on the fly. It’s conceivable to see in the near future the ability for humanity to use this personal device to map their genome, do blood work, and much more with their personalized treatment or recommendations coming back in seconds. We’re already seeing pushes into this direction already as of the 2013-2014 range. This will require an incredible large computing center, higher bandwidth (Google Fiber), and tighter encryption than HIPPA currently requires. This may look similar to two factor authentication, hashed user values, or other methods peer-to-system encryption. With personal health data being pushed anonymously to the cloud, enterprises will then be able to execute on the final stage of 1:1 healthcare.

On-Demand Personalized Healthcare

Healthcare, pharmaceutical companies, academia or other enterprises will be able to access massive vaults of anonymous user health data to provide more precise and actionable medical care to the masses. With a constant stream of data, tracking viruses such as the flu will help us understand the potential mutation pattern more rapidly than ever before. But the holy grail is to do this with known individual patients as they come in. This is where these personal health devices will come into play. When a user utilizes their personal sequencing device, we’ll be able to identify and quickly provide recommendations to doctors on the fly with the best option for curing the patient with the doctor making the ultimate call on what the patient should be given. This could even happen remotely without the user ever having to step foot into the clinic which will reduce overhead for doctors, increase profitability, and increase throughput of patients globally. Apart from those benefits, the patient will provide unprecedented access to a new level of personalized healthcare that allows drugs to be tailored to their genetic markup, health plans suited to their needs, and a constant level of monitoring to ensure success of offered healthcare.

There are obvious challenges the vary in difficulty to execute. However, we have many of these systems somewhat in place today but they are disparate and lack a common infrastructure that can provide the next generation of demands. The largest challenges to overcome will lie within scaling computing architectures, data storage, advancement of software, and cost to deliver mass healthcare devices. Fortunately, many of these areas are being worked on currently and progressing smoothly. It wouldn’t be inconceivable to see many of these areas becoming more developed within 10 years for the masses to use. While the limitations are currently on the business & computer side, there will be a mind shift of patients/users to providing truly private data to the cloud which will take time, political shifts, regulation changes, and plenty of oversight.

There’s a bright future for the healthcare industry with it’s ability to massively disrupted in a way that benefits the consumer emphatically. It’s only a matter of time that this level of execution gets the correct funding and traction to make it happen.