Thoughts on Technology

Why open source accelerates all industries

Posted by | Thoughts on Enterprise Software, Thoughts on Startups, Thoughts on Technology | No Comments

It seems to be a trend lately for companies to give back to the community through. I, for one, absolutely love this trend. I’m a huge advocate of big companies taking the initiative to contribute back to the communities where they often times pull their software or ideas from. To me, there are 3 main benefits of open source: Community, Adoption, Innovation.


Fun fact: pretty much all tech you used has some level of open source software to it. And the only way for OSS to exist is through a community. It’s incredible to see what a few people can come up with as a grassroots idea and how dense the gravity can become around them. Developers gravitate towards trying out new software and often times enjoy simplicity yet extensibility. This gives them the sandbox that many love. What you’re reading is the start of community forming where users congregate to share ideas, projects they built with the OSS code/software, use cases, their contributions back, etc. Often times these communities gain massive momentum, such as the open CMS Drupal, where you have tens of thousands of users reviewing the source code, adding unique functionality, and creating extensions of the platform. Through contributions we see huge innovation at play where users push the limits of technology into a positive direction. Once this type of momentum is generated, there comes the massive adoption phase.


The beautiful thing I love about OSS is when the adoption hits critical mass to take off. A few years ago, Spark (an Apache based cluster computing solution, also OSS) was a small scale project where the founders wanted a more simple and fast way to compute lots of data. With the rise of the big data boom, more and more users really benefited from the scalability and speed that it provided. As adoption increased significantly over the next few years, so did the diversity and stability of the code. Fast forward to today and you’ll see that Spark is now becoming the standard for truly big data solutions. It’s gained enough traction to land a commitment from IBM of over 3,500 developers to help contribute to the code base. OSS powers the world and gives all users the ability to tailor it to their needs. Through tailoring the solution to fix their problem, we often see the biggest innovations surface.


A common term among the OSS world is “forking”. No, this isn’t forking someones lawn for a prank. Rather, you can think of it as mutating the original code base to be different. Many of the core principles from the code may still be in tact, but there will often be fundamental differences. This is a critical junction for the world as this is where massive innovation happens. When we start to see users play with it to be more performant or scalable or whatever else their use case may be, we see competition happening. Through competition, innovation. The true benefit for OSS is its ability to allow for rapid disruption to take place. Back in the day, there was no easy way to just launch a database designed for handling massive amounts of data. Today, we’re in a much better spot with various version like Mahout, Hadoop, Spark, and many others that provide different types of solutions. This is what makes the OSS world so great: it provides healthy competition in which the end users (both businesses and developers) benefit from.

The cycle keeps continuing at this point to help push innovation. This isn’t limited to just software though. Other industries are starting to push proprietary IP open in an effort to contribute and drive adoption. Tesla Motors is a great example of this when the opened up their patents for use by all. With software roots and mentality in mind, I’m hoping that we’ll see more and more companies offer up unique parts of their business to become open sourced.

Next Generation of Healthcare

Posted by | Thoughts on Genomics, Thoughts on Technology | No Comments

This is a blog post of thoughts on what the next generation of healthcare may look like. Some of these ideas are currently being worked on by universities, myself, family and colleagues, and some by enterprises.

The human genome had been declared fully mapped and complete in 2003, just shortly after the .com boom/bust era. As a historic milestone in the progression of humanity, we finally had the basic blue prints for what makes us. This mapped genome has become one of the standards to map new genomes against – more commonly called a reference genome. Since then, we’ve made leaps and bounds with advances in not only understanding how humans function but also in the ways we treat or medicate ourselves.

One of the challenges back then was that we didn’t have enough compute power to do genome mappings largely at scale. While we still face similar challenges today, we have much more access to scalable resources in order to make these mappings possible. There are many projects in flight right now such as the 1,000 Genomes project that want to take our knowledge to the next level by mapping massive amounts of genomes into one database to allow scientists and geneticists to find meaningful data. While this is the next progressive step in science, I’ll argue that the consumer facing side, that is Healthcare, is lacking behind this next wave of tech.

Some people may have heard of a buzz word called “1:1 Healthcare” where patients get personalized healthcare to cure them of their issues. Much of today is sweeping statement drugs that go through clinical trials en masse to find out if it works for the majority. To me, the 1:1 buzz word is where we will eventually go and I hope that some of the work people are doing will accelerate that. I believe that the next generation of healthcare will break down to something like this:

  1. Acceleration of Mapping, Analysis, and Storage
  2. Genomic Cloud, Machine Learning, Large Scalability
  3. Computational Drug Discovery, Personal Gene Sequencing, Streaming Genome Mapping
  4. On-Demand Personalized Healthcare

If we were to put the current state of advanced healthcare backed with genomics and computing on the technology adoption curve, we would be sitting just at the early adopters portion. Even there, many of the next generation technologies that are going to change the way healthcare functions are still being developed by universities. There are some organizations building the technology (Illumina or Synthetic Genomics) with their purchasers being governments, universities, or special divisions of enterprises, however they are not close to mass market adoption. I believe that the above 4 items will create the advent for a truly new healthcare system. Let’s look at them further.

Acceleration of Mapping, Analysis, and Storage

It took roughly 15 years to map the first human genome and around $3B. It now takes hours, and in some cases that my Dad has demonstrated, minutes. In future runs, he’s looking to map it in seconds, reducing the compute costs to cents. As genome mapping software switches to being able to run in parallel, we’ll see a dramatic decrease in the time it takes to map. My Dad mapped the genome in 8 minutes and 43 seconds on ~9,000 threads and is looking to scale up to 300,000 threads. We’ll also see the analysis of the genome begin to take seconds as well. Many types of analysis that scientists and geneticists are performing today take many hours to run one query of many in order to get meaningful insight. With new data structures popping up designed to handle massive data sets, such as Hadoop or Spark, the analysis time will be reduced to seconds.

These large datasets are arguably where our biggest barrier lies – data storage. Each genome mapping can produce roughly 30GB to 100GB and requires high I/O bandwidth within the server clusters.We will need to scale to potentially writing petabytes worth of genomic data to one data structure while simultaneously querying/analyzing the same scale. When running at scale, this is the next problem we will largely face that potentially can be solved via new data structures. Each of these areas provide leaps of improvement which set us up well for the next wave of challenging problems to solve.

Genomic Cloud, Machine Learning, Large Scalability

Many private corporations and open government collaborations are creating their own genome cloud databases, however they are not doing them at true scale. With advances in genomic mappings, we’ll see these databases explode in size and capabilities to help unlock many key secrets into how the human body works. Currently, patterns must be found through manual analysis and specialized querying which causes inefficiency – especially at scale. They are often one off techniques (such as Pathway Interactions) or manually coded/created queries. With the previous advances in database structures we will be able to start to using advanced machine learning techniques to create in-depth pattern recognition, regression analysis, clustering methods, and more to provide “always on” analysis. There are already open source projects, such as ADAM from BD Genomics, that are providing the building blocks for smarter genomics.

By providing these building blocks, we’ll reduce the bottleneck of analysis through automation which means we can increase the number of genomes going into the database. This key point provides us the ability to create truly scalable insight into these collections of genomes. The scientists and genetics community will see a transition from serial, single structure computing to massive parallel processing (MPP) where genomes will be mapped on container and thread based computing architectures. This will mean that we’ll see 100’s of genomes mapped per minute (or even seconds) versus days. One of the projects my Dad is working on specifically adheres to scaling a single genome mapping to 300,000 threads, however we are looking into options to scale genome mappings to 1M+ threads spit up by 10 genomes concurrently.

Computational Drug Discovery, Personal Gene Sequencing, Streaming Genome Mapping

In the past 5-7 years there has been an advancement in “computational drug discovery”, which basically means building software that interacts with a system designed to emulate how a drug would interact with a protein or enzyme. In the coming years, we’ll see genomes in databases gain computational properties paired with additional data (such as blood type analysis) that react to computation created drugs which will dramatically reduce the costs and time it takes to run clinical trials. Enterprises will be able to access these databases, segment based on their criteria that they drug is trying to tackle, and test out the drug to reduce the trial & error. This will be imperative, as well as cohesive, because personal gene sequencing will start to become widespread with devices that are affordable enough for the masses while small enough to fit in a home. These devices will include a variety of services such as blood testing paired with gene sequencing.

Sequencing devices today are small enough to fit on the top of a desk but costs range between $80,000 and $500,000 (sometimes more). I believe we’ll see software starting to drive the hardware to accelerate the reduction in costs due to more simple, accessible software which could increase the adoption/interest in the field. Genomics hardware providers will be forced to drive prices down to a more reasonable level which will be another potential for creating the advent of next generation healthcare.

With the advancements in personal gene sequencing, there will be the need to creating streaming genome mappings where individuals will receive results in seconds and on the fly. It’s conceivable to see in the near future the ability for humanity to use this personal device to map their genome, do blood work, and much more with their personalized treatment or recommendations coming back in seconds. We’re already seeing pushes into this direction already as of the 2013-2014 range. This will require an incredible large computing center, higher bandwidth (Google Fiber), and tighter encryption than HIPPA currently requires. This may look similar to two factor authentication, hashed user values, or other methods peer-to-system encryption. With personal health data being pushed anonymously to the cloud, enterprises will then be able to execute on the final stage of 1:1 healthcare.

On-Demand Personalized Healthcare

Healthcare, pharmaceutical companies, academia or other enterprises will be able to access massive vaults of anonymous user health data to provide more precise and actionable medical care to the masses. With a constant stream of data, tracking viruses such as the flu will help us understand the potential mutation pattern more rapidly than ever before. But the holy grail is to do this with known individual patients as they come in. This is where these personal health devices will come into play. When a user utilizes their personal sequencing device, we’ll be able to identify and quickly provide recommendations to doctors on the fly with the best option for curing the patient with the doctor making the ultimate call on what the patient should be given. This could even happen remotely without the user ever having to step foot into the clinic which will reduce overhead for doctors, increase profitability, and increase throughput of patients globally. Apart from those benefits, the patient will provide unprecedented access to a new level of personalized healthcare that allows drugs to be tailored to their genetic markup, health plans suited to their needs, and a constant level of monitoring to ensure success of offered healthcare.

There are obvious challenges the vary in difficulty to execute. However, we have many of these systems somewhat in place today but they are disparate and lack a common infrastructure that can provide the next generation of demands. The largest challenges to overcome will lie within scaling computing architectures, data storage, advancement of software, and cost to deliver mass healthcare devices. Fortunately, many of these areas are being worked on currently and progressing smoothly. It wouldn’t be inconceivable to see many of these areas becoming more developed within 10 years for the masses to use. While the limitations are currently on the business & computer side, there will be a mind shift of patients/users to providing truly private data to the cloud which will take time, political shifts, regulation changes, and plenty of oversight.

There’s a bright future for the healthcare industry with it’s ability to massively disrupted in a way that benefits the consumer emphatically. It’s only a matter of time that this level of execution gets the correct funding and traction to make it happen.