In this weeks post, I was asked to talk about running a genetics lab, what it takes, the challenges, how I got into it, and all the fun stuff around it. As a reminder, this series stems from conversations with college students around wild questions that they want to learn more about. I have an hour to read up on the subject and an hour to write about it. Most of the questions originate in places that I’ve thought or talked about before. This week, we’re talking about genetics and my genome sequencing facility!
I’ve written about aspects of this before however not necessarily from the business side. I can say without a doubt that it is a very different and unique domain to do a software startup in as there are a lot more pitfalls that we face.
To start out with, how exactly did we (my family) end up getting into this world? It actually all started in late 2014. My dad, who is a PhD in Biology and Bioinformatics, was working on a problem set around making human whole-genome alignments against a reference genome faster and parallel. The existing problem was that it would take hours to align a genome to a reference, causing researchers to just have to wait around forever to get results.
We talked a lot about this and discovered NVBio – a GPU-based alignment algorithm. Over the next year, we discovered that we could align 32 full 30x coverage human genomes in just 25 minutes leveraging GPUs. It was super promising! At the time, I was working at a truly big data company and we started to explore the idea of building a bioinformatics cloud. The intention was to create a hyper-performant cloud infrastructure designed just for genomics so that researchers could easily do large scale population analysis in seconds.
We spent the next year doing a ton of validation with researchers across the globe to see if the problem had legs. The good news was that the problem was definitely there, however, nearly everyone we talked too wouldn’t pay for it. Why? A few reasons.
- Bioinformatics stems from an academic world and there’s a lack of value perception. (“Why pay for it when I can just get it for free?”).
- The budgets definitely weren’t there for universities and for organizations doing genomics at this scale, they had their own infrastructure with teams to manage it.
- Genomics deals with massive data sets and moving that data around on public infrastructure isn’t feasible (time-wise). Upload bandwidth massively constrained this.
- Lastly, it doesn’t matter how good your infrastructure is…if your data sucks, you’ll get garbage results.
This last point is where the rubber met the road for us. After first-hand experience at Colorado State University and their genomics lab practices as well as interviewing many other genomics labs, pharmaceuticals, clinical researchers, and the like, we found that most produced shit quality. In some insane cases, there were labs reusing pipettes. This is a massive no-no because fragments of gDNA get caught in the micro-lining of the plastics, causing your sample to produce insane results.
Garbage data in. Garbage data out.
At the same time, while my Dad worked at the local university, there was massive in-fighting between a lot of the VPs and research leads. In one case, on VP made a big political move and, literally overnight, shut down the genomics lab my dad worked at in order to get back at another VP. My dad was part of this lab and this was the event that led us to do the lab.
The university left a lot of clients high and dry, so we decided that we were sick of the bullshit and that we were going to build our own commercial lab. We spent the next 6 months finding space, getting a purchase list in place, getting financing in place, legal work, bank accounts, website, marketing campaign, etc. This was the sprint to the new business and was extremely exhausting since we were both still holding day jobs. Apart from that, I was learning genomics on the fly. I had taken an advanced genetics course in high school but we learned and focused on Mendelian genetics. I didn’t know shit about how next-generation sequencing works, what loci were, novel alleles, drug discovery, or any of it was. I spent my evenings plowing through articles, books, software, learning from my dad, and everything in between.
Starting a genetics lab is similar in some ways to a software startup but very different in others. In the similar ways, we still had to get all the basics in place, such as accounting software, bank accounts, legal structures, a website, basic cloud infrastructure for running bioinformatics, and the like. What makes it so difficult is the wet lab portion and then integrating that into the cloud infrastructure. We call this the upstream and downstream from the genome sequencer.
The way genome sequencing works is that you typically get a sample in to the lab, such as a cell pellet full from bacteria or something. We then run it through a series of steps in which each have their own protocol. This looks like the following: DNA extraction, sample QC, library preparation, and then genome sequencing.
DNA extraction is simply extracting DNA from the sample we receive. We then do a sample quality check to look for the quantity and quality of DNA. This is done through a device called the Qubit Flurometer where we check for 260/280 and 260/230 ratios. If there’s enough to run, we then do library preparation. Library preparation is basically taking the DNA and processing it through the protocol, which consists of enzymes and other things, in order to get to tagmentation. Tagmentation is effectively tagging the basepairs with a partner that lights up under UV light in a certain color. This means that when the DNA goes through the sequencer, it can see that Blue = Adenine (as an example). This can vary quite a bit, but these protocols usually take between 8 hours to 5 full days. There’s a high degree of complexity in each of these with lots of room for error. Once the library preparation is done, we run one last quality check on a device called the Tapestation. In the most simple terms, this checks the read-length of the different fragments of DNA that are going into the sequencer. If that passes then the library is ready to be loaded into the sequencer.
Congratulations, we’ve now just performed the upstream portion of genome sequencing!
Once the sequencer is running, you have to wait 2 hours before the moment of truth comes out on whether you have a success or failed run. This moment is called “cluster density”. During the first 2 hours, the sequencer is, for a lack of better description, moving all of the DNA samples into four different quadrants. Each quadrant processes a whole slew of DNA material in parallel with each other. In our sequencer, if you have over or under 200k/mm2 by ~10%, you’re fucked. The downstream quality will end up throwing a whole bunch of quality scores below Q30. Once it’s running, you can’t stop it though and you can’t recover, so if you fuck up anywhere in the upstream process, you literally cannot recover from it on the downstream side. This means that if you have a failed run, you could be out thousands of dollars. In high-cost runs, it can be tens of thousands of dollars.
So, circling back a bit to the business side, the challenge in all of this is balancing risk, cost, and client expectations. A couple of bad runs can torpedo the company finances (and have) so it’s a very high stakes game. This was not something we had accounted for originally. Another area that was interesting in starting this up was the devices we needed. We thought there was a lot of devices that were required – and truthfully there was. However, we ended up buying a bunch of equipment that we need. As an example, we thought we would need a biosafety cabinet for sample processing. However, I think we’ve used it once? The reason being is that most people we work with send in safe samples to work with. We’re not dealing with Anthrax in our facility.
One of the biggest hidden costs of the genomics world and a large reason why there aren’t more startups in this space is that the ongoing costs are insanely high. For example, each year, just to have the basic service agreement on our sequencer is $5,000. Another crazy cost is our pipettes. These are hypersensitive in what they pull up (fluid wise) and need to be recalibrated each year. After sending ours to Germany(!), 5 weeks later we received ours back with a bill for ~$4,000. Advertising is crazy expensive because it is a scientific field. It’s not every day that someone is looking for genome sequencing so the bids on the keywords is super high, making our overall customer acquisition cost (CAC) high. Once we’ve landed a client though and they do repeat business with us, our total lifetime value (LTV) with them is very high with healthy margins. However, just getting the client and email bases is a huge journey itself.
Some of the other major challenges with this space are that each project is slightly different. Our clients are all working on insanely complicated and different research areas which typically require different ways of interacting with them. We have clients in the pharmaceutical space, clinical research, agrigenomics, and more. Even within those spaces, there are sub-verticals that we have to deal with. For example, with pharmaceuticals, we’ve done cancer sequencing, CRISPR validation, targeted sequencing, 16S metagenomics, and more. Those are all different protocols!
Having hit our 3-year mark, I can see why there aren’t more startups like ours. It truly is a very complicated field with lots of pitfalls that are extremely difficult or even unavoidable beforehand. Furthermore, doing it as self-funded, bootstrapped effort has a whole different set of problems (cash flow stabilization, net30 invoicing issues, etc.). Despite all the challenges, it’s hugely rewarding when we look at what our services help our researchers accomplish. We’ve helped accelerate research on immunotherapies for kids with rare diseases. We’ve saved companies hundreds of thousands of dollars by speeding up steps in their processes during clinical trials. We’ve helped new PhD grads gain their footing in foundational research that will help pave the way to new discoveries.
It’s all very exciting! I will admit that I never thought I’d be working on a genetics startup as that’s, by far, the last place my career or mindset was going towards. However, there’s been a lot of interesting learnings and parallels that have carried over into other areas of my life.
Alright, per the series guardrails, I’m at the hour mark of blasting my thoughts down. There’s a lot more content coming down this route so stay tuned. If you have a question that you want me to pontificate on, let me know!