Craig Venter and I share a similar genetic mutation: we both have ADHD. I was diagnosed at around the age 13 after struggling to pay attention in class, could not stop fidgeting, couldn’t focus, would forget everything, etc. The list goes on and on. Basically, I was a total shit student because I was all over the place mentally. The only time I was able to focus was when adrenaline was kicking in or it was something that really interested me and got me excited.
After learning of my diagnoses, I did what any 13 year old would do and started to go on a vision quest of what exactly it meant to have ADHD. I didn’t quite get it because I would look at other students and long to just be able to sit still and focus. I felt completely out of place. I started by going to the trust Google and searching, searching, searching. I read many articles that I didn’t understand and lots of unclear direction as to what really caused ADHD. I turned up dry with results except for one thing: ADHD was some symptom of genetics.
Years later, while I was in a brief moment of college, I went on the quest again to understand why this happened. I don’t remember where I heard it but someone told me that people with ADHD or entrepreneurs had the “risk taking” gene. After Googling that, I found the results of DRD4. I also found out that the man who initially sequenced the human genome also had it too – Craig Venter. This is 50% where my interest in genomics comes from while the other 50% is cancer and how my family is plagued with it (a post for a different day).
Fast forward to today and I’m now tinkering with massive scalable data warehouses that can hold 1,000’s of genomes to do population-based comparative genomics. The 1st genome on the list that has been ingested into this database was Craig Venter’s as a small tribute to someone I admire. I’ve ingested his variant format file which shows all of the SNPs (single nucleotide polymorphisms) within his genome compared to a reference genome. This netted around 3 million rows inserted.
I’ve also mapped this to a database of diseases where, upon ingestion, an intersection between the variant file and diseases is surfaced. This provides instant insights into diseases that may be present based on the diseases in the database. It’s simple and crude at the moment, and is by no means up to clinical grade. However, it’s one step towards pulling in additional data and running machine learning models to find the propensity of different diseases. Based on current benchmarks, I anticipate that we can do this in less than 1 minute.
This blog is really a reflection on something pretty extraordinary that I’m proud that I’ve built. While its basic, it’s been insanely rewarding to see the results. To bring this post full circle, I hope to explore much further the impacts of base pair mutations on DRD4 which start with a simple query and a simple image:
These are the mutations of Venter’s base pairs within Chromosome 11 at the specific base pair range of DRD4. The next steps I’ll be taking are associating this with genotyping mutations, association with a gene database & annotation, and providing multiple other genomes for comparison.
As a last item, probably the most serendipitous moment in this little adventure so far has been when I ran the first intersection between a variant test file and a database of 750. Again, not clinical grade by any means, but it was still a great moment to see a very small baby step towards a grander vision.
If interested, feel free to reach out to me if you have questions, would like to help, or just want to talk shop.