7 Things I’ve Learned Recently

Posted by | Life, Startups, Technology | No Comments

Life is pretty interesting with the direction that it will take you, who it will introduce you to, and the lessons it will teach you in weird ways. I recently had some pretty impactful changes and figured it would be good to share my recent learnings.

You can never be to cognizant of your business health

In the past few months it has become apparent that when you run a company, no matter the size or growth you’re experiencing, you can never be too close to the core fundamental business metrics. It’s important to keep an eye on critical areas of revenue, profits, growth, retention, and reduction of complexity. Having a core vision that you have conviction around is paramount for bringing people together as well as providing a lighthouse, dictating what metrics matter the most.

Be open to meeting with unique people

I’ve recently started saying “fuck it, let’s meet” more to random strangers who reach out. I’ve done the same as well by randomly reaching out to people in different industries to learn about their work, their successes, and their failures. I’m not sure what has recently triggered this but it has been incredibly enlightening.

For example, I went to a bioinformatics MeetUp recently and learned a lot about what challenges bioinformaticians face with their day to day work and what matters most to them. Turns out, having open data within the science community is really important. Who knew! Apart from that, I’ve met some individuals recently who have challenged my frame of thought and helped me develop a different framework for thinking about problems.

My take away lesson was to put yourself out there and talk to people you wouldn’t have otherwise – especially in different industries. I’ve found that understanding how they solved their own industry problems can translate into solving your own industry problems.

Be confident about what you’re worth

Pretty simple learning here: don’t devalue what you believe you can offer to others. They may not see the same type of value which can either tell you a lot about the person or provide a conduit within a conversation to explain how valuable you are. This isn’t to say that you should be cocky, but I’m finding that it’s important to have confidence in yourself, believe that you’re worth what you think you’re worth, be persistent and consistent in that confidence, and implore that on others who may not see the same way.

Life is pretty short

I know everyone says this but seriously, god damn does it feel short. I know this is a typical thing for someone to say and dumb to reiterate at my young age but… it really feels short. I think about where I was two years ago and it is shocking how fast things can change. However, I was reminded that life is also long, depending on your perspective, goals, and ambitions at that current time. What’s short now may seem long in the future. But for me, right now, I can’t seem to get enough time on my side.

Graphs are probably the future of computer to computer interaction

I had a great conversation recently with a colleague who is exploring different types of graphs as a method of computer to computer communication. In essence, individual components have their own operating rules and jurisdictions that they behave by. However, when interacting with other components, they have contracts that can be extended in order to have a meaningful “conversation”. This is critically important when working in increasingly complex situations like supply chain management, ERP, and other enterprise business functions.

Furthermore, with the internet of things coming into force, this complex graph of “things” will need to have a concrete way of sharing data with each other. You don’t want to have a situation where a street light isn’t talking to a car giving it instructions only to find out that the car isn’t talking the same language. That will end up bad.

The core principle is that components can have many data ingestion points to make decisions but have binary outputs that interact with other components that make binary decisions based on lots of data points, and so on. The method for this is called Intelligent Agents and is a subset of development methodologies underneath machine learning. I’ll probably write an expansion on this topic soon.

Every bad or tough situation has massive opportunities

Being put randomly into these situations sucks because it’s pushing you into the unknown. I hate the unknown. I’m always trying to be 100 steps ahead and super calculated. But alas, shit happens and what matters is being able to react in a flexible manner. Instead of victimizing yourself, I’ve found it better to look at the situation as “Now that I’m here, what are the good things about this situation? What unique opportunities do I have that I didn’t have before?”. Always look for the positive, even when shit really, really sucks. I’ve found that things typically course correct themselves and opportunities start opening up usually in less than a week. This all comes back to networking and being confident. Trust yourself in finding a way out of the shitty situation, even if the way out is completely unclear.

Saying no to features is hard. Finding features that are game changing is harder.

Both are difficult tasks of a product manager. You have lots of competing priorities and objectives and your job is to act as a shit umbrella and a filter. You have to have extremely firm but loosely held opinions about the world. It’s hard to say no to features that don’t align with your long term vision. If you cave in to those feature requests, they’re going to come back and bite you in the ass.

Even more hard is finding the features that are really game changing for customers. This is what makes a good product manager valuable. Often times the answer isn’t clear and the customer doesn’t know exactly what they want. It’s a 50/50 balance of solving their pain points now but also showing them the future. I advocate solving customer problems but really think that doing this constantly doesn’t progress you as fast as it could. If I constantly solve for a 10% gain, then I’m competing with the market (you need to do this because it’s the bread and butter). However, if I’m able to give a 10x solution to a problem, I’ve leapfrogged the market significantly. Often times, solving for 10x is incredibly unclear.

These are probably pretty obvious, hippie-millennial talking point and learnings but they’ve recently been really reinforced in the past couple months for me. Things are often not as bad as they seem initially.

Federated Queries in Genomics

Posted by | Genomics, Technology | No Comments

In the late 90s and early 2000s there was a trend called “federated search” that made some ground in how the web was architected. Federated Search, more commonly known as “federated querying”, allows a system to query multiple disparate data sources in order to surface up a single result from a single query. It never really made grounds however due to the disorganized nature of where we were at with data architecture as well as data storage performance.

One of the other trends within this world was storing a bunch of data into a single database from many different sources which is where the term “data lake” came into play. The term describes exactly what it is: a giant lake of data that you can query against from many different sources.

These were two trends that ended up dying pretty quickly as optimal solutions became apparent for specific industries. Postgres and NoSQL moved on to the scene and became the choice database solution for many needs within the high tech world. With NoSQL specifically, it’s performance was incredible for fast ingestion and really fast query times. However, the unstructured nature of NoSQL causes it to be problematic in many types of organizations.

In a funny way, the “big data” world is coming around to going back to federated querying. I think specifically that Genomics is going to be a huge user of this type of architecture given the nature of dramatically different database requirements for different parts of the genomics pipelines. Some systems will want very rigid and structured databases whereas others will want the freedom of unstructured storage that allows them to scale and bend data.

For example, you might want to store information about Genes and all their annotated meta data in a MySQL database with optimized query performances. However, you may want a Postgres build for ingesting human genome variant data. Postgres could be nice for this given its parallel processing nature. In another realm, you may want to store clinical trial data inside of a NoSQL database in order to return large arrays of data extremely fast. This means that you have 3 different databases with, while similar, different query languages. A structured federated query language to hit all 3 sources would be beneficial.

An example of a federated query could look something like:

PREFIX gene: </local_endpoint/genes>
PREFIX diseases: </local_endpoint/diseases>
PREFIX genome_variants: </local_endpoint/genome_variants>
PREFIX clinical_trials: </local_endpoint/clinical_trials>

SELECT ?genes ?chromosome ?basepair WHERE {
    SERVICE </local_query_api_endpoint/> {
        ?basepair BETWEEN 100000 AND 200000:genome_variants.
        ?chrom = 11:gene.
        ?diseases genes:sameAs ?genes
        FILTER(str(?ADHD), "DRD4"))

This isn’t necessarily the most pretty (or accurate) representation of a federated query but it provides the structure in which we might join together a multiple data sources with specific queries into one table. This table might be persisted or we may just be doing a floating query, in which we could store the query in a temp table.

Apart from its flexibility, federated queries are attractive to genomics primarily because it can provide incredible performance across massive datasets. There are other query languages or implementations in the field that act as more aggregators vs. federations however their purpose is very similar. Facebook, in my opinion, has the most advanced version of this where they use a node:leaf system pair with GPU based operations for querying against petabytes of data.

Among the other little experiments I’ve been running, I’ll be testing out an implementation of an aggregation service on a small scale from disparate data sources to see if I can create a simple example of this.

Analyzing DRD4 on Craig Venter’s Genome

Posted by | Genomics, Healthcare, Technology | No Comments

Craig Venter and I share a similar genetic mutation: we both have ADHD. I was diagnosed at around the age 13 after struggling to pay attention in class, could not stop fidgeting, couldn’t focus, would forget everything, etc. The list goes on and on. Basically, I was a total shit student because I was all over the place mentally. The only time I was able to focus was when adrenaline was kicking in or it was something that really interested me and got me excited.

After learning of my diagnoses, I did what any 13 year old would do and started to go on a vision quest of what exactly it meant to have ADHD. I didn’t quite get it because I would look at other students and long to just be able to sit still and focus. I felt completely out of place. I started by going to the trust Google and searching, searching, searching. I read many articles that I didn’t understand and lots of unclear direction as to what really caused ADHD. I turned up dry with results except for one thing: ADHD was some symptom of genetics.

Years later, while I was in a brief moment of college, I went on the quest again to understand why this happened. I don’t remember where I heard it but someone told me that people with ADHD or entrepreneurs had the “risk taking” gene. After Googling that, I found the results of DRD4. I also found out that the man who initially sequenced the human genome also had it too – Craig Venter. This is 50% where my interest in genomics comes from while the other 50% is cancer and how my family is plagued with it (a post for a different day).

Fast forward to today and I’m now tinkering with massive scalable data warehouses that can hold 1,000’s of genomes to do population-based comparative genomics. The 1st genome on the list that has been ingested into this database was Craig Venter’s as a small tribute to someone I admire. I’ve ingested his variant format file which shows all of the SNPs (single nucleotide polymorphisms) within his genome compared to a reference genome. This netted around 3 million rows inserted.

I’ve also mapped this to a database of diseases where, upon ingestion, an intersection between the variant file and diseases is surfaced. This provides instant insights into diseases that may be present based on the diseases in the database. It’s simple and crude at the moment, and is by no means up to clinical grade. However, it’s one step towards pulling in additional data and running machine learning models to find the propensity of different diseases. Based on current benchmarks, I anticipate that we can do this in less than 1 minute.

This blog is really a reflection on something pretty extraordinary that I’m proud that I’ve built. While its basic, it’s been insanely rewarding to see the results. To bring this post full circle, I hope to explore much further the impacts of base pair mutations on DRD4 which start with a simple query and a simple image:


These are the mutations of Venter’s base pairs within Chromosome 11 at the specific base pair range of DRD4. The next steps I’ll be taking are associating this with genotyping mutations, association with a gene database & annotation, and providing multiple other genomes for comparison.

As a last item, probably the most serendipitous moment in this little adventure so far has been when I ran the first intersection between a variant test file and a database of 750. Again, not clinical grade by any means, but it was still a great moment to see a very small baby step towards a grander vision.


If interested, feel free to reach out to me if you have questions, would like to help, or just want to talk shop.