Thoughts on Technology

Analyzing DRD4 on Craig Venter’s Genome

Posted by | Thoughts on Genomics, Thoughts on Technology | No Comments

Craig Venter and I share a similar genetic mutation: we both have ADHD. I was diagnosed at around the age 13 after struggling to pay attention in class, could not stop fidgeting, couldn’t focus, would forget everything, etc. The list goes on and on. Basically, I was a total shit student because I was all over the place mentally. The only time I was able to focus was when adrenaline was kicking in or it was something that really interested me and got me excited.

After learning of my diagnoses, I did what any 13 year old would do and started to go on a vision quest of what exactly it meant to have ADHD. I didn’t quite get it because I would look at other students and long to just be able to sit still and focus. I felt completely out of place. I started by going to the trust Google and searching, searching, searching. I read many articles that I didn’t understand and lots of unclear direction as to what really caused ADHD. I turned up dry with results except for one thing: ADHD was some symptom of genetics.

Years later, while I was in a brief moment of college, I went on the quest again to understand why this happened. I don’t remember where I heard it but someone told me that people with ADHD or entrepreneurs had the “risk taking” gene. After Googling that, I found the results of DRD4. I also found out that the man who initially sequenced the human genome also had it too – Craig Venter. This is 50% where my interest in genomics comes from while the other 50% is cancer and how my family is plagued with it (a post for a different day).


Fast forward to today and I’m now tinkering with massive scalable data warehouses that can hold 1,000’s of genomes to do population-based comparative genomics. The 1st genome on the list that has been ingested into this database was Craig Venter’s as a small tribute to someone I admire. I’ve ingested his variant format file which shows all of the SNPs (single nucleotide polymorphisms) within his genome compared to a reference genome. This netted around 3 million rows inserted.

I’ve also mapped this to a database of diseases where, upon ingestion, an intersection between the variant file and diseases is surfaced. This provides instant insights into diseases that may be present based on the diseases in the database. It’s simple and crude at the moment, and is by no means up to clinical grade. However, it’s one step towards pulling in additional data and running machine learning models to find the propensity of different diseases. Based on current benchmarks, I anticipate that we can do this in less than 1 minute.


This blog is really a reflection on something pretty extraordinary that I’m proud that I’ve built. While its basic, it’s been insanely rewarding to see the results. To bring this post full circle, I hope to explore much further the impacts of base pair mutations on DRD4 which start with a simple query and a simple image:

drd4-test

These are the mutations of Venter’s base pairs within Chromosome 11 at the specific base pair range of DRD4. The next steps I’ll be taking are associating this with genotyping mutations, association with a gene database & annotation, and providing multiple other genomes for comparison.


As a last item, probably the most serendipitous moment in this little adventure so far has been when I ran the first intersection between a variant test file and a database of 750. Again, not clinical grade by any means, but it was still a great moment to see a very small baby step towards a grander vision.

Tableau_-_Book1

If interested, feel free to reach out to me if you have questions, would like to help, or just want to talk shop.

 

 

 

Different Methods of Testing, Optimizing, Predicting and Personalizing

Posted by | Thoughts on Technology | No Comments

I had a conversation with my dad today that sparked me to write this. He wrote me an email talking about how it seems like some of the products just “know who he is” and appear to know what he’s likely to do next. He’s not the only one who has asked me this in the past few weeks so I figured I would try to shed some light from my point of view.

There always seems to be a lot of talk about how important A/B testing is or new buzzwords like “optimization” or “personalization. Many of these trends often start from a key blog or industry expert calling out the ROI that it can provide to digital marketers. I’m all for testing and personalization, but it’s key to understand what exactly each is and what the benefits are. This is my attempt to bring it all back down to the layman.


Simple A/B Testing

There are many different types of A/B testing but what we call “simple A/B testing” is often the most common example. Simple A/B testing is the method of testing different variations against a control group. With the variations, they have equal weighting meaning that they are showed at random to users equally. Often times, tests are set up to have 95% of the users receiving the test (the variations) and 5% receiving the control (either nothing or what was there before). Simple A/B tests are often most effective when ran for a minimum of 7 days in order to gain sufficient data however the big key with run time is setting a determined end date for testing before hand.

Simple A/B testing is a basic method for testing and is often not necessarily the best option. The problem with this method is that the variations are equally weighted. This means that if a variation is not performing well, it will still get served up to users which may cause them to bounce. You effectively lose out on the potential to convert users. This is called the Opportunity Cost. Simple A/B should be reserved for basic things like button color, minor layout variants, etc. should you not have any other tool to use.

Bandit Testing Algorithm

This is where things get more interesting. An Bandit Testing Algorithm is a more sophisticated version of Simple A/B testing. With simple A/B testing, the weight of each of the variations are the same, meaning that if one variations isn’t performing well, you still serve that variation up. With Bandit Testing, as you explore the variations and their performance, you gain feedback as to how each variation does. What the bandit will do is start to weigh the winning variation more which means that it will serve up that variation more often (this is called exploiting). However, it will continue to test other less performing variations in order to explore the potential that they could still perform better (this is called explore). After the test has run for some time, it often becomes clear which variation is the winner. Most testing platforms implement this type of testing method.

The two most common versions of the Bandit algorithm are Epsilon Greedy and Bayesian Bandits. Many argue that Bayesian Bandits are the most sophisticated method for testing as they employ various statistic methods, such as probability distributions and probability densities, to find the best variation faster and more frequently.

The reason that Bandits are often considered better is that they dynamically update themselves based on prior testing knowledge. As the test runs and collects conversion data for the different variations, it starts to see which variation is performing better. It then enforces what we call “exploit & explore”. Exploiting is the system intelligently serving up the winning variation more frequently to find the winner faster. However, the system will continue to Explore losing variations every so often to ensure that it isn’t experiencing an anomaly and that the variation that is appearing to perform better actually is. You can think of it as a self-check.


Retention

With any product or service, retention of customers is always a key metric. It is much more expensive to acquire a new customer than it is to get an existing customer to return. It’s double as expensive when they churn! Since retention is a key metric for all companies, it’s important to really understand what it is. Retention is how often a user comes back to the product or service in a given time frame. In my opinion, retention is especially critical when paired with A/B testing as you get to measure the “stickiness” of your testing methods.


Monte Carlo Simulation

While not as widely used, Monte Carlo simulations help sophisticated testers predict the future with reasonable understanding. A Monte Carlo simulation draws out trends at random which is useful for running simple tests. However, for the industry, many will run daily Monte Carlos that update themselves based on historical data. For example, I may have a Monte Carlo that simulates out the potential trend for the next 30 days by aggregating the averages of the past 30 days. This gives me a decent look as to what the potential average trend might be and I may even be able to parse out an upper or lower limit from it as well. The testing industry uses Monte Carlo simulations to predict how a variation may perform in the future, how a particular segment group may change over time, and a few other unique prediction methods.

Regression Analysis

Understanding trends is a big value in the analytics and testing. Trends obviously help create insight into how an overall group may be transforming over time. A regression analysis can help users identify that trend. Regression analysis is a method for taking a group of data points or variables and estimating the relationships between them. The most basic method is called Linear Regression where you have a static line that basically creates an average based on the dataset. Regression analysis is useful for understanding the state of the dataset while being able to see how it may progress over time. It’s a very useful tool for analytics by providing insight based on history.


Segment Personalization

The ability to personalize based on a segment can be a very powerful feature when used properly. Segment personalization really means being able to deliver relevant content to a group of people who share similar attributes or interests. For example, let’s say that I’ve identified that on my travel website there are two types of visitors: Beach Going Travelers and Weekend Getaway Travelers. I have two segments that have very different interests, buying cycles, content consumption, and more. Through segment personalization, I’m able to deliver relevant content about different beach vacation packages to my Beach Going segment. These segments are often a mixture of both known and unknown users. Segment personalization is incredibly important since you’re delivering content that is actually relevant to a group which can reduce the likelihood of users in that segment to churn. The more aligned content is with a groups interest, the higher the potential for affinity to a brand, site, or app.

User Personalization

While segment personalization is at a higher level, user personalization is much more prescriptive. With user personalization, we’re personalizing content based on intimate known information about a user through different methods, such as dynamic messaging. For example, we might use dynamic messaging to send an email that says “Hey Ryan, you added this item to your cart. Check out these recommendations.” We’re using explicitly known data to interact with the user based on known attributes. This level of personalization is often referred to as a 1:1 conversation. We want to provide unique content and a different experience for a user  so that they don’t have to sift through weeds of information to get something relevant.

Contextualization

Taking personalization a step further is contextualization. Contextualization is delivering personal content within the context of the user. Much of the industry is turning to this however it is also one of the hardest to accomplish. For example, Starbucks knows that I’m a really heavy coffee drinker. They know that I usually start work at 9am based on how many times my device passes through a beacon (location identifier.). It’s 8:30am and I’m walking to work. As I get close to a Starbucks, I receive a push notification through my Starbucks app that says “Hey Ryan, it’s early and cold out. Come wake up and warm up with 10% off a mocha today!”. The message knows my name, my potential buying pattern, and the local weather. This is a very context driven message that starts to tug at different levels of consumer buying behavior. We can get this level of sophisticated personalization through 1st and 3rd party data sources that are collected from many different platforms. As we collect data from many platforms, we perform a method called “identity merge” which means that if you’re unknown in one platform, but known in another, when you identify yourself on the unknown platform we’re able to merge all of the known data about you into a unified user profile.


If you find this useful, feel free to comment!

 

Convergence of Mobile and Web: How apps are the apex of it all

Posted by | Thoughts on Technology | No Comments

There’s a great term floating around that I think will eventually become how we think about software: the appification of everything. We’re seeing a lot of trends moving towards this type of thinking because it provides more utility to much of the software we’re building. This trend is really composed of 4 foundations with the forcing function being the pressure of the consumer market with their many devices.

1) Seamless Unification of Web, Mobile, and Everything Else

The number of devices per person is averaging around 3 as of 2015. Since web was the first to gain adoption, we typically see the most robust systems here. However, this has dramatically changed with the introduction of apps for mobile devices. This is the “appification” aspect of technology. Up until recent years, if you wanted to have your product cross platform, you were required to develop differently for each. While much of that is still true today, we’ve seen a massive trend into what we call “Cards” – a type of design and functionality pattern that provides a consistent experience across all devices. Cards are like mini-apps that allow developers to retain similar designs across all devices with similar levels of functionality. The key is that they still tie back to the same backend system. This is the first step towards a seamless unification across all platforms. Once this happens, we’ll see an intersection for user profiles and analytics where we’re able to share information across platforms and devices from one central location.

2) Requirement for Different Presentation Layers yet Same Features and Functionality

With multiple platforms comes multiple ways of presenting. This has posed a challenges for businesses since they often lose brand identity as they move across platforms. They often can’t retain things like functionality, style, fonts, and more that make their brand who they are. With the “appification” of everything delivered through mechanisms such as cards, this has become a hurdle that can be conquered. Cards provide a contract to the presentation layer that basically say “We’re a card and our outer system will follow what your platform requires.”. However, the key is the second part of their conversation: “We’ll present in your format but we’re going to loop in our own functionality that will be contained within our Card.” A perfect example of this is how Google Maps works. Maps is cross platform and varies in the level of data being presented however as you interact and expand the app, it retains the same robust functionality of the full scale app. Since we’re able to retain the same functionality across platforms, we can also pass in information across platforms as well – which is great for analytics.

3) Unification of Data and User Profiles across Platforms

Since Cards can be passed through different platforms, we’re able to collect more intimate data that is unified. In the past, it’s been a nightmare to unify user profile data across platforms to build a comprehensive understanding. With Cards, we now have to worry about two things: Collection and Data schema. The collection schema is specific to the environment that the Card is living in, meaning if I’m on the web I collect “A, B, C” data whereas on mobile I collect “G, H, I” data for a user. From there, it’s a matter of the user profile server side to handle the data collection which means that the user profile schema must be able to accept this information. This is incredibly useful because server side allows you to do interesting things with cross-platform data collection – such as recommendations or content personalization based on unified profile data. The nature of Cards allows you to easily add, remove, or swap out different levels of functionality through microservices.

4) Advances in Microservice Capability for Interoperability

One of the beautiful thing about Cards and Microservices is their ability to change constantly. With this structure, you’re able to update APIs and functionality on any platform (including native ones) without having to relaunch SDKs. I’ve seen incredible use cases for this such as delivery of content, updates in core functionality, or swapping out logic/engines behind the scenes. Additionally, you can add multiple microservices to the cards to extend functionality. This plays well with interoperability across platforms since other platforms can subscribe to data updates. One key to note here is that with Cards, you can expand your functionality within the limits of the Card. Since Cards are isolated from the exterior environment, it provides a great way to insert robust apps into hostile environments while still being able to “land and expand” functionality in the future.


In the near future I believe we’ll see a large shift towards everything being app based. We’ve seen this movement with larger corporations such as Google, Pinterest, Facebook, and Apple. I’m betting in the near future that we’ll see more platforms making it easier to develop this type of technology with more advanced developer tools specifically for this, app focused delivery mechanisms, and all-in-one development solutions that allow for app creation in one area but deploy in all. As the expansion of devices increases and we become more connect, I believe we’ll see a collapse in the code bases to something like apps so that we can provide that unification that businesses and users are looking for.