Thoughts on Technology

The Commoditization of Data: Where real value is moving to

Posted by | Thoughts on Enterprise Software, Thoughts on Technology | No Comments

I’ve been doing a lot of consulting lately for large groups around the personalization and analytics space. There seems to be a common trend amongst many of the questions I’m getting asked: How do all of these providers differentiate? Where is the value for enterprises looking to deploy these new softwares coming onto the space?

In all fairness, it’s not an easy question to answer. There’s a lot of moving parts and competitive pressures forcing enterprises (and software vendors) to think different about how true value is delivered to the end user. On one hand, you have a highly competitive marketing landscape at the very top who are all competing for the same business. These are analytics providers that typically collect torrents of data about your users. Often times these software providers are channel focused. On the other hand, you have IaaS providers, primarily Amazon AWS, who are completely commoditizing this space. Amazon allows users in house to easily create and deploy custom analytics applications. To further it, they’re starting to offer Business Intelligence tools and Machine Learning capabilities that are simple to deploy. This is putting big pressure on the software vendors to differentiate themselves.

With the marketing software market to increase between 17% by 2019 to over $56B in total market cap, the need to provide real value will come much more rapidly than many vendors expect. Data collection, management, and querying is basically table stakes for enterprises evaluating these vendors. The real shift has moved towards actionable insight, predictive analytics, and utilizing machine learning to understand users at an implicit behavior level.

To get a better perspective on where value is shift, let’s break down the following image.

At the very bottom, we have “Data Collection & Management“. What data encompasses is data collection, management, transformation, etc. I’m using it as a catch all term for anything around database systems. While there are many criteria for having a solid data practice, we can safely say that this space is very commoditized at this point. Many analytical vendors are all collecting the same or very similar data and enterprises are no longer wanting to wait until after the fact to make decisions on what to do next (especially with digital marketing strategies). With Amazon making this layer of the stack so easy, this is no longer a competitive advantage or a value proposition for these software providers. Up the stack we go.

Pure Analytics providers are getting eaten away by Amazon or startup disruptors. Vendors like Mixpanel or Swrve are pushing heavily against incumbents like Omniture to price more competitively and offer better differentiated value. Since Google Analytics is a robust offering for free, there’s pressure to provide a differentiated offering. Additionally, current software technology makes it easy to do graph visualizations (think d3.js), translate data, or pull data in from different sources thus making this part of the stack and the former commoditized. This isn’t to say that you can get by without the Analytics or Data part of the stack. You absolutely have to have both in order to move into the areas where real value is built. Up the chain we go.

Segmentation and User Targeting is getting closer to the point of commoditization but not there yet. We’re still seeing new forms of targeting driving value and decision making when enterprises are evaluating vendors. Often times, these new vendors are giving enterprises the true ability to track and target customers across channel versus a single channel. This has been a big push by many of the leading research firms, such as Forrest and Gartner, that they’ve termed “Unified Customer View”. In my opinion, this is most of the market today and many of the vendors in this space are, in some form or fashion, able to accomplish this. There are additional features that help add value proposition such as CMS integrations, triggering events, and automated segmentation. However, the challenge is that the brunt of the work is still put onto the marketer. The reality is that marketers are having to do more with less and time is not on their side. This portion of the tech stack is where the current focus of commoditization is at with the movement of machine learning into being able understand and react to user behavior.

This brings us up to current day and the future which is the meat of where I believe the value is moving to.

The future is always unclear but here is my prediction. Software that automatically surfaces up general insights, unique insights, predictive insights, and autonomous reactions is going to be king. To break it down, let’s look at each one of these items.

General Insights

In my eyes, general insights are insights into the audience that users of a software vendor would perform on a daily basis. This might be something like audience health (DAU:MAU or audience return frequency) which helps keep a pulse on audiences. In the future, I anticipate that vendors will automatically surface up a suite of insights, based on vertical, that users get as more of a “state of the union” dashboard with the obvious ability to add their own automated metrics via a nice query builder of sorts.

Unique Insights

Users don’t always know where to look for insights. Nor do they always know the right questions to be asking. Since marketers can’t constantly look at all of the crazy data points they’re collecting and correlations between these data points, future software vendors will have to move here in order to meet these demands. These types of insights would be the machine crunching very sophisticated machine learning models each night in order to surface up insights that the model believes the user should know about. You can think of this as “Show me what I don’t know”. An example of this might something like “Your broadcast marketing campaign to all users had 20% higher engagement with French Male users in the Active segment of your audience”. The software will surface up the things the marketers doesn’t even know to think about.

Predictive Insights

As the name suggests, these are insights that are based off predictive modeling. Since these vendors are required to collect troves of data on user activity and behavior, there’s a world where machine learning models can predict the performance of campaigns, when to send them, when a user may be churning, etc. These are insights that help the marketer be proactive about their approach when engaging with users or handling changes within their digital ecosystem. There are lots of vendors playing in this space right now as point solutions but very few (if any) have a buttoned up product that is making a significant difference yet. The reason for this is that it is hard to build a generalized machine learning model that can find covariance between the different data points collected in a way that works for different verticals. This isn’t to say that it’s impossible but it will take more time to get to a really good spot here. This is where the marketer gets exceptionally high value because it moves them from the reactive analytics world to the proactive “autonomous” world.

Autonomous Reactions

Building on the previous item, autonomous reactions are predominately around automating many of the mundane tasks the marketer has to do today. For example, setting up an automated trigger to email a user when they perform a specific event on a channel. In the future, high value world software vendors will build machine learning models that are similar to developments in artificial intelligence. The system will know when to reach out to users with what type of messaging based on variable inputs at each point of the user lifecycle. The user can pinpoint where positive or negative behaviors may be at and assign value weightings to these data points, however it is the machine learning model that is optimizing the user journey. This is along the lines of Factorial Experimentation with machine learning models that build off of notions such as Markov Chain Monte Carlo (MCMC) simulations. This helps the marketer focus heavily on things that are much harder to automate, such as their acquisition and churn reduction/retention strategies.

With the above points in play, the only way that this is possible is the ability for these sophisticated machine learning models to understand explicit and implicit user behavior. Today, we have a lot of explicit behavior since this is easily tracked (session count, time on page, etc.). The real value is in understanding explicit user behavior. Implicit behavior is extremely valuable because you can tie in personas which help dictate how you market this personas. For example, I’m an unknown user on a travel site looking for a vacation that has sandy beaches, is sunny, and warm. I start looking at different beach destinations that are in tropical regions, search for different locations, input different fields about what temperature range I want, etc. I interact with the site on a more intimate basis. In the background, a model is crunching an analysis on who I am which can inform the broader system, specifically the CMS, what types of content I may like.

The reason why this is so fundamentally important is that we can understand who our users are without known explicitly who they are on an authenticated basis. Additionally, we can serve up very specific content based on a granular understanding of what the user is interested in. This is exceptionally powerful because you’re able to achieve a deep personalized connection with your user. A great example of this is Spotify’s “Discover Weekly” engine. People have described this weekly curated list of songs based on who you are as “creepy awesome”. What is effectively happening is that Spotify is crunching 100’s of intimate data points on your listening habits and building a playlist each week customized for you, and you only. An example of some of these data points are:

  • Genres of songs frequently played
  • Sub-Genres of songs frequently played
  • Did user skip within first 30 seconds?
  • What type of track is it? (high energy, relaxing, etc.)

It’s a beautiful example of personalization that is damn near perfect. Much like the Discover Weekly engine, software vendors in the marketing space are going to need to get to this level where they can curate and deliver content at an extremely granular and personal level. As the stack moves upwards in value due to commoditization, the next battle will be fought in the insights, personalization, and proactive engagement world. It’s an exciting time to be watching and building products for this line of work. I welcome the day when brands know me well enough to know when to engage with me, how to engage, in what form, in what frequency, in what context, with what content and so much more.

Agree? Disagree? Let us hear your thoughts on the next generation of marketing software!

Old Enterprise, New Enterprise

Posted by | Thoughts on Enterprise Software, Thoughts on Technology | No Comments

This blog post is written as a personal opinion based on my experience and is geared specifically towards the enterprise market. Some of my suggestions or thoughts may not apply to smaller companies as they most likely aren’t experiencing the same pain points.

Old Enterprise, New Enterprise

There’s a change happening. Enterprises are getting upended from younger companies who are faster at changing. I’ve written before about the Fortune 500 no longer have their fortunes to rely on. They’re hemorrhaging money as newer technology breaks into their vertical, disrupting them at a pace that they can’t keep up with. It’s a battle between the old enterprise and new enterprise.

Old enterprises believe that one system to rule them all is the efficient way to operate since they’ve invested good money into them. New enterprises understand that monolithic systems are dangerous, insanely costly, and run the risk of not being agile enough to keep up with market dynamics.

Old enterprises believe that by collecting as much data as possible from all sources that they will understand their customers better. New enterprises know that the data they collect has to come from asking questions around fundamental user behavior to inform them on what data to actually collect in order to inform actions needed to be taken.

Old enterprises believe that they can get away with proprietary or single source hosting locations for digital properties and data. New enterprises know that having multi-region cloud hosting local to their users hedges against common factors that are out of their control.

Old enterprises believe that they can personalize content across channels with their current proprietary technology stack to follow their users. New enterprises know that regardless of the personalization products you buy, they need to have a technology stack that supports an open data configuration to pass personalized experiences across channels.

Old enterprises believe that being mobile first is how they keep their competitive edge and gain users. New enterprises understand that being mobile first corners themselves into a way of thinking and they opt to the belief that mobile is an extension and utility of their entire marketing strategy – not just a separate channel to solely invest in.

Mobile First? Wrong.

It’s a bold statement, but we’re wrong about “mobile first”. In fact, I actually think we’re wrong about both the web and mobile. I used to be huge on the mobile first approach when it was the next “thing” that we needed to design around, build content around, build digital campaigns around, etc. It made sense because as users, we spend a lot of our time there. We’re progressively purchasing more, ingesting more information, connecting more, and doing more of everything there. But we’re wrong about it and the web.

I’ve had this growing ache in the back of mind for the past year wondering whether or not we’re following the next “shiny” when it comes to content creation, choosing technology stacks, engaging our users, and a whole suite of other items. Something wasn’t clicking and it wasn’t until a recent onsite with a massive multi billion dollar company that it clicked for me. Enterprises are starting to think different about mobile. It’s not that they are not valuing it less; they’re just changing their perspective. There’s a growing movement to think less about the end devices and more about the thing that really matters: content. How can I deliver the right personalized content to the end user at the right time? It’s the whole 1:1 conversation that makes marketers salivate. It’s what the 1,000+ marketing automation platforms are all trying to solve. And there’s a good chance we’re all trying to solve it from the wrong perspective.

Problems with our Perspective

Enterprises used to think about content based on a specific channel. They would generate content for a mobile device, a website, a microsite, social media, print ad, products, email, and more that I’m missing. They hire tons of content marketers, digital marketers, product marketing, analytics folks, and any other employee that can fit underneath the marketing spaceship to build content for each of these channels. Digital strategies are often created based on the channel that they are hitting the most.

The problem here is that it doesn’t scale. When you’re a multinational brand that has different roll outs, different content, and different methods of engaging users, at some point the system breaks down. You start to lose brand identity. Marketing strategies start working in silos. It becomes a nightmare to manage. So in order to manage the nightmare, small teams are assembled to create microsites or different apps all operating on different frameworks. These clandestine operations, while good intentions, end up becoming yet another technology that needs to be managed. Another framework that developers must use. Another asset that needs active management so that it doesn’t die. While KPIs might hit for converting users or creating demand, often there is the darker side of the P/L where you factor in development costs, on-going support, on-going infrastructure costs, agency costs, additional analytics platform costs, and many more. It’s an iceberg.

This is unfortunately the world we’ve put ourselves. It’s no one’s fault. We’ve had new devices release each year with insane adoption rates, which our users spend time in (ie. Apple Watch) and new technology that can either A) solve a current problem or B) is the new “shiny” for a developer to try something out. And both are fine! Solving for “A” means you’re creating demand or pipeline or profits in the immediate. Solving for “B” means you’re developers get to tinker, hone in new skills, be on the bleeding edge, keep them engaged, and ultimately become experts. We want both of those to happen. But not in the current model we’ve created. We’re only in this position because we had to move quick but as of late, things actually seem like they’re slowing down. We’ve had less explosive “things” happen on the internet where we’re scrambling to keep up. Remember, enterprises move slower because it’s not just 1 or 2 properties that they’re looking after – it’s often 100’s to 1,000’s.

New Perspective

Stop doing what you’re doing. I know, sounds dramatic. But we need to stop. Many enterprises are heading to a scale of IT, infrastructure, and digital property costs that are in the many millions. In fact, a great example of this is a global company I worked with that was spending on average $45M per year in managing 1,000’s of sites. When they built the system back in circa 2002, it was what they had available. Not their fault. This isn’t a unique situation though. This is something that is a problem in literally every vertical. Pharma, Finance, E-Commerce, Media & Entertainment, B2B, Retail, and many more all have this problem.

Looking at NBC as an arbitrary example, they have 85 different shows on their list right now. That means 85 different microsites they need to manage infrastructure for, uptime, marketing strategies, content, design, branding, etc. This doesn’t even include other initiatives NBC might have going on as well such as a native app for engaging with these TV shows. Oh! And this is just for North America. Double oh! This doesn’t even include their subsidiaries (~774). You can see how this gets super costly, fast. Forget even figuring out how to figure out how to know what users watch which shows (more on this later).

So again, stop. I propose taking a step back to really think about how we’re going to effectively going to deliver content to new devices in the next 20 years. It’s content that is the face and outreach of every company, so we need to get this right or else we dilute brands, waste money, and confuse our users. Let’s get back to the basics. We know that we have a lot of distribution channels today that we need to distribute content on. We also know that there will be more devices, new channels, and different ways of engaging our users in the future. Our systems and infrastructure need to be better equipped to handle this. The infrastructure needs to be able to add, remove, and modify connections to distribution channels very fast. It also needs to be able to handle the use case of personalizing not only content but experiences for users. Last but not least, it needs to be able to push content outwards to these different channels. We’ll talk about the end devices of these channels here in a bit.

The old perspective says “With this new device, how can we make this user friendly?”. The new perspective says “With this new device, how can we create a user friendly experience that’s unified with all of our other channels?”. While we might have thought of the new perspective during the thought process of the old, it wasn’t driving the conversations. Today, it needs to start driving these conversations. This isn’t revolutionary thinking and it’s not supposed to be. I’m not trying to make up a new buzz word like “content first” or “user first” or whatever. I did marketing in my past – I’ve already used up all my kool aid.

With the new perspective, we’re looking at where our users or segments and figuring out how to create content that can be distributed across all devices they touch. This way of thinking pushes us down the path of keeping a consistent user experience across different devices. It retains brand integrity. It scales up to new devices while keeping the same architecture (although this can be changed also – more on this later). We think of this from the user’s perspective and say “You have a lot of devices and you interact with all of our channels. You use each of theses devices in different context for different uses to interact with us. How can we unify your experience across channels while providing a user experience that fits the context of each device?”. From that point, we work backwards into the architecture.

It’s not about mobile. It’s not about web. It’s hardly anything to do with devices.

It’s the assumption that our users will be wherever we need to be.

This may seem obvious but when architecting a full system, it’s a fundamental difference in thinking from how we do it today. Therefore, we need to structure our companies to have the agility and scale to be wherever they are, when they are, in a way that they understand. This means that we don’t develop content, experiences, or marketing strategies for individual channels. Rather, we develop each of the former for the user and the user’s context, and construct the end devices in such a way that allows the the devices to subscribe to what makes sense for the context they are used in. We’re no longer creating content for each of the channels but rather creating content in full depth and, once published, the end devices pull/sync that content but only grab the portions of the content that they need, when they need, in the context that they need it. They subscribe to the CMS but are not bound by it. This allows them to operate on their own which you want because each channel has a different context in which they interact with the end user.

One thing I want to be clear on is that this isn’t just content or experiences we’re talking about. We’re also talking about code and architectures. These are equally as important because they directly impact cost, scale, security, and time to market. If we have an architecture that allows us to design and develop once then deploy everywhere, we’re reducing an incredible amount of overhead. Now, before someone calls me out here, I get that this isn’t totally feasible right now. You need to design for different devices, code for different devices, etc. But what I’m saying is that for enterprises at scale, there are ways of creating frameworks that allow you to spin up unique microsites (for example) with different designs utilizing much of the same code base. We’ll cover this more later. Before we dive into what this system looks like, let’s look at some of the validation for this thinking among the enterprises and where they’re shifting. Note that I can’t name names for NDA purposes.


I was recently onsite with one of the world’s largest hospitality companies to learn a little bit more about how they do business, what their goals were, uses cases, and pain points were. It was surprising to hear how advanced their thinking was, where they wanted to go and the bets they were making. The most surprising (and heartwarming!) thing was that they were excited and loved hearing that they were on the leading edge of technology. They want to use new technology to make a difference.

They are making a big internal effort to think different about how they engage their users. From the different platforms they were using, they were starting to identify their customers different purchase patterns. For example, they had a high number of users browse for personal travel tickets on mobile then 3 weeks later that user had a high propensity to purchasing that initial search on the web. Another example would be that more business travel would purchase on web frequently and their mobile usage would help the “businessman/women” persona by frequent searches for local restaurants or coffee shops in the destination location of the app. This was the full context of their user lifecycle.

These are both critical insights into the user’s mind. Understanding this information is helping this company go from being blind about their users to personalizing the experience and the user’s entire journey based on the implicit or explicit behavior they elicited. This knowledge provides the power to personalize the experience through providing correct content recommendations or experiences wherever the users goes.

Building a New Face

So we need to structure our companies so that we can be wherever our customers are. Great. That’s what we’ve been trying to do. What a novel idea. Why is this different now than before? There’s 3 core reasons: decoupled front-ends, modern frameworks, and apps.

In our previous tech life, the systems we developed on were complete solutions. Our CMS would also act as the front-end of the site. They would be intimately tied together which meant that if we wanted to scale up more microsites, we had to add yet another CMS with another design. Similarly, we have e-commerce platforms that act in similar formats which created this world of the website and e-commerce site to being separate. It wasn’t, and in many cases still isn’t, uncommon to have a beautiful website yet frankenstein ecommerce site. You’d have a CMS for displaying beautiful content then, when a user wants to shop, redirect the user to a URL which hit the ecommerce system. In an odd way, the old tech world experience simultaneously too tightly integrated systems yet also completely divorced systems.

Today, we are evolving into more reasonable civil thinkers with how we go about building our architecture. Companies are now starting to decouple the front-end experience from the back-end to meet the needs to being where the customer is at, reducing development overhead, learning new technologies to stay on the edge, and being more agile with our code. The decoupled front-end allows us to fluidly make changes to the user interface with different markup languages without having to fundamentally change the CMS. We’re able to use Drupal or WordPress as our driving CMS while using something like Angular or React  paired with HTML/CSS to keep our front-end code clean, have a lightweight simple framework, and support modern thinking with different web component libraries. When content is created in the CMS, it can then push it out to the subscribed framework (ie. Angular) to render the content. If we need to spin up a different front-end experience for a different channel or spin down one, we can easily do this without creating a new CMS or expanding our back-end stack significantly. This can be done on the fly with little setup through web components.

We’re no longer constricted in our thinking. We have the flexibility required. Enterprises are rapidly adopting this way of thinking because it’s cheaper, more flexible, more powerful, and provides an overall better experience for the customer. You can see this in action on sites such as Virgin America or MSNBC. Even General Motors is now shipping their cars with Angular as a means to rapid update, flexibility, and scale.

Let’s get more into the details since talk is cheap. I’m going to give a basic scenario and show how this can be solved with what technology.

Scenario: I’m a technology news company that displays specifically business related technology news. We have a website and an Android mobile app. Our users typically visit 2-3 times during the business hours and 1-2 times in off hours (primarily commute time). Our problems right now are that we have to create content for both channels separately, have to publish separately, and manage separately.

Solution: We’re going to assume we have WordPress as our CMS. The frontend of the site is build in the standard HTML/CSS that we commonly use. We can apply some more modern thinking like Material Design or other methods. From there, we can define a specific section of the site to become a “Placeholder”. A Placeholder is effectively a web component which is comprised of HTML5 plus a javascript framework, such as Oasis.js. Here’s the gotcha about placeholders: we need to be able to make the placeholder independent of anything else that goes on (a.k.a. Sandbox it). We can do this by using iframes. No, not the god awful old version of iframes but the new HTML5 iframes which allow embedding of javascript. A good recommendation of javascript that can be inserter into here is Polymer, which provides you with the notion of web components. Polymer out of the box supports polyfilling so that you can insert content into the web component that lives inside of the iframe. We now have the framework for frontend isolation but robust hybrid app capabilities that can be used on both web and mobile.

From there, we can go back to Oasis. Oasis has this beautiful thing called “contracts” which basically allows for the Placeholder to send a request to the CMS for content through an HTTP or API request, but renders only what we need. For example, Oasis might call the server and say “I need a Title, Sub-Title, Body, and Header Image for a piece of content in a flat format like JSON”. Even though the content we’re requesting might have a ton more fields, meta fields, or event content, the request can pull exactly what it needs in order to render the content in a way that makes sense for it’s individual context and rendering methods. The server would then respond with a JSON blob of the request (AWS S3 location for image) which the Placeholder takes and fills itself with. Since the Placeholder is its own thing, it can apply any styling it wants to itself.

Since we need to use this Placeholder on our native app as well, we need to wrap it in a format that can fit into a native app. We can enable our Android app to accept this through wrapping portions of our app in code that allows for Placeholders to exist so that whenever we create new content, we can publish it and the Placeholder that is subscribed to our CMS can accept the content in a normalized format that allows it to render it in the way it wants.

While this is a super abbreviated version of this, we’re getting the constructs for us to be able to have a seamless publishing experience across channels. With new frameworks coming onto the scene designed specifically to tackle this challenge, most notably React from Facebook, I believe we’re going to see a huge progression towards this way of thinking. React, in my opinion, was built correctly from the ground up as an almost “device agnostic” structure. Many organizations are switching and recommending React as the go-to solution for omni-channel consistency and I tend to agree. This way of thinking gives us the Modern Frameworks we need, the Decouple Mindset to allow for agility in shifting trends, and an App centric vision that provides us with reuse of code structured functionality.


Cool, you just said a bunch of stuff, talked a big talk, but I don’t see any examples. Well, let’s do some examples!

Pinterest – Any image you see is actually this framework. Effectively what happens is that the page is filled with Placeholders. Based on who you are, your preferences, and “pins”, Pinterest will fill in the Placeholders with content that is personalized to you.

Facebook – Using their React framework, each newsfeed item is a Placeholder that populates itself with content based on who you are. This is why when you open up their website or native app, you see it loading or rendering for a second. What is happening is that Facebook is asking its server to say “Hey, based on this person’s Likes, Interests, Follows, etc., what content should I show them?”. The server then responds and populates the Placeholders with content.

Google Maps – Maps has three different sizes that can be used across different platforms: small, medium, and large. Each have the same functionality but different levels of content disclosure. However, they are the same app. This framework is used across devices to keep consistent experience and functionality yet adhering to the design and context of the device it’s being rendered in.

There are tons of other companies who are using this type of thinking and methodology right now, however there are more trying to progress towards this solution.

This isn’t to say that native apps are dead or web components are the way of the future. We’ll still have certain section of experiences be static and not necessarily in a component format. That said, many aspects of the experiences we’ll have across devices will be built within components (product reviews, main call to actions, content recommendations, etc.). The components merely allow us to reuse across platforms and devices in a sandboxed format all while retaining our ability to control them, update them, and tweak outputs on the fly instead of having to go into each system and performing the update. It’s really the advent of create once, distribute everywhere.

Building a Better Foundation

Of course, with the front-end being so flexible now with our above structure, we’re able to collect more intimate user data across all of our channels. Historically, most companies collected very high level data such as number of sessions, session length, actions completed, etc. While these data points can be somewhat useful, they’re not necessarily indicative of what the users actual behavior is. There’s a shift in the market that is pushing for more granular data collection through metadata but automatically surfacing up unique insights or automatic insights (insights from pre-set criteria, surfaced all the time). For example, it’s much more useful to join session length, session frequency, and article type to understand what interests a user has than it is to just collect session length and frequency. The reason why is that the latter is implicit – meaning we’re somewhat guessing what the user is interested versus the former which is more explicit. The user clicked on an article in a specific category with specific tags, had a session on that article for a period of time, and then proceeds to repeat behaviors like that.

This type of thinking helps us get better at understanding our users but also is stressful on the backend of our infrastructure. Handling that level of data collection across multiple platforms requires a flexible way of accessing the data, querying against it, surfacing it, and making decisions on it. The enterprises of today’s world need to have a different backend stack than what they had 5-7 years ago. With new devices coming online all the time, different tech being created that unlocks new capabilities, and a world where users interact with brands on many channels, the enterprise stack needs to be flexible enough to handle this rapid change. This means moving away from proprietary vendors with contract locks and more towards open source ecosystems where you can easily add and expand on the fly. The lack of flexibility is the death of the Fortune 500’s, in which this is often due to their digital business not being able to keep up.

Many new enterprises are taking advantage of the open source data storage systems on the market today, such as MongoDB, for their rapid performance and scalability. Backend systems are highly dependant on what you’re looking to accomplish. For example, you may want to have a NoSQL database specifically as an analytics database where internal employees can query against session data on an ad-hoc basis. You may then want to aggregate sets of data from this NoSQL database into a MySQL or flat file system to perform anomalies, changes in week over week, or even perform interesting machine learning techniques such as user behavior clustering.

Alongside the different database systems is a “microservices” approach to integrating different systems, platforms, or coding languages together. Microservices architectures are, in my opinion, critical to modern enterprises as they act as a fluid conduit between different systems with no single point of failure (when properly architected). Many times, enterprises will either construct their internal microservices architecture to interact with different platforms or use a pre-baked solutions called an Enterprise Service Bus. This allows companies to rapidly spin up integrations with different platforms, frameworks, or languages to pass around data that is critical to operations. Think omnichannel marketing or sales here. Apart from that, microservices really help engineers to keep a grip on the many different connectivity points that occur within a complex enterprise.

Orthogonal but a high user of microservices are CMS’s. Choosing a CMS is one of the most critical points in an enterprise’s technology stack. It’s crucial because it often acts as the delivery layer. This isn’t to say that you need to have the front facing HTML/CSS driven from the CMS itself, however that is one route that many enterprises take. While many employ this route today, there are a lot of enterprises moving towards the decouple route, where the CMS truly just manages the content and the end devices/channels pull the content in the context that makes sense for them.

The more critical piece of a CMS comes from the need to have granular level control over the content, an easy way to manage it, and an easy way to control it. In this world, there is no valid reason to choose a proprietary CMS vendor anymore. Another bold statement but this is the reality of today’s digital ecosystem. If you choose a proprietary software vendor, you are committing your Fortune 500 company to an imminent death. Enterprises need to go with an open source solution in which there are really only two options on the table currently: WordPress and Drupal. The debate between the two has a long feud in which I don’t have a strong opinion on which is better. They both have strengths and weaknesses. I won’t go into that here but will rather argue why an open source CMS is critical.

The major reasons for choosing an open source CMS are as follows:

  • Speed – Open source means you can develop on it whenever you want, not when someone else tells you too. Need to react to a digital change? Done.
  • Ownership – with proprietary vendors, you’re on their roadmap. Open source means you can add features core to your unique business or vertical whenever you want.
  • Cost – Open source is free (minus hosting and what not). Proprietary vendors charge an absurd amount of licensing costs that often increase year over year.
  • Community – Open source has thousands of contributors to core code bases, modules, plugins, and security. Proprietary vendors have whoever is on staff.
  • Open Connectivity – Want to connect to a 3rd party system? Install a module or script up a direct connection. Proprietary vendors often lock in the data their own systems or charge an absurd amount of money for API calls.
  • Decoupling – Open source CMS’s allow you to choose what front-end language you serve content on, helping you stay up to date on the latest. Proprietary… not so much. Their language, their delivery methods.

There are many other reasons why open source CMS’s provide a much better value to enterprises. It’s why over 20% of the entire web is powered by WordPress and over 1,000,000 sites are powered by Drupal – including a majority of the Fortune 500’s. In the last 5 years, there’s been a major shift in the Fortune 500s and government agencies to use Drupal, cutting out significant market share from DIY and Adobe. Legacy systems are just not cutting it anymore.

The new age enterprises choose to make their CMS geared towards just the content creation process while they decouple their builds from the front end. Their front end builds will subscribe to their CMS and react to content created and distributed outwards.

Machine Learning & Big Data: Where it fits in

The two hottest buzzwords in tech world today: machine learning and big data. I don’t know what it is but I need it! Here’s the real kicker – do you actually need it? I find that most enterprises want to get on board with this technology because they feel it will be some sort of panacea. Well, here’s a little perspective on the definition of both of these terms:

Big Data – Big data is merely the ability to collect, process, and store lots of data efficiently at scale from multiple sources.

Machine Learning – Machine learning is software designed to create algorithms that posses the ability to find patterns, provide information through statistics, and create predictions based on those statistics (for lack of better definition).

Big data doesn’t really mean much. In fact, big data can actually be a huge problem for many enterprises. I’ve been on calls and on-sites with enterprises where they are collecting so much data that they’re accruing large data storage costs without the ability to create reason from it. This is a problem. In fact, recently published a research study around big data projects sourced from PWC and found that 66% of businesses surveyed were getting little to no value from their data. They’re collecting data without knowing what to do with it. You can attribute this to the famous book “The Signal and The Noise”. These enterprises are collecting a ton of noise data in hopes to find a signal. They’re not asking the question of what signal they’re looking for out of the noise, then seeking to collect that data explicitly. What is even more worrisome is in that same study, only 4% of companies surveyed felt that they were set up for success for their big data collection.

With machine learning, enterprises view this as panacea for all of their problems with understanding user behaviors. This is wrong. Insights provided from machine learning are only as good as the data sets you provide. If you’re collecting a ton of noise data, it’s going to be much harder to create actions from the “insights” that these machine learning algorithms give you. Machine learning platforms or tools should be viewed as an extension to the organization’s knowledge base. A tool or utility, but certainly not something that will solve all problems.

Let’s answer the question though: Where does all of this fit in?

The answer to this question will vary greatly based on the vertical the enterprise is in, the questions the company is asking themselves, and what they are hoping to accomplish.

For example, if I run a mobile app, I want to notice when a user’s behavior has shifted that might exemplify that they could churn and uninstall my app. This is a classic example and a mobile apps worst nightmare. This is a great place to institute a machine learning model to help provide that prediction. A predictive model can crunch tons of previous data (big data) on a nightly basis, understand changes in behaviors (session or event counts drop), and notify the app owners that a user may be likely to churn. Note that in this scenario, we would initially be setting out and collecting explicit data that can contribute to the predictive model instead of collecting a bunch of “stuff” and trying to discern the prediction afterwards.

The more data you have here, the more finely tuned the predictive model will be. The key here though is to make sure that the data inputs into the predictive model are the data inputs you’re collecting into your “big database”.

The location of the big data, in terms of where it fits in your stack, is often times dumped into a either a core analytics database and then chunked out into a flat file format for further model crunching. This database will likely have lots of different tables that can be reference. The machine learning section often times lives to the “side” of the database as a subscriber. It crunches lots of data from those databases, might store something locally as an aggregated, but ultimately interfaces or informs other sections of the stack to either take action or inform the users to take action.

The Push Internet

What this ultimately all boils up to – machine learning and big data specifically – is a concept that is pushed by Dries Buytaert, the founder of Drupal and CTO of Acquia. As a former boss of mine, Dries pushed hard that the web was in a current state of “pull” – meaning that systems are unintelligently requesting information from backend systems for things like content and products. They had a hard time creating personalized experiences for users and even a harder time (if not impossible) doing it across channels. Front end applications request information based on when the user requests it. This is how the protocol of the internet has been performed since it’s inception.

However, Dries started to notice a trend. Frameworks are starting catch up and “moments” are a huge force in how users interact with brands. This has started to change the conversation to a “push” internet where backend systems send out prescriptive, in context communications when it makes sense for the user.

By far, the most noticeable example I have experienced of the Push Internet in action is Flipboard. In the beginning of my relationship with Flipboard, I didn’t get push notifications. I merely used their product when it was convenient for me – which just happened to be on my morning subway commute at around 8am and evening commute at around 6pm. 15-30 days after using their app on a fairly frequent basis, I started getting push notifications of called “Daily News” and “Recap” (or something like that) where I get the state of world delivered to me. To further this pro-active outreach to me, the push notification was personalized with content that was relevant to my reading habits. This is an example of big data, machine learning, personalization, contextualization, and the Push Internet working harmoniously.

Other examples of this are Google Now reading my email data for things like flight confirmations, hotels data, and more. Notifications are provided on when I need to leave in order to make it to the airport at the right time or providing me with handy information about how to get to my hotel and check in.

The prediction is that an omnichannel, truly user-first driven company will deliver contextual and a personalized experience anytime I interact with one of their end points. This means email, website, mobile app, billboard, and retail. The company should have a great idea about my likes, dislikes, when to engage me, and when not to in what context. The best part is that this doesn’t have to be creepy, invasive, or big-brother like. It is a symbiotic relationship between myself and the company that should be built up over time. A well tuned system will understand my typical level of interaction, purchasing behavior, interaction behavior, and desired privacy level. Based on that data collected, it should provide me with the correct level of interaction at the right times without annoying me.

The system should compliment my lifestyle and interactions.

This comes back to a point made earlier around machine learning. Machine learning is actually a crucial step forward in delivering an overall positive experience with a brand. Machine learning provides the conduit to creating a strong relationship between a brand and user. Since these algorithms and models can mine tons of data and derive insights from them, they can help optimize towards the proper level of experience that users would like. Some users don’t mind being pinged all day by push notifications. They enjoy being up to date on the latest and embrace the “always on” mentality of being connected to the human nervous system that is the internet. Their connection to the internet has become an extension of themselves. Other users, like myself, are picky about what level of privacy we allow access to and how we interact with brands. I give certain apps the ability to track my location and send me push notifications because I was informed properly and upfront about how they will enhance my experience. I prefer less interruption and less of a relationship with my device or brands because I’m not a power user. I don’t want to be informed constantly about news items, deals from brands, or other articles to get my to purchase something. I personally have a low LTV but long retention time window. I view the relationship as a tool or utility to how I interact with the internet. It’s an extension of my overall personality and the device is my bridge between my thoughts and the world’s thoughts.

This is why the combination of the Push Internet and machine learning is so undeniably critical. Brands don’t want to alienate users by over-messaging them, however they don’t want to lose mindshare equity by under messaging them. The end points in which users interact with brands, all end points, is where relationships are made or broken. Brands need to understand the personalizing and contextualizing experiences that a user requires paired with the level of engagement required.

What’s Next

It’s always hard to tell where the market will shift next so I won’t claim to have a crystal ball that will give all the answers. What I do think is that we’ll see a big progressive shift in the enterprise market to standardize on a lot of the open source platforms, particularly CMS’s, to take advantage of the great work the communities are doing. I believe that they are starting to see that in order to scale up, stay competitive, and retain customers that they will need to make these transformations towards these types of technology.

To me, I believe that the biggest changes that are going to happen is not only the “reverse” of the web from pull to push but also a reverse on architecting software stacks. As I previously described, we architect systems based on how we want to push content out to our customers rather than architecting our systems to allow for devices to pull the content they need in the context they need it in. The difference in this thinking is that while the CMS can push content out to the end user, the device still has to be subscribed to certain portions of that content and agree with when to interact with the user. Obviously this is dictated by an internal system but the point is that it has to be very systematic in order to fit within the lifestyle of users.

Being systematic means understanding the users at an intimate level. This doesn’t mean knowing their SSN but rather their behaviors from a more fundamental level. I believe that this is where machine learning and big data will be incredibly powerful and will ultimately drive the conversation on how we “push” content out to the end devices. I don’t believe it will ever be a 100% “push web” but rather a symbiotic relationship between the user and the business – a push and pull environment.

For enterprises who are looking to make a move towards the new world, the first notion to clear out of everyone’s head is that this will be a fast switch or it has to be done all at once. I believe the way to really tackle this is step by step starting with the front end (personal opinion). I’ve seen successful transformations of enterprises towards a modern architecture by them simply carving out a specific project as a pilot run. Often times, the best approach is to bring in a group of consultants (that’s what I do!) who can work as an accelerated ancillary development branch for the company. These are also individuals who can help educate the rest of stakeholders on how the development was done and how it fits into their current stack. From there, it’s a matter of tackling portions of the stack in a step by step format. This is critical because it gives progressive deadlines that spreads out the risk of changing everything at once.

The key is commitment to the transformation. If you don’t commit, it won’t succeed. And I believe that if these enterprises don’t make these transformations that they will feel big pressures from competitors that do. It’s no secret that enterprises are staffing up on data scientists, front end gurus, and data architects to help combat this ever changing landscape. The ones who understand how these transformations are made are the ones who will come out on top. For the ones who don’t, they’ll end up getting squeezed and die out.

If I had to bucket what I think will happen in the future, here are some common themes.

Modern FrameworksHybrid apps that are built to subscribe to holistic content versus device specific content. What I mean by that is the device can receive content from a core CMS where the content was built to be distributed on any channel. Through subscribing to this holistic content, the device can pick and choose which fields to display.

Frameworks such as Angular.JS provide options to polyfill content or data on the fly without needing extra authentication. Normally, this is incredibly difficult because anytime you want to provide a seamless user experience across channels, you need to insert the content or code into a hostile environment, such as a native app or ecommerce platform. This often bastardizes the content and you lose either the personal experience or the brand integrity. With modern frameworks though, you can provide placeholders inside of the hostile environment that jive well with the host framework but protects any content that is filled inside of this placeholder. This means that when a CMS pushes out content, the placeholder framework can accept the content, choose what to show and how to show it without sacrificing integrity. This polyfilling technique seen on many modern frameworks is the conduit to a unified experience which will be delivered through apps.

Apps: To clarify, this isn’t just native mobile applications. Apps are going through a convergence with other platforms. I’ve written about this before in my post **Convergence of the Mobile and Web**. In that post, I talk about how there’s a blend happening between mobile and web devices that you could call the “appification” of the internet. The way we build end user experiences is transitioning to being delivered by apps – whether native mobile apps or single-page javascript apps or small sub-set web apps.

As I talked about previously, more modern frameworks are allowing us the freedom to build unified experiences for our users. This is becoming delivered through apps that can be reused across different platforms or devices. Many modern methods are using the new HTML5 specifications for iframes to deliver this experience. Old iframes were awful. New iframes not only shelter you from the hostile platform but can communicate outwards to other services to polyfill content. These are often wrapped around some sort of app framework that we can push to other areas.

A great example of this is Google Cards. When you look at how a Google Maps Card, there are three sizes: Small, Medium, Large. Each have different levels of progressive disclosure of content but all have the same underlying content and functionality. The way these cards are built also allows them to be reused across many modern platforms. You can attribute a card to an app. An app that has the same functionality but ability to be displayed anywhere.

Data powering everything: The last bucket is pretty self explanatory. I believe that there will be (and already is) a big shift towards a truly data driven enterprise. This means at every layer – marketing, sales, support, etc. – data will power decisions being made. We’ll stop collecting torrents of random data “because we can” but rather we’ll take a step back and focus on what exactly we need to collect in order to be extremely signal-driven about our outcomes. These signal based outcomes will drive much more targeted and ultimately more sophisticated machine learning models that help curate and power the backbone of the push/pull web. These models will help bring in the world where the machine knows my actual behavior versus basic “who am I” data in order to interact with me on an intimate, contextual oriented basis.

It’s going to be a steep road for lots of enterprises but something that is manageable that many are getting through today. For CMOs, CIOs, CTOs, and CEOs, it needs to be a high priority to make these shifts in order to retain users for the future. If you’re not sure where to start, how to go about this transformation, or need some guidance, feel free to reach out.

Hybrid Genomics Cloud: Why genetics needs local and cloud computing

Posted by | Thoughts on Genomics, Thoughts on Technology | No Comments

There’s an interesting battle that appears to be brewing in the genetics world. It’s bred from many of the hypes around cloud computing and how these visions paint the panacea for all verticals. For people who know me, they’ll say that I’m a huge proponent for the cloud in general. However, genetics isn’t a general thing, and I believe that we’ll see an emerging hybrid cloud that specializes in handling just this. A “biocloud” that acts as a platform for the biologists.

There’s many contributing factors as to why I believe this is going to be the likely outcome for the genetics market. Personally, many of the accelerated genomics pipeline tools that I’m working on are hedging on this outcome due to challenges I’ve seen with computing on the cloud. The reality is that the cloud is design for general purpose computing. For the most part, the problems that are solved by cloud computing are fairly basic and don’t require huge clusters of servers. They’re often smaller files and simpler computations when compared to genomics.

For genomics, the data sets are massive, have N’th permutations of interactions, and constantly evolve. In a weird way, the data we’re storing has many different dimensions – time being one of them. In order to do queries and complex computations, a specific computing environment is needed both from a software and hardware perspective.

To back up, an accelerated genomics pipeline looks something like this:

  1. Sequenced data is stored for alignment
  2. Sequenced genome is aligned against reference genome
  3. Aligned genome and variant file are passed to structured database storage
  4. End user queries against one or many of the datasets ad-hoc to collect data for a hypothesis
  5. End user performs complex computations against large collections of datasets for deep understanding of datasets (simulation, similarities, etc.)

While step 2 is considered a high performance computing example, you could easily say the same for steps 4 and 5. However, the computations being performed are completely different. In step 2, we’re running an algorithm called “Bowtie” based off of the popular Burrows-Wheeler Transform. This algorithm aligns short read sequences to a larger genome; often times this means aligning 3.1 billion rows to 3.1 billion rows. This takes a specially designed system to do at scale utilizing sophisticated hardware architectures such as GPUs, Infiniband, and Flash Array Block Storage. We’ve personally tried using AWS for this and it has either failed to complete or is so slow that it nets a negative gain. On a custom design system though, we’ve seen close to 10x improvements in speeds where we’ve reduced the time to align a full human genome down to seconds from nearly an hour or days.

That said, this system isn’t designed for storing data in a way that is very useful. It’s purely designed for speed. We would consider this part of our pipeline as the “local compute cluster” where we stream data coming from the sequencer and align it on the fly, allowing us to do “genomics” in real time. On the flip side, we want to take advantage of the economies of scale with cloud storage and computing. This is where the output of the aligned data should go since we get many of these benefits over time. Personally, we’ve tested passing aligned data to the cloud for storage, analysis, and automation which has been a positive outcome. In our tests, we’ve used Redshift from AWS (a PostgreSQL install) as our core database for prototype purposes. We’ve had great success with very low query times for full disease to genetic mappings, providing a viable solution for our “cloud” portion of the pipeline. In the future, we plan on using different types of elastic and scalable resources, such as EC2, for doing interesting data analysis utilizing machine learning software.

At the end of the day, while many of the major cloud providers have “Life Sciences” focused vertical cloud offerings, the reality is that they don’t stand up to the real use cases of a commercial environment that will be required to genetics at scale. They currently cater to many of the ad-hoc analysis done by researchers which is incredibly useful. However, once we start to scale up to a medical grade and commercial scale, there will need to be a specific pipeline and hybrid platform that gives us the benefit of both speed and complex analysis. There’s a world where there may be an entirely new genomics cloud that, while “cloud” based, isn’t part of one of the major cloud providers. Rather, this is a separate cloud environment designed specifically or the biology world, designed and tuned specifically for the incredibly complex and massive datasets that we haven’t seen yet. For now though, I believe that the best solution is a combination of both local and cloud based computing for the full benefits of an optimized system.