Enterprise Software

The Balance Between Quality & Speed in Product Development

Posted by | Enterprise Software, Startups, Technology | No Comments

I’m going to make some controversial statements in this post regarding product management. If you’re currently a product manager, I’m sure you’ve been peppered with the “do this! don’t do this! you’re a bad PM if you do this! 10x PM’s do this!” rhetoric. Honesty, it’s exhausting. Earlier in my career, I read a ton of books on how to be an effective product manager. What stuck out to me is that these books often pushed opposite agendas – sometimes even within the same book.

Having gone through some of my product management career thus far, I’ve realized that there is no silver bullet to doing a good job. In fact, it largely depends on where you’re at in the lifecycle of your company, the industry, the team you’re working on, etc. These books often write about a set of criteria PM’s must meet. They pain the picture of a panacea outcome if you do these 10 things. That’s never the case.

What I learned early on (and quickly) is that there is a very hard balance between delivering high quality products and delivering them in a very rapid format. These aren’t mutually exclusive but they are very hard to balance. Sometimes they shouldn’t be.

For example, let’s say you’re losing a ton of business by lacking a major feature set. The company has put it off for too long and it’s now a top priority in order to help sales to stop losing in competitive deals. To create a feature parity product would take you a minimum of 9 months. Looking backwards, you know that you’ve lost $900k in the last 9 months to competitors because you lack this feature, so it’s safe to assume that over the course of this entire development of 18 months, you’ll lose $1.8M.

Do you build as fast as possible and ship something “good enough”? Do you go for a high quality product and release later?

Personally, I’d opt to release something that is barely good enough. Something that is almost embarrassing but paints the vision, dream, and demonstrates that we can accomplish the basics of what our competitor can do, refocusing the sale back onto features that you crush competitors on.

On the other hand, let’s say that you’re trying to leap frog the competition. Time is on your side and the feature you’re developing is very complex with a high barrier to entry from a user experience perspective. You know that if you ship early, its going to have little to no adoption which will make iterative testing and development challenging since you could be stuck building in the dark.

Do you build as fast as possible and ship something “good enough”? Do you go for a high quality product and release later?

In this scenario, I’d say build a high quality product with lots of UX/UI iterative testing, prototypes, etc. before you commit to engineering on it. This is going to take a lot longer but you know that by doing this, it’s going to make the adoption a bit easier. Plus, you’re confident you have time on your side since you’re blazing a new path through a different aspect of the market.

My opinion is this: product management is a case by case job description. There is no silver bullet method to building great products because every company is in a different place. A super fast, agile, and “release often” approach may not be appropriate for monolithic, highly audited and complex industries. A PM must understand the vertical and adjust their style to match the most effective form of delivering product in that organization.

This isn’t to say that the paradigm can’t change over time. Often times, it’s the product department helping create ways to be more efficient and deliver products in different ways. However, these things don’t happen over night and require a flexible product management style.

As the title of this post suggests, it’s a constant battle between delivering very high quality products and delivering products very fast to the market. They’re not mutually exclusive by any means, but rarely (in the real world) is the environment setup to enable both to happen.

New York, Paris, London – A Software Development Dilemma

Posted by | Enterprise Software, Technology | No Comments

New York is a well thought out, grid based, highly planned city from the ground up from the beginning.

Paris is a complex city with many organic roads that were built long ago but during the Napoleon era, a few large boulevards were built in order to increase the efficiency and connectedness of the city. A hybrid of planning and organic growth.

London is an organically developed city with curving roads, complex subway systems, and tons of legacy but is still a highly functional city.

In all three scenarios, each city is massive, efficient, has high production, and gets the job done. They were all planned in different ways but achieved the same outcome. New York wanted to be very structured, Paris wanted a hybrid, London opted for an organic route. No one city has it right and each have their own charms, cultures, efficiencies, and issues.

In enterprise software development, we have very similar issues. Enterprise software is uniquely hard to make sweeping changes due to scale, complexity, impact, and cost. In certain cases, adding just a single dimensions to a database can mean hundreds of thousands, possibly millions, of extra dollars accrued in storage costs due to operation scale – not to mention performance impact of query times. When a new product or feature needs to be introduced that has sweeping impact and ramifications for the rest of the organization, it’s often a good idea to take a step back to think about New York, Paris, and London.

Strategically, do you think it makes sense to rewrite 1/4 of the platform? Is there a work around? Should we just build something entirely new from scratch then migrate? Is there a middle ground where we can build a connection? Is that a long term solution or just a bandaid that will hurt us later? These are all questions that go back to the same macro level issues of city planning.

I think most companies start out with a basic plan but rapidly move into an organic model – much like London. The development goes in different directions, workarounds are instituted, compromises are made, it becomes more costly to engineer, but it still works. At some point, this scale can break down in which organizations move to the Paris model.

Moving to the Paris model means creating large boulevards that allow for certain efficiencies while allowing autonomy for organic growth to fit into those boulevards. Sometimes we even dig tunnels to find ways around the boulevards and organic streets, but at the end of the day we still provide connections to the main boulevards.

Then, at some point, the intersection of feature needs and tech debt forces organizations to consider building New York – a new paradigm that offers a foreseeable scalable solution and is well planned from the ground up. We build a basic version of New York and then start to port over all of the features. New York works for a while until we start to see flaws, in which we move back towards feature developments that model London and Paris: organic development.

Repeat a few times and you’ve got software development for the enterprises.

This isn’t necessarily a bad thing at all. Much of the worlds greatest software follows each of these patterns and have proven that no single model is “the best”. What has become very apparent to me is that it’s important to know which model you’re building after. It’s critically important because it highlights specific choices and sacrifices you’re willing to make.

Put simply: be aware of which what you’re doing.

Where I’ve seen these models fail the most is when organizations are not aware of the type of city they’re trying to build, making compromises or features that back them into incredibly costly situations. Now, this isn’t to say that you need to consider this for every feature. I personally think it pertains largely to more macro/strategic development (ie. user permissions, data portability, etc.).

Obviously we will make mistakes and sometimes choose to bulldoze the wrong areas in way of building our boulevards, but the hope is that by being more conscious about these decisions it will help shed light on the long term impacts and make decision making easier.

Anomaly Detection – A Novel Approach

Posted by | Enterprise Software, Optimization, Technology | No Comments

One of the harder things to do in monitoring system health or even brand health is to detect anomalies or “events” that are happening that may be out of the ordinary. It gets harder to detect such events when you’re data fluctuates frequently or when you’re trying to build a model that can be applied towards dramatically different datasets. This topic has been talked about many different times and contested with different theories, mathematics, and approaches such that we don’t create “alert fatigue”. Of course, I had to try and do it differently! Let’s talk about the approach I’ve been testing out.

TL;DR – I’m testing out a model that looks at the velocity vector moving average and derivative moving average. By looking at 3 time series data points of the derivatives in the past and extrapolate into the future, paired with the velocity vector, we get a good idea on when an anomaly may be happening.

I’ve explored many different ways including sophisticated machine learning methods. However, one afternoon I had a thought about looking at the problem a different way. This approach includes methods from day trading, physics, and calculus. The approach is simple enough: look at the change in slope agains the moving average. The reality is that there is a lot more to get it to work. And now, for the deconstruction…

Acceleration Moving Average

The first portion of this theory is to take a look at the acceleration moving average. This is found often in day trading as an indicator to a dramatic shift in direction that outpaces the prior accelerations. In this portion of the formula, we look at the acceleration formula as follows:

a = ∆v/∆t

For each time series, we store what the a is calculated out at. From there, we compare that against the moving average. Internally, we have tested out looking at a 14 day moving average on a 10 minute time series. So for each 10 minute increment, we look at the current and compare it against the moving average. However, as you can imagine, this can fluctuate quite dramatically and cause alerts to be sent that shouldn’t be. The risk of looking specifically at this is that you set a static threshold – ie. if current acceleration is greater than acceleration moving average by 20%, send alert. Where this really breaks down is when you get multiple spikes over the course of a day with each subsequent spike being less in volume (but still notable). Since the moving average will increase to account for the most recent spike, you lose out on the sub sequent spikes. Example below.

anomalies

If you look at the large red line right at around 12/1/15, you’ll notice that if we were to use a moving average that our moving average line would be pulled up dramatically. This causes the subsequent events happening at around 12/15/15 and 12/18/15 to be missed. While the acceleration moving average is a novel approach, we’ve actually found that it isn’t necessarily as useful as we’d like. It has often been led astray with wild fluctuations of volume and has a high propensity to trigger alerts that are not actually needed – such as the above. This led to look at a different approach.

Velocity Vector

Vectors allow us to quantify an object’s direction and magnitude. When looking at an anomaly, we want to understand it’s direction of movement on an x,y axis then pair that with the magnitude of volume. We could arguably get rid of the acceleration moving average at this point as they effectively become the same thing once we look at the moving average. Now, the velocity vector gives us a bit of understanding in real time what is happening to our volume. See example below.

velvec2

When analyzing twitter volume, volume can be sporadic. Even when reviewing the velocity vector moving average against the current, we still find that alarms are triggered more frequently than we’d like. This is primarily due to the data not being smoothed out. Meaning, we get snapshots of volume at different time frames as whole numbers, such as 10, 50, 34, etc. This makes it hard to discern the significance of a change in the vector portion of velocity vector. This brings us to the third portion of the formula.

Fourier Smoothing

Since Twitter volume data comes in as chunks of whole numbers, this causes our vectors to change dramatically which renders the prior useless. Vector velocities appear to really only be useful when the data is smoothed out between the actual time series counts. For example, if we have the two data points of 1 and 5, we’d actually want to fill in the difference with 1.1, 1.2, 1.3, 1.4, etc. In an interesting way, Twitter volume data can sometimes look like audio signal data, the sense that it can be incredibly choppy. In order to smooth it out, we can actually use Fourier Smoothing to create a nice looking data set as the Twitter volume count comes in. Below is an example of Fourier Smooth, where we look at discrete values of temperature by day and smooth out the data using this technique.

fourier-smoothing

Now when we look the velocity vector moving average, the value becomes more stoic and doesn’t change nearly as much as it did when no smoothing was applied. If we look at the velocity vector on 10 minute increments as a 14 day moving average, we get some nice insight as to the different fluctuations happening. However, we’re still looking at the current state and still don’t have a good way of letting the machine tell us not only when to trigger something, but letting us know when something might happen. In order to solve the predictive portion of that problem, we looked to derivatives.

Derivatives

Since our Fourier Smoothing of the dataset provided nice hyperbolas, we can easily calculate the derivative of any data point at any given time. In our environment, we have tested out looking at the derivative at each 10 minute increment. Since calculating the derivative gives us a line that theoretically extends both into the past and future, we actually look up to 3 time series increments into the future and past. From there, we calculate the change in the y axis of the derivatives. See example below.

Tangent-calculus.svg

By doing this, we can predict what the change in the derivative is up to 30 minutes before we get to that point. This is key because we’re looking specifically at the slope of an extrapolated derivative. But how do we know when an anomaly may happen? We look at the moving average of the past 14 days of the change in derivative slope. If the current change in slope exceeds the moving average, we’re likely to have an anomaly on our hands. However, we have found this to also be a bit too sensitive by itself which led to creating a combination of both the velocity vector moving average and the derivative slope moving average.

By combining both, we force a decision to be made. If the velocity vector is within the moving average but the derivative slope isn’t, it is most likely not an anomaly. Conversely, the same also applies. What I did find out though was that if both the derivative slope and the velocity vector exceed the moving average, it’s a strong indication that an anomaly is or will happen. I’ve also tried pairing this with 1 standard deviation away from the moving average as a dynamic threshold. Adding this in creates a system that only pulls out the very extreme cases of anomalies. In further tests, I’ll probably be testing out using units of standard deviation as a way to create a “more/less” sensitive alerting system. Almost like a user-drive knob or refinement method.


Is this a finished approach? Absolutely not. There’s a lot of challenges in getting this to work 100% properly to the point that it meets some sort of statistical rigor. I’ve been encouraged by the early results looking at real examples of events happening with our customers in the Twitter sphere. So far, I’ve seen a decent amount of success with predicting when an anomaly may be happening. There are other methods that we could look at to help refine the model, such as adding an F-Score for precision and recall for better accuracy on the prediction front.