## Anomaly Detection – A Novel Approach

Posted by rcasey | Enterprise Software, Optimization, Technology | No CommentsOne of the harder things to do in monitoring system health or even brand health is to detect anomalies or “events” that are happening that may be out of the ordinary. It gets harder to detect such events when you’re data fluctuates frequently or when you’re trying to build a model that can be applied towards dramatically different datasets. This topic has been talked about many different times and contested with different theories, mathematics, and approaches such that we don’t create “alert fatigue”. Of course, I had to try and do it differently! Let’s talk about the approach I’ve been testing out.

**TL;DR – I’m testing out a model that looks at the velocity vector moving average and derivative moving average. By looking at 3 time series data points of the derivatives in the past and extrapolate into the future, paired with the velocity vector, we get a good idea on when an anomaly may be happening.**

I’ve explored many different ways including sophisticated machine learning methods. However, one afternoon I had a thought about looking at the problem a different way. This approach includes methods from day trading, physics, and calculus. The approach is simple enough: look at the change in slope agains the moving average. The reality is that there is a lot more to get it to work. And now, for the deconstruction…

**Acceleration Moving Average**

The first portion of this theory is to take a look at the acceleration moving average. This is found often in day trading as an indicator to a dramatic shift in direction that outpaces the prior accelerations. In this portion of the formula, we look at the acceleration formula as follows:

*a = ∆v/∆t*

For each time series, we store what the *a* is calculated out at. From there, we compare that against the moving average. Internally, we have tested out looking at a 14 day moving average on a 10 minute time series. So for each 10 minute increment, we look at the current *a *and compare it against the moving average. However, as you can imagine, this can fluctuate quite dramatically and cause alerts to be sent that shouldn’t be. The risk of looking specifically at this is that you set a static threshold – ie. if current acceleration is greater than acceleration moving average by 20%, send alert. Where this really breaks down is when you get multiple spikes over the course of a day with each subsequent spike being less in volume (but still notable). Since the moving average will increase to account for the most recent spike, you lose out on the sub sequent spikes. Example below.

If you look at the large red line right at around 12/1/15, you’ll notice that if we were to use a moving average that our moving average line would be pulled up dramatically. This causes the subsequent events happening at around 12/15/15 and 12/18/15 to be missed. While the acceleration moving average is a novel approach, we’ve actually found that it isn’t necessarily as useful as we’d like. It has often been led astray with wild fluctuations of volume and has a high propensity to trigger alerts that are not actually needed – such as the above. This led to look at a different approach.

**Velocity Vector**

Vectors allow us to quantify an object’s direction and magnitude. When looking at an anomaly, we want to understand it’s direction of movement on an *x,y* axis then pair that with the magnitude of volume. We could arguably get rid of the acceleration moving average at this point as they effectively become the same thing once we look at the moving average. Now, the velocity vector gives us a bit of understanding in real time what is happening to our volume. See example below.

When analyzing twitter volume, volume can be sporadic. Even when reviewing the velocity vector moving average against the current, we still find that alarms are triggered more frequently than we’d like. This is primarily due to the data not being smoothed out. Meaning, we get snapshots of volume at different time frames as whole numbers, such as 10, 50, 34, etc. This makes it hard to discern the significance of a change in the vector portion of velocity vector. This brings us to the third portion of the formula.

**Fourier Smoothing**

Since Twitter volume data comes in as chunks of whole numbers, this causes our vectors to change dramatically which renders the prior useless. Vector velocities appear to really only be useful when the data is smoothed out between the actual time series counts. For example, if we have the two data points of 1 and 5, we’d actually want to fill in the difference with 1.1, 1.2, 1.3, 1.4, etc. In an interesting way, Twitter volume data can sometimes look like audio signal data, the sense that it can be incredibly choppy. In order to smooth it out, we can actually use Fourier Smoothing to create a nice looking data set as the Twitter volume count comes in. Below is an example of Fourier Smooth, where we look at discrete values of temperature by day and smooth out the data using this technique.

Now when we look the velocity vector moving average, the value becomes more stoic and doesn’t change nearly as much as it did when no smoothing was applied. If we look at the velocity vector on 10 minute increments as a 14 day moving average, we get some nice insight as to the different fluctuations happening. However, we’re still looking at the current state and still don’t have a good way of letting the machine tell us not only when to trigger something, but letting us know when something might happen. In order to solve the predictive portion of that problem, we looked to derivatives.

**Derivatives**

Since our Fourier Smoothing of the dataset provided nice hyperbolas, we can easily calculate the derivative of any data point at any given time. In our environment, we have tested out looking at the derivative at each 10 minute increment. Since calculating the derivative gives us a line that theoretically extends both into the past and future, we actually look up to 3 time series increments into the future and past. From there, we calculate the change in the *y* axis of the derivatives. See example below.

By doing this, we can predict what the change in the derivative is up to 30 minutes before we get to that point. This is key because we’re looking specifically at the slope of an extrapolated derivative. But how do we know when an anomaly may happen? We look at the moving average of the past 14 days of the change in derivative slope. If the current change in slope exceeds the moving average, we’re likely to have an anomaly on our hands. However, we have found this to also be a bit too sensitive by itself which led to creating a combination of both the velocity vector moving average and the derivative slope moving average.

By combining both, we force a decision to be made. If the velocity vector is within the moving average but the derivative slope isn’t, it is most likely not an anomaly. Conversely, the same also applies. What I did find out though was that if both the derivative slope *and* the velocity vector exceed the moving average, it’s a strong indication that an anomaly is or will happen. I’ve also tried pairing this with 1 standard deviation away from the moving average as a dynamic threshold. Adding this in creates a system that only pulls out the very extreme cases of anomalies. In further tests, I’ll probably be testing out using units of standard deviation as a way to create a “more/less” sensitive alerting system. Almost like a user-drive knob or refinement method.

Is this a finished approach? Absolutely not. There’s a lot of challenges in getting this to work 100% properly to the point that it meets some sort of statistical rigor. I’ve been encouraged by the early results looking at real examples of events happening with our customers in the Twitter sphere. So far, I’ve seen a decent amount of success with predicting when an anomaly may be happening. There are other methods that we could look at to help refine the model, such as adding an F-Score for precision and recall for better accuracy on the prediction front.