# Anomaly detection in time series data using BigQuery ML

## Create an ARIMA model, then detect anomalies

Let’s say we have a time series dataset of the number of bicycles rented in London every day:

`SELECT `

EXTRACT(date from start_date) AS start_date,

COUNT(*) AS num_trips

FROM `bigquery-public-data.london_bicycles.cycle_hire`

GROUP BY start_date

How can we find unusual days in terms of the number of bicycle rentals?

In BigQuery, all we need are two SQL statements.

## 1. ARIMA+ Model

First, create an ML model to predict the number of trips on future days using an ARIMA model:

`CREATE OR REPLACE MODEL ch09eu.bicycle_daily_trips`

OPTIONS(

**model_type='arima_plus',**

TIME_SERIES_DATA_COL='num_trips',

TIME_SERIES_TIMESTAMP_COL='start_date',

**DECOMPOSE_TIME_SERIES=TRUE**

)

AS (

SELECT

EXTRACT(date from start_date) AS start_date,

COUNT(*) AS num_trips

FROM `bigquery-public-data.london_bicycles.cycle_hire`

GROUP BY start_date

);

One key thing is that we ask the model to keep track of forecast errors during the modeling using DECOMPOSE_TIME_SERIES. This is so that we can turn around and ask to see the anomalies.

## 2. Detecting anomalies

Next, detect anomalies (unusual days):

`SELECT * `

FROM **ML.DETECT_ANOMALIES(**

MODEL ch09eu.bicycle_daily_trips,

STRUCT (0.95 AS anomaly_prob_threshold))

ORDER BY anomaly_probability DESC

LIMIT 10

The result?

Okay, the second item on the list is Christmas. But why was July 9, 2015 so unusual?

## What happened in London on July 9, 2015?

Let’s ask Google.

Ah yes. The subway network was shutdown due to a strike. That’s why there was an unusually high number of bicycle rentals.

Enjoy!