> ## Documentation Index
> Fetch the complete documentation index at: https://cubed3-feat-druid-driver-streaming.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Pre-aggregations

> Pre-aggregations are materialized summaries of data that accelerate query performance by pre-computing results.

Pre-aggregations must have, at minimum, a [name](#name) and a [type](#type).
Pre-aggregations must include all dimensions and measures you will query with.

A pre-aggregation is typically placed in a cube that defines most of dimensions
and measures that this pre-aggregation includes. If a pre-aggregation doesn't
include any dimensions or measures from a cube it's defined in, this
pre-aggregation is ignored.

## Parameters

### `name`

The `name` parameter serves as the identifier of a pre-aggregation. It must be
unique among all pre-aggregations within a cube and follow the [naming
conventions][ref-naming].

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: orders_by_status
          dimensions:
            - CUBE.status
          measures:
            - CUBE.count

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      orders_by_status: {
        dimensions: [
          status
        ],
        measures: [
          count
        ]
      }
    },

    // ...
  })
  ```
</CodeGroup>

This `name`, along with the name of the cube, will be used as a prefix for
pre-aggregation tables created in the database.

### `type`

Cube supports the following types of pre-aggregations:

* [`rollup`][self-rollup]
* [`original_sql`][self-originalsql]
* [`rollup_join`][self-rollupjoin]
* [`rollup_lambda`][self-rolluplambda]

The default type is `rollup`.

#### `rollup`

Rollup pre-aggregations are the most effective way to boost performance of any
analytical application. The blazing fast performance of tools like Google
Analytics or Mixpanel are backed by a similar concept. The theory behind it lies
in multi-dimensional analysis, and a rollup pre-aggregation is the result of a
[roll-up operation on an OLAP cube][wiki-olap-ops]. A rollup pre-aggregation is
essentially the summarized data of the original cube grouped by any selected
dimensions of interest.

The most performant kind of rollup pre-aggregation is an **additive** rollup:
all measures of which are based on [decomposable aggregate
functions][wiki-composable-agg-fn]. Additive measure types are: `count`, `sum`,
`min`, `max` or `count_distinct_approx`. The performance boost in this case is
based on two main properties of additive rollup pre-aggregations:

1. A rollup pre-aggregation table usually contains many fewer rows than its'
   corresponding original fact table. The fewer dimensions that are selected
   for roll-up means fewer rows in the materialized result. A smaller number of
   rows therefore means less time to query rollup pre-aggregation tables.

2. If your query is a subset of dimensions and measures of an additive rollup,
   then it can be used to calculate a query without accessing the raw data. The
   more dimensions and measures are selected for roll-up, the more queries can
   use this particular rollup.

Rollup definitions can contain members from a single cube as well as from
multiple cubes. In case of multiple cubes being involved, the join query will be
built according to the standard rules of cubes joining.

Rollups are selected for querying based on properties found in queries made to
the Cube REST (JSON) API. A thorough explanation can be found under [Getting Started
with Pre-Aggregations][ref-caching-preaggs-target].

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: orders_by_company
          measures:
            - count
          dimensions:
            - status

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      orders_by_company: {
        measures: [
          count
        ],
        dimensions: [
          status
        ]
      }
    },

    // ...
  })
  ```
</CodeGroup>

#### `original_sql`

As the name suggests, it persists the results of the `sql` property of the cube
in the data source (rather than Cube Store). `original_sql` pre-aggregations should
**only** be used when the cube's `sql` is a complex query (i.e., nested subqueries,
window functions, and/or multiple joins).

<Warning>
  `originalSql` pre-aggregations **must only** be stored in the data source.
  While you can set `external: true` for `original_sql` pre-aggregation, this is
  not recommended or generally supported.
</Warning>

They often do not provide much in the way of performance directly, but there are
two specific applications:

1. They can be used in tandem with the
   [`use_original_sql_pre_aggregations`][self-origsql-preaggs] option in other
   rollup pre-aggregations.

2. Situations where it is not possible to use a `rollup` pre-aggregations, such
   as [funnels][ref-recipe-funnels].

For example, to pre-aggregate all completed orders, you could do the following:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: completed_orders
      sql: |
        SELECT *
        FROM orders
        WHERE completed = true

      pre_aggregations:
        - name: main
          type: original_sql

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`completed_orders`, {
    sql: `
      SELECT *
      FROM orders
      WHERE completed = true
    `,

    pre_aggregations: {
      main: {
        type: `original_sql`
      }
    },

    // ...
  })
  ```
</CodeGroup>

#### `rollup_join`

<Warning>
  `rollup_join` is currently in Preview, and the API may change in a future
  version.
</Warning>

<Warning>
  Rollup join is designed to join data across different data sources.
  To join data within the same data source you can list members from different
  cubes within a regular rollup. Rollup join can be used only to join two
  rollups.

  The rollup on the right side of the join is bounded by the number of physical
  Cube Store partitions it can have, which depends on your Cube Store compute
  tier. Note that these are Cube Store physical partitions — not Cube logical
  partitions.
</Warning>

Cube is capable of performing joins between pre-aggregations from different
[data sources][ref-schema-ref-cube-datasource] to avoid making excessive queries
to them.

<Warning>
  `rollup_join` is an ephemeral pre-aggregation that relies on referenced rollups
  when queries are executed. Setting
  [`scheduled_refresh`](#scheduled_refresh) to `true` is unnecessary
  for `rollup_join` and will result in an error. Appropriate freshness controls
  should be set on referenced rollups instead.
</Warning>

In the following example, we have a `users` cube with a `users_rollup`
pre-aggregation, and an `orders` cube with an `orders_rollup` pre-aggregation,
and an `orders_with_users_rollup` pre-aggregation. Note the following:

* Both cubes have different values for `data_source`.
* The type of `orders_with_users_rollup` is `rollup_join`.
* This pre-aggregation has a special property `rollups` which is an array
  containing references to both "source" rollups.

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: users
      data_source: postgres
      sql_table: users

      pre_aggregations:
        - name: users_rollup
          dimensions:
            - CUBE.id
            - CUBE.name

      measures:
        - name: count
          type: count

      dimensions:
        - name: id
          sql: id
          type: number
          primary_key: true

        - name: name
          sql: "{CUBE}.first_name || {CUBE}.last_name"
          type: string

    - name: orders
      data_source: mssql
      sql_table: orders

      pre_aggregations:
        - name: orders_rollup
          measures:
            - CUBE.count
          dimensions:
            - CUBE.user_id
            - CUBE.status
          time_dimension: CUBE.created_at
          granularity: day

        - name: orders_with_users_rollup
          type: rollup_join
          measures:
            - CUBE.count
          dimensions:
            - users.name
          rollups:
            - users.users_rollup
            - CUBE.orders_rollup

      joins:
        - name: users
          relationship: many_to_one
          sql: "{CUBE.user_id} = {users.id}"

      measures:
        - name: count
          type: count

      dimensions:
        - name: id
          sql: id
          type: number
          primary_key: true

        - name: user_id
          sql: user_id
          type: number

        - name: status
          sql: status
          type: string

        - name: created_at
          sql: created_at
          type: time
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`users`, {
    data_source: `postgres`,
    sql_table: `users`,

    pre_aggregations: {
      users_rollup: {
        dimensions: [CUBE.id, CUBE.name]
      }
    },

    measures: {
      count: {
        type: `count`
      }
    },

    dimensions: {
      id: {
        sql: `id`,
        type: `number`,
        // We need to set this field as the primary key for joins to work
        primary_key: true
      },

      name: {
        sql: `first_name || last_name`,
        type: `string`
      }
    }
  })

  cube(`orders`, {
    data_source: `mssql`,
    sql_table: `orders`,

    pre_aggregations: {
      orders_rollup: {
        measures: [CUBE.count],
        dimensions: [CUBE.user_id, CUBE.status],
        time_dimension: CUBE.created_at,
        granularity: `day`
      },

      // Here we add a new pre-aggregation of type `rollup_join`
      orders_with_users_rollup: {
        type: `rollup_join`,
        measures: [CUBE.count],
        dimensions: [users.name],
        rollups: [users.users_rollup, CUBE.orders_rollup]
      }
    },

    joins: {
      users: {
        relationship: `many_to_one`,
        // Make sure the join uses dimensions on the cube, rather than
        // the column names from the underlying SQL
        sql: `${CUBE.user_id} = ${users.id}`
      }
    },

    measures: {
      count: {
        type: `count`
      }
    },

    dimensions: {
      id: {
        sql: `id`,
        type: `number`,
        primary_key: true
      },

      user_id: {
        sql: `user_id`,
        type: `number`
      },

      status: {
        sql: `status`,
        type: `string`
      },

      created_at: {
        sql: `created_at`,
        type: `time`
      }
    }
  })
  ```
</CodeGroup>

<Info>
  `rollup_join` is not required to join cubes from the same data source; instead,
  include the foreign cube's dimensions/measures in the rollup definition
  directly:
</Info>

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: orders_rollup
          measures:
            - CUBE.count
          dimensions:
            - users.name
            - CUBE.status
          time_dimension: CUBE.created_at
          granularity: day
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      orders_rollup: {
        measures: [CUBE.count],
        dimensions: [users.name, CUBE.status],
        time_dimension: CUBE.created_at,
        granularity: `day`
      }
    }
  })
  ```
</CodeGroup>

#### `rollup_lambda`

<Warning>
  `rollup_lambda` pre-aggregations **must** be defined **before** any other
  pre-aggregations in a cube.
</Warning>

A `rollup_lambda` pre-aggregation is a special type of pre-aggregation that can
combine data from data sources and other rollups. It is extremely useful in
scenarios where real-time data is required.

[Lambda pre-aggregations][ref-caching-lambda-preaggs] can be used to combine
data from a data source and a pre-aggregation, or even from multiple
pre-aggregations across different cubes that share the same dimensions and
measures.

### `measures`

The `measures` property is an array of [measures from the
cube][ref-schema-measures] that should be included in the pre-aggregation:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: users_rollup
          measures:
            - CUBE.count

      measures:
        - name: count
          type: count
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      users_rollup: {
        measures: [CUBE.count]
      }
    },

    measures: {
      count: {
        type: `count`
      }
    }
  })
  ```
</CodeGroup>

### `dimensions`

The `dimensions` property is an array of [dimensions from the
cube][ref-schema-dimensions] that should be included in the pre-aggregation:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: users_rollup
          dimensions:
            - CUBE.status

      dimensions:
        - name: status
          type: string
          sql: status
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      users_rollup: {
        dimensions: [CUBE.status]
      }
    },

    dimensions: {
      status: {
        type: `string`,
        sql: `status`
      }
    }
  })
  ```
</CodeGroup>

### `time_dimension`

The `time_dimension` property can be any [`dimension`][ref-schema-dimensions] of
type [`time`][ref-schema-types-dim-time]. All other measures and dimensions in
the data model are aggregated. This property is an extremely useful tool for
improving performance with massive datasets.

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: orders_by_status
          measures:
            - CUBE.count
          dimensions:
            - CUBE.status
          time_dimension: CUBE.created_at
          granularity: day

      measures:
        - name: count
          type: count

      dimensions:
        - name: status
          type: string
          sql: status

        - name: created_at
          type: time
          sql: created_at
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      orders_by_status: {
        measures: [CUBE.count],
        dimensions: [CUBE.status],
        time_dimension: CUBE.created_at,
        granularity: `day`
      }
    },

    measures: {
      count: {
        type: `count`
      }
    },

    dimensions: {
      status: {
        type: `string`,
        sql: `status`
      },

      created_at: {
        type: `time`,
        sql: `created_at`
      }
    }
  })
  ```
</CodeGroup>

A [`granularity`][self-granularity] **must** also be included in the
pre-aggregation definition.

### `granularity`

The `granularity` property defines the time dimension granularity of data
*within* the pre-aggregation. If set to `week`, for example, then Cube will
pre-aggregate the data by week and persist it to Cube Store.

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: users_rollup_by_week
          measures:
            - CUBE.count
          dimensions:
            - CUBE.status
          time_dimension: CUBE.created_at
          granularity: week
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      users_rollup_by_week: {
        measures: [CUBE.count],
        dimensions: [CUBE.status],
        time_dimension: CUBE.created_at,
        granularity: `week`
      }
    },

    // ...
  })
  ```
</CodeGroup>

The value can be either a default granularity (i.e., `second`, `minute`, `hour`,
`day`, `week`, `month`, `quarter`, or `year`) or a [custom
granularity][ref-custom-granularity]. This property is required when using
[`time_dimension`][self-timedimension].

### `segments`

The `segments` property is an array of [segments from the
cube][ref-schema-segments] that can target the pre-aggregation:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: main
          measures:
            - CUBE.count
          segments:
            - CUBE.only_complete

      measures:
        - name: count
          type: count

      segments:
        - name: only_complete
          sql: "{CUBE}.status = 'completed'"
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      main: {
        measures: [CUBE.count],
        segments: [CUBE.only_complete]
      }
    },

    measures: {
      count: {
        type: `count`
      }
    },

    segments: {
      only_complete: {
        sql: `${CUBE}.status = 'completed'`
      }
    }
  })
  ```
</CodeGroup>

### `partition_granularity`

The `partition_granularity` defines the granularity for each
[partition][ref-caching-partitioning] of the pre-aggregation:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: users_rollup
          measures:
            - CUBE.count
          dimensions:
            - CUBE.status
          time_dimension: CUBE.created_at
          granularity: day
          partition_granularity: month

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      users_rollup: {
        measures: [CUBE.count],
        dimensions: [CUBE.status],
        time_dimension: CUBE.created_at,
        granularity: `day`,
        partition_granularity: `month`
      }
    },

    // ...
  })
  ```
</CodeGroup>

The value can be one of `hour`, `day`, `week`, `month`, `quarter`, `year`. A
[`time_dimension`][self-timedimension] and [`granularity`][self-granularity]
**must** also be included in the pre-aggregation definition. This property is
required when using [partitioned pre-aggregations][ref-caching-partitioning].

Number of partitions to be built per cube is calculated as
[build\_range][self-buildrange] divided by `partition_granularity`. Number of
partitions to be built per cube is multiplied by the count of time zones and
tenants in case different tenants have different pre-aggregation SQL.

<Warning>
  Choose the count of partitions wisely as those consume memory and CPU resources.
  As a rule of thumb, you do not want to go over 500-1,000 partitions per pre-aggregation in total
  to keep the partitioning overhead low. Too many partitions will most likely
  cause out of memory.
  In case of very long build ranges please consider use [Lambda pre-aggregations][ref-caching-lambda-preaggs] to reduce partition count per pre-aggregation.
</Warning>

### `refresh_key`

Cube can also take care of keeping pre-aggregations up to date with the
`refresh_key` property. By default, it is set to `every: '1 hour'`,
if neither of the cubes' pre-aggregation references don't override `refresh_key`.

<Info>
  When using [partitioned pre-aggregations][ref-caching-partitioning], the refresh
  key is evaluated for each partition separately.
</Info>

#### `sql`

You can set up a custom refresh check strategy by using the `sql` property:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: main
          measures:
            - CUBE.count
          refresh_key:
            sql: SELECT MAX(created_at) FROM orders

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      main: {
        measures: [CUBE.count],
        refresh_key: {
          sql: `SELECT MAX(created_at) FROM orders`
        }
      }
    },

    // ...
  })
  ```
</CodeGroup>

In the above example, the refresh key SQL will be executed every 10 seconds, as
[`every`][self-refreshkey-every] is not defined. If the results of the SQL
refresh key differ from the last execution, then the pre-aggregation will be
refreshed.

#### `every`

The `refresh_key` can define an `every` property which can be used to refresh
pre-aggregations based on a time interval. By default, it is set to `1 hour`
unless the [`sql` property][self-refreshkey-sql] is also defined in any of cubes pre-aggregation references, in which case
it is set to `10 seconds`. For example:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: main
          measures:
            - CUBE.count
          refresh_key:
            every: 1 day

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      main: {
        measures: [CUBE.count],
        refresh_key: {
          every: `1 day`
        }
      }
    },

    // ...
  })
  ```
</CodeGroup>

To have a pre-aggregation rebuild at a specific time of day, you can use a CRON string with some limitations. For more details about values that can be used with the `every` parameter, please refer to the
[`refreshKey`][ref-cube-refreshkey] documentation.

You can also use `every` with `sql`:

<Warning>
  Using CRON strings and `sql` is not supported and will result in a compilation error.
</Warning>

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: main
          measures:
            - CUBE.count
          refresh_key:
            every: 1 hour
            sql: SELECT MAX(created_at) FROM orders
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      main: {
        measures: [CUBE.count],
        refreshKey: {
          every: `1 hour`,
          sql: `SELECT MAX(created_at) FROM orders`
        }
      }
    }
  })
  ```
</CodeGroup>

In the above example, the refresh key SQL will be executed every hour. If the
results of the SQL refresh key differ from the last execution, then the
pre-aggregation will be refreshed.

<Warning>
  The `every` parameter guarantees that the refresh key will be executed **at
  least** once during the specified interval. However, it does **not** guarantee
  that it will be executed **at most** once. The refresh key may be checked more
  frequently.

  When using `every` with `sql`, the purpose is to reduce the load from running
  the refresh key query itself, not to minimize the number of pre-aggregation
  refreshes. The refresh key SQL should be designed to return consistent results
  and only change when the underlying data needs to be updated, ensuring minimal
  refreshes even if the query is executed multiple times.
</Warning>

#### `incremental`

You can incrementally refresh partitioned rollups by setting
`incremental: true`. This option defaults to `false`.

<Warning>
  Partition tables are refreshed as a whole. When a new partition table is
  available, it replaces the old one. Old partition tables are collected by a
  garbage collection mechanism. Append is never used to add new rows to the
  existing tables.
</Warning>

<Warning>
  Because incremental refreshes generate their own SQL, you **must not** use the
  [`sql`](#refresh_key) property here.
</Warning>

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: main
          measures:
            - CUBE.count
          time_dimension: CUBE.created_at
          granularity: day
          partition_granularity: day
          refresh_key:
            every: 1 day
            incremental: true

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      main: {
        measures: [CUBE.count],
        time_dimension: CUBE.created_at,
        granularity: `day`,
        partition_granularity: `day`,
        refresh_key: {
          every: `1 day`,
          incremental: true
        }
      }
    },

    // ...
  })
  ```
</CodeGroup>

#### `update_window`

<Warning>
  Incremental refreshes without a defined `update_window` will only update the
  last partition as determined by the pre-aggregation's `partition_granularity`.
</Warning>

The `incremental: true` flag generates a special `refresh_key` SQL query which
triggers a refresh for partitions where the end date lies within the
`update_window` from the current time.

<Warning>
  Because incremental refreshes generate their own SQL, you **must not** use the
  [`sql`](#refresh_key) property here.
</Warning>

In the example below, it will refresh today's and the last 7 days of partitions
once a day. Partitions before the `7 day` interval **will not** be refreshed
once they are built unless the rollup SQL is changed.

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: main
          measures:
            - CUBE.count
          time_dimension: CUBE.created_at
          granularity: day
          partition_granularity: day
          refresh_key:
            every: 1 day
            incremental: true
            update_window: 7 day

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      main: {
        measures: [CUBE.count],
        time_dimension: CUBE.created_at,
        granularity: `day`,
        partition_granularity: `day`,
        refresh_key: {
          every: `1 day`,
          incremental: true,
          update_window: `7 day`
        }
      }
    },

    // ...
  })
  ```
</CodeGroup>

This property is required when using [`incremental`][self-incremental]
refreshes.

### `allow_non_strict_date_range_match`

The `allow_non_strict_date_range_match` parameter is used to allow queries to match a
pre-aggregation even if a query contains a *non-strict date range*. It is set to `true`
by default via the \[[`CUBEJS_PRE_AGGREGATIONS_ALLOW_NON_STRICT_DATE_RANGE_MATCH`](/reference/configuration/environment-variables#cubejs_pre_aggregations_allow_non_strict_date_range_match)][ref-env-allow-non-strict]
environment variable.

If this flag is set to `false`, Cube would check if requested date range exactly matches
pre-aggregation granularity. For example, if you're requesting half of a day or your date
range filter is just one millisecond off for a pre-aggregation with the daily granularity,
Cube would not use such a pre-aggregation.

With this flag set to `true`, that strict check is lifted. It allows queries from BI tools
to still match pre-aggregations at the cost of a slight potential data discrepancy.
This is convenient when using Cube with visualization tools such as [Tableau][ref-config-downstream-tableau]
or [Apache Superset][ref-config-downstream-superset] that use loose date ranges.

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: main
          measure:
            - CUBE.count
          time_dimension: CUBE.created_at
          granularity: day
          partition_granularity: day
          allow_non_strict_date_range_match: true
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql: `orders`,

    pre_aggregations: {
      main: {
        measure: [CUBE.count],
        time_dimension: CUBE.created_at,
        granularity: `day`,
        partition_granularity: `day`,
        allow_non_strict_date_range_match: true
      }
    },

    // ...
  })
  ```
</CodeGroup>

### `use_original_sql_pre_aggregations`

Cube supports multi-stage pre-aggregations by reusing original SQL
pre-aggregations in rollups through the `use_original_sql_pre_aggregations`
property. It is helpful in cases where you want to re-use a heavy SQL query
calculation in multiple `rollup` pre-aggregations. Without
`use_original_sql_pre_aggregations` enabled, Cube will always re-execute all
underlying SQL calculations every time it builds new rollup tables.

<Warning>
  `originalSql` pre-aggregations **must only** be stored in the data source.
  While you can set `external: true` for `original_sql` pre-aggregation, this is
  not recommended or generally supported.
</Warning>

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql: |
        SELECT * FROM orders1 UNION ALL SELECT * FROM orders2 UNION ALL SELECT *
        FROM orders3

      pre_aggregations:
        - name: main
          type: original_sql

        - name: statuses
          measures:
            - CUBE.count
          dimensions:
            - CUBE.status
          use_original_sql_pre_aggregations: true

        - name: completed_orders
          measures:
            - CUBE.count
          time_dimension: CUBE.completed_at
          granularity: day
          use_original_sql_pre_aggregations: true

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql: `
      SELECT * FROM orders1
      UNION ALL
      SELECT * FROM orders2
      UNION ALL
      SELECT * FROM orders3
    `,

    pre_aggregations: {
      main: {
        type: `original_sql`
      },

      statuses: {
        measures: [CUBE.count],
        dimensions: [CUBE.status],
        use_original_sql_pre_aggregations: true
      },

      completed_orders: {
        measures: [CUBE.count],
        time_dimension: CUBE.completed_at,
        granularity: `day`,
        use_original_sql_pre_aggregations: true
      }
    },

    // ...
  })
  ```
</CodeGroup>

### `scheduled_refresh`

To always keep pre-aggregations up-to-date, you can set
`scheduled_refresh: true`. This option defaults to `true`.
In production mode, pre-aggregations with `scheduled_refresh: false` will not be
built automatically and require external orchestration to trigger their refresh.
Additionally, any `scheduled_refresh: false` pre-aggregations that were built manually or on-demand will be considered
stale and orphaned over time, and will be dropped by the refresh worker during
cleanup processes.
The [`refresh_key`][self-refreshkey] is used to determine if there's a need to
update specific pre-aggregations on each scheduled refresh run. For partitioned
pre-aggregations, `min` and `max` dates for
[`time_dimension`][self-timedimension] are checked to determine range for the
refresh.

Each time a scheduled refresh is run, it takes every pre-aggregation partition
starting with most recent ones in time and checks if the
[`refresh_key`][self-refreshkey] has changed. If a change was detected, then
that partition will be refreshed.

In development mode, Cube runs the background refresh by default and will
refresh all pre-aggregations which have `scheduled_refresh: true`.

Please consult [Production Checklist][ref-production-checklist-refresh] for best
practices on running background refresh in production environments.

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: orders_by_status
          measures:
            - CUBE.count
          dimensions:
            - CUBE.status
          time_dimension: CUBE.created_at
          granularity: day
          partition_granularity: month

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      orders_by_status: {
        measures: [CUBE.count],
        dimensions: [CUBE.status],
        time_dimension: CUBE.created_at,
        granularity: `day`,
        partition_granularity: `month`
      }
    },

    // ...
  })
  ```
</CodeGroup>

### `build_range_start` and `build_range_end`

<Warning>
  Currently `build_range_start` and `build_range_end` doesn't have any effect on
  pre-aggregations without `partition_granularity`. This behavior can be changed
  in future versions.
</Warning>

<Warning>
  Cube will **not** return results outside of the defined build range. Data will
  **only** be queried within this range, which can lead to an empty result set,
  depending on the query. Consider using [Lambda
  pre-aggregations][self-rolluplambda] if you want to query data outside of the
  build range.
</Warning>

The build range defines what partitions should be built by a scheduled refresh.
By default, the build range is defined as the minimum and maximum values
possible for the [`time_dimension`][self-timedimension] used in the rollup.

<Warning>
  The SQL queries for the build range (as defined by the `sql` property) are
  executed based on the [`refresh_key`][self-refreshkey] settings of the
  pre-aggregation.

  In case of very small `update_window` or `FILTER_PARAMS` are used in the [`refresh_key`][self-refreshkey] definition
  and the current timestamp is used as `build_range_end`, there's a possibility
  to write [`refresh_key`][self-refreshkey] which won't refresh due to cycle dependency on each other.
  To address such cases, you can use relative date in the future for the `build_range_end`.
  For example, you can add one day to the current timestamp and use it as `build_range_end`.
  The actual value to be added depends on how frequently `refresh_key` is updated.

  On the other hand, if the current timestamp is used as `build_range_end`, it can trigger
  updates of the first and last partitions more frequently than the [`refresh_key`][self-refreshkey] does,
  since the build range boundaries change with each evaluation.
</Warning>

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: main
          measures:
            - CUBE.count
          time_dimension: CUBE.created_at
          granularity: day
          partition_granularity: month
          build_range_start:
            sql: SELECT CURRENT_DATE - INTERVAL '300 day'
          build_range_end:
            sql: SELECT CURRENT_DATE

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      main: {
        measures: [CUBE.count],
        time_dimension: CUBE.created_at,
        granularity: `day`,
        partition_granularity: `month`,
        build_range_start: {
          sql: `SELECT CURRENT_DATE - INTERVAL '300 day'`
        },
        build_range_end: {
          sql: `SELECT CURRENT_DATE`
        }
      }
    },

    // ...
  })
  ```
</CodeGroup>

It can be used together with the pre-aggregation's `refresh_key` to define
granular update settings. Set `refresh_key.update_window` to the interval in
which your data can change and `build_range_start` to the earliest point of time
when history should be available.

In the following example, `refresh_key.update_window` is `1 week` and
`build_range_start` is `SELECT NOW() - INTERVAL '365 day'`, so the scheduled
refresh will build historical partitions for 365 days in the past and will only
refresh last week's data.

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: main
          measures:
            - CUBE.count
          time_dimension: CUBE.created_at
          granularity: day
          partition_granularity: month
          build_range_start:
            sql: SELECT NOW() - INTERVAL '365 day'
          build_range_end:
            sql: SELECT NOW()
          refresh_key:
            update_window: 1 week
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      main: {
        measures: [CUBE.count],
        time_dimension: CUBE.created_at,
        granularity: `day`,
        partition_granularity: `month`,
        build_range_start: {
          sql: `SELECT NOW() - INTERVAL '365 day'`
        },
        build_range_end: {
          sql: `SELECT NOW()`
        },
        refresh_key: {
          update_window: `1 week`
        }
      }
    },

    // ...
  })
  ```
</CodeGroup>

### `union_with_source_data`

This option allows combining a pre-aggregation with fresh data retrieved from
the data source; this is extremely useful in scenarios where latency can be
sacrificed for accuracy.

To configure a pre-aggregation to behave in this way, ensure the pre-aggregation
is of type `rollup_lambda`, and then set `union_with_source_data` to `true`:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      # ...

      pre_aggregations:
        - name: lambda
          type: rollup_lambda
          union_with_source_data: true
          rollups:
            - orders.main
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    // ...

    pre_aggregations: {
      lambda: {
        type: `rollup_lambda`,
        union_with_source_data: true,
        rollups: [orders.main]
      }
    }
  })
  ```
</CodeGroup>

### `indexes`

This option allows to define [indexes][ref-indexes]. Indexes are used to
fine-tune pre-aggregation performance when pre-aggregations have significant
cardinality. (For [aggregating indexes][ref-aggregating-indexes], see the
[`type` option](#type-2) below.)

Here's how you can define an index:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: category_and_date
          measures:
            - count
          dimensions:
            - category
          time_dimension: created_at
          granularity: day
          indexes:
            - name: category_index
              columns:
                - category

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      category_and_date: {
        measures: [
          count
        ],
        dimensions: [
          category
        ],
        time_dimension: created_at,
        granularity: `day`,
        indexes: {
          category_index: {
            columns: [
              category
            ]
          }
        }
      }
    },

    // ...
  })
  ```
</CodeGroup>

To reference a time dimension column with its granularity in the index,
use the `CUBE.<time_dimension>.<granularity>` syntax:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: category_and_date
          measures:
            - CUBE.count
          dimensions:
            - CUBE.category
          time_dimension: CUBE.created_at
          granularity: day
          indexes:
            - name: category_created_at_index
              columns:
                - CUBE.category
                - CUBE.created_at.day
      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      category_and_date: {
        measures: [
          CUBE.count
        ],
        dimensions: [
          CUBE.category
        ],
        time_dimension: CUBE.created_at,
        granularity: `day`,
        indexes: {
          category_created_at_index: {
            columns: [
              CUBE.category,
              CUBE.created_at.day
            ]
          }
        }
      }
    },

    // ...
  })
  ```
</CodeGroup>

Alternatively, you can reference the time dimension column as a string
using the `<cube_name>__<time_dimension_name>_<granularity>` format:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      # ...

      pre_aggregations:
        - name: category_and_date
          # ...

          indexes:
            - name: category_created_at_index
              columns:
                - CUBE.category
                - "{'orders__created_at_day'}"
      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    // ...

    pre_aggregations: {
      category_and_date: {
        // ...

        indexes: {
          category_created_at_index: {
            columns: [
              CUBE.category,
              `orders__created_at_day`
            ]
          }
        }
      }
    },

    // ...
  })
  ```
</CodeGroup>

The same approach works for `original_sql` pre-aggregations:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: main
          type: original_sql
          indexes:
            - name: timestamp_index
              columns:
                - timestamp
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      main: {
        type: `original_sql`,
        indexes: {
          timestamp_index: {
            columns: [
              `timestamp`
            ]
          }
        }
      }
    }
  })
  ```
</CodeGroup>

<Note>
  In some cases, indexes would not work with `original_sql` pre-aggregations.
  Please [track this issue](https://github.com/cube-js/cube/issues/7420).
</Note>

#### `type`

This option is used to define [aggregating indexes][ref-aggregating-indexes]
that contain **only** dimensions and pre-aggregated measures from the
pre-aggregation definition.

<Note>
  A query can only target an aggregating index when every measure it uses is
  [additive](/docs/pre-aggregations/matching-pre-aggregations)
  (e.g. built on `sum`, `count`, `min`, `max`, or `countDistinctApprox`), and every
  dimension and filter in the query is one of the index's `columns`. Non-additive
  measures, such as `avg` or exact `countDistinct`, cannot use an aggregating index.
  Queries that don't qualify automatically fall back to a regular index.
</Note>

Here's how you can define an aggregating index:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      pre_aggregations:
        - name: category_and_date
          measures:
            - CUBE.count
          dimensions:
            - CUBE.status
          time_dimension: CUBE.created_at
          granularity: day
          indexes:
            - name: category_index
              columns:
                - CUBE.status
                - Products.name
            - name: aggregating_index
              columns:
                - CUBE.status
              type: aggregate

      # ...
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    pre_aggregations: {
      category_and_date: {
        measures: [
          CUBE.count
        ],
        dimensions: [
          CUBE.status
        ],
        time_dimension: CUBE.created_at,
        granularity: `day`,
        indexes: {
          category_index: {
            columns: [
              CUBE.status,
              products.name
            ]
          },
          aggregating_index: {
            columns: [
              CUBE.status
            ],
            type: `aggregate`
          }
        }
      }
    },

    // ...
  })
  ```
</CodeGroup>

[ref-caching-lambda-preaggs]: /docs/pre-aggregations/lambda-pre-aggregations

[ref-caching-partitioning]: /docs/pre-aggregations/using-pre-aggregations#partitioning

[ref-caching-preaggs-target]: /docs/pre-aggregations/matching-pre-aggregations#matching-algorithm

[ref-config-downstream-superset]: /admin/connect-to-data/visualization-tools/superset

[ref-cube-refreshkey]: /reference/data-modeling/cube#refresh_key

[ref-production-checklist-refresh]: /cube-core/running-in-production#set-up-refresh-worker

[ref-recipe-funnels]: /recipes/data-modeling/funnels

[ref-schema-ref-cube-datasource]: /reference/data-modeling/cube#data_source

[ref-schema-dimensions]: /reference/data-modeling/dimensions

[ref-schema-measures]: /reference/data-modeling/measures

[ref-schema-segments]: /reference/data-modeling/segments

[ref-schema-types-dim-time]: /reference/data-modeling/dimensions#type

[ref-naming]: /docs/data-modeling/syntax#naming

[self-granularity]: #granularity

[self-incremental]: #incremental

[self-origsql-preaggs]: #use_original_sql_pre_aggregations

[self-originalsql]: #original_sql

[self-refreshkey]: #refresh_key

[self-refreshkey-every]: #every

[self-refreshkey-sql]: #sql

[self-rollup]: #rollup

[self-rollupjoin]: #rollup_join

[self-rolluplambda]: #rollup_lambda

[self-timedimension]: #time_dimension

[self-buildrange]: #build_range_start-and-build_range_end

[wiki-olap-ops]: https://en.wikipedia.org/wiki/OLAP_cube#Operations

[wiki-composable-agg-fn]: https://en.wikipedia.org/wiki/Aggregate_function#Decomposable_aggregate_functions

[ref-indexes]: /docs/pre-aggregations/using-pre-aggregations#using-indexes

[ref-aggregating-indexes]: /docs/pre-aggregations/using-pre-aggregations#aggregating-indexes

[ref-pre-aggs]: /docs/pre-aggregations/using-pre-aggregations

[ref-ref-cubes]: /reference/data-modeling/cube

[ref-custom-granularity]: /reference/data-modeling/dimensions#granularities

[ref-env-allow-non-strict]: /reference/configuration/environment-variables#cubejs-pre-aggregations-allow-non-strict-date-range-match

[ref-config-downstream-tableau]: /admin/connect-to-data/visualization-tools/tableau