> ## Documentation Index
> Fetch the complete documentation index at: https://cubed3-feat-druid-driver-streaming.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Joins

> Joins define relationships between cubes, allowing you to access and compare members from multiple cubes at the same time.

You can use the `joins` parameter within [cubes][ref-ref-cubes] to define joins to other cubes.
Joins allow to access and compare members from two or more cubes at the same time.

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: my_cube
      # ...

      joins:
        - name: target_cube
          relationship: one_to_one || one_to_many || many_to_one
          sql: SQL ON clause
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`my_cube`, {
    // ...

    joins: {
      target_cube: {
        relationship: `one_to_one` || `one_to_many` || `many_to_one`,
        sql: `SQL ON clause`
      }
    }
  })
  ```
</CodeGroup>

All joins are generated as `LEFT JOIN`. The cube which defines the join serves
as a main table, and any cubes referenced inside the `joins` property are used
in the `LEFT JOIN` clause. Learn more about direction of joins
[here][ref-schema-fundamentals-join-dir].

The semantics of `INNER JOIN` can be achieved with additional filtering. For
example, a simple check of whether the column value `IS NOT NULL` by using [set
filter][ref-restapi-query-filter-op-set] satisfies this requirement.

There's also no way to define `FULL OUTER JOIN` and `RIGHT OUTER JOIN` for the
sake of join modeling simplicity. To get `RIGHT OUTER JOIN` semantics just
define join [from other side of relationship][ref-schema-fundamentals-join-dir].
The `FULL OUTER JOIN` can be built inside cube [sql][ref-schema-cube-sql]
parameter. Quite frequently, `FULL OUTER JOIN` is used to solve [Data
Blending][ref-schema-data-blenging] or similar problems. In that case, it's best
practice to have a separate cube for such an operation.

## Parameters

### name

The name must match the name of the joined cube and, thus, follow the [naming
conventions][ref-naming].

For example, when the `products` cube is joined on to the `orders` cube, we
would define the join as follows:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      # ...

      joins:
        - name: products
          relationship: many_to_one
          sql: "{CUBE.id} = {products.order_id}"
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    // ...

    joins: {
      products: {
        relationship: `many_to_one`,
        sql: `${CUBE.id} = ${products.order_id}`
      }
    }
  })
  ```
</CodeGroup>

### relationship

The `relationship` property is used to describe the type of the relationship
between joined cubes. It’s important to properly define the type of relationship
so Cube can accurately calculate measures.

The cube that declares the join is considered *left* in terms of the [left
join][wiki-left-join] semantics, and the joined cube is considered *right*. It
means that all rows of the *left* cube are selected, while only those rows of
the *right* cube that match the condition are selected as well. For more
information and specific examples, please see [join
directions][ref-schema-fundamentals-join-dir].

<Info>
  The join does not need to be defined on both cubes, but the definition can
  affect the [join direction][ref-schema-fundamentals-join-dir].
</Info>

You can use the following types of relationships:

* `one_to_one` for [one-to-one][wiki-1-1] relationships
* `one_to_many` for [one-to-many][wiki-1-m] relationships
* `many_to_one` for the opposite of [one-to-many][wiki-1-m] relationships

<Warning>
  The types of relationships listed above were introduced in v0.32.19 for clarity
  as they are commonly used in the data space. The following aliases were used
  before and are still valid, so there's no need to update existing data models:

  * `one_to_one` was known as `has_one` or `hasOne`
  * `one_to_many` was known as `has_many` or `hasMany`
  * `many_to_one` was known as `belongs_to` or `belongsTo`
</Warning>

#### One-to-one

The `one_to_one` type indicates a [one-to-one][wiki-1-1] relationship between
the declaring cube and the joined cube. It means that one row in the declaring
cube can match only one row in the joined cube.

For example, in a data model containing `users` and their `profiles`, the
`users` cube would declare the following join:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: users
      # ...

      joins:
        - name: profiles
          relationship: one_to_one
          sql: "{users}.id = {profiles.user_id}"
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`users`, {
    // ...

    joins: {
      profiles: {
        relationship: `one_to_one`,
        sql: `${CUBE}.id = ${profiles.user_id}`
      }
    }
  })
  ```
</CodeGroup>

#### One-to-many

The `one_to_many` type indicates a [one-to-many][wiki-1-m] relationship between
the declaring cube and the joined cube. It means that one row in the declaring
cube can match many rows in the joined cube.

For example, in a data model containing `authors` and the `books` they have
written, the `authors` cube would declare the following join:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: authors
      # ...

      joins:
        - name: books
          relationship: one_to_many
          sql: "{authors}.id = {books.author_id}"
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`authors`, {
    // ...

    joins: {
      books: {
        relationship: `one_to_many`,
        sql: `${CUBE}.id = ${books.author_id}`
      }
    }
  })
  ```
</CodeGroup>

#### Many-to-one

The `many_to_one` type indicates the many-to-one relationship between the
declaring cube and the joined cube. You’ll often find this type of relationship
on the opposite side of the [one-to-many][wiki-1-m] relationship. It means that
one row in the declaring cube matches a single row in the joined cube, while a
row in the joined cube can match many rows in the declaring cube.

For example, in a data model containing `orders` and `customers` who made them,
the `orders` cube would have the following join:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      # ...

      joins:
        - name: customers
          relationship: many_to_one
          sql: "{orders}.customer_id = {customers.id}"
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    // ...

    joins: {
      customers: {
        relationship: `many_to_one`,
        sql: `${CUBE}.customer_id = ${customers.id}`
      }
    }
  })
  ```
</CodeGroup>

### sql

`sql` is necessary to indicate a related column between cubes. It is important
to properly specify a matching column when creating joins. Take a look at the
example below:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      # ...

      joins:
        - name: customers
          relationship: many_to_one
          sql: "{orders}.customer_id = {customers.id}"
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    // ...

    joins: {
      customers: {
        relationship: `many_to_one`,
        // The `customer_id` column of the `orders` cube corresponds to the
        // `id` dimension of the `customers` cube
        sql: `${CUBE}.customer_id = ${customers.id}`
      }
    }
  })
  ```
</CodeGroup>

## Setting a primary key

In order for a join to work, it is necessary to define a `primary_key` as
specified below. It is a requirement when a join is defined so that Cube can
handle row multiplication issues such as chasm and fan traps.

Let's imagine you want to calculate `Order Amount` by `Order Item Product Name`.
In this case, `Order` rows will be multiplied by the `Order Item` join due to
the `one_to_many` relationship. In order to produce correct results, Cube will
select distinct primary keys from `Order` first and then will join these primary
keys with `Order` to get the correct `Order Amount` sum result. Please note that
`primary_key` should be defined in the `dimensions` section.

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      # ...

      dimensions:
        - name: customer_id
          sql: id
          type: number
          primary_key: true
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    // ...

    dimensions: {
      customer_id: {
        sql: `id`,
        type: `number`,
        primary_key: true
      }
    }
  })
  ```
</CodeGroup>

<Info>
  Setting `primary_key` to `true` will change the default value of the `public`
  parameter to `false`. If you still want `public` to be `true` — set it manually.
</Info>

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      # ...

      dimensions:
        - name: customer_id
          sql: id
          type: number
          primary_key: true
          public: true
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    // ...

    dimensions: {
      customer_id: {
        sql: `id`,
        type: `number`,
        primary_key: true,
        public: true
      }
    }
  })
  ```
</CodeGroup>

If you don't have a single column in a cube's table that can act as a primary
key, you can create a composite primary key as shown below.

<Info>
  The example uses Postgres string concatenation; note that SQL may be different
  depending on your database.
</Info>

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: users
      # ...

      dimensions:
        - name: id
          sql:
            "{CUBE}.user_id || '-' || {CUBE}.signup_week || '-' ||
            {CUBE}.activity_week"
          type: string
          primary_key: true
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`users`, {
    // ...

    dimensions: {
      id: {
        sql: `${CUBE}.user_id || '-' || ${CUBE}.signup_week || '-' || ${CUBE}.activity_week`,
        type: `string`,
        primary_key: true
      }
    }
  })
  ```
</CodeGroup>

## Chasm and fan traps

Cube automatically detects chasm and fan traps based on the `many_to_one` and `one_to_many` relationships defined in join.
When detected, Cube generates a deduplication query that evaluates all distinct primary keys within the multiplied measure's cube and then joins distinct primary keys to this cube on itself to calculate the aggregation result.
If there's more than one multiplied measure in a query, then such query is generated for every such multiplied measure, and results are joined.
Cube solves for chasm and fan traps during query time.
If there's pre-aggregregation that fits measure multiplication requirements it'd be leveraged to serve such a query.
Such pre-aggregations and queries are always considered non-additive for the purpose of pre-aggregation matching.

Let's consider an example data model:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      dimensions:
        - name: id
          sql: id
          type: number
          primary_key: true
        - name: city
          sql: city
          type: string

      joins:
        - name: customers
          relationship: many_to_one
          sql: "{orders}.customer_id = {customers.id}"

  - name: customers
      sql_table: customers

      dimensions:
        - name: id
          sql: id
          type: number
          primary_key: true

      measures:
        - name: average_age
          sql: age
          type: avg

  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`

    dimensions: {
      id: {
        sql: `id`,
        type: `number`,
        primary_key: true
      },
      city: {
        sql: `city`,
        type: `string`
      }
    },

    joins: {
      customers: {
        relationship: `many_to_one`,
        sql: `${CUBE}.customer_id = ${customers.id}`
      }
    }
  })

  cube(`customers`, {
    sql_table: `customers`

    measures: {
      count: {
        type: `count`
      }
    },

    dimensions: {
      id: {
        sql: `id`,
        type: `number`,
        primary_key: true
      }
    }
  })
  ```
</CodeGroup>

If we try to query `customers.average_age` by `orders.city`, the Cube detects that the `average_age` measure in the `customers` cube would be multiplied by `orders` to `customers` and would generate SQL similar to:

```sql theme={null}
SELECT
  "keys"."orders__city",
  avg("customers_key__customers".age) "customers__average_age"
FROM
  (
    SELECT
      DISTINCT "customers_key__orders".city "orders__city",
      "customers_key__customers".id "customers__id"
    FROM
      orders AS "customers_key__orders"
      LEFT JOIN customers AS "customers_key__customers" ON "customers_key__orders".customer_id = "customers_key__customers".id
  ) AS "keys"
  LEFT JOIN customers AS "customers_key__customers" ON "keys"."customers__id" = "customers_key__customers".id
GROUP BY
  1
```

## CUBE reference

When you have several joined cubes, you should accurately use columns’ names to
avoid any mistakes. One way to make no mistakes is to use the `CUBE` reference.
It allows you to specify columns’ names in cubes without any ambiguity. During
the implementation of the query, this reference will be used as an alias for a
basic cube. Take a look at the following example:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: users
      # ...

      dimensions:
        - name: name
          sql: "{CUBE}.name"
          type: string
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`users`, {
    // ...

    dimensions: {
      name: {
        sql: `${CUBE}.name`,
        type: `string`
      }
    }
  })
  ```
</CodeGroup>

## Transitive joins

<Warning>
  Join graph is directed and `a → b` join is different from `b → a`. [Learn more
  about it here][ref-schema-fundamentals-join-dir].
</Warning>

Cube automatically takes care of transitive joins. For example, consider the
following data model:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: a
      # ...

      joins:
        - name: b
          sql: "{a}.b_id = {b.id}"
          relationship: many_to_one

      measures:
        - name: count
          type: count

    - name: b
      # ...

      joins:
        - name: c
          sql: "{b}.c_id = {c.id}"
          relationship: many_to_one

    - name: c
      # ...

      dimensions:
        - name: category
          sql: category
          type: string
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`a`, {
    // ...

    joins: {
      b: {
        sql: `${a}.b_id = ${b.id}`,
        relationship: `many_to_one`
      }
    },

    measures: {
      count: {
        type: `count`
      }
    }
  })

  cube(`b`, {
    // ...

    joins: {
      c: {
        sql: `${b}.c_id = ${c.id}`,
        relationship: `many_to_one`
      }
    }
  })

  cube(`c`, {
    // ...

    dimensions: {
      category: {
        sql: `category`,
        type: `string`
      }
    }
  })
  ```
</CodeGroup>

Assume that the following query is run:

```json theme={null}
{
  "measures": ["a.count"],
  "dimensions": ["c.category"]
}
```

Joins `a → b` and `b → c` will be resolved automatically. Cube uses the
[Dijkstra algorithm][wiki-djikstra-alg] to find a join path between cubes given
requested members.

In case there are multiple join paths that can be used to join the same set of cubes, Cube will collect cube names from members in the following order:

1. Measures
2. Dimensions
3. Segments
4. Time dimensions

Cube makes join trees as predictable and stable as possible, but this isn't guaranteed in case multiple join paths exist.
Please use views to address join predictability and stability.

[ref-ref-cubes]: /reference/data-modeling/cube

[ref-restapi-query-filter-op-set]: /reference/core-data-apis/rest-api/query-format#set

[ref-schema-fundamentals-join-dir]: /docs/data-modeling/joins#direction-of-joins

[ref-schema-cube-sql]: /reference/data-modeling/cube#sql

[ref-schema-data-blenging]: /docs/data-modeling/concepts/data-blending#data-blending

[ref-naming]: /docs/data-modeling/syntax#naming

[wiki-djikstra-alg]: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

[wiki-left-join]: https://en.wikipedia.org/wiki/Join_\(SQL\)#Left_outer_join

[wiki-1-1]: https://en.wikipedia.org/wiki/One-to-one_\(data_model\)

[wiki-1-m]: https://en.wikipedia.org/wiki/One-to-many_\(data_model\)
