> ## Documentation Index
> Fetch the complete documentation index at: https://cubed3-feat-druid-driver-streaming.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Joins in the SQL API

> The best practice is to use views to specify explicit join paths for SQL API queries.

## Views

The best practice is to use [views][ref-views] to specify explicit [join
paths][ref-join-paths] for SQL API queries.

While BI tools would see a view as a table, in fact, no materialization is performed
until Cube is queried through the SQL API. At the query time, Cube will try to maximize
member pushdown, so only required parts of the view are materialized at query time.

Cube also solves fan and chasm traps based on the dimensions selected in the query, so
if measure aggregation types are properly set up, you will see correct results in BI tools
even though cubes and views are seen just as tables.

Consider the following data model:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  views:
    - name: orders_users
      cubes:
        - join_path: orders
          includes:
            - status
            - count

        - join_path: orders.users
          prefix: true
          includes: 
            - id
            - city
            - state
  ```

  ```javascript title="JavaScript" theme={null}
  view(`orders_users`, {
    cubes: [
      {
        join_path: orders,
        includes: ['status', 'count']
      },
      {
        join_path: orders.users,
        prefix: true,
        includes: ['id', 'city', 'state']
      }
    ]
  })
  ```
</CodeGroup>

With this data model, here's how you can get orders count by users' cities:

```sql theme={null}
cube=> SELECT MEASURE(count) AS count, users_city FROM orders_users GROUP BY 2;

 count |   users_city
-------+---------------
  1416 | Los Angeles
  1412 | Seattle
  1365 | Mountain View
  1263 | New York
  1220 | Austin
  1164 | Chicago
  1101 | San Francisco
  1059 | Palo Alto
(8 rows)
```

## `CROSS JOIN` and `__cubeJoinField`

The SQL API also supports joins through the `__cubeJoinField` virtual column, which
allows to control how specific cubes are joined. This is considered an advanced
functionality, and views should be used where possible. Join can also be done through
`CROSS JOIN`.

Usage of `CROSS JOIN` or `__cubeJoinField` instructs Cube to perform join as it's defined
in a data model while using provided cubes as [join hints][ref-join-hints].

For example, the following query joins the `orders` and `products` tables under
the hood on `orders.product_id = products.id`, exactly the same way as the
REST (JSON) API query does:

```sql theme={null}
cube=> SELECT p.name, SUM(o.count) FROM orders o LEFT JOIN products p ON o.__cubeJoinField = p.__cubeJoinField GROUP BY 1 LIMIT 5;
           name           | SUM(o.count)
--------------------------+--------------
 Tasty Plastic Mouse      |          121
 Intelligent Cotton Ball  |          119
 Ergonomic Steel Tuna     |          116
 Intelligent Rubber Pants |          116
 Generic Wooden Gloves    |          116
(5 rows)
```

Or through `CROSS JOIN`:

```sql theme={null}
cube=> SELECT p.name, sum(o.count) FROM orders o CROSS JOIN products p GROUP BY 1 LIMIT 5;
           name           | SUM(o.count)
--------------------------+--------------
 Tasty Plastic Mouse      |          121
 Intelligent Cotton Ball  |          119
 Ergonomic Steel Tuna     |          116
 Intelligent Rubber Pants |          116
 Generic Wooden Gloves    |          116
(5 rows)
```

In the resulting query plan, you won't see any joins as you can't see those for
REST (JSON) API queries either:

```sql theme={null}
cube=> EXPLAIN SELECT p.name, sum(o.count) FROM orders o LEFT JOIN products p ON o.__cubeJoinField = p.__cubeJoinField GROUP BY 1 LIMIT 5;
   plan_type   |         plan
---------------+-----------------------
 logical_plan  | CubeScan: request={  +
               |   "measures": [      +
               |     "orders.count"   +
               |   ],                 +
               |   "dimensions": [    +
               |     "products.name"  +
               |   ],                 +
               |   "segments": [],    +
               |   "limit": 5         +
               | }
 physical_plan | CubeScanExecutionPlan+
               |
(2 rows)
```

This feature allows you to join cubes even joined transitively only.

In most of the BI tools you'd use `__cubeJoinField` to define joins between cube
tables. In tools that allow defining custom SQL datasets, you can use joined
tables as a dataset SQL. For example:

```sql theme={null}
SELECT o.count as count, p.name as product_name, p.description as product_description
FROM orders o
CROSS JOIN products p;
```

Please note we use aliasing to avoid name clashing between cube members in a
resulting data set. In this case, wrapped SQL will be properly processed by
Cube, pushing down all operations to Cube query:

```sql theme={null}
cube=> SELECT product_name, SUM(count) FROM (
  SELECT o.count as count, p.name as product_name, p.description as product_description
  FROM orders o CROSS
  JOIN products p
) joined
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
       product_name       | SUM(joined.count)
--------------------------+-------------------
 Tasty Plastic Mouse      |               121
 Intelligent Cotton Ball  |               119
 Ergonomic Steel Tuna     |               116
 Intelligent Rubber Pants |               116
 Generic Wooden Gloves    |               116
(5 rows)
```

We can see this by introspecting the `EXPLAIN` plan for this query:

```sql theme={null}
cube=> EXPLAIN SELECT product_name, SUM(count) FROM (
  SELECT o.count as count, p.name as product_name, p.description as product_description
  FROM orders o
  CROSS JOIN products p
) joined
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
   plan_type   |         plan
---------------+-----------------------
 logical_plan  | CubeScan: request={  +
               |   "measures": [      +
               |     "orders.count"   +
               |   ],                 +
               |   "dimensions": [    +
               |     "products.name"  +
               |   ],                 +
               |   "segments": [],    +
               |   "order": [         +
               |     [                +
               |       "orders.count",+
               |       "desc"         +
               |     ]                +
               |   ],                 +
               |   "limit": 5         +
               | }
 physical_plan | CubeScanExecutionPlan+
               |
(2 rows)
```

Please note that, even if `product_description` is in the inner selection, it isn't
evaluated in the final query as it isn't used in any way.

## Joining views on a shared dimension

When you join two views on a dimension that resolves to the **same underlying
cube member** and group by that dimension, Cube doesn't perform a row-level
join. Instead it merges them into a single
[multi-fact query][ref-multi-fact-views]: each view becomes its own
aggregating subquery and the results are stitched together on the shared key,
so measures from both views are combined without fan-out.

```sql theme={null}
SELECT
  o.name,
  MEASURE(o.total_amount),
  MEASURE(r.total_refund)
FROM orders_view o
LEFT JOIN returns_view r ON r.name = o.name
GROUP BY 1
```

The `JOIN` type (`INNER`, `LEFT`, `RIGHT`, `FULL`) controls which keys are
kept. This requires the [Tesseract SQL planner][ref-tesseract-env] and only
applies to grouped queries whose `GROUP BY` is the join key. See
[multi-fact views][ref-multi-fact-views] for the full explanation.

[ref-views]: /docs/data-modeling/views

[ref-join-paths]: /docs/data-modeling/joins#join-paths

[ref-join-hints]: /docs/data-modeling/joins#join-hints

[ref-multi-fact-views]: /docs/data-modeling/multi-fact-views

[ref-tesseract-env]: /reference/configuration/environment-variables#cubejs_tesseract_sql_planner
