> ## Documentation Index
> Fetch the complete documentation index at: https://cubed3-feat-druid-driver-streaming.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Implementing Entity-Attribute-Value Model (EAV)

> Shape sparse entity-attribute-value warehouse tables into queryable dimensions and joins while preserving flexibility across entities.

## Use case

We want to create a cube for a dataset which uses the
[Entity-Attribute-Value](https://en.wikipedia.org/wiki/Entity–attribute–value_model)
model (EAV). It stores entities in a table that can be joined to another table
with numerous attribute-value pairs. Each entity is not guaranteed to have the
same set of associated attributes, thus making the entity-attribute-value
relation a sparse matrix. In the cube, we'd like every attribute to be modeled
as a dimension.

## Data modeling

Let's explore the `users` cube that contains the entities:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: users
      sql_table: users

      joins:
        - name: orders
          relationship: one_to_many
          sql: "{CUBE}.id = {orders.user_id}"

      dimensions:
        - name: name
          sql: "first_name || ' ' || last_name"
          type: string
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`users`, {
    sql_table: `users`,

    joins: {
      orders: {
        relationship: "one_to_many",
        sql: `${CUBE}.id = ${orders.user_id}`
      }
    },

    dimensions: {
      name: {
        sql: `first_name || ' ' || last_name`,
        type: `string`
      }
    }
  })
  ```
</CodeGroup>

The `users` cube is joined with the `orders` cube to reflect that there might be
many orders associated with a single user. The orders remain in various
statuses, as reflected by the `status` dimension, and their creation dates are
available via the `created_at` dimension:

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: orders
      sql_table: orders

      dimensions:
        - name: user_id
          sql: user_id
          type: string

        - name: status
          sql: status
          type: string

        - name: created_at
          sql: created_at
          type: time
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`orders`, {
    sql_table: `orders`,

    dimensions: {
      user_id: {
        sql: `user_id`,
        type: `string`
      },

      status: {
        sql: `status`,
        type: `string`
      },

      created_at: {
        sql: `created_at`,
        type: `time`
      }
    }
  })
  ```
</CodeGroup>

Currently, the dataset contains orders in the following statuses:

| status     |
| ---------- |
| completed  |
| processing |
| shipped    |

Let's say that we'd like to know, for each user, the earliest creation date for
their orders in any of these statuses. In terms of the EAV model:

* the users serve as *entities* and they should be modeled with a *cube*
* order statuses serve as *attributes* and they should be modeled as
  *dimensions*
* the earliest creation dates for each status serve as attribute *values* and
  they will be modeled as *dimension values*

Let's explore some possible ways to model that.

### Static attributes

We already know that the following statuses are present in the dataset:
`completed`, `processing`, and `shipped`. Let's assume this set of statuses is
not going to change often.

Then, modeling the cube is as simple as defining a few joins (one join per
attribute):

<CodeGroup>
  ```yaml title="YAML" theme={null}
  cubes:
    - name: users_statuses_joins
      sql: |
        SELECT
          users.first_name,
          users.last_name,
          MIN(cOrders.created_at) AS cCreatedAt,
          MIN(pOrders.created_at) AS pCreatedAt,
          MIN(sOrders.created_at) AS sCreatedAt
        FROM public.users AS users
        LEFT JOIN public.orders AS cOrders
          ON users.id = cOrders.user_id AND cOrders.status = 'completed'
        LEFT JOIN public.orders AS pOrders
          ON users.id = pOrders.user_id AND pOrders.status = 'processing'
        LEFT JOIN public.orders AS sOrders
          ON users.id = sOrders.user_id AND sOrders.status = 'shipped'
        GROUP BY 1, 2

      dimensions:
        - name: name
          sql: "first_name || ' ' || last_name"
          type: string

        - name: completed_created_at
          sql: cCreatedAt
          type: time

        - name: processing_created_at
          sql: pCreatedAt
          type: time

        - name: shipped_created_at
          sql: sCreatedAt
          type: time
  ```

  ```javascript title="JavaScript" theme={null}
  cube(`users_statuses_joins`, {
    sql: `
      SELECT
        users.first_name,
        users.last_name,
        MIN(cOrders.created_at) AS cCreatedAt,
        MIN(pOrders.created_at) AS pCreatedAt,
        MIN(sOrders.created_at) AS sCreatedAt
      FROM public.users AS users
      LEFT JOIN public.orders AS cOrders
        ON users.id = cOrders.user_id AND cOrders.status = 'completed'
      LEFT JOIN public.orders AS pOrders
        ON users.id = pOrders.user_id AND pOrders.status = 'processing'
      LEFT JOIN public.orders AS sOrders
        ON users.id = sOrders.user_id AND sOrders.status = 'shipped'
      GROUP BY 1, 2
    `,

    dimensions: {
      name: {
        sql: `first_name || ' ' || last_name`,
        type: `string`
      },

      completed_created_at: {
        sql: `cCreatedAt`,
        type: `time`
      },

      processing_created_at: {
        sql: `pCreatedAt`,
        type: `time`
      },

      shipped_created_at: {
        sql: `sCreatedAt`,
        type: `time`
      }
    }
  })
  ```
</CodeGroup>

Querying the cube would yield data like this. As we can see, every user has
attributes that show the earliest creation date for their orders in all three
statuses. However, some attributes don't have values (meaning that a user
doesn't have orders in this status).

| name              | completed\_created\_at | processing\_created\_at | shipped\_created\_at |
| ----------------- | ---------------------: | ----------------------: | -------------------: |
| Ally Blanda       |             2019-03-05 |                       — |           2019-04-06 |
| Cayla Mayert      |             2019-06-14 |              2021-05-20 |                    — |
| Concepcion Maggio |                      — |              2020-07-14 |           2019-07-19 |

The drawback is that when the set of statuses changes, we'll need to amend the
cube definition in several places: update selected values and joins in SQL as
well as update the dimensions. Let's see how to work around that.

### Static attributes, DRY version

<Note>
  This approach uses JavaScript-specific features — programmatic code generation
  with helper functions and `Object.assign`. It is not available in YAML data
  models.
</Note>

We can embrace the
[Don't Repeat Yourself](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)
principle and eliminate the repetition by generating the cube definition
dynamically based on the list of statuses. Let's create a new JavaScript model
so we can move all repeated code patterns into handy functions and iterate over
statuses in relevant parts of the cube's code.

```javascript theme={null}
const statuses = ["completed", "processing", "shipped"]

const createValue = (status, index) =>
  `MIN(orders_${index}.created_at) AS created_at_${index}`

const createJoin = (status, index) =>
  `LEFT JOIN public.orders AS orders_${index}
    ON users.id = orders_${index}.user_id
    AND orders_${index}.status = '${status}'`;

const createDimension = (status, index) => ({
  [`${status}_created_at`]: {
    sql: (CUBE) => `created_at_${index}`,
    type: `time`
  }
})

cube(`users_statuses_DRY`, {
  sql: `
    SELECT
      users.first_name,
      users.last_name,
      ${statuses.map(createValue).join(",")}
    FROM public.users AS users
    ${statuses.map(createJoin).join("")}
    GROUP BY 1, 2
  `,

  dimensions: Object.assign(
    {
      name: {
        sql: `first_name || ' ' || last_name`,
        type: `string`
      }
    },
    statuses.reduce(
      (all, status, index) => ({
        ...all,
        ...createDimension(status, index)
      }),
      {}
    )
  )
})
```

The new `users_statuses_DRY` cube is functionally identical to the
`users_statuses_joins` cube above. Querying this new cube would yield the same
data. However, there's still a static list of statuses present in the cube's
source code. Let's work around that next.

### Dynamic attributes

<Note>
  This approach uses JavaScript-specific features — `asyncModule`, `require`, and
  the Node.js `pg` package. It is not available in YAML data models.
</Note>

We can eliminate the list of statuses from the cube's code by loading this list
from an external source, e.g., the data source. Here's the code from the
`fetch.js` file that defines the `fetchStatuses` function that would load the
statuses from the database. Note that it uses the `pg` package (Node.js client
for Postgres) and reuses the credentials from Cube.

```javascript theme={null}
const { Pool } = require("pg")

const pool = new Pool({
  host: process.env.CUBEJS_DB_HOST,
  port: process.env.CUBEJS_DB_PORT,
  user: process.env.CUBEJS_DB_USER,
  password: process.env.CUBEJS_DB_PASS,
  database: process.env.CUBEJS_DB_NAME
})

const statusesQuery = `
  SELECT DISTINCT status
  FROM public.orders
`;

exports.fetchStatuses = async () => {
  const client = await pool.connect()
  const result = await client.query(statusesQuery)
  client.release()

  return result.rows.map((row) => row.status)
}
```

In the cube file, we will use the `fetchStatuses` function to load the list of
statuses. We will also wrap the cube definition with the `asyncModule` built-in
function that allows the data model to be created
[dynamically](/docs/data-modeling/dynamic).

```javascript theme={null}
const fetchStatuses = require("../fetch").fetchStatuses

asyncModule(async () => {
  const statuses = await fetchStatuses()

  const createValue = (status, index) =>
    `MIN(orders_${index}.created_at) AS created_at_${index}`

  const createJoin = (status, index) =>
    `LEFT JOIN public.orders AS orders_${index}
      ON users.id = orders_${index}.user_id
      AND orders_${index}.status = '${status}'`;

  const createDimension = (status, index) => ({
    [`${status}_created_at`]: {
      sql: (CUBE) => `created_at_${index}`,
      type: `time`
    }
  })

  cube(`users_statuses_dynamic`, {
    sql: `
      SELECT
        users.first_name,
        users.last_name,
        ${statuses.map(createValue).join(",")}
      FROM public.users AS users
      ${statuses.map(createJoin).join("")}
      GROUP BY 1, 2
    `,

    dimensions: Object.assign(
      {
        name: {
          sql: `first_name || ' ' || last_name`,
          type: `string`
        }
      },
      statuses.reduce(
        (all, status, index) => ({
          ...all,
          ...createDimension(status, index)
        }),
        {}
      )
    )
  })
})
```

Again, the new `users_statuses_dynamic` cube is functionally identical to the
previously created cubes. So, querying this new cube would yield the same data
too.
