Measuring Cross-Product Adoption Utilizing dbt_set_similarity | by Matthew Senick

Are You Being Unfair to LLMs?

Constructing a Сustom MCP Chatbot | In the direction of Knowledge Science

Enhancing cross-product insights inside dbt workflows

For multi-product firms, one essential metric is commonly what is known as “cross-product adoption”. (i.e. understanding how customers have interaction with a number of choices in a given product portfolio)

One measure steered to calculate cross-product or cross-feature utilization within the standard guide Hacking Development [1] is the Jaccard Index. Historically used to measure the similarity between two units, the Jaccard Index may function a robust instrument for assessing product adoption patterns. It does this by quantifying the overlap in customers between merchandise, you may establish cross-product synergies and development alternatives.

A dbt bundle dbt_set_similarity is designed to simplify the calculation of set similarity metrics instantly inside an analytics workflow. This bundle offers a way to calculate the Jaccard Indices inside SQL transformation workloads.

To import this bundle into your dbt venture, add the next to the packages.yml file. We will even want dbt_utils for the needs of this articles instance. Run a dbt deps command inside your venture to put in the bundle.

packages:
- bundle: Matts52/dbt_set_similarity
model: 0.1.1
- bundle: dbt-labs/dbt_utils
model: 1.3.0

The Jaccard Index, also called the Jaccard Similarity Coefficient, is a metric used to measure the similarity between two units. It’s outlined as the scale of the intersection of the units divided by the scale of their union.

Mathematically, it may be expressed as:

The Jaccard Index represents the “Intersection” over the “Union” of two units (picture by writer)

The place:

A and B are two units (ex. customers of product A and product B)
The numerator represents the variety of components in each units
The denominator represents the overall variety of distinct components throughout each units

The Jaccard Index is especially helpful within the context of cross-product adoption as a result of:

It focuses on the overlap between two units, making it splendid for understanding shared person bases
It accounts for variations within the complete dimension of the units, making certain that outcomes are proportional and never skewed by outliers

For instance:

If 100 customers undertake Product A and 50 undertake Product B, with 25 customers adopting each, the Jaccard Index is 25 / (100 + 50 — 25) = 0.2, indicating a 20% overlap between the 2 person bases by the Jaccard Index.

The instance dataset we will likely be utilizing is a fictional SaaS firm which gives space for storing as a product for customers. This firm offers two distinct storage merchandise: doc storage (doc_storage) and picture storage (photo_storage). These are both true, indicating the product has been adopted, or false, indicating the product has not been adopted.

Moreover, the demographics (user_category) that this firm serves are both tech fans or householders.