• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Saturday, October 18, 2025
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Machine Learning

Measuring Cross-Product Adoption Utilizing dbt_set_similarity | by Matthew Senick | Dec, 2024

Admin by Admin
December 28, 2024
in Machine Learning
0
1xorwpyl3rbfyrnyndutg7g.png
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

READ ALSO

Cease Feeling Misplaced :  The right way to Grasp ML System Design

First Ideas Considering for Knowledge Scientists


Enhancing cross-product insights inside dbt workflows

Matthew Senick

Towards Data Science

For multi-product firms, one essential metric is commonly what is known as “cross-product adoption”. (i.e. understanding how customers have interaction with a number of choices in a given product portfolio)

One measure steered to calculate cross-product or cross-feature utilization within the standard guide Hacking Development [1] is the Jaccard Index. Historically used to measure the similarity between two units, the Jaccard Index may function a robust instrument for assessing product adoption patterns. It does this by quantifying the overlap in customers between merchandise, you may establish cross-product synergies and development alternatives.

A dbt bundle dbt_set_similarity is designed to simplify the calculation of set similarity metrics instantly inside an analytics workflow. This bundle offers a way to calculate the Jaccard Indices inside SQL transformation workloads.

To import this bundle into your dbt venture, add the next to the packages.yml file. We will even want dbt_utils for the needs of this articles instance. Run a dbt deps command inside your venture to put in the bundle.

packages:
- bundle: Matts52/dbt_set_similarity
model: 0.1.1
- bundle: dbt-labs/dbt_utils
model: 1.3.0

The Jaccard Index, also called the Jaccard Similarity Coefficient, is a metric used to measure the similarity between two units. It’s outlined as the scale of the intersection of the units divided by the scale of their union.

Mathematically, it may be expressed as:

The Jaccard Index represents the “Intersection” over the “Union” of two units (picture by writer)

The place:

  • A and B are two units (ex. customers of product A and product B)
  • The numerator represents the variety of components in each units
  • The denominator represents the overall variety of distinct components throughout each units
(picture by writer)

The Jaccard Index is especially helpful within the context of cross-product adoption as a result of:

  • It focuses on the overlap between two units, making it splendid for understanding shared person bases
  • It accounts for variations within the complete dimension of the units, making certain that outcomes are proportional and never skewed by outliers

For instance:

  • If 100 customers undertake Product A and 50 undertake Product B, with 25 customers adopting each, the Jaccard Index is 25 / (100 + 50 — 25) = 0.2, indicating a 20% overlap between the 2 person bases by the Jaccard Index.

The instance dataset we will likely be utilizing is a fictional SaaS firm which gives space for storing as a product for customers. This firm offers two distinct storage merchandise: doc storage (doc_storage) and picture storage (photo_storage). These are both true, indicating the product has been adopted, or false, indicating the product has not been adopted.

Moreover, the demographics (user_category) that this firm serves are both tech fans or householders.

For the sake of this instance, we’ll learn this csv file in as a “seed” mannequin named seed_example throughout the dbt venture.

Now, let’s say we need to calculate the jaccard index (cross-adoption) between our doc storage and picture storage merchandise. First, we have to create an array (listing) of the customers who’ve the doc storage product, alongside an array of the customers who’ve the picture storage product. Within the second cte, we apply the jaccard_coef operate from the dbt_set_similarity bundle to assist us simply compute the jaccard coefficient between the 2 arrays of person id’s.

with product_users as (
choose
array_agg(user_id) filter (the place doc_storage = true)
as doc_storage_users,
array_agg(user_id) filter (the place photo_storage = true)
as photo_storage_users
from {{ ref('seed_example') }}
)

choose
doc_storage_users,
photo_storage_users,
{{
dbt_set_similarity.jaccard_coef(
'doc_storage_users',
'photo_storage_users'
)
}} as cross_product_jaccard_coef
from product_users

Output from the above dbt mannequin (picture by writer)

As we will interpret, evidently simply over half (60%) of customers who’ve adopted both of merchandise, have adopted each. We will graphically confirm our end result by putting the person id units right into a Venn diagram, the place we see three customers have adopted each merchandise, amongst 5 complete customers: 3/5 = 0.6.

What the gathering of person id’s and product adoption would appear like, verifying our end result (picture by writer)

Utilizing the dbt_set_similarity bundle, creating segmented jaccard indices for our totally different person classes ought to be pretty pure. We are going to comply with the identical sample as earlier than, nevertheless, we’ll merely group our aggregations on the person class {that a} person belongs to.

with product_users as (
choose
user_category,
array_agg(user_id) filter (the place doc_storage = true)
as doc_storage_users,
array_agg(user_id) filter (the place photo_storage = true)
as photo_storage_users
from {{ ref('seed_example') }}
group by user_category
)

choose
user_category,
doc_storage_users,
photo_storage_users,
{{
dbt_set_similarity.jaccard_coef(
'doc_storage_users',
'photo_storage_users'
)
}} as cross_product_jaccard_coef
from product_users

Output from the above dbt mannequin (picture by writer)

We will see from the info that amongst householders, cross-product adoption is larger, when contemplating jaccard indices. As proven within the output, all householders who’ve adopted one of many product, have adopted each. In the meantime, solely one-third of the tech fans who’ve adopted one product have adopted each of the merchandise. Thus, in our very small dataset, cross-product adoption is larger amongst householders versus tech fans.

We will graphically confirm the output by once more creating Venn diagram:

Venn diagrams break up by the 2 segments (picture by writer)

dbt_set_similarity offers a simple and environment friendly strategy to calculate cross-product adoption metrics such because the Jaccard Index instantly inside a dbt workflow. By making use of this technique, multi-product firms can achieve precious insights into person conduct and adoption patterns throughout their product portfolio. In our instance, we demonstrated the calculation of total cross-product adoption in addition to segmented adoption for distinct person classes.

Utilizing the bundle for cross-product adoption is just one easy utility. In actuality, there exists numerous different potential functions of this method, for instance some areas are:

  • Function utilization evaluation
  • Advertising and marketing marketing campaign impression evaluation
  • Help evaluation

Moreover, this fashion of study is definitely not restricted to simply SaaS, however can apply to just about any trade. Pleased Jaccard-ing!

References

[1] Sean Ellis and Morgan Brown, Hacking Development (2017)

Sources

dbt bundle hub

Tags: AdoptionCrossProductdbt_set_similarityDecMatthewMeasuringSenick

Related Posts

Zero.jpg
Machine Learning

Cease Feeling Misplaced :  The right way to Grasp ML System Design

October 17, 2025
Ali alavi fwkma 1i7za unsplash scaled 1.jpg
Machine Learning

First Ideas Considering for Knowledge Scientists

October 15, 2025
Titleimage 1.jpg
Machine Learning

Constructing A Profitable Relationship With Stakeholders

October 14, 2025
20250924 154818 edited.jpg
Machine Learning

Find out how to Spin Up a Venture Construction with Cookiecutter

October 13, 2025
Blog images 3.png
Machine Learning

10 Information + AI Observations for Fall 2025

October 10, 2025
Img 5036 1.jpeg
Machine Learning

How the Rise of Tabular Basis Fashions Is Reshaping Knowledge Science

October 9, 2025
Next Post
Taxes20880x400 Id 25400d83 0e99 4925 B49d 2919ee0b560b Size900.jpg

US Tax Company to Regulate DeFi Platforms as Brokers, Mandates Gross Proceeds Reporting

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Blog.png

XMN is accessible for buying and selling!

October 10, 2025
0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025
Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
1da3lz S3h Cujupuolbtvw.png

Scaling Statistics: Incremental Customary Deviation in SQL with dbt | by Yuval Gorchover | Jan, 2025

January 2, 2025
0khns0 Djocjfzxyr.jpeg

Constructing Data Graphs with LLM Graph Transformer | by Tomaz Bratanic | Nov, 2024

November 5, 2024

EDITOR'S PICK

Unnamed 2025 01 20t222019.011.jpg

BC.GAME Unveils Wukong Slot and ‘Wukong Gold Legend’ Occasion with Unique Rewards and 1 BTC Prize Pool

January 20, 2025
Cerebras Deepseek 2 1 0125.png

Cerebras Stories Quickest DeepSeek R1 Distill Llama 70B Inference

February 5, 2025
Chatgpt image jun 15 2025 08 46 04 pm.jpg

LLM-as-a-Choose: A Sensible Information | In direction of Information Science

June 21, 2025
Eth cb 9.jpg

ETH Promote-Off? Whales Transfer $69M as Taker Quantity Plunges

August 6, 2025

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • OpenAI and Broadcom to Deploy 10 GW of OpenAI-Designed AI Accelerators
  • Cardano Consumers Push Onerous  — Charles Hoskinson Dismisses ‘Ethereum Killer’ Discuss, Altcoins Shift ⋆ ZyCrypto
  • Statistical Methodology mcRigor Enhances the Rigor of Metacell Partitioning in Single-Cell Information Evaluation
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?