• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Sunday, June 1, 2025
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Artificial Intelligence

Dealing with Suggestions Loops in Recommender Techniques — Deep Bayesian Bandits | by Sachin Hosmani | Jul, 2024

Admin by Admin
July 31, 2024
in Artificial Intelligence
0
1jfzocbq 2hmknnwnyvolsq.png
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

READ ALSO

Simulating Flood Inundation with Python and Elevation Information: A Newbie’s Information

The Secret Energy of Information Science in Buyer Help


Understanding fundamentals of exploration and Deep Bayesian Bandits to deal with suggestions loops in recommender programs

Sachin Hosmani

Towards Data Science

Picture from ChatGPT-4o

Recommender system fashions are sometimes skilled to optimize for person engagement like clicks and purchases. The well-meaning intention behind that is to favor objects that the person has beforehand engaged with. Nonetheless, this creates a suggestions loop that over time can manifest because the “chilly begin downside”. Merely put, the objects which have traditionally been common for a person are inclined to proceed to be favored by the mannequin. In distinction, new however extremely related objects don’t obtain a lot publicity. On this article, I introduce exploration strategies from the fundamentals and in the end clarify Deep Bayesian Bandits, a highly-effective algorithm described in a paper by Guo, Dalin, et al [1].

Allow us to use a easy advert recommender system for instance all through this text.

A easy three-component advert recommender system. Picture by writer

It’s a three-component system

  • Retrieval: a part to effectively retrieve candidates for rating
  • Rating: a deep neural community that predicts the click-through charge (CTR) because the rating for an advert given a person
    rating = predict_ctr(person options, advert options)
  • Public sale: a part that
    – retrieves candidate adverts for the person
    – scores them utilizing the rating mannequin
    – selects the highest-scored advert and returns it*

Our focus on this article will likely be solely on the rating mannequin.

*real-world public sale programs additionally take the advert’s bid quantity into consideration, however we ignore that for simplicity

Rating mannequin structure

The rating mannequin is a deep neural community that predicts the click-through charge (CTR) of an advert, given the person and advert options. For simplicity, I suggest a easy totally related DNN beneath, however one may very effectively enrich it with strategies like wide-and-deep community, DCN, and DeepFM with none lack of applicability of the strategies I clarify on this article.

A binary classifier deep neural community that predicts pCTR. Picture by writer

Coaching knowledge

The rating mannequin is skilled on knowledge that contains clicks as binary labels and, concatenation of person and advert options. The precise set of options used is unimportant to this text, however I’ve assumed that some advertiser brand-related options are current to assist the mannequin study the person’s affinity in direction of manufacturers.

Coaching knowledge with pattern options. Picture by writer

Think about we efficiently skilled our rating mannequin on our adverts click on dataset, and the mannequin has discovered that certainly one of our customers Jane loves shopping for luggage from the bag firm “Vogue Voyage”. However there’s a new bag firm “Radiant Clutch” out there they usually promote nice luggage. Nonetheless, regardless of “Radiant Clutch” working advert campaigns to achieve customers like Jane, Jane by no means sees their adverts. It’s because our rating mannequin has so firmly discovered that Jane likes luggage from “Vogue Voyage”, that solely their adverts are proven to her. She typically clicks on them and when the mannequin is additional skilled on these new clicks, it solely strengthens the mannequin’s perception. This turns into a vicious cycle resulting in some objects remaining at the hours of darkness.

The suggestions loop in motion, inflicting the cold-start downside: luggage from Radiant Clutch don’t stand an opportunity. Picture by writer, thumbnails generated with ChatGPT-4o

If we ponder about this, we’d understand that the mannequin didn’t do something fallacious by studying that Jane likes luggage from “Vogue Voyage”. However the issue is solely that the mannequin just isn’t being given an opportunity to study Jane’s pursuits in different corporations’ luggage.

Exploration vs exploitation

It is a nice time to introduce the trade-off between exploration vs exploitation.

Exploitation: Throughout advert public sale, as soon as we get our CTR predictions from our rating mannequin, we merely choose the advert with the very best rating. It is a 100% exploitation technique as a result of we’re fully appearing on our present greatest data to attain the best instant reward.

Exploration: What our strategy has been missing is the willingness to take some danger and present an advert even when it wasn’t assigned the very best rating. If we did that, the person would possibly click on on it and the rating mannequin when up to date on this knowledge would study one thing new about it. But when we by no means take the danger, the mannequin won’t ever study something new. That is the motivation behind exploration.

Exploration vs exploitation is a balancing act. Too little exploration would go away us with the cold-start downside and an excessive amount of exploration would danger displaying extremely irrelevant adverts to customers, thus dropping person belief and cash.

Now that we’ve set the stage for exploration, allow us to delve into some concrete strategies for managed exploration.

ε-greedy coverage

The thought right here is straightforward. In our public sale service, when we’ve got the scores for all of the candidate adverts, as an alternative of simply taking the top-scored advert, we do the next

  1. choose a random quantity r in [0, 1)
  2. if r < ε, select a random ad from our candidates (exploration)
  3. else, select the top-scored ad (exploitation)

where ε is a constant that we carefully select in [0, 1) knowing that the algorithm will explore with ε probability and exploit with 1 — ε probability.

Exploration with ε probability: pick any candidate ad at random. Image by author
Exploitation with 1 — ε probability: pick the highest CTR ad. Image by author

This is a very simple yet powerful technique. However, it can be too naive because when it explores, it completely randomly selects an ad. Even if an ad has an absurdly low pCTR prediction that the user has repeatedly disliked in the past, we might still show the ad. This can be a bit harsh and can lead to a serious loss in revenue and user trust. We can certainly do better.

Upper confidence bound (UCB)

Our motivation for exploration was to ensure that all ad candidates have an opportunity to be shown to the user. But as we give some exposure to an ad, if the user still doesn’t engage with it, it becomes prudent to cut future exposure to it. So, we need a mechanism by which we select the ad based on both its score estimate and also the amount of exposure it has already received.

Imagine our ranking model could produce not just the CTR score but also a confidence interval for it*.

*how this is achieved is explained later in the article

The model predicts a confidence interval along with the score. Image by author

Such a confidence interval is typically inversely proportional to the amount of exposure the ad has received because the more an ad is shown to the user, the more user feedback we have about it, which reduces the uncertainty interval.

Increased exposure to an ad leads to a decrease in the confidence interval in the model’s score prediction. Image by author

During auction, instead of selecting the ad with the greatest pCTR, we select the ad with the highest upper confidence bound. This approach is called UCB. The philosophy here is “Optimism in the face of uncertainty”. This approach effectively takes into account both the ad’s score estimate and also the uncertainty around it.

UCB in action: Ad-1 wins auction at first on account of its large confidence interval, but as the model learns about it, its UCB falls leading to Ad-2 winning auction. Image by author

Thompson sampling

The UCB approach went with the philosophy of “(complete) optimism in the face of uncertainty”. Thompson sampling softens this optimism a little. Instead of using the upper confidence bound as the score of an ad, why not sample a score in the posterior distribution?

For this to be possible, imagine our ranking model could produce not just the CTR and the confidence interval but an actual score distribution*.

*how this is achieved is explained later in the article

The model can predict a distribution of scores for one ad. Image by author

Then, we just sample a score from this distribution and use that as the score during auction.

Ad-1 wins auction due to a high sampled score from its wide distribution. Image by author
Ad-1 has received exposure and the model has lesser uncertainty about it. Ad-2 wins auction due to its higher score distribution mass. Image by author
Ad-2’s score distribution stdev further shrinks as it gets more exposure. Image by author

For the UCB and Thompson sampling techniques to work, we must update our models as often as possible. Only then will it be able to update its uncertainty estimates in response to user feedback. The ideal setup is a continuous learning setup where user feedback events are sent in near-real time to the model to update its weights. However, periodically statefully updating the weights of the model is also a viable option if continuous learning infrastructure is too expensive to set up.

A high-level continuous learning setup utilizing streaming infrastructure. Image by author, thumbnail generated by ChatGPT-4o

In the UCB and Thompson sampling approaches, I explained the idea of our model producing not just one score but an uncertainty measure as well (either as a confidence interval or a distribution of scores). How can this be possible? Our DNN can produce just one output after all! Here are the approaches discussed in the paper.

Bootstrapping

Bootstrapping in statistics simply means sampling with replacement. What this means for us is that we apply bootstrapping on our training dataset to create several closely related but slightly different datasets and train a separate model with each dataset. The models learned would thereby be slight variants of each other. If you have studied decision trees and bagging, you would already be familiar with the idea of training multiple related trees that are slight variants of each other.

Bootstrapped datasets are used to train separate models, resulting in a distribution of scores. Image by author

During auction, for each ad, we get one score from each bootstrapped model. This gives us a distribution of scores which is exactly what we wanted for Thompson sampling. We can also extract a confidence interval from the distribution if we choose to use UCB.

The biggest drawback with this approach is the sheer computational and maintenance overhead of training and serving several models.

Multi-head bootstrapping

To mitigate the costs of several bootstrapped models, this approach unifies the several models into one multi-head model with one head for each output.

Multi-head model. Image by author

The key cost reduction comes from the fact that all the layers except the last are shared.

Training is done as usual on bootstrapped subsets of data. While each bootstrapped subset of data should be used to update the weights of all the shared layers, care must be taken to update the weight of just one output head with a subset of data.

Constrained influence of each bootstrapped subset of data on one head during backprop. Image by author

Stochastic Gradient descent (SGD)

Instead of using separate bootstrapped datasets to train different models, we can just use one dataset, but train each model with SGD with random weight initialization thus utilizing the inherent stochasticity offered by SGD. Each model trained thus becomes a variant of the other.

Multi-head SGD

In the same way, using a multi-head architecture brought down the number of models trained with bootstrapping to one, we can use a multi-head architecture with SGD. We just have to randomly initialize the weights at each head so that upon training on the whole dataset, each head is learned to be a slight variant of the others.

Forward-propagation dropout

Dropout is a well-known regularization technique where during model training, some of the nodes of a layer are randomly dropped to prevent chances of overfitting. We borrow the same idea here except that we use it during forward propagation to create controlled randomness.

We modify our ranking model’s last layer to introduce dropout. Then, when we want to score an ad, we pass it through the model several times, each time getting a slightly different score on account of the randomness introduced by dropout. This gives us the distribution and confidence interval that we seek.

The same model produces a distribution of scores through random dropout. Image by author

One significant disadvantage of this approach is that it requires several full forward passes through the network which can be quite costly during inference time.

Hybrid

In the hybrid approach, we perform a key optimization to give us the advantages of dropout and bootstrapping while bringing down the serving and training costs:

  • With dropout applied to just the last-but-one layer, we don’t have to run a full forward pass several times to generate our score distribution. We can do one forward pass until the dropout layer and then do several invocations of just the dropout layer in parallel. This gives us the same effect as the multi-head model where each dropout output acts like a multi-head output.

Also, with dropout deactivating one or more nodes randomly, it serves as a Bernoulli mask on the higher-order features at its layer, thus producing an effect equivalent to bootstrapping with different subsets of the dataset.

Unfortunately, there is no easy answer. The best way is to experiment under the constraints of your problem and see what works best. But if the findings from the authors of the Deep Bayesian Bandits paper are anything to go by,

  1. ε-greedy unsurprisingly gives the lowest CTR improvement due to its unsophisticated exploration, however, the simplicity and low-cost nature of it make it very alluring.
  2. UCB generally outperformed Thompson sampling.
  3. Bootstrap UCB gave the highest CTR return but was also the most computationally expensive due to the need to work with multiple models.
  4. The hybrid model which relied on dropout at the penultimate layer needed more training epochs to perform well and was on par with SGD UCB’s performance but at lower computational cost.
  5. The model’s PrAuc measured offline was inversely related to the CTR gain: this is an important observation that shows that offline performance can be easily attained by giving the model easier training data (for example, data not containing significant exploration) but that will not always translate to online CTR uplifts. This underscores the significance of robust online tests.

That said, the findings can be quite different for a different dataset and problem. Hence, real-world experimentation remains vital.

In this article, I introduced the cold-start problem created by feedback loops in recommender systems. Following the Deep Bayesian Bandits paper, we framed our ad recommender system as a k-arm bandit and saw many practical applications of reinforcement learning techniques to mitigate the cold-start problem. We also scratched the surface of capturing uncertainty in our neural networks which is a good segue into Bayesian networks.

[1] Guo, Dalin, et al. “Deep bayesian bandits: Exploring in on-line personalised suggestions.” Proceedings of the 14th ACM Convention on Recommender Techniques. 2020.

Tags: BanditsbayesianDeepFeedbackHandlingHosmaniJulLoopsRecommenderSachinSystems

Related Posts

Kelly sikkema whs7fpfkwq unsplash scaled 1.jpg
Artificial Intelligence

Simulating Flood Inundation with Python and Elevation Information: A Newbie’s Information

June 1, 2025
Ds for cx 1024x683.png
Artificial Intelligence

The Secret Energy of Information Science in Buyer Help

May 31, 2025
Article title.png
Artificial Intelligence

Fingers-On Consideration Mechanism for Time Sequence Classification, with Python

May 30, 2025
Gaia 1024x683.png
Artificial Intelligence

GAIA: The LLM Agent Benchmark Everybody’s Speaking About

May 30, 2025
Img 0259 1024x585.png
Artificial Intelligence

From Knowledge to Tales: Code Brokers for KPI Narratives

May 29, 2025
Claudio schwarz 4rssw2aj6wu unsplash scaled 1.jpg
Artificial Intelligence

Multi-Agent Communication with the A2A Python SDK

May 28, 2025
Next Post
Your ultimate prompt engineering cheatsheet 80.jpg

What's Temperature in immediate engineering?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025
Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
1da3lz S3h Cujupuolbtvw.png

Scaling Statistics: Incremental Customary Deviation in SQL with dbt | by Yuval Gorchover | Jan, 2025

January 2, 2025
0khns0 Djocjfzxyr.jpeg

Constructing Data Graphs with LLM Graph Transformer | by Tomaz Bratanic | Nov, 2024

November 5, 2024
How To Maintain Data Quality In The Supply Chain Feature.jpg

Find out how to Preserve Knowledge High quality within the Provide Chain

September 8, 2024

EDITOR'S PICK

01pczu 1b2csv19un.jpeg

Jingle Bells and Statistical Assessments | by Gizem Kaya | Dec, 2024

December 27, 2024
1721853167 data quality shutterstock 243064750.jpg

Enterprise Leaders Should Prioritize Knowledge High quality to Guarantee Lasting AI Implementation

July 24, 2024
Civic Institutions.jpg

Generative AI and Civic Establishments

March 4, 2025
1wun Rwpzno0l1zsmsi3zbq.png

Massive Language Fashions: A Brief Introduction | by Carolina Bento | Jan, 2025

January 22, 2025

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • Simulating Flood Inundation with Python and Elevation Information: A Newbie’s Information
  • LLM Optimization: LoRA and QLoRA | In direction of Information Science
  • The Evolution of Knowledge Lakes within the Cloud: From Storage to Intelligence
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?