• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Tuesday, July 15, 2025
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Machine Learning

Gaussian Naive Bayes, Defined: A Visible Information with Code Examples for Learners | by Samy Baladram | Oct, 2024

Admin by Admin
October 12, 2024
in Machine Learning
0
1tqvz91e05rbiwmd7xwdrla.png
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

READ ALSO

Accuracy Is Lifeless: Calibration, Discrimination, and Different Metrics You Really Want

AI Brokers Are Shaping the Way forward for Work Job by Job, Not Job by Job


CLASSIFICATION ALGORITHM

Bell-shaped assumptions for higher predictions

Samy Baladram

Towards Data Science

⛳️ Extra CLASSIFICATION ALGORITHM, defined:
· Dummy Classifier
· Ok Nearest Neighbor Classifier
· Bernoulli Naive Bayes
▶ Gaussian Naive Bayes
· Resolution Tree Classifier
· Logistic Regression
· Help Vector Classifier
· Multilayer Perceptron (quickly!)

Constructing on our earlier article about Bernoulli Naive Bayes, which handles binary knowledge, we now discover Gaussian Naive Bayes for steady knowledge. Not like the binary strategy, this algorithm assumes every characteristic follows a traditional (Gaussian) distribution.

Right here, we’ll see how Gaussian Naive Bayes handles steady, bell-shaped knowledge — ringing in correct predictions — all with out entering into the intricate math of Bayes’ Theorem.

All visuals: Creator-created utilizing Canva Professional. Optimized for cell; could seem outsized on desktop.

Like different Naive Bayes variants, Gaussian Naive Bayes makes the “naive” assumption of characteristic independence. It assumes that the options are conditionally unbiased given the category label.

Nonetheless, whereas Bernoulli Naive Bayes is suited to datasets with binary options, Gaussian Naive Bayes assumes that the options observe a steady regular (Gaussian) distribution. Though this assumption could not all the time maintain true in actuality, it simplifies the calculations and infrequently results in surprisingly correct outcomes.

Bernoulli NB assumes binary knowledge, Multinomial NB works with discrete counts, and Gaussian NB handles steady knowledge assuming a traditional distribution.

All through this text, we’ll use this synthetic golf dataset (made by writer) for example. This dataset predicts whether or not an individual will play golf based mostly on climate situations.

Columns: ‘RainfallAmount’ (in mm), ‘Temperature’ (in Celcius), ‘Humidity’ (in %), ‘WindSpeed’ (in km/h) and ‘Play’ (Sure/No, goal characteristic)
# IMPORTING DATASET #
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

dataset_dict = {
'Rainfall': [0.0, 2.0, 7.0, 18.0, 3.0, 3.0, 0.0, 1.0, 0.0, 25.0, 0.0, 18.0, 9.0, 5.0, 0.0, 1.0, 7.0, 0.0, 0.0, 7.0, 5.0, 3.0, 0.0, 2.0, 0.0, 8.0, 4.0, 4.0],
'Temperature': [29.4, 26.7, 28.3, 21.1, 20.0, 18.3, 17.8, 22.2, 20.6, 23.9, 23.9, 22.2, 27.2, 21.7, 27.2, 23.3, 24.4, 25.6, 27.8, 19.4, 29.4, 22.8, 31.1, 25.0, 26.1, 26.7, 18.9, 28.9],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'WindSpeed': [2.1, 21.2, 1.5, 3.3, 2.0, 17.4, 14.9, 6.9, 2.7, 1.6, 30.3, 10.9, 3.0, 7.5, 10.3, 3.0, 3.9, 21.9, 2.6, 17.3, 9.6, 1.9, 16.0, 4.6, 3.2, 8.3, 3.2, 2.2],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)

# Set characteristic matrix X and goal vector y
X, y = df.drop(columns='Play'), df['Play']

# Break up the info into coaching and testing units
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)
print(pd.concat([X_train, y_train], axis=1), finish='nn')
print(pd.concat([X_test, y_test], axis=1))

Gaussian Naive Bayes works with steady knowledge, assuming every characteristic follows a Gaussian (regular) distribution.

  1. Calculate the chance of every class within the coaching knowledge.
  2. For every characteristic and sophistication, estimate the imply and variance of the characteristic values inside that class.
  3. For a brand new occasion:
    a. For every class, calculate the chance density perform (PDF) of every characteristic worth beneath the Gaussian distribution of that characteristic inside the class.
    b. Multiply the category chance by the product of the PDF values for all options.
  4. Predict the category with the best ensuing chance.
Gaussian Naive Bayes makes use of the conventional distribution to mannequin the chance of various characteristic values for every class. It then combines these likelihoods to make a prediction.

Remodeling non-Gaussian distributed knowledge

Keep in mind that this algorithm naively assume that each one the enter options are having Gaussian/regular distribution?

Since we aren’t actually certain concerning the distribution of our knowledge, particularly for options that clearly don’t observe a Gaussian distribution, making use of a energy transformation (like Field-Cox) earlier than utilizing Gaussian Naive Bayes will be helpful. This strategy may also help make the info extra Gaussian-like, which aligns higher with the assumptions of the algorithm.

All columns are scaled utilizing Energy Transformation (Field-Cox Transformation) after which standardized.
from sklearn.preprocessing import PowerTransformer

# Initialize and match the PowerTransformer
pt = PowerTransformer(standardize=True) # Customary Scaling already included
X_train_transformed = pt.fit_transform(X_train)
X_test_transformed = pt.remodel(X_test)

Now we’re prepared for the coaching.

1. Class Chance Calculation: For every class, calculate its chance: (Variety of situations on this class) / (Complete variety of situations)

from fractions import Fraction

def calc_target_prob(attr):
total_counts = attr.value_counts().sum()
prob_series = attr.value_counts().apply(lambda x: Fraction(x, total_counts).limit_denominator())
return prob_series

print(calc_target_prob(y_train))

2. Function Chance Calculation : For every characteristic and every class, calculate the imply (μ) and normal deviation (σ) of the characteristic values inside that class utilizing the coaching knowledge. Then, calculate the chance utilizing Gaussian Chance Density Perform (PDF) formulation.

For every climate situation, decide the imply and normal deviation for each “YES” and “NO” situations. Then calculate their PDF utilizing the PDF formulation for regular/Gaussian distribution.
The identical course of is utilized to the entire different options.
def calculate_class_probabilities(X_train_transformed, y_train, feature_names):
lessons = y_train.distinctive()
equations = pd.DataFrame(index=lessons, columns=feature_names)

for cls in lessons:
X_class = X_train_transformed[y_train == cls]
imply = X_class.imply(axis=0)
std = X_class.std(axis=0)
k1 = 1 / (std * np.sqrt(2 * np.pi))
k2 = 2 * (std ** 2)

for i, column in enumerate(feature_names):
equation = f"{k1[i]:.3f}·exp(-(x-({imply[i]:.2f}))²/{k2[i]:.3f})"
equations.loc[cls, column] = equation

return equations

# Use the perform with the remodeled coaching knowledge
equation_table = calculate_class_probabilities(X_train_transformed, y_train, X.columns)

# Show the equation desk
print(equation_table)

3. Smoothing: Gaussian Naive Bayes makes use of a novel smoothing strategy. Not like Laplace smoothing in different variants, it provides a tiny worth (0.000000001 instances the biggest variance) to all variances. This prevents numerical instability from division by zero or very small numbers.

Given a brand new occasion with steady options:

1. Chance Assortment:
For every doable class:
· Begin with the chance of this class occurring (class chance).
· For every characteristic within the new occasion, calculate the chance density perform of that characteristic inside the class.

For ID 14, we calculate the PDF every of the characteristic for each “YES” and “NO” situations.

2. Rating Calculation & Prediction:
For every class:
· Multiply all of the collected PDF values collectively.
· The result’s the rating for this class.
· The category with the best rating is the prediction.

from scipy.stats import norm

def calculate_class_probability_products(X_train_transformed, y_train, X_new, feature_names, target_name):
lessons = y_train.distinctive()
n_features = X_train_transformed.form[1]

# Create column names utilizing precise characteristic names
column_names = [target_name] + listing(feature_names) + ['Product']

probability_products = pd.DataFrame(index=lessons, columns=column_names)

for cls in lessons:
X_class = X_train_transformed[y_train == cls]
imply = X_class.imply(axis=0)
std = X_class.std(axis=0)

prior_prob = np.imply(y_train == cls)
probability_products.loc[cls, target_name] = prior_prob

feature_probs = []
for i, characteristic in enumerate(feature_names):
prob = norm.pdf(X_new[0, i], imply[i], std[i])
probability_products.loc[cls, feature] = prob
feature_probs.append(prob)

product = prior_prob * np.prod(feature_probs)
probability_products.loc[cls, 'Product'] = product

return probability_products

# Assuming X_new is your new pattern reshaped to (1, n_features)
X_new = np.array([-1.28, 1.115, 0.84, 0.68]).reshape(1, -1)

# Calculate chance merchandise
prob_products = calculate_class_probability_products(X_train_transformed, y_train, X_new, X.columns, y.title)

# Show the chance product desk
print(prob_products)

For this specific dataset, this accuracy is taken into account fairly good.
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Initialize and prepare the Gaussian Naive Bayes mannequin
gnb = GaussianNB()
gnb.match(X_train_transformed, y_train)

# Make predictions on the check set
y_pred = gnb.predict(X_test_transformed)

# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print the accuracy
print(f"Accuracy: {accuracy:.4f}")

GaussianNB is understood for its simplicity and effectiveness. The primary factor to recollect about its parameters is:

  1. priors: That is probably the most notable parameter, just like Bernoulli Naive Bayes. Typically, you don’t must set it manually. By default, it’s calculated out of your coaching knowledge, which regularly works nicely.
  2. var_smoothing: This can be a stability parameter that you just not often want to regulate. (the default is 0.000000001)

The important thing takeaway is that this algoritm is designed to work nicely out-of-the-box. In most conditions, you should use it with out worrying about parameter tuning.

Execs:

  1. Simplicity: Maintains the easy-to-implement and perceive trait.
  2. Effectivity: Stays swift in coaching and prediction, making it appropriate for large-scale functions with steady options.
  3. Flexibility with Information: Handles each small and enormous datasets nicely, adapting to the size of the issue at hand.
  4. Steady Function Dealing with: Thrives with steady and real-valued options, making it perfect for duties like predicting real-valued outputs or working with knowledge the place options differ on a continuum.

Cons:

  1. Independence Assumption: Nonetheless assumes that options are conditionally unbiased given the category, which could not maintain in all real-world situations.
  2. Gaussian Distribution Assumption: Works greatest when characteristic values really observe a traditional distribution. Non-normal distributions could result in suboptimal efficiency (however will be mounted with Energy Transformation we’ve mentioned)
  3. Sensitivity to Outliers: Could be considerably affected by outliers within the coaching knowledge, as they skew the imply and variance calculations.

Gaussian Naive Bayes stands as an environment friendly classifier for a variety of functions involving steady knowledge. Its potential to deal with real-valued options extends its use past binary classification duties, making it a go-to alternative for quite a few functions.

Whereas it makes some assumptions about knowledge (characteristic independence and regular distribution), when these situations are met, it offers sturdy efficiency, making it a favourite amongst each newcomers and seasoned knowledge scientists for its steadiness of simplicity and energy.

import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import PowerTransformer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Load the dataset
dataset_dict = {
'Rainfall': [0.0, 2.0, 7.0, 18.0, 3.0, 3.0, 0.0, 1.0, 0.0, 25.0, 0.0, 18.0, 9.0, 5.0, 0.0, 1.0, 7.0, 0.0, 0.0, 7.0, 5.0, 3.0, 0.0, 2.0, 0.0, 8.0, 4.0, 4.0],
'Temperature': [29.4, 26.7, 28.3, 21.1, 20.0, 18.3, 17.8, 22.2, 20.6, 23.9, 23.9, 22.2, 27.2, 21.7, 27.2, 23.3, 24.4, 25.6, 27.8, 19.4, 29.4, 22.8, 31.1, 25.0, 26.1, 26.7, 18.9, 28.9],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'WindSpeed': [2.1, 21.2, 1.5, 3.3, 2.0, 17.4, 14.9, 6.9, 2.7, 1.6, 30.3, 10.9, 3.0, 7.5, 10.3, 3.0, 3.9, 21.9, 2.6, 17.3, 9.6, 1.9, 16.0, 4.6, 3.2, 8.3, 3.2, 2.2],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}

df = pd.DataFrame(dataset_dict)

# Put together knowledge for mannequin
X, y = df.drop('Play', axis=1), (df['Play'] == 'Sure').astype(int)

# Break up knowledge into coaching and testing units
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, shuffle=False)

# Apply PowerTransformer
pt = PowerTransformer(standardize=True)
X_train_transformed = pt.fit_transform(X_train)
X_test_transformed = pt.remodel(X_test)

# Prepare the mannequin
nb_clf = GaussianNB()
nb_clf.match(X_train_transformed, y_train)

# Make predictions
y_pred = nb_clf.predict(X_test_transformed)

# Verify accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

Tags: BaladramBayesbeginnersCodeexamplesExplainedGaussianGuideNaiveOctSamyvisual

Related Posts

Afif ramdhasuma rjqck9mqhng unsplash 1.jpg
Machine Learning

Accuracy Is Lifeless: Calibration, Discrimination, and Different Metrics You Really Want

July 15, 2025
Chatgpt image jul 6 2025 10 09 01 pm 1024x683.png
Machine Learning

AI Brokers Are Shaping the Way forward for Work Job by Job, Not Job by Job

July 14, 2025
Pexels sofia falco 1148410914 32439212.jpg
Machine Learning

Fearful About AI? Use It to Your Benefit

July 13, 2025
0 ov1ab 5q7gvwkdm .webp.webp
Machine Learning

Are You Being Unfair to LLMs?

July 12, 2025
Screenshot 2025 07 05 at 21.33.46 scaled 1 1024x582.png
Machine Learning

Constructing a Сustom MCP Chatbot | In the direction of Knowledge Science

July 10, 2025
Ryan moreno lurw1nciklc unsplash scaled 1.jpg
Machine Learning

What I Discovered in my First 18 Months as a Freelance Information Scientist

July 9, 2025
Next Post
1n2maugjdxvpyyysgd3ysjq.png

Bursting the Gen AI hype bubble | Pau Blasco

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025
Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
1da3lz S3h Cujupuolbtvw.png

Scaling Statistics: Incremental Customary Deviation in SQL with dbt | by Yuval Gorchover | Jan, 2025

January 2, 2025
How To Maintain Data Quality In The Supply Chain Feature.jpg

Find out how to Preserve Knowledge High quality within the Provide Chain

September 8, 2024
0khns0 Djocjfzxyr.jpeg

Constructing Data Graphs with LLM Graph Transformer | by Tomaz Bratanic | Nov, 2024

November 5, 2024

EDITOR'S PICK

Harnessing Speculative Fiction For Strategic Innovation With Tobias Buckell.webp.webp

Harnessing Speculative Fiction for Strategic Innovation with Tobias Buckell

September 4, 2024
Zachxbt Says Blockchain Bandit Hacker Moves 51k Eth.webp.webp

Blockchain Bandit Hacker Lively Once more,Strikes 51K ETH

December 31, 2024
Total Derivative.jpg

The Complete By-product: Correcting the False impression of Backpropagation’s Chain Rule

May 6, 2025
Dall·e 2024 08 07 11.24.16 An Abstract Photographic Style Hero Image Representing Successful Data Center Migration. The Image Should Feature Elements Such As Servers Data Stre.jpg

Key Advantages of Customized Stock Administration Software program for Producers

September 12, 2024

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • Report: 87% of Corporations Use AI Instruments in App Growth Processes
  • Accuracy Is Lifeless: Calibration, Discrimination, and Different Metrics You Really Want
  • James Wynn Returns with $19M Bitcoin, $100k PEPE Guess
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?