• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Wednesday, June 25, 2025
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Machine Learning

Log Hyperlink vs Log Transformation in R — The Distinction that Misleads Your Whole Information Evaluation

Admin by Admin
May 9, 2025
in Machine Learning
0
Dan Cristian Padure H3kuhyuce9a Unsplash Scaled 1.jpg
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

READ ALSO

Constructing A Trendy Dashboard with Python and Taipy

A Multi-Agent SQL Assistant You Can Belief with Human-in-Loop Checkpoint & LLM Value Management


distributions are essentially the most generally used, quite a lot of real-world information sadly isn’t regular. When confronted with extraordinarily skewed information, it’s tempting for us to make the most of log transformations to normalize the distribution and stabilize the variance. I not too long ago labored on a mission analyzing the vitality consumption of coaching AI fashions, utilizing information from Epoch AI [1]. There isn’t a official information on vitality utilization of every mannequin, so I calculated it by multiplying every mannequin’s energy draw with its coaching time. The brand new variable, Vitality (in kWh), was extremely right-skewed, together with some excessive and overdispersed outliers (Fig. 1).

Determine 1. Histogram of Vitality Consumption (kWh)

To deal with this skewness and heteroskedasticity, my first intuition was to use a log transformation to the Vitality variable. The distribution of log(Vitality) appeared far more regular (Fig. 2), and a Shapiro-Wilk check confirmed the borderline normality (p ≈ 0.5).

Determine 2. Histogram of log of Vitality Consumption (kWh)

Modeling Dilemma: Log Transformation vs Log Hyperlink

The visualization appeared good, however once I moved on to modeling, I confronted a dilemma: Ought to I mannequin the log-transformed response variable (log(Y) ~ X), or ought to I mannequin the unique response variable utilizing a log hyperlink perform (Y ~ X, hyperlink = “log")? I additionally thought-about two distributions — Gaussian (regular) and Gamma distributions — and mixed every distribution with each log approaches. This gave me 4 totally different fashions as under, all fitted utilizing R’s Generalized Linear Fashions (GLM):

all_gaussian_log_link <- glm(Energy_kWh ~ Parameters +
      Training_compute_FLOP +
      Training_dataset_size +
      Training_time_hour +
      Hardware_quantity +
      Training_hardware, 
    household = gaussian(hyperlink = "log"), information = df)
all_gaussian_log_transform <- glm(log(Energy_kWh) ~ Parameters +
                          Training_compute_FLOP +
                          Training_dataset_size +
                          Training_time_hour +
                          Hardware_quantity +
                          Training_hardware, 
                         information = df)
all_gamma_log_link  <- glm(Energy_kWh ~ Parameters +
                    Training_compute_FLOP +
                    Training_dataset_size +
                    Training_time_hour +
                    Hardware_quantity +
                    Training_hardware + 0, 
                  household = Gamma(hyperlink = "log"), information = df)
all_gamma_log_transform  <- glm(log(Energy_kWh) ~ Parameters +
                    Training_compute_FLOP +
                    Training_dataset_size +
                    Training_time_hour +
                    Hardware_quantity +
                    Training_hardware + 0, 
                  household = Gamma(), information = df)

Mannequin Comparability: AIC and Diagnostic Plots

I in contrast the 4 fashions utilizing Akaike Data Criterion (AIC), which is an estimator of prediction error. Sometimes, the decrease the AIC, the higher the mannequin suits.

AIC(all_gaussian_log_link, all_gaussian_log_transform, all_gamma_log_link, all_gamma_log_transform)

                           df       AIC
all_gaussian_log_link      25 2005.8263
all_gaussian_log_transform 25  311.5963
all_gamma_log_link         25 1780.8524
all_gamma_log_transform    25  352.5450

Among the many 4 fashions, fashions utilizing log-transformed outcomes have a lot decrease AIC values than those utilizing log hyperlinks. For the reason that distinction in AIC between log-transformed and log-link fashions was substantial (311 and 352 vs 1780 and 2005), I additionally examined the diagnostics plots to additional validate that log-transformed fashions match higher:

Determine 4. Diagnostic plots for the log-linked Gaussian mannequin. The Residuals vs Fitted plot suggests linearity regardless of a number of outliers. Nonetheless, the Q-Q plot reveals noticeable deviations from the theoretical line, suggesting non-normality.
Determine 5. Diagnostics plots for the log-transformed Gaussian mannequin. The Q-Q plot reveals a significantly better match, supporting normality. Nonetheless, the Residuals vs Fitted plot has a dip to -2, which can recommend non-linearity. 
Determine 6. Diagnostic plots for the log-linked Gamma mannequin. The Q-Q plot seems okay, but the Residuals vs Fitted plot reveals clear indicators of non-linearity
Determine 7. Diagnostic plots for the log-transformed Gamma mannequin. The Residuals vs Fitted plot seems good, with a small dip of -0.25 initially. Nonetheless, the Q-Q plot reveals some deviation at each tails.

Primarily based on the AIC values and diagnostic plots, I made a decision to maneuver ahead with the log-transformed Gamma mannequin, because it had the second-lowest AIC worth and its Residuals vs Fitted plot seems higher than that of the log-transformed Gaussian mannequin. 
I proceeded to discover which explanatory variables have been helpful and which interactions might have been important. The ultimate mannequin I chosen was:

glm(system = log(Energy_kWh) ~ Training_time_hour * Hardware_quantity + 
    Training_hardware + 0, household = Gamma(), information = df)

Decoding Coefficients

Nonetheless, once I began deciphering the mannequin’s coefficients, one thing felt off. Since solely the response variable was log-transformed, the consequences of the predictors are multiplicative, and we have to exponentiate the coefficients to transform them again to the unique scale. A one-unit enhance in 𝓍 multiplies the end result 𝓎 by exp(β), or every further unit in 𝓍 results in a (exp(β) — 1) × 100 % change in 𝓎 [2]. 

Wanting on the outcomes desk of the mannequin under, we’ve Training_time_hour, Hardware_quantity, and their interplay time period Training_time_hour:Hardware_quantity are steady variables, so their coefficients characterize slopes. In the meantime, since I specified +0 within the mannequin system, all ranges of the explicit Training_hardware act as intercepts, which means that every {hardware} kind acted because the intercept β₀ when its corresponding dummy variable was energetic. 

> glm(system = log(Energy_kWh) ~ Training_time_hour * Hardware_quantity + 
    Training_hardware + 0, household = Gamma(), information = df)

Coefficients:
                                                 Estimate Std. Error t worth Pr(>|t|)    
Training_time_hour                             -1.587e-05  3.112e-06  -5.098 5.76e-06 ***
Hardware_quantity                              -5.121e-06  1.564e-06  -3.275  0.00196 ** 
Training_hardwareGoogle TPU v2                  1.396e-01  2.297e-02   6.079 1.90e-07 ***
Training_hardwareGoogle TPU v3                  1.106e-01  7.048e-03  15.696  < 2e-16 ***
Training_hardwareGoogle TPU v4                  9.957e-02  7.939e-03  12.542  < 2e-16 ***
Training_hardwareHuawei Ascend 910              1.112e-01  1.862e-02   5.969 2.79e-07 ***
Training_hardwareNVIDIA A100                    1.077e-01  6.993e-03  15.409  < 2e-16 ***
Training_hardwareNVIDIA A100 SXM4 40 GB         1.020e-01  1.072e-02   9.515 1.26e-12 ***
Training_hardwareNVIDIA A100 SXM4 80 GB         1.014e-01  1.018e-02   9.958 2.90e-13 ***
Training_hardwareNVIDIA GeForce GTX 285         3.202e-01  7.491e-02   4.275 9.03e-05 ***
Training_hardwareNVIDIA GeForce GTX TITAN X     1.601e-01  2.630e-02   6.088 1.84e-07 ***
Training_hardwareNVIDIA GTX Titan Black         1.498e-01  3.328e-02   4.501 4.31e-05 ***
Training_hardwareNVIDIA H100 SXM5 80GB          9.736e-02  9.840e-03   9.894 3.59e-13 ***
Training_hardwareNVIDIA P100                    1.604e-01  1.922e-02   8.342 6.73e-11 ***
Training_hardwareNVIDIA Quadro P600             1.714e-01  3.756e-02   4.562 3.52e-05 ***
Training_hardwareNVIDIA Quadro RTX 4000         1.538e-01  3.263e-02   4.714 2.12e-05 ***
Training_hardwareNVIDIA Quadro RTX 5000         1.819e-01  4.021e-02   4.524 3.99e-05 ***
Training_hardwareNVIDIA Tesla K80               1.125e-01  1.608e-02   6.993 7.54e-09 ***
Training_hardwareNVIDIA Tesla V100 DGXS 32 GB   1.072e-01  1.353e-02   7.922 2.89e-10 ***
Training_hardwareNVIDIA Tesla V100S PCIe 32 GB  9.444e-02  2.030e-02   4.653 2.60e-05 ***
Training_hardwareNVIDIA V100                    1.420e-01  1.201e-02  11.822 8.01e-16 ***
Training_time_hour:Hardware_quantity            2.296e-09  9.372e-10   2.450  0.01799 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma household taken to be 0.05497984)

    Null deviance:    NaN  on 70  levels of freedom
Residual deviance: 3.0043  on 48  levels of freedom
AIC: 345.39

When changing the slopes to % change in response variable, the impact of every steady variable was virtually zero, even barely unfavorable:

All of the intercepts have been additionally transformed again to simply round 1 kWh on the unique scale. The outcomes didn’t make any sense as no less than one of many slopes ought to develop together with the large vitality consumption. I questioned if utilizing the log-linked mannequin with the identical predictors might yield totally different outcomes, so I match the mannequin once more:

glm(system = Energy_kWh ~ Training_time_hour * Hardware_quantity + 
    Training_hardware + 0, household = Gamma(hyperlink = "log"), information = df)

Coefficients:
                                                 Estimate Std. Error t worth Pr(>|t|)    
Training_time_hour                              1.818e-03  1.640e-04  11.088 7.74e-15 ***
Hardware_quantity                               7.373e-04  1.008e-04   7.315 2.42e-09 ***
Training_hardwareGoogle TPU v2                  7.136e+00  7.379e-01   9.670 7.51e-13 ***
Training_hardwareGoogle TPU v3                  1.004e+01  3.156e-01  31.808  < 2e-16 ***
Training_hardwareGoogle TPU v4                  1.014e+01  4.220e-01  24.035  < 2e-16 ***
Training_hardwareHuawei Ascend 910              9.231e+00  1.108e+00   8.331 6.98e-11 ***
Training_hardwareNVIDIA A100                    1.028e+01  3.301e-01  31.144  < 2e-16 ***
Training_hardwareNVIDIA A100 SXM4 40 GB         1.057e+01  5.635e-01  18.761  < 2e-16 ***
Training_hardwareNVIDIA A100 SXM4 80 GB         1.093e+01  5.751e-01  19.005  < 2e-16 ***
Training_hardwareNVIDIA GeForce GTX 285         3.042e+00  1.043e+00   2.916  0.00538 ** 
Training_hardwareNVIDIA GeForce GTX TITAN X     6.322e+00  7.379e-01   8.568 3.09e-11 ***
Training_hardwareNVIDIA GTX Titan Black         6.135e+00  1.047e+00   5.862 4.07e-07 ***
Training_hardwareNVIDIA H100 SXM5 80GB          1.115e+01  6.614e-01  16.865  < 2e-16 ***
Training_hardwareNVIDIA P100                    5.715e+00  6.864e-01   8.326 7.12e-11 ***
Training_hardwareNVIDIA Quadro P600             4.940e+00  1.050e+00   4.705 2.18e-05 ***
Training_hardwareNVIDIA Quadro RTX 4000         5.469e+00  1.055e+00   5.184 4.30e-06 ***
Training_hardwareNVIDIA Quadro RTX 5000         4.617e+00  1.049e+00   4.401 5.98e-05 ***
Training_hardwareNVIDIA Tesla K80               8.631e+00  7.587e-01  11.376 3.16e-15 ***
Training_hardwareNVIDIA Tesla V100 DGXS 32 GB   9.994e+00  6.920e-01  14.443  < 2e-16 ***
Training_hardwareNVIDIA Tesla V100S PCIe 32 GB  1.058e+01  1.047e+00  10.105 1.80e-13 ***
Training_hardwareNVIDIA V100                    9.208e+00  3.998e-01  23.030  < 2e-16 ***
Training_time_hour:Hardware_quantity           -2.651e-07  6.130e-08  -4.324 7.70e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma household taken to be 1.088522)

    Null deviance: 2.7045e+08  on 70  levels of freedom
Residual deviance: 1.0593e+02  on 48  levels of freedom
AIC: 1775

This time, Training_time and Hardware_quantity would enhance the entire vitality consumption by 0.18% per further hour and 0.07% per further chip, respectively. In the meantime, their interplay would lower the vitality use by 2 × 10⁵%. These outcomes made extra sense as Training_time can attain as much as 7000 hours and Hardware_quantity as much as 16000 models.

To visualise the variations higher, I created two plots evaluating the predictions (proven as dashed strains) from each fashions. The left panel used the log-transformed Gamma GLM mannequin, the place the dashed strains have been practically flat and near zero, nowhere close to the fitted strong strains of uncooked information. However, the fitting panel used log-linked Gamma GLM mannequin, the place the dashed strains aligned far more carefully with the precise fitted strains. 

test_data <- df[, c("Training_time_hour", "Hardware_quantity", "Training_hardware")]
prediction_data <- df %>%
  mutate(
    pred_energy1 = exp(predict(glm3, newdata = test_data)),
    pred_energy2 = predict(glm3_alt, newdata = test_data, kind = "response"),
  )
y_limits <- c(min(df$Energy_KWh, prediction_data$pred_energy1, prediction_data$pred_energy2),
              max(df$Energy_KWh, prediction_data$pred_energy1, prediction_data$pred_energy2))

p1 <- ggplot(df, aes(x = Hardware_quantity, y = Energy_kWh, coloration = Training_time_group)) +
  geom_point(alpha = 0.6) +
  geom_smooth(methodology = "lm", se = FALSE) +
  geom_smooth(information = prediction_data, aes(y = pred_energy1), methodology = "lm", se = FALSE, 
              linetype = "dashed", measurement = 1) + 
  scale_y_log10(limits = y_limits) +
  labs(x="{Hardware} Amount", y = "log of Vitality (kWh)") +
  theme_minimal() +
  theme(legend.place = "none") 
p2 <- ggplot(df, aes(x = Hardware_quantity, y = Energy_kWh, coloration = Training_time_group)) +
  geom_point(alpha = 0.6) +
  geom_smooth(methodology = "lm", se = FALSE) +
  geom_smooth(information = prediction_data, aes(y = pred_energy2), methodology = "lm", se = FALSE, 
              linetype = "dashed", measurement = 1) + 
  scale_y_log10(limits = y_limits) +
  labs(x="{Hardware} Amount", coloration = "Coaching Time Degree") +
  theme_minimal() +
  theme(axis.title.y = element_blank()) 
p1 + p2
Determine 8. Relationship between {hardware} amount and log of vitality consumption throughout coaching time teams. In each panels, uncooked information is proven as factors, strong strains characterize fitted values from linear fashions, and dashed strains characterize predicted values from generalized linear fashions. The left panel makes use of a log-transformed Gamma GLM, whereas the fitting panel makes use of a log-linked Gamma GLM with the identical predictors.

Why Log Transformation Fails

To grasp the explanation why the log-transformed mannequin can’t seize the underlying results because the log-linked one, let’s stroll via what occurs once we apply a log transformation to the response variable:

Let’s say Y is the same as some perform of X plus the error time period:

After we apply a log reworking to Y, we are literally compressing each f(X) and the error:

Which means we’re modeling an entire new response variable, log(Y). After we plug in our personal perform g(X)— in my case g(X) = Training_time_hour*Hardware_quantity + Training_hardware — it’s attempting to seize the mixed results of each the “shrunk” f(X) and error time period.

In distinction, once we use a log hyperlink, we’re nonetheless modeling the unique Y, not the reworked model. As an alternative, the mannequin exponentiates our personal perform g(X) to foretell Y.

The mannequin then minimizes the distinction between the precise Y and the anticipated Y. That method, the error phrases stays intact on the unique scale:

Conclusion

Log-transforming a variable isn’t the identical as utilizing a log hyperlink, and it might not all the time yield dependable outcomes. Underneath the hood, a log transformation alters the variable itself and distorts each the variation and noise. Understanding this delicate mathematical distinction behind your fashions is simply as vital as looking for the best-fitting mannequin. 


[1] Epoch AI. Information on Notable AI Fashions. Retrieved from https://epoch.ai/information/notable-ai-models

[2] College of Virginia Library. Decoding Log Transformations in a Linear Mannequin. Retrieved from https://library.virginia.edu/information/articles/interpreting-log-transformations-in-a-linear-model

Tags: DataAnalysisDifferenceentireLinkLogMisleadsTransformation

Related Posts

T2.jpg
Machine Learning

Constructing A Trendy Dashboard with Python and Taipy

June 24, 2025
Sqlcrew.jpg
Machine Learning

A Multi-Agent SQL Assistant You Can Belief with Human-in-Loop Checkpoint & LLM Value Management

June 23, 2025
Image 66.jpg
Machine Learning

What PyTorch Actually Means by a Leaf Tensor and Its Grad

June 22, 2025
Alina grubnyak ziqkhi7417a unsplash 1 scaled 1.jpg
Machine Learning

Why You Ought to Not Substitute Blanks with 0 in Energy BI

June 21, 2025
Artboard 2.png
Machine Learning

Understanding Matrices | Half 2: Matrix-Matrix Multiplication

June 19, 2025
Istock 1218017051 1 1024x683.jpg
Machine Learning

Why Open Supply is No Longer Non-compulsory — And Find out how to Make it Work for Your Enterprise

June 18, 2025
Next Post
Wwwww 2.jpg

Fueling Autonomous AI Brokers with the Knowledge to Assume and Act

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025
Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
1da3lz S3h Cujupuolbtvw.png

Scaling Statistics: Incremental Customary Deviation in SQL with dbt | by Yuval Gorchover | Jan, 2025

January 2, 2025
How To Maintain Data Quality In The Supply Chain Feature.jpg

Find out how to Preserve Knowledge High quality within the Provide Chain

September 8, 2024
0khns0 Djocjfzxyr.jpeg

Constructing Data Graphs with LLM Graph Transformer | by Tomaz Bratanic | Nov, 2024

November 5, 2024

EDITOR'S PICK

Tonusdt 2024 09 05 15 02 23.png

Why Did Toncoin Plummet 18% This Week and What’s Subsequent?

September 5, 2024
Depositphotos 472644780 Xl Scaled.jpg

AI-Pushed Discord Bots Can Monitor Server Stats

October 14, 2024
Database Shutterstock 2149853057 Special.png

SurrealDB Launches Surreal Cloud, a Totally Managed Database Service That Tackles the Complexity Points Confronted by Developer Groups

December 8, 2024
F5 Red Hat Logos 2 1 0525.png

F5 Expands AI Collaboration with Crimson Hat

May 21, 2025

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • Pepecoin Millionaires Transfer to Pepe Greenback, Why Profitable Merchants Are Betting Large On Utility-Based mostly Memes
  • 10 FREE AI Instruments That’ll Save You 10+ Hours a Week
  • Information Has No Moat! | In direction of Information Science
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?