• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Saturday, September 13, 2025
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Machine Learning

Stepwise Choice Made Easy: Enhance Your Regression Fashions in Python

Admin by Admin
August 29, 2025
in Machine Learning
0
Image 273 1024x683.png
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


To get probably the most out of this tutorial, you need to have already got a strong understanding of how linear regression works and the assumptions behind it. You must also bear in mind that, in apply, multicollinearity is addressed utilizing the Variance Inflation Issue (VIF). As well as, you must perceive what prediction threat means, and be accustomed to the fundamentals of Python in addition to its core features.

On the finish of this text, you will see the code for the stepwise choice process used right here. The implementation follows two key ideas: orthogonality and Don’t Repeat Your self (DRY), making certain clear, modular, and reusable code.

Lowering the variety of variables in a regression mannequin will not be solely a technical train; it’s a strategic alternative that have to be guided by the targets of the evaluation. In a earlier work, we demonstrated how easy instruments, equivalent to correlation evaluation and the Variance Inflation Issue (VIF), can already shrink a dataset with lots of of predictors into a much more compact mannequin. But, even after this preliminary discount, fashions typically nonetheless comprise too many variables to work successfully. A smaller mannequin with fewer predictors gives a number of benefits: it might yield higher predictions than a bigger mannequin, it’s extra parsimonious, therefore simpler to interpret, and it typically generalizes higher. As extra variables are added, the mannequin’s bias decreases however its variance will increase. That is the essence of the bias–variance trade-off: too few variables result in excessive bias (underfitting), whereas too many result in excessive variance (overfitting). Good predictive efficiency requires a steadiness between the 2.

This raises a elementary query for anybody working with regression fashions: how can we resolve which variables need to be included within the mannequin? In different phrases, how can we scale back the dimensionality of our information with out dropping important data?

READ ALSO

If we use AI to do our work – what’s our job, then?

10 Python One-Liners Each Machine Studying Practitioner Ought to Know

The problem relies on the aim of the evaluation. Ought to the mannequin present exact estimates of the coefficients? Ought to it determine which predictors are vital? Or ought to it maximize predictive accuracy? Every of those targets calls for various approaches to mannequin choice, and ignoring this distinction can result in deceptive conclusions.

On this article, we deal with the problem of mannequin choice in regression. We start by outlining the final framework of linear regression (readers already accustomed to it might skip this part). We then evaluate the principle scoring standards used to guage competing fashions, adopted by a dialogue of the procedures that permit us to discover subsets of the doable mannequin area. Lastly, we illustrate these strategies with a Python utility utilizing the Communities and Crime dataset.

1. Framework of linear regression.

On this part, we offer a short overview of the linear regression mannequin. We start by describing the dataset, together with the variety of observations and the variety of covariates. We then introduce the mannequin itself and description the assumptions made concerning the information.

We assume that we now have a dataset with n observations and p covariates. The response variable is denoted by Y is steady and the covariates are denoted by X1, … , Xp. We assume that the connection between the response variable and the covariates is linear, that’s:

for i = 1, …, 𝑛, the place β0 is the intercept, βj is the coefficient of the 𝑗-th covariate, and εᵢ is the error time period. We assume that the error time period is unbiased and identically distributed (i.i.d.) with imply zero and variance σ².

With the regression framework in place, the following step is to face the central problem of mannequin choice: how can we evaluate completely different subsets of variables?

2. Scoring Standards for Evaluating Competing Fashions

In mannequin choice, the primary problem is to assign a rating to every mannequin, the place a mannequin is outlined by a specific subset of covariates. This part explains how fashions might be scored.

Allow us to first talk about the issue of scoring fashions. Let S ⊂ {1, …, p} and let 𝓧S = {Xⱼ : j ∈ S} denote a subset of the covariates. Let βS denote the coefficients of the corresponding set of covariates and let β̂S denote the least squares estimate of βS. Additionally, let XS denote the X matrix for this subset of covariates and outline r̂S(x), to be the estimated regression operate. The anticipated values from mannequin S are denoted by Ŷᵢ(S) = r̂S(Xᵢ). The 𝐩𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧 𝐫𝐢𝐬𝐤 is outlined to be

the place Yi* is the long run statement of Yi on the covariate Xi.

The aim of mannequin choice is to seek out the subset S that minimizes the prediction threat R(S).

With information, we can’t immediately compute the prediction threat R(S). On this state of affairs, we typically use its estimate  primarily based on the obtainable information. The estimation of the prediction threat is used as our scoring criterion.

The naive estimate of the prediction threat that we will use is: the coaching error, which is outlined as:

the place Yi is the noticed worth of the response variable for the i-th statement.

Nevertheless, the coaching error could be very biased as an estimate of the prediction threat. It’s all the time smaller than the prediction threat . In reality,

What explains this biaïs is that the info is used twice: as soon as to suit the mannequin and as soon as to compute the coaching error. After we match a fancy mannequin, with many parameters, the covariance 𝐶𝑜𝑣(Ŷᵢ(S), 𝑌ᵢ) can be giant and the bias of the coaching error will worsen. That is why we have to use a extra dependable estimate of the prediction threat.

2.1 Mallow’s Cp statistic

The Mallow’s Cp statistic is a well-liked technique for mannequin choice. It’s outlined as:

Right here, |𝑆| is the variety of phrases in 𝑆, and σ̂² is the estimate of σ², the variance of the error time period obtained from the total mannequin with all variables (𝑘). This worth represents the coaching error plus a bias correction. The primary time period measures how nicely the mannequin suits the info, whereas the second time period measures the mannequin’s complexity. The extra complicated the mannequin, the bigger the second time period can be, and consequently, the bigger the Mallow’s 𝐶ₚ statistic can be.

Mallow’s 𝐶ₚ statistic represents a trade-off between mannequin match and complexity. Discovering an excellent mannequin subsequently requires balancing these two features. The aim is to determine the mannequin that minimizes the Mallow’s 𝐶ₚ statistic.

Past Mallows’ CP, mannequin choice standards may also be derived from likelihood-based estimation with penalty phrases. This leads us to the following household of strategies.”

2.2 Probability and penalization

The strategy beneath to estimate the prediction threat relies on the utmost chance estimation of the parameters.

Within the speculation that the error time period is generally distributed, the chance operate is given by:

In case you compute the utmost chance estimate of the parameters β and σ², for the mannequin 𝑆, that has |𝑆| variables, you’re going to get respectively:
β̂(𝑆)mv = β̂(𝑆)mco and σ̂(𝑆)²mv= (1/𝑛) ∑ᵢ₌₁ⁿ (𝑌ᵢ − 𝑌̂ᵢ(𝑆))².

The log-likelihood of the mannequin for the mannequin $S$ which has $|S|$ variables is then given by:

Selecting the mannequin that maximizes the log-likelihood is equal to selecting the mannequin which have the smallest residual sum of squares (RSS), that’s:

With the intention to reduce a criterion, we use the unfavourable log-likelihood. The criterion is usually outlined as:

−2𝓁(𝑆) + 2|𝑆|·𝑓(𝑛)

the place 𝑓(𝑛) is a penalty operate that relies on the pattern dimension 𝑛.
This formulation permits us to outline the AIC and BIC standards, given as follows:

2.2.1 Akaike Data Criterion (AIC)

One other strategy to mannequin choice is the Akaike Data Criterion (AIC). The aim of the AIC is to determine the mannequin that minimizes data loss. In apply, this implies selecting the subset S that minimizes the AIC, outlined as:

AIC(𝑆) = −2𝓁ₛ + 2|𝑆|

the place 𝓁ₛ is the log-likelihood of mannequin 𝑆 evaluated on the most chance estimates of its parameters. Right here, 𝑓(𝑛) = 2.

This criterion might be considered as a mix of goodness of match and mannequin complexity.
When evaluating two fashions, the one with the decrease AIC worth is most well-liked.

2.2.2 Bayesian Data Criterion (BIC)

The Bayesian Data Criterion (BIC) is one other technique for mannequin choice. It’s just like the AIC, and BIC is outlined as:

BIC(𝑆) = −2𝓁ₛ + |𝑆|·log(𝑛)

the place 𝓁ₛ is the log-likelihood of mannequin 𝑆 evaluated on the most chance estimates of its parameters.

It’s referred to as the Bayesian Data Criterion as a result of it may be derived from a Bayesian perspective. Let 𝑆 = {𝑆₁, …, 𝑆ₘ} denote a set of fashions. If we assign every mannequin 𝑆ᵢ a previous chance π(𝑆ᵢ) = 1/𝑚, then the posterior chance of mannequin 𝑆ᵢ given the info is proportional to its chance. This results in the next expression:

Thus, selecting the mannequin that minimizes the BIC is equal to selecting the mannequin with the very best posterior chance given the info.

BIC additionally has an interpretation by way of minimal description size: it balances mannequin match towards complexity. As a result of its penalty time period is 𝑓(𝑛) = ½·log(𝑛), the BIC applies a stronger penalty than the AIC when 𝑛 > 7. In consequence, BIC sometimes selects extra parsimonious fashions than AIC, particularly because the pattern dimension grows.

As with AIC, when evaluating two fashions, the popular mannequin is the one with the decrease BIC.

2.3 Go away-One-Out Cross-Validation (LOOCV) and k-Fold Cross-Validation

One other extensively used technique for mannequin choice is leave-one-out cross-validation (LOOCV). On this strategy, the danger estimator is outlined as:

the place Ŷ₋ᵢ(𝑆) is the prediction for 𝑌ᵢ utilizing mannequin 𝑆 fitted on all observations besides the i-th one, and 𝑌ᵢ is the precise response for the i-th statement.

It may be proven that:

the place hᵢᵢ(𝑆) is the i-th diagonal factor of the hat matrix: HS = XS (XSᵀ XS)⁻¹ XSᵀ for the mannequin S.

This formulation reveals that it’s pointless to refit the mannequin repeatedly by leaving out one statement at a time. As a substitute, LOOCV might be computed immediately utilizing the fitted values and the hat matrix.

A pure extension of LOOCV is k-fold cross-validation, the place the info is partitioned into ok folds, and the mannequin is educated on ok − 1 folds and validated on the remaining fold. This course of is repeated throughout all folds, and the outcomes are averaged to estimate prediction error.

Ok-Fold Cross-Validation

On this strategy, the info is split into 𝑘 teams, or folds (generally 𝑘 = 5 or 𝑘 = 10). One fold is not noted, and the mannequin is fitted on the remaining 𝑘 − 1 folds. The fitted mannequin is then used to foretell the responses within the omitted fold. The chance for that fold is estimated as:

the place the sum is taken over all observations within the omitted fold. This process is repeated for every of the 𝑘 folds, and the general threat estimate is obtained by averaging the 𝑘 particular person threat values.
This technique is especially appropriate when the first aim of regression is prediction. On this setting, various efficiency measures [such as the Mean Absolute Error (MAE) or the Root Mean Squared Error (RMSE)] may also be used to guage predictive accuracy.

2.4 Different standards

Within the literature, along with the standards mentioned above, a number of different measures are generally used for mannequin choice. One extensively used choice is the adjusted coefficient of dedication, outlined as:

One other strategy is to make use of nested mannequin checks, such because the F-test. The F-test compares two nested fashions: a smaller mannequin 𝑆₁, whose covariates type a subset of these in a bigger mannequin 𝑆₂. The null speculation states that the extra variables in 𝑆₂ don’t considerably enhance the match relative to 𝑆₁.

General, the strategies offered above primarily deal with the 2 central targets of linear regression: parameter estimation and variable choice.

Having outlined a number of methods to attain a mannequin, the remaining query is tips on how to search the set of candidates to seek out the mannequin with the perfect rating.

3. Choice Process

As soon as fashions might be scored, the following step is to go looking both the whole area of doable fashions or a specific subset to determine the one with the perfect rating. With 𝑘 covariates, there are 2ok−1 doable fashions [a number that quickly becomes impractical for large 𝑘 (for instance, more than one million models when 𝑘 = 20]. In such circumstances, exhaustive search is computationally infeasible, and heuristic strategies are most well-liked. Broadly, mannequin choice methods fall into two classes: exhaustive search and stepwise search.

3.1 Exhaustive Search

This strategy evaluates each doable mannequin and selects the one with the perfect rating. It’s possible solely when 𝑘 is small, because the computational burden turns into prohibitive with a lot of covariates.

3.2 Stepwise Search

Stepwise strategies goal to determine a native optimum—a mannequin that performs higher than its fast neighbors. These strategies are typically advisable solely when exhaustive search will not be possible (e.g., when each 𝑛 and 𝑝 are giant).

3.2.1 Ahead Stepwise Choice

  • Select a scoring criterion (e.g., AIC, BIC, Mallows’ 𝐶ₚ).
  • Begin with an empty mannequin.
  • At every step, add the variable that gives the best enchancment within the criterion.
  • Proceed till no variable improves the rating or all variables are included within the mannequin.

3.2.2 Backward Stepwise Choice

  • Select a scoring criterion (e.g., AIC, BIC, Mallows’ 𝐶ₚ).
  • Begin with the total mannequin containing all variables.
  • At every step, take away the variable whose elimination yields the best enchancment within the criterion.
  • Proceed till no additional enchancment is feasible or solely the important variables stay.

3.2.3 Stepwise Choice (Combined Methodology)

  • Select a scoring criterion (e.g., AIC, BIC, Mallows’ 𝐶ₚ).
  • Begin with an empty mannequin and add variables one by one, as in ahead choice, till no variable additional improves the rating.
  • Then proceed as in backward choice, eradicating variables one by one if doing so improves the criterion.
  • Cease when no extra enchancment might be achieved or when all variables are included.

The subsequent part is tips on how to apply it on actual information.

4. Software

In apply, earlier than making use of mannequin choice strategies, it’s important to make sure that the covariates aren’t extremely correlated. The next process might be utilized:

  1. Preliminary filtering: Take away covariates which might be clearly irrelevant to the response variable (primarily based on knowledgeable judgment, remedy of lacking values, and so on.).
  2. Correlation with the response variable: Outline a threshold for the correlation between every covariate and the response variable (e.g., 0.6). Covariates beneath this threshold could also be excluded. (Right here, we won’t apply this filter to retain a enough variety of covariates for choice.)
  3. Correlation amongst covariates: Outline a threshold for pairwise correlations between covariates (e.g., 0.7). Compute the correlation matrix; if two covariates exceed the brink, preserve the one with the strongest correlation with the response variable or the one with higher interpretability from a site perspective.
  4. Variance Inflation Issue (VIF): Compute the VIF for all remaining covariates. If a covariate’s VIF exceeds 5 or 10, it’s thought-about extremely collinear with others and must be eliminated.
  5. Mannequin choice: Apply the chosen mannequin choice strategies. On this case, we’ll use Mallows’ 𝐶ₚ because the scoring criterion and backward stepwise choice because the variable choice technique.

Lastly, we’ll implement a stepwise choice process that may incorporate all the standards mentioned above (AIC, BIC, Mallows’ 𝐶ₚ, and so on.) underneath both ahead or backward methods. This unified strategy will permit us to match fashions and choose the one which greatest balances goodness of match and complexity.

For instance the process, allow us to now current the dataset that can be used for the evaluation.

4.1 Presentation of the Dataset

We use the Communities and Crime dataset from the UCI Machine Studying Repository, which comprises socio-economic and demographic details about U.S. communities. The dataset consists of greater than 100 variables. The response variable is the variety of violent crimes per inhabitants (violentCrimesPerPop). Our aim is to use the mannequin choice strategies mentioned above to determine the covariates most strongly related to this response.

4.2 Dealing with Lacking Values

For this evaluation, we take away all rows containing lacking values.
An alternate technique can be to:

  • Drop variables with a excessive proportion of missingness (e.g., >10%), and
  • Assess whether or not the remaining lacking values are Lacking At Random (MAR), Lacking Utterly at Random (MCAR), or Lacking Not at Random (MNAR), making use of an applicable imputation technique if vital.

Right here, nonetheless, we undertake the less complicated strategy of discarding all incomplete rows. After this step, the dataset comprises no lacking values and consists of 103 variables in whole: the response variable violentCrimesPerPop plus the covariates.

4.3 Choice of Related Variables Utilizing Skilled Judgment

We then apply knowledgeable judgment to evaluate the relevance of every variable and decide whether or not its correlation with the response is significant. This requires session between the statistician and area specialists to grasp the context and significance of every covariate.

For this dataset, we take away:

  • communityname (a categorical variable with many ranges), and
  • fold (a technical variable used just for cross-validation).

After this filtering step, we retain 101 variables: the response violentCrimesPerPop and 100 covariates.

4.4 Lowering Covariates Utilizing a Correlation Threshold

To additional scale back dimensionality, we compute the correlation matrix of the covariates and the response. When a number of covariates are extremely correlated with one another (correlation > 0.6), we retain solely the one with the strongest correlation to the response. This process reduces redundancy whereas mitigating multicollinearity.

After making use of this filtering and computing the Variance Inflation Issue (VIF), we retain a remaining set of 19 covariates with VIF values beneath 5.

Desk 1 : Variance inflation issue of covariates.

These preprocessing steps are defined in higher element in my article: Characteristic Choice. Now, allow us to apply our choice process to determine probably the most related variables

4.5 Mannequin Choice with Stepwise Choice

With 19 variables, the full variety of doable fashions is 219-1 = 524,287, which might be computationally infeasible for a lot of techniques. To cut back the search area, we use a stepwise choice process. We implement a operate, stepwise_selection, that identifies probably the most related variables primarily based on a selected choice criterion and technique (ahead, backward, or combined). On this instance, we use Mallows’ 𝐶ₚ as the choice criterion and apply each ahead and backward stepwise choice strategies.

4.5.1 Backward Stepwise Choice Utilizing Mallows’ 𝐶ₚ

Making use of backward choice with Mallows’ 𝐶ₚ, we proceed as follows:

  • Step 1: Take away pctWFarmSelf. Its exclusion reduces the criterion to 𝐶ₚ = 41.74, decrease than the total mannequin.
  • Step 2: Take away PctWOFullPlumb. This additional decreases 𝐶ₚ to 41.69895.
  • Step 3: Take away indianPerCap. The criterion is decreased once more to 𝐶ₚ = 41.66073.

In whole, three variables are eliminated, yielding the ultimate mannequin.

4.5.2 Ahead Stepwise Choice Utilizing Mallows’ 𝐶ₚ

Ahead stepwise choice is usually advisable when the variety of variables is giant, since it’s much less computationally demanding than backward choice. Ranging from an empty mannequin, variables are added sequentially, one by one, in accordance with the criterion enchancment.

On this instance, ahead choice identifies the identical set of variables as backward choice. The Determine 1 beneath illustrates the sequence of variables added to the mannequin, together with their corresponding 𝐶ₚ values. The method begins with PctKids2Par, adopted by PctWorkMom, LandArea, and continues till the ultimate mannequin is reached, attaining a criterion worth of 𝐶ₚ = 41.66.

Determine 1. Variable choice primarily based on Mallows’ Cp​ criterion. The corresponding Python implementation is supplied within the appendix.

Warning! This doesn’t but deal with the query of which variables are causes of the unbiased variable.

Conclusion

On this article, we addressed the query of mannequin choice. The core precept of the process is to assign a rating to every mannequin as a way to measure its high quality, after which to go looking via the set of doable fashions to determine the one with the perfect rating. This rating is outlined by balancing each the standard of match and the complexity of the mannequin.

Among the many obtainable procedures, we offered the stepwise ahead and backward strategies, which we carried out in Python. We utilized them utilizing completely different analysis standards: AIC, BIC, and Mallows’ CP.

These strategies, nonetheless, have a limitation: they discover a subset of all doable fashions. In consequence, the chosen fashions might typically characterize oversimplifications of actuality. Nonetheless, they continue to be very helpful when the variety of variables is giant and exhaustive approaches turn out to be computationally too costly.

Lastly, when coping with regression for predictive functions, it’s important to separate the dataset into two elements: coaching and check units. Variable choice have to be carried out solely on the coaching set; and by no means on the check set as a way to guarantee an sincere analysis of the mannequin’s predictive efficiency.

Picture Credit

All photos and visualizations on this article had been created by the writer utilizing Python (pandas, matplotlib, seaborn, and plotly) and excel, except in any other case acknowledged.

References

Wasserman, L. (2013). All of statistics: a concise course in statistical inference. Springer Science & Enterprise Media.

Redmond, M. (2002). Communities and Crime [Dataset]. UCI Machine Studying Repository. https://doi.org/10.24432/C53W3X.

Cornillon, P. A., Hengartner, N., Matzner-Løber, E., & Rouvière, L. (2023). Régression avec R: 3ème édition. In Régression avec R. EDP sciences.

Knowledge & Licensing

The dataset used on this article is licensed underneath the Inventive Commons Attribution 4.0 Worldwide (CC BY 4.0) license.

This license permits anybody to share and adapt the dataset for any goal, together with industrial use, supplied that correct attribution is given to the supply.

For extra particulars, see the official license textual content: CC BY 4.0.

Disclaimer

Any remaining errors or inaccuracies are the writer’s duty. Suggestions and corrections are welcome.

Codes

import numpy as np
import statsmodels.api as sm

def compute_score(y, X, vars_to_test, metric, full_model_mse=None):
    X_train = sm.add_constant(X[vars_to_test])
    mannequin = sm.OLS(y, X_train).match()
    n = len(y)
    p = len(vars_to_test) + 1  # +1 pour la constante

    if metric == 'AIC':
        return mannequin.aic

    elif metric == 'BIC':
        return mannequin.bic

    elif metric == 'Cp':
        if full_model_mse is None:
            elevate ValueError("full_model_mse doit être fourni pour calculer Cp Mallows.")
        rss = sum(mannequin.resid ** 2)
        return rss + 2 * p * full_model_mse

    elif metric == 'R2_adj':
        return -model.rsquared_adj  # négatif pour maximiser

    else:
        elevate ValueError("Métrique inconnue. Utilisez 'AIC', 'BIC', 'Cp' ou 'R2_adj'.")

def get_best_candidate(y, X, chosen, candidates, metric, technique, full_model_mse=None):
    scores_with_candidates = []
    for candidate in candidates:
        vars_to_test = chosen + [candidate] if technique == 'ahead' else [var for var in selected if var != candidate]
        rating = compute_score(y, X, vars_to_test, metric, full_model_mse)
        scores_with_candidates.append((rating, candidate, vars_to_test))

    scores_with_candidates.type()
    print("Suppressions testées:", [(v, round(s, 2)) for s, v, _ in scores_with_candidates])
    return scores_with_candidates[0] if scores_with_candidates else (None, None, None)

def stepwise_selection(df, goal, technique='ahead', metric='AIC', verbose=True):
    if df.isnull().values.any():
        elevate ValueError("Des valeurs manquantes sont présentes dans le DataFrame.")

    X = df.drop(columns=[target])
    y = df[target]
    variables = record(X.columns)

    chosen = [] if technique == 'ahead' else variables.copy()
    remaining = variables.copy() if technique == 'ahead' else []

    # Calcul préalable du MSE du modèle complet pour Cp Mallows
    if metric == 'Cp':
        X_full = sm.add_constant(X)
        full_model = sm.OLS(y, X_full).match()
        full_model_mse = sum(full_model.resid ** 2) / (len(y) - len(variables) - 1)
    else:
        full_model_mse = None

    current_score = np.inf
    historical past = []
    step = 0

    whereas True:
        step += 1
        candidates = remaining if technique == 'ahead' else chosen
        best_score, best_candidate, vars_to_test = get_best_candidate(y, X, chosen, candidates, metric, technique, full_model_mse)

        if best_candidate is None:
            if verbose:
                print("Aucun candidat disponible.")
            break

        if verbose:
            motion = "ajouter" if technique == 'ahead' else "retirer"
            print(f"nÉtape {step}: Meilleure variable à {motion} : {best_candidate} (rating={spherical(best_score,5)})")


        enchancment = best_score < current_score - 1e-6

        if enchancment:
            if technique == 'ahead':
                chosen.append(best_candidate)
                remaining.take away(best_candidate)
            else:
                chosen.take away(best_candidate)

            current_score = best_score
            historical past.append({
                'step': step,
                'chosen': chosen.copy(),
                'rating': current_score,
                'modified': best_candidate
            })
        else:
            if verbose:
                print("Aucune amélioration supplémentaire du rating.")
            break

    X_final = sm.add_constant(X[selected])
    best_model = sm.OLS(y, X_final).match()

    if verbose:
        print("nVariables sélectionnées :", chosen)
        final_score = best_model.aic if metric == 'AIC' else best_model.bic
        if metric == 'Cp':
            final_score = compute_score(y, X, chosen, metric, full_model_mse)
        elif metric == 'R2_adj':
            final_score = -compute_score(y, X, chosen, metric)
        print(f"Rating remaining ({metric}): {spherical(final_score,5)}")

    return chosen, best_model, historical past
import matplotlib.pyplot as plt


def plot_stepwise_crosses(historical past, all_vars, metric="AIC", title=None):
    """
    Affiche le graphique stepwise kind heatmap à croix :
    - X : variables explicatives modifiées à au moins une étape (ordre d'apparition)
    - Y : rating (AIC/BIC) à chaque étape (de l'historique)
    - Croix noire : variable modifiée à chaque étape
    - Vide ailleurs
    - Courbe du rating
    """
    n_steps = len(historical past)
    scores = [h['score'] for h in historical past]
    
    # Extraire la liste ordonnée des variables effectivement modifiées
    modified_vars = []
    for h in historical past:
        var = h['modified']
        if var not in modified_vars and var will not be None:
            modified_vars.append(var)
    
    n_mod_vars = len(modified_vars)
    
    # Building des positions X pour les croix (selon modified_vars)
    mod_pos = [modified_vars.index(h['modified']) if h['modified'] in modified_vars else None for h in historical past]

    fig, ax = plt.subplots(figsize=(min(1.3 * n_mod_vars, 8), 6))
    # Placer la croix noire à chaque étape
    for i, x in enumerate(mod_pos):
        if x will not be None:
            ax.scatter(x, scores[i], colour='black', marker='x', s=100, zorder=3)
    # Tracer la courbe du rating
    ax.plot(vary(n_steps), scores, colour='grey', alpha=0.7, linewidth=2, zorder=1)
    # Axe X : labels verticaux, police réduite (uniquement variables modifiées)
    ax.set_xticks(vary(n_mod_vars))
    ax.set_xticklabels(modified_vars, rotation=90, fontsize=10)
    ax.set_xlabel("Variables modifiées")
    ax.set_ylabel(metric)
    ax.set_title(title or f"Stepwise ({metric}) – Variables modifiées à chaque étape")
    ax.grid(True, axis='y', alpha=0.2)
    plt.tight_layout()
    plt.present()
Tags: ImproveModelsPythonRegressionSelectionSimpleStepwise

Related Posts

Mike von 2hzl3nmoozs unsplash scaled 1.jpg
Machine Learning

If we use AI to do our work – what’s our job, then?

September 13, 2025
Mlm ipc 10 python one liners ml practitioners 1024x683.png
Machine Learning

10 Python One-Liners Each Machine Studying Practitioner Ought to Know

September 12, 2025
Luna wang s01fgc mfqw unsplash 1.jpg
Machine Learning

When A Distinction Truly Makes A Distinction

September 11, 2025
Mlm ipc roc auc vs precision recall imblanced data 1024x683.png
Machine Learning

ROC AUC vs Precision-Recall for Imbalanced Knowledge

September 10, 2025
Langchain for eda build a csv sanity check agent in python.png
Machine Learning

LangChain for EDA: Construct a CSV Sanity-Examine Agent in Python

September 9, 2025
Jakub zerdzicki a 90g6ta56a unsplash scaled 1.jpg
Machine Learning

Implementing the Espresso Machine in Python

September 8, 2025
Next Post
0195439a d27e 73d7 8470 90c1bccae141.jpeg

Crypto Candidates Usually Fail In The Interview Course of — This Is Why

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025
Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
1da3lz S3h Cujupuolbtvw.png

Scaling Statistics: Incremental Customary Deviation in SQL with dbt | by Yuval Gorchover | Jan, 2025

January 2, 2025
0khns0 Djocjfzxyr.jpeg

Constructing Data Graphs with LLM Graph Transformer | by Tomaz Bratanic | Nov, 2024

November 5, 2024
How To Maintain Data Quality In The Supply Chain Feature.jpg

Find out how to Preserve Knowledge High quality within the Provide Chain

September 8, 2024

EDITOR'S PICK

Img Lmf6tnvaaz1e1adpvlaplqmz 800x457.jpg

Bitcoin’s lead narrows however nonetheless outpaces different belongings regardless of Q3 downturn: NYDIG

October 7, 2024
Trump 4 800x420.png

Trump Jr. desires to ‘make finance nice once more’ with new DeFi challenge

August 24, 2024
0b1gjwghk0qyedtnm.jpeg

Cease Creating Dangerous DAGs — Optimize Your Airflow Setting By Enhancing Your Python Code | by Alvaro Leandro Cavalcante Carneiro | Jan, 2025

January 30, 2025
12cerymz2xid3pzaoeczetq.png

Full MLOPS Cycle for a Laptop Imaginative and prescient Venture | by Yağmur Çiğdem Aktaş | Nov, 2024

November 29, 2024

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • Generalists Can Additionally Dig Deep
  • If we use AI to do our work – what’s our job, then?
  • ‘Sturdy Likelihood’ Of US Forming Strategic Bitcoin Reserve In 2025
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?