Journey to Full-Stack Information Scientist: Mannequin Deployment | by Alex Davis

First, for our instance, we have to develop a mannequin. Since this text focuses on mannequin deployment, we won’t fear in regards to the efficiency of the mannequin. As an alternative, we are going to construct a easy mannequin with restricted options to give attention to studying mannequin deployment.

On this instance, we are going to predict a knowledge skilled’s wage based mostly on just a few options, equivalent to expertise, job title, firm measurement, and many others.

See information right here: https://www.kaggle.com/datasets/ruchi798/data-science-job-salaries (CC0: Public Area). I barely modified the info to cut back the variety of choices for sure options.

#import packages for information manipulation
import pandas as pd
import numpy as np#import packages for machine studying
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder
from sklearn.metrics import mean_squared_error, r2_score
#import packages for information administration
import joblib

First, let’s check out the info.

Since all of our options are categorical, we are going to use encoding to remodel our information to numerical. Under, we use ordinal encoders to encode expertise degree and firm measurement. These are ordinal as a result of they characterize some type of development (1 = entry degree, 2 = mid-level, and many others.).

For job title and employment sort, we are going to create a dummy variables for every choice (observe we drop the primary to keep away from multicollinearity).

#use ordinal encoder to encode expertise degree
encoder = OrdinalEncoder(classes=[['EN', 'MI', 'SE', 'EX']])
salary_data['experience_level_encoded'] = encoder.fit_transform(salary_data[['experience_level']])#use ordinal encoder to encode firm measurement
encoder = OrdinalEncoder(classes=[['S', 'M', 'L']])
salary_data['company_size_encoded'] = encoder.fit_transform(salary_data[['company_size']])
#encode employmeny sort and job title utilizing dummy columns
salary_data = pd.get_dummies(salary_data, columns = ['employment_type', 'job_title'], drop_first = True, dtype = int)
#drop unique columns
salary_data = salary_data.drop(columns = ['experience_level', 'company_size'])

Now that we now have reworked our mannequin inputs, we are able to create our coaching and take a look at units. We’ll enter these options right into a easy linear regression mannequin to foretell the worker’s wage.

#outline unbiased and dependent options
X = salary_data.drop(columns = 'salary_in_usd')
y = salary_data['salary_in_usd']#break up between coaching and testing units
X_train, X_test, y_train, y_test = train_test_split(
X, y, random_state = 104, test_size = 0.2, shuffle = True)
#match linear regression mannequin
regr = linear_model.LinearRegression()
regr.match(X_train, y_train)
#make predictions
y_pred = regr.predict(X_test)
#print the coefficients
print("Coefficients: n", regr.coef_)
#print the MSE
print("Imply squared error: %.2f" % mean_squared_error(y_test, y_pred))
#print the adjusted R2 worth
print("R2: %.2f" % r2_score(y_test, y_pred))

Let’s see how our mannequin did.

Scaling Characteristic Engineering Pipelines with Feast and Ray

Optimizing Token Era in PyTorch Decoder Fashions

Seems like our R-squared is 0.27, yikes. Much more work would must be finished with this mannequin. We might possible want extra information and extra info on the observations. However for the sake of this text, we are going to transfer ahead and save our mannequin.

#save mannequin utilizing joblib
joblib.dump(regr, 'lin_regress.sav')

Journey to Full-Stack Information Scientist: Mannequin Deployment | by Alex Davis | Jan, 2025

READ ALSO

Scaling Characteristic Engineering Pipelines with Feast and Ray

Optimizing Token Era in PyTorch Decoder Fashions

Related Posts

Scaling Characteristic Engineering Pipelines with Feast and Ray

Optimizing Token Era in PyTorch Decoder Fashions

Is the AI and Knowledge Job Market Lifeless?

Construct Efficient Inner Tooling with Claude Code

The Actuality of Vibe Coding: AI Brokers and the Safety Debt Disaster

AI in A number of GPUs: How GPUs Talk

Predicting a Ball Trajectory. Polynomial Slot in Python with NumPy | by Florian Trautweiler | Jan, 2025

Leave a Reply Cancel reply

POPULAR NEWS

Chainlink’s Run to $20 Beneficial properties Steam Amid LINK Taking the Helm because the High Creating DeFi Challenge ⋆ ZyCrypto

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

Easy methods to Use LLMs for Highly effective Computerized Evaluations

XMN is accessible for buying and selling!

College endowments be a part of crypto rush, boosting meme cash like Meme Index

EDITOR'S PICK

Why We Ought to Concentrate on AI for Girls

Terraform Labs to open claims portal for collectors affected by UST collapse

Write for In direction of Information Science

Nursing Colleges Are Compelled to Adapt to Advances in AI

About Us

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?