• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Thursday, June 19, 2025
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Artificial Intelligence

I Gained $10,000 in a Machine Studying Competitors — Right here’s My Full Technique

Admin by Admin
June 16, 2025
in Artificial Intelligence
0
1750094343 default image.jpg
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

READ ALSO

Past Code Era: Constantly Evolve Textual content with LLMs

Pc Imaginative and prescient’s Annotation Bottleneck Is Lastly Breaking


in my first ML competitors and truthfully, I’m nonetheless a bit shocked.

I’ve labored as an information scientist in FinTech for six years. Once I noticed that Spectral Finance was working a credit score scoring problem for Web3 wallets, I made a decision to provide it a strive regardless of having zero blockchain expertise.

Right here have been my limitations:

  • I used my pc, which has no GPUs
  • I solely had a weekend (~10 hours) to work on it
  • I had by no means touched web3 or blockchain knowledge earlier than
  • I had by no means constructed a neural community for credit score scoring

The competitors aim was easy: predict which Web3 wallets have been more likely to default on loans utilizing their transaction historical past. Basically, conventional credit score scoring however with DeFi knowledge as an alternative of financial institution statements.

To my shock, I got here second and gained $10k in USD Coin! Sadly, Spectral Finance has since taken the competitors web site and leaderboard down, however right here’s a screenshot from after I gained:

My username was Ds-clau, second place with a rating of 83.66 (picture by creator)

This expertise taught me that understanding the enterprise drawback actually issues. On this submit, I’ll present you precisely how I did it with detailed explanations and Python code snippets, so you possibly can replicate this strategy in your subsequent machine studying undertaking or competitors.

Getting Began: You Don’t Want Costly {Hardware}

Let me get this clear, you don’t essentially want an costly cloud computing setup to win ML competitions (until the dataset is simply too massive to suit domestically).

The dataset for this competitors contained 77 options and 443k rows, which isn’t small by any means. The information got here as a .parquet file that I downloaded utilizing duckdb.

I used my private laptop computer, a MacBook Professional with 16GB RAM and no GPU. Your complete dataset match domestically on my laptop computer, although I have to admit the coaching course of was a bit sluggish.

Perception: Intelligent sampling methods get you 90% of the insights with out the excessive computational prices. Many individuals get intimidated by massive datasets and assume they want massive cloud cases. You can begin a undertaking domestically by sampling a portion of the dataset and analyzing the pattern first.

EDA: Know Your Knowledge

Right here’s the place my fintech background grew to become my superpower, and I approached this like some other credit score danger drawback.

First query in credit score scoring: What’s the category distribution?

Seeing the 62/38 break up made me shiver… 38% is a very excessive default fee from a enterprise perspective, however fortunately, the competitors wasn’t about pricing this product.

Subsequent, I needed to see which options really mattered:

That is the place I acquired excited. The patterns have been precisely what I’d anticipate from credit score knowledge:

  • risk_factor was the strongest predictor and confirmed > 0.4 correlation with the goal variable (increased danger actor = extra more likely to default)
  • time_since_last_liquidated confirmed a robust unfavorable correlation, so the extra lately they final liquidated, they riskier they have been. This strains up as anticipated, since excessive velocity is often a excessive danger sign (current liquidation = dangerous)
  • liquidation_count_sum_eth urged that debtors with increased liquidation counts in ETH have been danger flags (extra liquidations = riskier behaviour)

Perception: Taking a look at Pearson correlation is a straightforward but intuitive solution to perceive linear relationships between options and the goal variable. It’s a good way to achieve instinct on which options ought to and shouldn’t be included in your last mannequin.

Characteristic Choice: Much less is Extra

Right here’s one thing that all the time puzzles executives after I clarify this to them:

Extra options doesn’t all the time imply higher efficiency.

The truth is, too many options often imply worse efficiency and slower coaching, as a result of further options add noise. Each irrelevant function makes your mannequin a little bit bit worse at discovering the true patterns.

So, function choice is a vital step that I by no means skip. I used recursive function elimination to search out the optimum variety of options. Let me stroll you thru my precise course of:

The candy spot was 34 options. After this level, the mannequin efficiency as measured by the AUC rating didn’t enhance with extra options. So, I ended up utilizing lower than half of the given options to coach my mannequin, going from 77 options right down to 34.

Perception: This discount in options eradicated noise whereas preserving sign from the vital options, resulting in a mannequin that was each quicker to coach and extra predictive.

Constructing the Neural Community: Easy But Highly effective Structure

Earlier than defining the mannequin structure, I needed to outline the dataset correctly:

  1. Cut up into coaching and validation units (for verifying outcomes after mannequin coaching)
  2. Scale options as a result of neural networks are very delicate to outliers
  3. Convert datasets to PyTorch tensors for environment friendly computation

Right here’s my precise knowledge preprocessing pipeline:

Now comes the enjoyable half: constructing the precise neural community mannequin.

Necessary context: Spectral Finance (the competitors organizer) restricted mannequin deployments to solely neural networks and logistic regression due to their zero-knowledge proof system.

ZK proofs require mathematical circuits that may cryptographically confirm computations with out revealing underlying knowledge, and neural networks and logistic regression will be effectively transformed into ZK circuits.

Because it was my first time constructing a neural community for credit score scoring, I needed to maintain issues easy however efficient. Right here’s my mannequin structure:

Let’s stroll via my structure alternative intimately:

  • 5 hidden layers: Deep sufficient to seize complicated patterns, shallow sufficient to keep away from overfitting
  • 64 neurons per layer: Good stability between capability and computational effectivity
  • ReLU activation: Commonplace alternative for hidden layers, prevents vanishing gradients
  • Dropout (0.2): Prevents overfitting by randomly zeroing 20% of neurons throughout coaching
  • Sigmoid output: perfect for binary classification, outputs chances between 0 and 1

Coaching the Mannequin: The place the Magic Occurs

Now for the coaching loop that kicks off the mannequin studying course of:

Listed below are some particulars on the mannequin coaching course of:

  • Early stopping: Prevents overfitting by stopping when validation efficiency stops enhancing
  • SGD with momentum: Easy however efficient optimizer alternative
  • Validation monitoring: Important for monitoring actual efficiency, not simply coaching loss

The coaching curves confirmed regular enhancements with out overfitting throughout the coaching course of. That is precisely what I needed to see.

Model training loss surves
Mannequin coaching loss curves (picture by creator)

The Secret Weapon: Threshold Optimization

Right here’s the place I in all probability outperformed others with extra sophisticated fashions within the competitors: I guess most individuals submitted predictions with the default 0.5 threshold.

However as a result of class imbalance (~38% of loans defaulted), I knew that the default threshold could be suboptimal. So, I used precision-recall evaluation to select a greater cutoff.

I ended up maximizing the F1 rating, which is the harmonic imply between precision and recall. The optimum threshold based mostly on the best F1 rating was 0.35 as an alternative of 0.5. This single change improved my competitors rating by a number of proportion factors, doubtless the distinction between putting and successful.

Perception: In the true world, various kinds of errors have completely different prices. Lacking a default loses you cash, which rejecting a great buyer simply loses you potential revenue. The edge ought to mirror this actuality and shouldn’t be set arbitrarily at 0.5.

Conclusion

This competitors bolstered one thing I’ve identified for some time:

Success in machine studying isn’t about having the fanciest instruments or probably the most complicated algorithms.

It’s about understanding your drawback, making use of stable fundamentals, and specializing in what really strikes the needle.

You don’t want a PhD to be an information scientist or win a ML competitors.

You don’t have to implement the most recent analysis papers.

You additionally don’t want costly cloud assets.

What you do want is area data, stable fundamentals, consideration to particulars that others may overlook (like threshold optimization).


Need to construct your AI expertise?

👉🏻 I run the AI Weekender, which options enjoyable weekend AI tasks and fast, sensible ideas that will help you construct with AI.

Tags: competitionCompleteHeresLearningMachineStrategyWon

Related Posts

0 fx1lkzojp1meik9s.webp.webp
Artificial Intelligence

Past Code Era: Constantly Evolve Textual content with LLMs

June 19, 2025
Matt briney 0tfz7zoxawc unsplash scaled.jpg
Artificial Intelligence

Pc Imaginative and prescient’s Annotation Bottleneck Is Lastly Breaking

June 18, 2025
Chris ried ieic5tq8ymk unsplash scaled 1.jpg
Artificial Intelligence

Summary Courses: A Software program Engineering Idea Information Scientists Should Know To Succeed

June 18, 2025
Coverimage.png
Artificial Intelligence

Grad-CAM from Scratch with PyTorch Hooks

June 17, 2025
Chatgpt image 11 juin 2025 21 55 10 1024x683.png
Artificial Intelligence

Exploring the Proportional Odds Mannequin for Ordinal Logistic Regression

June 16, 2025
Chatgpt image 11 juin 2025 09 16 53 1024x683.png
Artificial Intelligence

Design Smarter Prompts and Increase Your LLM Output: Actual Tips from an AI Engineer’s Toolbox

June 15, 2025
Next Post
Kdn chugani polars pandas users blazing fast dataframe alternatives.png

Polars for Pandas Customers: A Blazing Quick DataFrame Different

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025
Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
1da3lz S3h Cujupuolbtvw.png

Scaling Statistics: Incremental Customary Deviation in SQL with dbt | by Yuval Gorchover | Jan, 2025

January 2, 2025
How To Maintain Data Quality In The Supply Chain Feature.jpg

Find out how to Preserve Knowledge High quality within the Provide Chain

September 8, 2024
0khns0 Djocjfzxyr.jpeg

Constructing Data Graphs with LLM Graph Transformer | by Tomaz Bratanic | Nov, 2024

November 5, 2024

EDITOR'S PICK

1uidx2hd5dcalyh1ehwpmba.gif

How I Turned IPL Stats right into a Mesmerizing Bar Chart Race | by Tezan Sahu | Oct, 2024

October 10, 2024
Shutterstock happy robot.jpg

California trims AI security invoice amid fears of tech exodus • The Register

August 17, 2024
Unnamed 2024 05 23t181407.835.jpg

Sui Declares Profitable Deployment of Mysticeti on Mainnet, Chopping Consensus Latency to 390 Milliseconds

August 6, 2024
1hiagb1cl06bpxyylzm09uq.png

Cease Guessing and Measure Your RAG System to Drive Actual Enhancements | by Abhinav Kimothi | Oct, 2024

October 4, 2024

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • Past Code Era: Constantly Evolve Textual content with LLMs
  • Past Hashtags: The Rising Tech Instruments and Methods Powering Social Media Promotions
  • North Korean dev hijacks dormant Waves repositories, slips credential-stealing code in pockets updates
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?