, ML engineers, and software program builders, optimising each little bit of efficiency from our codebases is usually a essential consideration. If you’re a Python person, you’ll pay attention to a few of its deficits on this respect. Python is taken into account a gradual language, and also you’ve most likely heard that plenty of the explanation for this is because of its International Interpreter Lock (GIL) mechanism.
It’s what it’s, however what can we do about it? There are a number of methods we are able to ameliorate this problem when coding in Python, particularly if you happen to’re utilizing a fairly up-to-date model of Python.
- The very newest releases of Python have a method of working code with out utilizing the GIL.
- We will utilise high-performance third-party libraries, akin to NumPy, to carry out quantity crunching.
- There are additionally many strategies for parallel and concurrent processing constructed into the language now.
One different technique we are able to use is to name different high-performance languages from inside Python for time-critical sections of our code. That’s what we’ll cowl on this article as I present you tips on how to name Mojo code from Python.
Have you ever heard of Mojo earlier than? If not, right here’s a fast historical past lesson.
Mojo is a comparatively new systems-level language developed by Modular Inc. (an AI infrastructure firm co-founded in 2022 by compiler writing legend Chris Lattner, of LLVM and Swift creator fame, and former Google TPUs lead Tim Davis) and first proven publicly in Might 2023.
It was born from a easy ache level, i.e. Python’s lack of efficiency that we mentioned earlier. Mojo tackles this head-on by grafting a superset of Python’s syntax onto an LLVM/MLIR-based compiler pipeline that delivers zero-cost abstractions, static typing, ownership-based reminiscence administration, automated vectorisation, and seamless code era for CPU and GPU accelerators.
Early benchmarks demoed at its launch ran kernel-dense workloads as much as 35,000× sooner than vanilla Python, proving that Mojo can match — or exceed — the uncooked throughput of C/CUDA whereas letting builders keep in acquainted “pythonic” territory.
Nevertheless, there may be all the time a stumbling block, and that’s people’ inertia to maneuver totally to a brand new language. I’m a type of individuals, too, so I used to be delighted once I learn that, as of some weeks in the past, it was now potential to name Mojo code instantly from Python.
Does this imply we get the most effective of each worlds: the simplicity of Python and the efficiency of Mojo?
To check the claims, we are going to write some code utilizing vanilla Python. Then, for every, we’ll additionally code a model utilizing NumPy and, lastly, a Python model that offloads a few of its computation to a Mojo module. In the end, we’ll evaluate the varied run occasions.
Will we see vital efficiency good points? Learn on to search out out.
Establishing a improvement atmosphere
I’ll be utilizing WSL2 Ubuntu for Home windows for my improvement. One of the best follow is to arrange a brand new improvement atmosphere for every challenge you’re engaged on. I normally use conda for this, however as everybody and their granny appears to be shifting in the direction of utilizing the brand new uv bundle supervisor, I’m going to offer {that a} go as a substitute. There are a few methods you’ll be able to set up uv.
$ curl -LsSf https://astral.sh/uv/set up.sh | sh
or...
$ pip set up uv
Subsequent, initialise a challenge.
$ uv init mojo-test
$ cd mojo-test
$ uv venv
$ supply .venv/bin/activate
Initialized challenge `mojo-test` at `/dwelling/tom/tasks/mojo-test`
(mojo-test) $ cd mojo-test
(mojo-test) $ ls -al
complete 28
drwxr-xr-x 3 tom tom 4096 Jun 27 09:20 .
drwxr-xr-x 15 tom tom 4096 Jun 27 09:20 ..
drwxr-xr-x 7 tom tom 4096 Jun 27 09:20 .git
-rw-r--r-- 1 tom tom 109 Jun 27 09:20 .gitignore
-rw-r--r-- 1 tom tom 5 Jun 27 09:20 .python-version
-rw-r--r-- 1 tom tom 0 Jun 27 09:20 README.md
-rw-r--r-- 1 tom tom 87 Jun 27 09:20 important.py
-rw-r--r-- 1 tom tom 155 Jun 27 09:20 pyproject.toml
Now, add any exterior libraries we want
(mojo-test) $ uv pip set up modular numpy matplotlib
How does calling Mojo from Python work?
Let’s assume now we have the next easy Mojo operate that takes a Python variable as an argument and provides two to its worth. For instance,
# mojo_func.mojo
#
fn add_two(py_obj: PythonObject) raises -> Python
var n = Int(py_obj)
return n + 2
When Python is attempting to load add_two, it seems for a operate known as PyInit_add_two(). Inside PyInit_add_two(), now we have to declare all Mojo capabilities and kinds which might be callable from Python utilizing the PythonModuleBuilder library. So, in truth, our Mojo code in its closing type will resemble this.
from python import PythonObject
from python.bindings import PythonModuleBuilder
from os import abort
@export
fn PyInit_mojo_module() -> PythonObject:
strive:
var m = PythonModuleBuilder("mojo_func")
m.def_function[add_two]("add_two", docstring="Add 2 to n")
return m.finalize()
besides e:
return abort[PythonObject](String("Rrror creating Python Mojo module:", e))
fn add_two(py_obj: PythonObject) raises -> PythonObject:
var n = Int(py_obj)
n + 2
The Python code requires extra boilerplate code to operate accurately, as proven right here.
import max.mojo.importer
import sys
sys.path.insert(0, "")
import mojo_func
print(mojo_func.add_two(5))
# SHould print 7
Code examples
For every of my examples, I’ll present three totally different variations of the code. One can be written in pure Python, one will utilise NumPy to hurry issues up, and the opposite will substitute calls to Mojo the place applicable.
Be warned that calling Mojo code from Python is in early improvement. You may count on vital modifications to the API and ergonomics
Instance 1 — Calculating a Mandelbrot set
For our first instance, we’ll compute and show a Mandelbrot set. That is fairly computationally costly, and as we’ll see, the pure Python model takes a substantial period of time to finish.
We’ll want 4 information in complete for this instance.
1/ mandelbrot_pure_py.py
# mandelbrot_pure_py.py
def compute(width, peak, max_iters):
"""Generates a Mandelbrot set picture utilizing pure Python."""
picture = [[0] * width for _ in vary(peak)]
for row in vary(peak):
for col in vary(width):
c = advanced(-2.0 + 3.0 * col / width, -1.5 + 3.0 * row / peak)
z = 0
n = 0
whereas abs(z) <= 2 and n < max_iters:
z = z*z + c
n += 1
picture[row][col] = n
return picture
2/ mandelbrot_numpy.py
# mandelbrot_numpy.py
import numpy as np
def compute(width, peak, max_iters):
"""Generates a Mandelbrot set utilizing NumPy for vectorized computation."""
x = np.linspace(-2.0, 1.0, width)
y = np.linspace(-1.5, 1.5, peak)
c = x[:, np.newaxis] + 1j * y[np.newaxis, :]
z = np.zeros_like(c, dtype=np.complex128)
picture = np.zeros(c.form, dtype=int)
for n in vary(max_iters):
not_diverged = np.abs(z) <= 2
picture[not_diverged] = n
z[not_diverged] = z[not_diverged]**2 + c[not_diverged]
picture[np.abs(z) <= 2] = max_iters
return picture.T
3/ mandelbrot_mojo.mojo
# mandelbrot_mojo.mojo
from python import PythonObject, Python
from python.bindings import PythonModuleBuilder
from os import abort
from advanced import ComplexFloat64
# That is the core logic that may run quick in Mojo
fn compute_mandel_pixel(c: ComplexFloat64, max_iters: Int) -> Int:
var z = ComplexFloat64(0, 0)
var n: Int = 0
whereas n < max_iters:
# abs(z) > 2 is identical as z.norm() > 4, which is quicker
if z.norm() > 4.0:
break
z = z * z + c
n += 1
return n
# That is the operate that Python will name
fn mandelbrot_mojo_compute(width_obj: PythonObject, height_obj: PythonObject, max_iters_obj: PythonObject) raises -> PythonObject:
var width = Int(width_obj)
var peak = Int(height_obj)
var max_iters = Int(max_iters_obj)
# We'll construct a Python listing in Mojo to return the outcomes
var image_list = Python.listing()
for row in vary(peak):
# We create a nested listing to symbolize the 2D picture
var row_list = Python.listing()
for col in vary(width):
var c = ComplexFloat64(
-2.0 + 3.0 * col / width,
-1.5 + 3.0 * row / peak
)
var n = compute_mandel_pixel(c, max_iters)
row_list.append(n)
image_list.append(row_list)
return image_list
# That is the particular operate that "exports" our Mojo operate to Python
@export
fn PyInit_mandelbrot_mojo() -> PythonObject:
strive:
var m = PythonModuleBuilder("mandelbrot_mojo")
m.def_function[mandelbrot_mojo_compute]("compute", "Generates a Mandelbrot set.")
return m.finalize()
besides e:
return abort[PythonObject]("error creating mandelbrot_mojo module")
4/ important.py
This can name the opposite three packages and likewise enable us to plot out the Mandelbrot graph in a Jupyter Pocket book. I’ll solely present the plot as soon as. You’ll need to take my phrase that it was plotted accurately on all three runs of the code.
# important.py (Last model with visualization)
import time
import numpy as np
import sys
import matplotlib.pyplot as plt # Now, import pyplot
# --- Mojo Setup ---
strive:
import max.mojo.importer
besides ImportError:
print("Mojo importer not discovered. Please make sure the MODULAR_HOME and PATH are set accurately.")
sys.exit(1)
sys.path.insert(0, "")
# --- Import Our Modules ---
import mandelbrot_pure_py
import mandelbrot_numpy
import mandelbrot_mojo
# --- Visualization Operate ---
def visualize_mandelbrot(image_data, title="Mandelbrot Set"):
"""Shows the Mandelbrot set information as a picture utilizing Matplotlib."""
print(f"Displaying picture for: {title}")
plt.determine(figsize=(10, 8))
# 'scorching', 'inferno', and 'plasma' are all nice colormaps for this
plt.imshow(image_data, cmap='scorching', interpolation='bicubic')
plt.colorbar(label="Iterations")
plt.title(title)
plt.xlabel("Width")
plt.ylabel("Top")
plt.present()
# --- Check Runner ---
def run_test(identify, compute_func, *args):
"""A helper operate to run and time a check."""
print(f"Operating {identify} model...")
start_time = time.time()
# The compute operate returns the picture information
result_data = compute_func(*args)
period = time.time() - start_time
print(f"-> {identify} model took: {period:.4f} seconds")
# Return the information so we are able to visualize it
return result_data
if __name__ == "__main__":
WIDTH, HEIGHT, MAX_ITERS = 800, 600, 5000
print("Beginning Mandelbrot efficiency comparability...")
print("-" * 40)
# Run Pure Python Check
py_image = run_test("Pure Python", mandelbrot_pure_py.compute, WIDTH, HEIGHT, MAX_ITERS)
visualize_mandelbrot(py_image, "Pure Python Mandelbrot")
print("-" * 40)
# Run NumPy Check
np_image = run_test("NumPy", mandelbrot_numpy.compute, WIDTH, HEIGHT, MAX_ITERS)
# uncomment the under line if you wish to see the plot
#visualize_mandelbrot(np_image, "NumPy Mandelbrot")
print("-" * 40)
# Run Mojo Check
mojo_list_of_lists = run_test("Mojo", mandelbrot_mojo.compute, WIDTH, HEIGHT, MAX_ITERS)
# Convert Mojo's listing of lists right into a NumPy array for visualization
mojo_image = np.array(mojo_list_of_lists)
# uncomment the under line if you wish to see the plot
#visualize_mandelbrot(mojo_image, "Mojo Mandelbrot")
print("-" * 40)
print("Comparability full.")
Lastly, right here is the output.

Okay, in order that’s a powerful begin for Mojo. It was nearly 20 occasions sooner than the pure Python implementation and 5 occasions sooner than the NumPy code.
Instance 2 — Numerical integration
For this instance, we are going to carry out numerical integration utilizing Simpson’s rule to find out the worth of sin(X) within the interval 0 to π. Recall that Simpson’s rule is a technique of calculating an approximate worth for an integral and is outlined as,
∫ ≈ (h/3) * [f(x₀) + 4f(x₁) + 2f(x₂) + 4f(x₃) + … + 2f(xₙ-₂) + 4f(xₙ-₁) + f(xₙ)]
The place:
- h is the width of every step.
- The weights are 1, 4, 2, 4, 2, …, 4, 1.
- The primary and final factors have a weight of 1.
- The factors at odd indices have a weight of 4.
- The factors at even indices have a weight of 2.
The true worth of the integral we’re attempting to calculate is two. Let’s see how correct (and quick) our strategies are.
As soon as once more, we want 4 information.
1/ integration_pure_py.py
# integration_pure_py.py
import math
def compute(begin, finish, n):
"""Calculates the particular integral of sin(x) utilizing Simpson's rule."""
if n % 2 != 0:
n += 1 # Simpson's rule requires a fair variety of intervals
h = (finish - begin) / n
integral = math.sin(begin) + math.sin(finish)
for i in vary(1, n, 2):
integral += 4 * math.sin(begin + i * h)
for i in vary(2, n, 2):
integral += 2 * math.sin(begin + i * h)
integral *= h / 3
return integral
2/ integration_numpy
# integration_numpy.py
import numpy as np
def compute(begin, finish, n):
"""Calculates the particular integral of sin(x) utilizing NumPy."""
if n % 2 != 0:
n += 1
x = np.linspace(begin, finish, n + 1)
y = np.sin(x)
# Apply Simpson's rule weights: 1, 4, 2, 4, ..., 2, 4, 1
integral = (y[0] + y[-1] + 4 * np.sum(y[1:-1:2]) + 2 * np.sum(y[2:-1:2]))
h = (finish - begin) / n
3/ integration_mojo.mojo
# integration_mojo.mojo
from python import PythonObject, Python
from python.bindings import PythonModuleBuilder
from os import abort
from math import sin
# Be aware: The 'fn' key phrase is used right here because it's suitable with all variations.
fn compute_integral_mojo(start_obj: PythonObject, end_obj: PythonObject, n_obj: PythonObject) raises -> PythonObject:
# Bridge crossing occurs ONCE initially.
var begin = Float64(start_obj)
var finish = Float64(end_obj)
var n = Int(n_obj)
if n % 2 != 0:
n += 1
var h = (finish - begin) / n
# All computation under is on NATIVE Mojo varieties. No Python interop.
var integral = sin(begin) + sin(finish)
# First loop for the '4 * f(x)' phrases
var i_1: Int = 1
whereas i_1 < n:
integral += 4 * sin(begin + i_1 * h)
i_1 += 2
# Second loop for the '2 * f(x)' phrases
var i_2: Int = 2
whereas i_2 < n:
integral += 2 * sin(begin + i_2 * h)
i_2 += 2
integral *= h / 3
# Bridge crossing occurs ONCE on the finish.
return Python.float(integral)
@export
fn PyInit_integration_mojo() -> PythonObject:
strive:
var m = PythonModuleBuilder("integration_mojo")
m.def_function[compute_integral_mojo]("compute", "Calculates a particular integral in Mojo.")
return m.finalize()
besides e:
return abort[PythonObject]("error creating integration_mojo module")
4/ important.py
import time
import sys
import numpy as np
# --- Mojo Setup ---
strive:
import max.mojo.importer
besides ImportError:
print("Mojo importer not discovered. Please guarantee your atmosphere is ready up accurately.")
sys.exit(1)
sys.path.insert(0, "")
# --- Import Our Modules ---
import integration_pure_py
import integration_numpy
import integration_mojo
# --- Check Runner ---
def run_test(identify, compute_func, *args):
print(f"Operating {identify} model...")
start_time = time.time()
outcome = compute_func(*args)
period = time.time() - start_time
print(f"-> {identify} model took: {period:.4f} seconds")
print(f" End result: {outcome}")
# --- Essential Check Execution ---
if __name__ == "__main__":
# Use a really massive variety of steps to spotlight loop efficiency
START = 0.0
END = np.pi
NUM_STEPS = 100_000_000 # 100 million steps
print(f"Calculating integral of sin(x) from {START} to {END:.2f} with {NUM_STEPS:,} steps...")
print("-" * 50)
run_test("Pure Python", integration_pure_py.compute, START, END, NUM_STEPS)
print("-" * 50)
run_test("NumPy", integration_numpy.compute, START, END, NUM_STEPS)
print("-" * 50)
run_test("Mojo", integration_mojo.compute, START, END, NUM_STEPS)
print("-" * 50)
print("Comparability full.")
And the outcomes?
Calculating integral of sin(x) from 0.0 to three.14 with 100,000,000 steps...
--------------------------------------------------
Operating Pure Python model...
-> Pure Python model took: 4.9484 seconds
End result: 2.0000000000000346
--------------------------------------------------
Operating NumPy model...
-> NumPy model took: 0.7425 seconds
End result: 1.9999999999999998
--------------------------------------------------
Operating Mojo model...
-> Mojo model took: 0.8902 seconds
End result: 2.0000000000000346
--------------------------------------------------
Comparability full.
It’s fascinating that this time, the NumPy code was marginally sooner than the Mojo code, and its closing worth was extra correct. This highlights a key idea in high-performance computing: the trade-off between vectorisation and JIT-compiled loops.
NumPy’s power lies in its skill to vectorise operations. It allocates a big block of reminiscence after which calls extremely optimised, pre-compiled C code that leverages fashionable CPU options, akin to SIMD, to carry out the sin() operate on thousands and thousands of values concurrently. This “burst processing” is extremely environment friendly.
Mojo, however, takes our easy whereas loop and JIT-compiles it into extremely environment friendly machine code. Whereas this avoids the big preliminary reminiscence allocation of NumPy, on this particular case, the uncooked energy of NumPy’s vectorisation gave it a slight edge.
Instance 3— The sigmoid operate
The sigmoid operate is a vital idea in AI because it’s the cornerstone of binary classification.
Often known as the logistic operate, it’s outlined as this.

The sigmoid operate takes any real-valued enter x and “squashes” it easily into the open interval (0,1). In easy phrases, it doesn’t matter what is handed to the sigmoid operate, it’s going to all the time return a worth between 0 and 1.
So, for instance,
S(-197865) = 0
S(-2) = 0.0009111
S(3) = 0.9525741
S(10776.87) = 1
This makes it excellent for representing sure issues like possibilities.
As a result of the Python code is easier, we are able to embrace it within the benchmarking script, so we solely have two information this time.
sigmoid_mojo.mojo
from python import Python, PythonObject
from python.bindings import PythonModuleBuilder
from os import abort
from math import exp
from time import perf_counter
# ----------------------------------------------------------------------
# Quick Mojo routine (no Python calls inside)
# ----------------------------------------------------------------------
fn sigmoid_sum(n: Int) -> (Float64, Float64):
# deterministic fill, sized as soon as
var information = Checklist[Float64](size = n, fill = 0.0)
for i in vary(n):
information[i] = (Float64(i) / Float64(n)) * 10.0 - 5.0 # [-5, +5]
var t0: Float64 = perf_counter()
var complete: Float64 = 0.0
for x in information: # single tight loop
complete += 1.0 / (1.0 + exp(-x))
var elapsed: Float64 = perf_counter() - t0
return (complete, elapsed)
# ----------------------------------------------------------------------
# Python-visible wrapper
# ----------------------------------------------------------------------
fn py_sigmoid_sum(n_obj: PythonObject) raises -> PythonObject:
var n: Int = Int(n_obj) # validates arg
var (tot, secs) = sigmoid_sum(n)
# most secure container: construct a Python listing (auto-boxes scalars)
var out = Python.listing()
out.append(tot)
out.append(secs)
return out # -> PythonObject
# ----------------------------------------------------------------------
# Module initialiser (identify should match: PyInit_sigmoid_mojo)
# ----------------------------------------------------------------------
@export
fn PyInit_sigmoid_mojo() -> PythonObject:
strive:
var m = PythonModuleBuilder("sigmoid_mojo")
m.def_function[py_sigmoid_sum](
"sigmoid_sum",
"Return [total_sigmoid, elapsed_seconds]"
)
return m.finalize()
besides e:
# if something raises, give Python an actual ImportError
return abort[PythonObject]("error creating sigmoid_mojo module")
important.py
# bench_sigmoid.py
import time, math, numpy as np
N = 50_000_000
# --------------------------- pure-Python -----------------------------------
py_data = [(i / N) * 10.0 - 5.0 for i in range(N)]
t0 = time.perf_counter()
py_total = sum(1 / (1 + math.exp(-x)) for x in py_data)
print(f"Pure-Python : {time.perf_counter()-t0:6.3f} s - Σσ={py_total:,.1f}")
# --------------------------- NumPy -----------------------------------------
np_data = np.linspace(-5.0, 5.0, N, dtype=np.float64)
t0 = time.perf_counter()
np_total = float(np.sum(1 / (1 + np.exp(-np_data))))
print(f"NumPy : {time.perf_counter()-t0:6.3f} s - Σσ={np_total:,.1f}")
# --------------------------- Mojo ------------------------------------------
import max.mojo.importer # installs .mojo import hook
import sigmoid_mojo # compiles & masses shared object
mj_total, mj_secs = sigmoid_mojo.sigmoid_sum(N)
print(f"Mojo : {mj_secs:6.3f} s - Σσ={mj_total:,.1f}")
Right here is the output.
$ python sigmoid_bench.py
Pure-Python : 1.847 s - Σσ=24,999,999.5
NumPy : 0.323 s - Σσ=25,000,000.0
Mojo : 0.150 s - Σσ=24,999,999.5
The Σσ=… outputs are displaying the sum of all of the calculated Sigmoid values. In concept, this needs to be exactly equal to the enter N divided by 2, as N tends in the direction of infinity.
However as we see, the mojo implementation represents an honest uplift of over 2x on the already quick NumPy code and is over 12x sooner than the bottom Python implementation.
Not too shabby.
Abstract
This text explored the thrilling new functionality of calling high-performance Mojo code instantly from Python to speed up computationally intensive duties. Mojo, a comparatively new techniques programming language from Modular, guarantees C-level efficiency with a well-known Pythonic syntax, aiming to unravel Python’s historic pace limitations.
To check this promise, we benchmarked three computationally costly situations: Mandelbrot set era, numerical integration, and the sigmoid calculation operate, by implementing every in pure Python, optimised NumPy, and a hybrid Python-Mojo strategy.
The outcomes reveal a nuanced efficiency panorama for loop-heavy algorithms the place information might be processed totally with native Mojo varieties. Mojo can considerably outperform each pure Python and even extremely optimised NumPy code. Nevertheless, we additionally noticed that for duties that align completely with NumPy’s vectorised, pre-compiled C capabilities, NumPy can keep a slight edge over Mojo.
This investigation demonstrates that whereas Mojo is a robust new software for Python acceleration, attaining most efficiency requires a considerate strategy to minimising the “bridge-crossing” overhead between the 2 language runtimes.
As all the time, when contemplating efficiency enhancements to your code, check, check, check. That’s the closing arbiter as as to whether it’s worthwhile or not.