• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Saturday, September 13, 2025
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Artificial Intelligence

Use OpenAI Whisper for Automated Transcriptions

Admin by Admin
June 26, 2025
in Artificial Intelligence
0
Image 52 1024x683.png
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

READ ALSO

Generalists Can Additionally Dig Deep

3 Methods to Velocity Up and Enhance Your XGBoost Fashions


growth currently with giant language fashions (LLMs). A whole lot of the main focus is on the question-answering you are able to do with each pure text-based fashions, or vision-language fashions (VLMs), the place you too can enter photos.

Nevertheless, there’s one other dimension that has developed a ton over the previous couple of years: Audio. Fashions that may each transcribe (speech -> textual content), speech synthesis (textual content -> speech), and likewise speech-to-speech, the place you will have an entire dialog with a language mannequin, with audio going each out and in.

The arcitecture and and coaching pipeline for OpenAI’s Whisper mannequin. Picture from OpenAI Whisper GitHub repository with MIT license.

On this article, I’ll focus on how I’m using the event throughout the audio mannequin area to my benefit, changing into an much more environment friendly programmer.

That is an instance video of me utilizing the transcription device. I first choose the immediate subject in Cursor and use my hotkey to activate the microphone, which is indicated by the orange icon within the high left. I then communicate out the sentence I need to transcribe, and it rapidly seems within the immediate window with out me having to sort on the keyboard in any respect. This can be a extra environment friendly option to sort lengthy English prompts into your editor. Video by the creator.

Motivation

My major motivation for writing this text is that I’m frequently in search of methods to develop into a extra environment friendly programmer. After utilizing the ChatGPT cell app for some time, I found their transcription possibility (the microphone icon to the fitting within the consumer enter subject). I used the transcription and rapidly realized how a lot better this transcription is in comparison with others I’ve used earlier than, reminiscent of Apple’s built-in iPhone transcription.

OpenAI’s transcription virtually at all times captures all of my phrases, with only a few errors. Even when I exploit much less frequent phrases, for instance, acronyms associated to laptop science, it’s nonetheless capable of choose up what I’m saying.

The transcription icon from the OpenAI utility. Picture by the creator, taken from OpenAI’s ChatGPT.

This transcription was solely obtainable within the ChatGPT app. Nevertheless, I do know that OpenAI has an API endpoint for his or her Whisper mannequin, which is (presumably) the identical mannequin they’re utilizing to transcribe textual content within the app. I thus wished to set this mannequin up on my Mac to be obtainable through a shortcut.

(I do know there are apps reminiscent of Macwhisper obtainable, however I wished to develop a very free answer, aside from the prices of the API calls themselves)

Conditions

  • Alfred (I will likely be utilizing Alfred on the Mac to set off some scripts. Nevertheless, options to this additionally exist. On the whole, you want a option to set off scripts in your Mac / PC from a hotkey.

Professionals

The principle benefit of utilizing this transcription is you can enter phrases into your laptop extra rapidly. After I sort as rapidly as I can on my laptop, I’m not even capable of attain 100 phrases per minute, and if I’m to sort at that pace, I actually must focus. Nevertheless, the typical speaking pace is at a minimal of 110, based on this article.

This implies you is usually a lot more practical if you’ll be able to communicate your phrases with transcription, as a substitute of typing them out on the keyboard.

I believe that is particularly related after the rise of huge language fashions reminiscent of ChatGPT. You spend extra time prompting the language fashions, for instance, asking inquiries to ChatGPT, or prompting the cursor to implement a function, or fixing a bug. Thus, using the English language is far more prevalent now than earlier than, in comparison with using programming languages reminiscent of Python instantly.

Word: After all, you’ll nonetheless be writing quite a lot of code, however from expertise, I spend much more time prompting the cursor, for instance, with intensive English prompts, by which case, utilizing this transcription saves me quite a lot of time.

Cons

There can, nevertheless, be some downsides to utilizing the transcription as nicely. One of many foremost ones is that quite a lot of instances, you do not need to talk out loud when programming. You is perhaps sitting within the airport (as I’m when writing this text), and even in your workplace. While you’re in these eventualities, you most likely don’t need to disturb these round you by talking out loud. Nevertheless, if you’re sitting in a house workplace, that is naturally not an issue.

One other detrimental aspect is that smaller prompts won’t be that a lot sooner. Think about this: if you happen to simply need to write a immediate of a single sentence, it should, in lots of eventualities, be sooner simply to sort the immediate out by hand. That is due to the delay in beginning, stopping, and transcribing audio into textual content. Sending the API name takes a little bit little bit of time, and the shorter the immediate you will have, the bigger fraction of the time you need to spend ready for the response.

Tips on how to implement

You may see the code I used on this article on my GitHub. Nevertheless, you additionally want so as to add hotkeys to run the scripts.

First, you need to:

  • Clone the GitHub repository:
git clone https://github.com/EivindKjosbakken/whisper-shortcut.git
  • Create a digital setting known as .venv and set up the required packages:
python3 -m venv .venv
supply .venv/bin/activate
pip set up -r necessities.txt
  • Get an OpenAI API Key. You are able to do that by:
    • Going to the OpenAI API Overview, logging in/making a profile
    • Go to your profile, and API Keys
    • Create a brand new key. Keep in mind to repeat the important thing, as you will be unable to see it once more

The scripts from the GitHub repository work by:

  • start_recording.sh — begins recording your voice. The primary time you employ this, it should ask you for permission to make use of the microphone
  • stop_recording.sh — sends a cease sign to the script to cease recording. Then sends the recorded audio to OpenAI for transcription. Moreover, it provides the transcribed textual content to your clipboard and pastes the textual content you probably have a textual content subject in your PC chosen

The complete repository is offered with an MIT license.

Alfred

You’ll find the Alfred workflow on the GitHub repository right here: Transcribe.alfredworkflow.

That is how I arrange the Alfred workflow:

My Alfred workflow. I’ve two hotkeys, one to begin the transcription (document voice), and one to cease transcription (cease recording, and ship the audio to the OpenAI Whisper API for transcription). The choice + Q command runs the start_recording.sh script, and the choice + W run the stop_recording.sh script. You may, after all, change the hotkeys for these instructions. Picture by the creator.

You may merely obtain it and add it to your Alfred.

Additionally, keep in mind to have a terminal window open everytime you need to run this script, as you activate the Python script from the terminal. I needed to do it this manner as a result of if the script was activated instantly from Alfred, I received permission points. The primary time you run the script, you need to be prompted to provide your terminal entry to the microphone, which you need to approve.

Value

An essential consideration when utilizing APIs reminiscent of OpenAI Whisper is the price of the API utilization. I’d contemplate the price of utilizing OpenAI’s Whisper mannequin reasonably excessive. As at all times, the associated fee is totally depending on how a lot you employ the mannequin. I’d say I exploit the mannequin as much as 25 instances a day, as much as 150 phrases, and the associated fee is lower than 1 greenback per day.

This implies, nevertheless, that if you happen to use the mannequin so much, you’ll be able to see prices as much as 30 {dollars} monthly, which is certainly a considerable price. Nevertheless, I believe it’s essential to be aware of the time financial savings you will have from the mannequin. If every mannequin utilization saves you 30 seconds, and you employ it 20 instances per day, you will have simply saved ten minutes of your day. Personally, I’m prepared to pay one greenback to avoid wasting ten minutes of my day, performing a activity (writing on my keyboard), that doesn’t actually grant me every other profit. If any, utilizing your keyboard might contribute to the next threat of accidents reminiscent of carpal tunnel syndrome. Utilizing the mannequin is thus undoubtedly price it for me.

Conclusion

On this article, I began off discussing the immense advances inside language fashions in the previous couple of years. This has helped us create highly effective chatbots, saving us monumental quantities of time. Nevertheless, with the advances of language fashions, we’ve additionally seen advances in voice fashions. Transcription utilizing OpenAI Whisper is now close to excellent (from private expertise), which makes it a strong device you should utilize to enter phrases in your laptop extra successfully. I mentioned the professionals and cons of utilizing OpenAI Whisper in your PC, and I additionally went step-by-step by way of how one can implement it by yourself laptop.

Tags: automatedOpenAiTranscriptionsWhisper

Related Posts

Ida.png
Artificial Intelligence

Generalists Can Additionally Dig Deep

September 13, 2025
Mlm speed up improve xgboost models 1024x683.png
Artificial Intelligence

3 Methods to Velocity Up and Enhance Your XGBoost Fashions

September 13, 2025
1 m5pq1ptepkzgsm4uktp8q.png
Artificial Intelligence

Docling: The Doc Alchemist | In direction of Knowledge Science

September 12, 2025
Mlm ipc small llms future agentic ai 1024x683.png
Artificial Intelligence

Small Language Fashions are the Way forward for Agentic AI

September 12, 2025
Untitled 2.png
Artificial Intelligence

Why Context Is the New Forex in AI: From RAG to Context Engineering

September 12, 2025
Mlm ipc gentle introduction batch normalization 1024x683.png
Artificial Intelligence

A Light Introduction to Batch Normalization

September 11, 2025
Next Post
Vast logo 2 1 0124.png

Voltage Park Companions with VAST Information

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025
Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
1da3lz S3h Cujupuolbtvw.png

Scaling Statistics: Incremental Customary Deviation in SQL with dbt | by Yuval Gorchover | Jan, 2025

January 2, 2025
0khns0 Djocjfzxyr.jpeg

Constructing Data Graphs with LLM Graph Transformer | by Tomaz Bratanic | Nov, 2024

November 5, 2024
How To Maintain Data Quality In The Supply Chain Feature.jpg

Find out how to Preserve Knowledge High quality within the Provide Chain

September 8, 2024

EDITOR'S PICK

0yehjgg71i Nlougq.png

Run and Serve Quicker VLMs Like Pixtral and Phi-3.5 Imaginative and prescient with vLLM

September 23, 2024
09k r1hrss9xineth.jpeg

Information Scaling 101: Standardization and Min-Max Scaling Defined | by Haden Pelletier | Aug, 2024

August 11, 2024
Ai Shutterstock 2285020313 Special.png

RAND AI Governance Sequence: How U.S. policymakers can study from the EU AI Act

August 25, 2024
0kkbg W7oalsleney.jpeg

Introduction to TensorFlow’s Practical API | by Javier Martínez Ojeda | Dec, 2024

December 18, 2024

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • Generalists Can Additionally Dig Deep
  • If we use AI to do our work – what’s our job, then?
  • ‘Sturdy Likelihood’ Of US Forming Strategic Bitcoin Reserve In 2025
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?