• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Saturday, November 22, 2025
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Machine Learning

Translating a Memoir: A Technical Journey | by Valeria Cortez | Dec, 2024

Admin by Admin
December 12, 2024
in Machine Learning
0
1tdfchl6k8wygu9fgiwhnww.png
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

READ ALSO

Fashionable DataFrames in Python: A Fingers-On Tutorial with Polars and DuckDB

How Relevance Fashions Foreshadowed Transformers for NLP


Leveraging GPT-3.5 and unstructured APIs for translations

Valeria Cortez

Towards Data Science

This weblog publish particulars how I utilised GPT to translate the non-public memoir of a household pal, making it accessible to a broader viewers. Particularly, I employed GPT-3.5 for translation and Unstructured’s APIs for environment friendly content material extraction and formatting.

The memoir, a heartfelt account by my household pal Carmen Rosa, chronicles her upbringing in Bolivia and her romantic journey in Paris with an Iranian man in the course of the vibrant Nineteen Seventies. Initially written in Spanish, we aimed to protect the essence of her narrative whereas increasing its attain to English-speaking readers by the applying of LLM applied sciences.

Cover image of “Un Destino Sorprendente”, used with permission of author Carmen Rosa Wichtendahl.
Cowl picture of “Un Destino Sorprendente”, used with permission of creator Carmen Rosa Wichtendahl.

Under you may learn the interpretation course of in additional element or you may entry right here the Colab Pocket book.

I adopted the subsequent steps for the interpretation of the guide:

  1. Import Guide Knowledge: I imported the guide from a Docx doc utilizing the Unstructured API and divided it into chapters and paragraphs.
  2. Translation Method: I translated every chapter utilizing GPT-3.5. For every paragraph, I supplied the most recent three translated sentences (if obtainable) from the identical chapter. This strategy served two functions:
  • Model Consistency: Sustaining a constant model all through the interpretation by offering context from earlier translations.
  • Token Restrict: Limiting the variety of tokens processed without delay to keep away from exceeding the mannequin’s context restrict.

3. Exporting translation as Docx: I used Unstructured’s API as soon as once more to save lots of the translated content material in Docx format.

1. Libraries

We’ ll begin with the set up and import of the required libraries.

pip set up --upgrade openai 
pip set up python-dotenv
pip set up unstructured
pip set up python-docx
import openai

# Unstructured
from unstructured.partition.docx import partition_docx
from unstructured.cleaners.core import group_broken_paragraphs

# Knowledge and different libraries
import pandas as pd
import re
from typing import Checklist, Dict
import os
from dotenv import load_dotenv

2. Connecting to OpenAI’s API

The code under units up the OpenAI API key to be used in a Python venture. It’s essential save your API key in an .env file.

import openai

# Specify the trail to the .env file
dotenv_path = '/content material/.env'

_ = load_dotenv(dotenv_path) # learn native .env file
openai.api_key = os.environ['OPENAI_API_KEY']

3. Loading the guide

The code permits us to import the guide in Docx format and divide it into particular person paragraphs.

parts = partition_docx(
filename="/content material/libro.docx",
paragraph_grouper=group_broken_paragraphs
)

The code under returns the paragraph within the tenth index of parts.

print(parts[10])

# Returns: Destino sorprendente, es el título que la autora le puso ...

4. Group guide into titles and chapters

The subsequent step includes creating an inventory of chapters. Every chapter will probably be represented as a dictionary containing a title and an inventory of paragraphs. This construction simplifies the method of translating every chapter and paragraph individually. Right here’s an instance of this format:

[
{"title": title 1, "content": [paragraph 1, paragraph 2, ..., paragraph n]},
{"title": title 2, "content material": [paragraph 1, paragraph 2, ..., paragraph n]},
...
{"title": title n, "content material": [paragraph 1, paragraph 2, ..., paragraph n]},
]

To attain this, we’ll create a perform referred to as group_by_chapter. Listed here are the important thing steps concerned:

  1. Extract Related Data: We are able to get every narrative textual content and title by calling aspect.class. These are the one classes we’re desirous about translating at this level.
  2. Determine Narrative Titles: We recognise that some titles needs to be a part of the narrative textual content. To account for this, we assume that italicised titles belong to the narrative paragraph.
def group_by_chapter(parts: Checklist) -> Checklist[Dict]:
chapters = []
current_title = None

for aspect in parts:

text_style = aspect.metadata.emphasized_text_tags # checks whether it is 'b' or 'i' and returns listing
unique_text_style = listing(set(text_style)) if text_style just isn't None else None

# we think about a component a title if it's a title class and the model is daring
is_title = (aspect.class == "Title") & (unique_text_style == ['b'])

# we think about a component a story content material if it's a narrative textual content class or
# if it's a title class, however it's italic or italic and daring
is_narrative = (aspect.class == "NarrativeText") | (
((aspect.class == "Title") & (unique_text_style is None)) |
((aspect.class == "Title") & (unique_text_style == ['i'])) |
((aspect.class == "Title") & (unique_text_style == ['b', 'i']))
)

# for brand spanking new titles
if is_title:
print(f"Including title {aspect.textual content}")

# Add earlier chapter when a brand new one is available in, except present title is None
if current_title just isn't None:
chapters.append(current_chapter)

current_title = aspect.textual content
current_chapter = {"title": current_title, "content material": []}

elif is_narrative:
print(f"Including Narrative {aspect.textual content}")
current_chapter["content"].append(aspect.textual content)

else:
print(f'### No have to convert. Aspect sort: {aspect.class}')

return chapters

Within the instance under, we are able to see an instance:

book_chapters[2] 

# Returns
{'title': 'Proemio',
'content material': [
'La autobiografía es considerada ...',
'Dentro de las artes literarias, ...',
'Se encuentra más próxima a los, ...',
]
}

5. Guide translation

To translate the guide, we observe these steps:

  1. Translate Chapter Titles: We translate the title of every chapter.
  2. Translate Paragraphs: We translate every paragraph, offering the mannequin with the most recent three translated sentences as context.
  3. Save Translations: We save each the translated titles and content material.

The perform under automates this course of.

def translate_book(book_chapters: Checklist[Dict]) -> Dict:
translated_book = []
for chapter in book_chapters:
print(f"Translating following chapter: {chapter['title']}.")
translated_title = translate_title(chapter['title'])
translated_chapter_content = translate_chapter(chapter['content'])
translated_book.append({
"title": translated_title,
"content material": translated_chapter_content
})
return translated_book

For the title, we ask GPT a easy translation as follows:

def translate_title(title: str) -> str:
response = shopper.chat.completions.create(
mannequin="gpt-3.5-turbo",
messages= [{
"role": "system",
"content": f"Translate the following book title into English:n{title}"
}]
)
return response.decisions[0].message.content material

To translate a single chapter, we offer the mannequin with the corresponding paragraphs. We instruct the mannequin as follows:

  1. Determine the function: We inform the mannequin that it’s a useful translator for a guide.
  2. Present context: We share the most recent three translated sentences from the chapter.
  3. Request translation: We ask the mannequin to translate the subsequent paragraph.

Throughout this course of, the perform combines all translated paragraphs right into a single string.

# Operate to translate a chapter utilizing OpenAI API
def translate_chapter(chapter_paragraphs: Checklist[str]) -> str:
translated_content = ""

for i, paragraph in enumerate(chapter_paragraphs):

print(f"Translating paragraph {i + 1} out of {len(chapter_paragraphs)}")

# Builds the message dynamically based mostly on whether or not there may be earlier translated content material
messages = [{
"role": "system",
"content": "You are a helpful translator for a book."
}]

if translated_content:
latest_content = get_last_three_sentences(translated_content)
messages.append(
{
"function": "system",
"content material": f"That is the most recent textual content from the guide that you have translated from Spanish into English:n{latest_content}"
}
)

# Provides the person message for the present paragraph
messages.append(
{
"function": "person",
"content material": f"Translate the next textual content from the guide into English:n{paragraph}"
}
)

# Calls the API
response = shopper.chat.completions.create(
mannequin="gpt-3.5-turbo",
messages=messages
)

# Extracts the translated content material and appends it
paragraph_translation = response.decisions[0].message.content material
translated_content += paragraph_translation + 'nn'

return translated_content

Lastly, under we are able to see the supporting perform to get the most recent three sentences.

def get_last_three_sentences(paragraph: str) -> str:
# Use regex to separate the textual content into sentences
sentences = re.break up(r'(?

# Get the final three sentences (or fewer if the paragraph has lower than 3 sentences)
last_three = sentences[-3:]

# Be part of the sentences right into a single string
return ' '.be part of(last_three)

6. Guide export

Lastly, we go the dictionary of chapters to a perform that provides every title as a heading and every content material as a paragraph. After every paragraph, a web page break is added to separate the chapters. The ensuing doc is then saved domestically as a Docx file.

from docx import Doc

def create_docx_from_chapters(chapters: Dict, output_filename: str) -> None:
doc = Doc()

for chapter in chapters:
# Add chapter title as Heading 1
doc.add_heading(chapter['title'], degree=1)

# Add chapter content material as regular textual content
doc.add_paragraph(chapter['content'])

# Add a web page break after every chapter
doc.add_page_break()

# Save the doc
doc.save(output_filename)

Whereas utilizing GPT and APIs for translation is quick and environment friendly, there are key limitations in comparison with human translation:

  • Pronoun and Reference Errors: GPT did misread pronouns or references in few instances, doubtlessly attributing actions or statements to the fallacious particular person within the narrative. A human translator can higher resolve such ambiguities.
  • Cultural Context: GPT missed refined cultural references and idioms {that a} human translator may interpret extra precisely. On this case, a number of slang phrases distinctive to Santa Cruz, Bolivia, had been retained within the unique language with out extra context or clarification.

Combining AI with human overview can steadiness velocity and high quality, making certain translations are each correct and genuine.

This venture demonstrates an strategy to translating a guide utilizing a mix of GPT-3 and Unstructured APIs. By automating the interpretation course of, we considerably decreased the handbook effort required. Whereas the preliminary translation output might require some minor human revisions to refine the nuances and make sure the highest high quality, this strategy serves as a robust basis for environment friendly and efficient guide translation

If in case you have any suggestions or ideas on learn how to enhance this course of or the standard of the translations, please be at liberty to share them within the feedback under.

Tags: CortezDecJourneyMemoirTechnicalTranslatingValeria

Related Posts

Rene bohmer yeuvdkzwsz4 unsplash scaled 1.jpg
Machine Learning

Fashionable DataFrames in Python: A Fingers-On Tutorial with Polars and DuckDB

November 21, 2025
Screenshot 2025 11 18 at 18.28.22 4.jpg
Machine Learning

How Relevance Fashions Foreshadowed Transformers for NLP

November 20, 2025
Image 155.png
Machine Learning

How Deep Characteristic Embeddings and Euclidean Similarity Energy Automated Plant Leaf Recognition

November 19, 2025
Stockcake vintage computer programming 1763145811.jpg
Machine Learning

Javascript Fatigue: HTMX Is All You Must Construct ChatGPT — Half 2

November 18, 2025
Gemini generated image 7tgk1y7tgk1y7tgk 1.jpg
Machine Learning

Cease Worrying about AGI: The Quick Hazard is Decreased Basic Intelligence (RGI)

November 17, 2025
Mlm chugani 10 python one liners calculating model feature importance feature 1024x683.png
Machine Learning

10 Python One-Liners for Calculating Mannequin Characteristic Significance

November 16, 2025
Next Post
Ripple Whale.jpg

Ripple Whale Deposits to Binance Attain 6-Month Excessive, Over 2.66 Billion XRP Transferred

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
Blog.png

XMN is accessible for buying and selling!

October 10, 2025
0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025
Holdinghands.png

What My GPT Stylist Taught Me About Prompting Higher

May 10, 2025
1da3lz S3h Cujupuolbtvw.png

Scaling Statistics: Incremental Customary Deviation in SQL with dbt | by Yuval Gorchover | Jan, 2025

January 2, 2025

EDITOR'S PICK

Solana Blockchain.jpg

Simulation Recreation ‘Contaminated’ Leaves Base for Solana Over Transaction Bottlenecks

April 6, 2025
Premium Photo 1675278880650 9bb1ffb11983 225x300 1.jpg

I Train Knowledge Viz with a Bag of Rocks

May 20, 2025
0r A6uq057ayful3i.jpeg

Linked Lists — Knowledge Buildings & Algorithms for Knowledge Scientists | by Egor Howell | Oct, 2024

October 21, 2024
1721853281 generativeai shutterstock 2313909647 special.jpg

MIT Information: Assess a Basic-purpose AI Mannequin’s Reliability Earlier than It’s Deployed

July 24, 2024

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • Binance Japan Unleashes PayPay Cash Integration, Enabling 24/7 Crypto Buys From Simply ¥1,000
  • Empirical Mode Decomposition: The Most Intuitive Technique to Decompose Advanced Alerts and Time Sequence
  • Git for Vibe Coders – KDnuggets
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?