• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Sunday, September 14, 2025
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Artificial Intelligence

The right way to Import Pre-Annotated Information into Label Studio and Run the Full Stack with Docker

Admin by Admin
September 2, 2025
in Artificial Intelligence
0
Featured img.jpg
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

READ ALSO

Constructing Analysis Brokers for Tech Insights

5 Key Methods LLMs Can Supercharge Your Machine Studying Workflow


Dataset preparation for an object detection coaching workflow can take a very long time and sometimes be irritating. Label Studio, an open-source knowledge annotation instrument, can assist by offering a simple method to annotate datasets. It helps all kinds of annotation templates, together with pc imaginative and prescient, pure language processing, and audio or speech processing. Nonetheless, we’ll focus particularly on the article detection workflow.

However what if you wish to reap the benefits of pre-annotated open-source datasets, such because the Pascal VOC dataset? On this article, I’ll present you methods to simply import these duties into Label Studio’s format whereas establishing the complete stack — together with a PostgreSQL database, MinIO object storage, an Nginx reverse proxy, and the Label Studio backend. MinIO is an S3-compatible object storage service: you would possibly use cloud-native storage in manufacturing, however you can even run it domestically for growth and testing.

On this tutorial, we’ll undergo the next steps:

  1. Convert Pascal VOC annotations – remodel bounding containers from XML into Label Studio duties in JSON format.
  2. Run the complete stack – begin Label Studio with PostgreSQL, MinIO, Nginx, and the backend utilizing Docker Compose.
  3. Arrange a Label Studio undertaking – configure a brand new undertaking contained in the Label Studio interface.
  4. Add pictures and duties to MinIO – retailer your dataset in an S3-compatible bucket.
  5. Join MinIO to Label Studio – add the cloud storage bucket to your undertaking so Label Studio can fetch pictures and annotations straight.

Stipulations

To observe this tutorial, be sure to have:

From VOC to Label Studio: Getting ready Annotations

The Pascal VOC dataset has a folder construction the place the practice and take a look at datasets are already cut up. The Annotations folder incorporates the annotation recordsdata for every picture. In whole, the coaching set contains 17,125 pictures, every with a corresponding annotation file.

.
└── VOC2012
    ├── Annotations  # 17125 annotations
    ├── ImageSets 
    │   ├── Motion
    │   ├── Structure
    │   ├── Major
    │   └── Segmentation
    ├── JPEGImages  # 17125 pictures
    ├── SegmentationClass
    └── SegmentationObject

The XML snippet under, taken from one of many annotations, defines a bounding field round an object labeled “individual”. The field is specified utilizing 4 pixel coordinates: xmin, ymin, xmax, and ymax.

XML snippet from the Pascal VOC dataset (Picture by Writer)

The illustration under exhibits the inside rectangle because the annotated bounding field, outlined by the top-left nook (xmin, ymin) and the bottom-right nook (xmax, ymax), inside the outer rectangle representing the picture.

Pascal VOC bounding field coordinates in pixel format (Picture by Writer)

Label Studio expects every bounding field to be outlined by its width, peak, and top-left nook, expressed as percentages of the picture dimension. Under is a working instance of the transformed JSON format for the annotation proven above.

{
  "knowledge": {
    "picture": "s3:////2007_000027.jpg"
  },
  "annotations": [
    {
      "result": [
        {
          "from_name": "label",
          "to_name": "image",
          "type": "rectanglelabels",
          "value": {
            "x": 35.802,
            "y": 20.20,
            "width": 36.01,
            "height": 50.0,
            "rectanglelabels": ["person"]
          }
        }
      ]
    }
  ]
}

As you may see within the JSON format, you additionally must specify the situation of the picture file — for instance, a path in MinIO or an S3 bucket if you happen to’re utilizing cloud storage.

Whereas preprocessing the info, I merged the complete dataset, despite the fact that it was already divided into coaching and validation. This simulates a real-world state of affairs the place you sometimes start with a single dataset and carry out the splitting into coaching and validation units your self earlier than coaching.

Operating the Full Stack with Docker Compose

I merged the docker-compose.yml and docker-compose.minio.yml recordsdata right into a simplified single configuration so the complete stack can run on the identical community. Each recordsdata had been taken from the official Label Studio GitHub repository.



providers:
  nginx:
    # Acts as a reverse proxy for Label Studio frontend/backend
    picture: heartexlabs/label-studio:newest
    restart: unless-stopped
    ports:
      - "8080:8085" 
      - "8081:8086"
    depends_on:
      - app
    setting:
      - LABEL_STUDIO_HOST=${LABEL_STUDIO_HOST:-}
    
    volumes:
      - ./mydata:/label-studio/knowledge:rw # Shops Label Studio initiatives, configs, and uploaded recordsdata
    command: nginx

  app:
    stdin_open: true
    tty: true
    picture: heartexlabs/label-studio:newest
    restart: unless-stopped
    expose:
      - "8000"
    depends_on:
      - db
    setting:
      - DJANGO_DB=default
      - POSTGRE_NAME=postgres
      - POSTGRE_USER=postgres
      - POSTGRE_PASSWORD=
      - POSTGRE_PORT=5432
      - POSTGRE_HOST=db
      - LABEL_STUDIO_HOST=${LABEL_STUDIO_HOST:-}
      - JSON_LOG=1
    volumes:
      - ./mydata:/label-studio/knowledge:rw  # Shops Label Studio initiatives, configs, and uploaded recordsdata
    command: label-studio-uwsgi

  db:
    picture: pgautoupgrade/pgautoupgrade:13-alpine
    hostname: db
    restart: unless-stopped
    setting:
      - POSTGRES_HOST_AUTH_METHOD=belief
      - POSTGRES_USER=postgres
    volumes:
      - ${POSTGRES_DATA_DIR:-./postgres-data}:/var/lib/postgresql/knowledge  # Persistent storage for PostgreSQL database
  minio:
    picture: "minio/minio:${MINIO_VERSION:-RELEASE.2025-04-22T22-12-26Z}"
    command: server /knowledge --console-address ":9009"
    restart: unless-stopped
    ports:
      - "9000:9000"
      - "9009:9009"
    volumes:
      - minio-data:/knowledge   # Shops uploaded dataset objects (like pictures or JSON duties)
    # configure env vars in .env file or your methods setting
    setting:
      - MINIO_ROOT_USER=${MINIO_ROOT_USER:-minio_admin_do_not_use_in_production}
      - MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-minio_admin_do_not_use_in_production}
      - MINIO_PROMETHEUS_URL=${MINIO_PROMETHEUS_URL:-http://prometheus:9090}
      - MINIO_PROMETHEUS_AUTH_TYPE=${MINIO_PROMETHEUS_AUTH_TYPE:-public}
 
volumes:
  minio-data: # Named quantity for MinIO object storage

This simplified Docker Compose file defines 4 core providers with their quantity mappings:

App – runs the Label Studio backend itself.

  • Shares the mydata listing with Nginx, which shops initiatives, configurations, and uploaded recordsdata.
  • Makes use of a bind mount: ./mydata:/label-studio/knowledge:rw → maps a folder out of your host into the container.

Nginx – acts as a reverse proxy for the Label Studio frontend and backend.

  • Shares the mydata listing with the App service.

PostgreSQL (db) – manages metadata and undertaking info.

  • Shops persistent database recordsdata.
  • Makes use of a bind mount: ${POSTGRES_DATA_DIR:-./postgres-data}:/var/lib/postgresql/knowledge.

MinIO – an S3-compatible object storage service.

  • Shops dataset objects corresponding to pictures or JSON annotation duties.
  • Makes use of a named quantity: minio-data:/knowledge.

Once you mount host folders corresponding to ./mydata and ./postgres-data, you have to assign possession on the host to the identical consumer that runs contained in the container. Label Studio doesn’t run as root — it makes use of a non-root consumer with UID 1001. If the host directories are owned by a distinct consumer, the container received’t have write entry and also you’ll run into permission denied errors.

After creating these folders in your undertaking listing, you may regulate their possession with:

mkdir mydata 
mkdir postgres-data
sudo chown -R 1001:1001 ./mydata ./postgres-data

Now that the directories are ready, we will carry up the stack utilizing Docker Compose. Merely run:

docker compose up -d

It might take a couple of minutes to drag all of the required pictures from Docker Hub and arrange Label Studio. As soon as the setup is full, open http://localhost:8080 in your browser to entry the Label Studio interface. It’s good to create a brand new account, after which you may log in along with your credentials to entry the interface. You’ll be able to allow a legacy API token by going to Group → API Token Settings. This token permits you to talk with the Label Studio API, which is very helpful for automation duties.

Arrange a Label Studio undertaking

Now we will create our first knowledge annotation undertaking on Label Studio, particularly for an object detection workflow. However earlier than beginning to annotate your pictures, you have to outline the varieties of lessons to select from. Within the Pascal VOC dataset, there are 20 varieties of pre-annotated objects.

XML-style labeling setup (Picture by Writer)

Add pictures and duties to MinIO

You’ll be able to open the MinIO consumer interface in your browser at localhost:9000, after which log in utilizing the credentials you specified beneath the related service within the docker-compose.yml file.

I created a bucket with folders, one among which is used for storing pictures and one other for JSON duties formatted in response to the directions above.

Screenshot of an instance bucket in MinIO (Picture by Writer)

We arrange an S3-like service domestically that permits us to simulate S3 cloud storage with out incurring any prices. If you wish to switch recordsdata to an S3 bucket on AWS, it’s higher to do that straight over the web, contemplating the info switch prices. The excellent news is you can additionally work together along with your MinIO bucket utilizing the AWS CLI. To do that, you have to add a profile in ~/.aws/config and supply the corresponding credentials in ~/.aws/credentials beneath the identical profile title.

After which, you may simply sync along with your native folder utilizing the next instructions:

#!/bin/bash
set -e

PROFILE=
MINIO_ENDPOINT=   # e.g. http://localhost:9000
BUCKET_NAME=
SOURCE_DIR=    
DEST_DIR= 

aws s3 sync 
      --endpoint-url "$MINIO_ENDPOINT" 
      --no-verify-ssl 
      --profile "$PROFILE" 
      "$SOURCE_DIR" "s3://$BUCKET_NAME/$DEST_DIR"

 

Join MinIO to Label Studio

In spite of everything the info, together with the pictures and annotations, has been uploaded, we will transfer on to including cloud storage to the undertaking we created within the earlier step.

Out of your undertaking settings, go to Cloud Storage and add the required parameters, such because the endpoint (which factors to the service title within the Docker stack together with the port quantity, e.g., minio:9000), the bucket title, and the related prefix the place the annotation recordsdata are saved. Every path contained in the JSON recordsdata will then level to the corresponding picture.

Screenshot of the Cloud Storage settings (Picture by Writer)

After verifying that the connection is working, you may sync your undertaking with the cloud storage. You might must run the sync command a number of occasions because the dataset incorporates 22,263 pictures. It might seem to fail at first, however if you restart the sync, it continues to make progress. Ultimately, all of the Pascal VOC knowledge shall be efficiently imported into Label Studio.

Screenshot of the duty listing (Picture by Writer)

You’ll be able to see the imported duties with their thumbnail pictures within the activity listing. Once you click on on a activity, the picture will seem with its pre-annotations.

Screenshot of a picture with bounding containers (Picture by Writer)

Conclusions

On this tutorial, we demonstrated methods to import the Pascal VOC dataset into Label Studio by changing XML annotations into Label Studio’s JSON format, working a full stack with Docker Compose, and connecting MinIO as S3-compatible storage. This setup lets you work with large-scale, pre-annotated datasets in a reproducible and cost-effective approach, all in your native machine. Testing your undertaking settings and file codecs domestically first will guarantee a smoother transition when transferring to cloud environments.

I hope this tutorial helps you kickstart your knowledge annotation undertaking with pre-annotated knowledge you can simply increase or validate. As soon as your dataset is prepared for coaching, you may export all of the duties in in style codecs corresponding to COCO or YOLO.

Tags: DataDockerFullImportlabelPreAnnotatedrunStackStudio

Related Posts

A 1.webp.webp
Artificial Intelligence

Constructing Analysis Brokers for Tech Insights

September 14, 2025
Mlm ipc supercharge your workflows llms 1024x683.png
Artificial Intelligence

5 Key Methods LLMs Can Supercharge Your Machine Studying Workflow

September 13, 2025
Ida.png
Artificial Intelligence

Generalists Can Additionally Dig Deep

September 13, 2025
Mlm speed up improve xgboost models 1024x683.png
Artificial Intelligence

3 Methods to Velocity Up and Enhance Your XGBoost Fashions

September 13, 2025
1 m5pq1ptepkzgsm4uktp8q.png
Artificial Intelligence

Docling: The Doc Alchemist | In direction of Knowledge Science

September 12, 2025
Mlm ipc small llms future agentic ai 1024x683.png
Artificial Intelligence

Small Language Fashions are the Way forward for Agentic AI

September 12, 2025
Next Post
Trm labs launches beacon network to stop crypto crime.webp.webp

Binance and TRM Labs Roll Out Beacon Crime Community

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025
Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
1da3lz S3h Cujupuolbtvw.png

Scaling Statistics: Incremental Customary Deviation in SQL with dbt | by Yuval Gorchover | Jan, 2025

January 2, 2025
0khns0 Djocjfzxyr.jpeg

Constructing Data Graphs with LLM Graph Transformer | by Tomaz Bratanic | Nov, 2024

November 5, 2024
How To Maintain Data Quality In The Supply Chain Feature.jpg

Find out how to Preserve Knowledge High quality within the Provide Chain

September 8, 2024

EDITOR'S PICK

A beginners guide to mastering gemini google sheets 1.png

A Newbie’s Information to Mastering Gemini + Google Sheets

June 30, 2025
Stotts terry 150105 scaled 1.jpg

What Statistics Can Inform Us About NBA Coaches

May 22, 2025
1qv7ftzi8rjyor4kztokpbw.png

LangChain’s Father or mother Doc Retriever — Revisited | by Omri Eliyahu Levy

November 22, 2024
Arkham Exchange Announces Xrp Listing Today Price To Surge.webp.webp

Ripple’s XRP to be Listed on Arkham Trade, XRP Surge to $3?

December 13, 2024

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • Constructing Analysis Brokers for Tech Insights
  • Unleashing Energy: NVIDIA L40S Knowledge Heart GPU by PNY
  • 5 Key Methods LLMs Can Supercharge Your Machine Studying Workflow
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?