Skip to content

Smelt AI

LLM-powered structured data transformation.

Feed in rows of data, get back strictly typed Pydantic models — batched, concurrent, and validated.

from smelt import Model, Job
from pydantic import BaseModel

class Classification(BaseModel):
    sector: str
    sub_sector: str
    is_public: bool

model = Model(provider="openai", name="gpt-4.1-mini")
job = Job(
    prompt="Classify each company by industry sector and whether it's publicly traded.",
    output_model=Classification,
    batch_size=20,
    concurrency=3,
)

result = job.run(model, data=[
    {"name": "Apple", "desc": "Consumer electronics and software"},
    {"name": "Stripe", "desc": "Payment processing platform"},
    {"name": "Mayo Clinic", "desc": "Nonprofit medical center"},
])

for row in result.data:
    print(row)
    # Classification(sector='Technology', sub_sector='Consumer Electronics', is_public=True)

Features

  • Structured output — define your schema with Pydantic, get back validated typed objects
  • Batch processing — automatically splits data into batches for efficient LLM calls
  • Concurrent execution — async semaphore-based concurrency, no threads or process pools
  • Automatic retries — exponential backoff with jitter for transient failures
  • Row ordering — results always match original input order, regardless of batch completion order
  • Test mode — validate your setup with a single row before running the full dataset
  • Provider agnostic — works with any LangChain-supported LLM (OpenAI, Anthropic, Google, etc.)
  • Detailed metrics — token usage, timing, retry counts, and per-batch error tracking
  • Flexible error handling — fail fast or collect errors, with partial results always available

Quick install

pip install smelt-ai[openai]      # OpenAI models
pip install smelt-ai[anthropic]   # Anthropic models
pip install smelt-ai[google]      # Google Gemini models

Requires Python 3.10+.

How it works

list[dict]  →  Tag with row_id  →  Split into batches  →  Concurrent LLM calls  →  Validate  →  Reorder  →  SmeltResult[T]
  1. Your input rows get tagged with positional IDs for tracking
  2. Rows are split into batches of configurable size
  3. Batches run concurrently through the LLM with structured output
  4. Each response is validated (schema, row IDs, count)
  5. Results are reordered to match your original input order
  6. Everything is packaged into a typed SmeltResult with metrics

Learn more about the architecture →

Documentation

Getting Started

  • Installation — set up smelt with your LLM provider
  • Quickstart — build your first transformation in 5 minutes

Guide

Cookbook

API Reference

  • Model — LLM provider configuration
  • Job — transformation definition and execution
  • Results & Metrics — SmeltResult, SmeltMetrics, BatchError
  • Errors — exception hierarchy

Changelog