Smelt AI¶

LLM-powered structured data transformation.

Feed in rows of data, get back strictly typed Pydantic models — batched, concurrent, and validated.

from smelt import Model, Job
from pydantic import BaseModel

class Classification(BaseModel):
    sector: str
    sub_sector: str
    is_public: bool

model = Model(provider="openai", name="gpt-4.1-mini")
job = Job(
    prompt="Classify each company by industry sector and whether it's publicly traded.",
    output_model=Classification,
    batch_size=20,
    concurrency=3,
)

result = job.run(model, data=[
    {"name": "Apple", "desc": "Consumer electronics and software"},
    {"name": "Stripe", "desc": "Payment processing platform"},
    {"name": "Mayo Clinic", "desc": "Nonprofit medical center"},
])

for row in result.data:
    print(row)
    # Classification(sector='Technology', sub_sector='Consumer Electronics', is_public=True)

Features¶

Structured output — define your schema with Pydantic, get back validated typed objects
Batch processing — automatically splits data into batches for efficient LLM calls
Concurrent execution — async semaphore-based concurrency, no threads or process pools
Automatic retries — exponential backoff with jitter for transient failures
Row ordering — results always match original input order, regardless of batch completion order
Test mode — validate your setup with a single row before running the full dataset
Provider agnostic — works with any LangChain-supported LLM (OpenAI, Anthropic, Google, etc.)
Detailed metrics — token usage, timing, retry counts, and per-batch error tracking
Flexible error handling — fail fast or collect errors, with partial results always available

Quick install¶

pip install smelt-ai[openai]      # OpenAI models
pip install smelt-ai[anthropic]   # Anthropic models
pip install smelt-ai[google]      # Google Gemini models

Requires Python 3.10+.

How it works¶

list[dict]  →  Tag with row_id  →  Split into batches  →  Concurrent LLM calls  →  Validate  →  Reorder  →  SmeltResult[T]

Your input rows get tagged with positional IDs for tracking
Rows are split into batches of configurable size
Batches run concurrently through the LLM with structured output
Each response is validated (schema, row IDs, count)
Results are reordered to match your original input order
Everything is packaged into a typed SmeltResult with metrics

Learn more about the architecture →

Documentation¶

Getting Started¶

Installation — set up smelt with your LLM provider
Quickstart — build your first transformation in 5 minutes

Guide¶

Architecture — how smelt works under the hood
Batching & Concurrency — tune batch_size, concurrency, shuffle, and retries
Writing Prompts — write prompts that produce consistent results
Error Handling — strategies for handling failures
Providers — provider setup, model recommendations, and cost comparison

Cookbook¶

Classification — categorize data with fixed or open-ended labels
Sentiment Analysis — extract sentiment, emotions, and opinions
Data Extraction — parse structured fields from unstructured text
Data Enrichment — add summaries, translations, and generated content
Advanced Patterns — chaining, retries, pandas, large datasets

API Reference¶

Model — LLM provider configuration
Job — transformation definition and execution
Results & Metrics — SmeltResult, SmeltMetrics, BatchError
Errors — exception hierarchy

Smelt AI¶

Features¶

Quick install¶

How it works¶

Documentation¶

Getting Started¶

Guide¶

Cookbook¶

API Reference¶

Changelog¶

Changelog ¶