Smelt AI¶
LLM-powered structured data transformation.
Feed in rows of data, get back strictly typed Pydantic models — batched, concurrent, and validated.
from smelt import Model, Job
from pydantic import BaseModel
class Classification(BaseModel):
sector: str
sub_sector: str
is_public: bool
model = Model(provider="openai", name="gpt-4.1-mini")
job = Job(
prompt="Classify each company by industry sector and whether it's publicly traded.",
output_model=Classification,
batch_size=20,
concurrency=3,
)
result = job.run(model, data=[
{"name": "Apple", "desc": "Consumer electronics and software"},
{"name": "Stripe", "desc": "Payment processing platform"},
{"name": "Mayo Clinic", "desc": "Nonprofit medical center"},
])
for row in result.data:
print(row)
# Classification(sector='Technology', sub_sector='Consumer Electronics', is_public=True)
Features¶
- Structured output — define your schema with Pydantic, get back validated typed objects
- Batch processing — automatically splits data into batches for efficient LLM calls
- Concurrent execution — async semaphore-based concurrency, no threads or process pools
- Automatic retries — exponential backoff with jitter for transient failures
- Row ordering — results always match original input order, regardless of batch completion order
- Test mode — validate your setup with a single row before running the full dataset
- Provider agnostic — works with any LangChain-supported LLM (OpenAI, Anthropic, Google, etc.)
- Detailed metrics — token usage, timing, retry counts, and per-batch error tracking
- Flexible error handling — fail fast or collect errors, with partial results always available
Quick install¶
pip install smelt-ai[openai] # OpenAI models
pip install smelt-ai[anthropic] # Anthropic models
pip install smelt-ai[google] # Google Gemini models
Requires Python 3.10+.
How it works¶
list[dict] → Tag with row_id → Split into batches → Concurrent LLM calls → Validate → Reorder → SmeltResult[T]
- Your input rows get tagged with positional IDs for tracking
- Rows are split into batches of configurable size
- Batches run concurrently through the LLM with structured output
- Each response is validated (schema, row IDs, count)
- Results are reordered to match your original input order
- Everything is packaged into a typed
SmeltResultwith metrics
Learn more about the architecture →
Documentation¶
Getting Started¶
- Installation — set up smelt with your LLM provider
- Quickstart — build your first transformation in 5 minutes
Guide¶
- Architecture — how smelt works under the hood
- Batching & Concurrency — tune batch_size, concurrency, shuffle, and retries
- Writing Prompts — write prompts that produce consistent results
- Error Handling — strategies for handling failures
- Providers — provider setup, model recommendations, and cost comparison
Cookbook¶
- Classification — categorize data with fixed or open-ended labels
- Sentiment Analysis — extract sentiment, emotions, and opinions
- Data Extraction — parse structured fields from unstructured text
- Data Enrichment — add summaries, translations, and generated content
- Advanced Patterns — chaining, retries, pandas, large datasets
API Reference¶
- Model — LLM provider configuration
- Job — transformation definition and execution
- Results & Metrics — SmeltResult, SmeltMetrics, BatchError
- Errors — exception hierarchy