Writing Effective Prompts¶

Your prompt is the most important factor in output quality. This page covers how smelt uses your prompt and how to write prompts that produce consistent, high-quality results.

How smelt uses your prompt¶

Your prompt string is embedded in a system message that smelt constructs:

You are a structured data transformation assistant.

## Task
{your prompt here}          ← This is what you control

## Rules
- You will receive a JSON array of objects. Each object has a "row_id" field.
- For EACH input object, produce exactly one output object.
- Every output object MUST include the same "row_id" as the corresponding input object.
- Do NOT skip, merge, duplicate, or reorder rows.
- Return ALL rows — the count of output rows must equal the count of input rows.

## Output Schema
Each output row must conform to this schema:
- row_id (int, required)
- sector (str, required) — Primary industry sector
- sub_sector (str, required) — More specific sub-sector
- is_public (bool, required) — Whether the company is publicly traded

Return your response as a JSON object with a single "rows" key...

Smelt automatically handles:

Row-level instructions (row_id tracking, count matching)
Schema description (generated from your Pydantic model's field names, types, and descriptions)
Output format (JSON structure)

You only need to describe what transformation to apply.

Prompt anatomy¶

A good prompt has three parts:

1. Task description¶

What to do with each row:

# Bad — vague
prompt = "Classify the companies"

# Good — specific
prompt = "Classify each company by its primary industry sector and sub-sector"

# Better — specific + criteria
prompt = (
    "Classify each company by its primary GICS industry sector and sub-sector. "
    "Determine if the company is publicly traded on a major stock exchange "
    "(NYSE, NASDAQ, LSE, or equivalent)."
)

2. Rules and constraints¶

Business logic and edge cases:

prompt = (
    "Analyze the sentiment of each product review. "
    "Rules: "
    "- Score must be between 0.0 (most negative) and 1.0 (most positive). "
    "- 'mixed' sentiment means both positive and negative aspects are present. "
    "- 'neutral' means neither positive nor negative. "
    "- Extract 1-3 key themes, not more."
)

3. Context and calibration¶

Help the LLM understand your domain:

prompt = (
    "Create a concise structured summary for each company. "
    "Calculate the approximate age based on the founded year (current year is 2026). "
    "Size classification: startup (<50 employees), small (50-200), "
    "medium (200-1000), large (1000-10000), enterprise (10000+)."
)

Field descriptions matter¶

Smelt includes your Field(description=...) values in the system prompt. These are the LLM's primary guide for each output field.

# Without descriptions — LLM guesses what "tier" means
class Pricing(BaseModel):
    tier: str
    score: float

# With descriptions — LLM knows exactly what you want
class Pricing(BaseModel):
    tier: str = Field(description="Price tier: budget (<$50), mid-range ($50-200), premium ($200-500), luxury ($500+)")
    score: float = Field(description="Value-for-money score from 0.0 (poor value) to 1.0 (excellent value)")

Tip

Think of Field(description=...) as micro-prompts for individual fields. Be as specific as possible — include ranges, formats, examples, and edge cases.

Prompt patterns¶

Classification with fixed categories¶

prompt = (
    "Classify each support ticket into exactly one category. "
    "Categories are: billing (payment, charges, invoices), "
    "technical (bugs, errors, performance), "
    "shipping (delivery, tracking, returns), "
    "account (login, settings, permissions), "
    "general (everything else)."
)

Extraction with format rules¶

prompt = (
    "Extract contact information from each text block. "
    "Normalize phone numbers to E.164 format (+1XXXXXXXXXX for US numbers). "
    "Normalize email addresses to lowercase. "
    "If a field is not present in the text, return 'not_found'."
)

Scoring with calibration¶

prompt = (
    "Rate each product review on a scale from 0.0 to 1.0. "
    "Calibration: "
    "0.0 = extremely negative (product is broken, dangerous, or useless) "
    "0.25 = mostly negative (significant issues, would not recommend) "
    "0.5 = neutral or mixed (equal positives and negatives) "
    "0.75 = mostly positive (minor issues, would recommend) "
    "1.0 = extremely positive (perfect, no complaints)"
)

Enrichment with knowledge boundaries¶

prompt = (
    "Enrich each company with market analysis. "
    "Base your analysis on publicly available information. "
    "If you are not confident about a fact, indicate uncertainty. "
    "Do not speculate about future performance."
)

Multi-step reasoning¶

prompt = (
    "For each support ticket: "
    "1. Identify the core issue (what went wrong). "
    "2. Categorize by department (billing, technical, shipping, account, general). "
    "3. Assess priority based on: urgency (explicit deadline), impact (number of users affected), "
    "   and severity (data loss > functionality > cosmetic). "
    "4. Draft a response that acknowledges the issue and provides a next step."
)

Common mistakes¶

Too vague¶

# Bad
prompt = "Process the data"

# Good
prompt = "Classify each company by GICS sector and determine if publicly traded"

Contradicting the schema¶

# Bad — prompt says 3 categories but Literal has 5
prompt = "Classify as positive, negative, or neutral"

class Output(BaseModel):
    sentiment: Literal["positive", "negative", "neutral", "mixed", "unknown"]

# Good — prompt matches schema
prompt = "Classify sentiment as positive, negative, neutral, mixed, or unknown"

Missing edge cases¶

# Bad — what if the review is in another language? What if it's empty?
prompt = "Analyze the sentiment of each review"

# Good — covers edge cases
prompt = (
    "Analyze the sentiment of each review. "
    "If the review is in a non-English language, analyze it in its original language. "
    "If the review is empty or unintelligible, classify as 'neutral' with score 0.5."
)

Overloading the prompt¶

# Bad — too many instructions, LLM may miss some
prompt = (
    "Classify by industry and sub-industry and determine if public and calculate market cap "
    "and find the CEO name and headquarters city and founding year and number of employees "
    "and annual revenue and profit margin and stock ticker and..."
)

# Good — focus on what matters, let the schema handle field definitions
prompt = (
    "Classify each company by industry sector. "
    "Use GICS classification for sector and sub-sector."
)

Testing your prompt¶

Always test with job.test() before a full run:

# Test with one row
result = job.test(model, data=data)
print(result.data[0])

# Check if the output makes sense
# If not, refine your prompt and test again

For systematic testing, compare outputs across temperature settings:

for temp in [0, 0.5, 1.0]:
    m = Model(provider="openai", name="gpt-4.1-mini", params={"temperature": temp})
    result = job.test(m, data=data)
    print(f"temp={temp}: {result.data[0]}")

If results vary wildly across temperatures, your prompt may be too ambiguous. Tighten the instructions until temperature=0 and temperature=0.5 produce similar results.