Skip to main content
Back to Home

Beyond the Prompt

A system prompt is only one part of an agent's configuration. Temperature, tool access, and model selection work together with the prompt to define how an agent behaves. Getting the prompt right but the configuration wrong will produce an agent that feels broken.

Temperature

Temperature controls how the model selects tokens. Lower values make output more deterministic; higher values introduce more variation.

TemperatureBehaviorBest For
0.0 – 0.3Highly deterministicFactual lookup, classification, structured output
0.4 – 0.6Mostly deterministic with slight variationCode generation, technical writing, analysis
0.7 – 0.8Balanced — reliable but with personalityConversational agents, knowledge bases, support
0.9 – 1.0High variation, surprising word choicesCreative writing, brainstorming, storytelling

How Temperature Interacts with Prompts

Mismatch (Bad)

A prompt saying “be creative and expressive” at temperature 0.2 will produce bland output.

Low temperature overrides the prompt's intent by always selecting safe, common tokens.

Mismatch (Bad)

A prompt saying “be precise and accurate” at temperature 1.0 will produce unreliable output.

High temperature introduces randomness that undermines precision.

Rule of thumb: Your temperature should match the freedom your prompt gives the agent.

The Three Built-in Agents

@guide
0.8

Warm, conversational, but grounded in real features

@writer
1.0

Maximum creative variation

@coder
0.4

Deterministic, accurate technical output

Tool Access

Tools give agents the ability to take actions — check the weather, search the web, access calendars. But more tools isn't always better.

The Principle

Every enabled tool is a classification candidate. When the model receives a user message, it decides whether to call a tool or respond with text. More enabled tools means more potential for misclassification.

AgentToolsReasoning
@guideAll enabledNeeds to demonstrate every feature
@writerAll disabledShould generate language, never trigger actions
@coderWeb search onlyNeeds current docs, nothing else

Enable a tool when:

  • The agent's core job requires it
  • Users would naturally expect the capability
  • It enhances responses rather than distracting

Disable a tool when:

  • It has no relation to the agent's purpose
  • It could cause misclassification of core queries
  • The agent should focus purely on language generation

Common Mistakes

Leaving all tools enabled “just in case”

This is the most common configuration error. An agent with 11 tools enabled will misclassify more often than one with 2.

Disabling tools the agent needs

A research agent without web search, or a scheduling agent without calendar access, will frustrate users.

Enabling tools that create ambiguity

A writing agent with weather tools might interpret “Write about a stormy night” as a weather query.

Model Selection

Different models have different strengths. The model tier determines the quality ceiling for your agent.

TierModelBest For
Apple IntelligenceFoundation ModelsHighest quality, best reasoning, nuanced output
Gemma 3 4B2.5GB on-deviceStrong general purpose, good for most agents
Gemma 3n E4B2.7GB on-deviceEfficient text generation, good balance
Gemma 3n E2B1.5GB on-deviceLightweight, fast responses, simpler tasks
Qwen 2.5 0.5BSmallest on-deviceQuick answers, limited reasoning depth

How Model Choice Affects Prompts

Smaller models have shorter context windows and less reasoning depth. This means:

  • Shorter prompts work better on smaller models — a 200-word system prompt might consume too much context
  • Simpler directives are more reliably followed — complex conditionals may be lost
  • Explicit examples help more on smaller models than abstract instructions

Putting It All Together

The best agents have coherent configuration — every setting reinforces the same intent.

Coherent Configuration (Good)

Prompt:  Creative writing assistant
Temp:    1.0 — supports creativity
Tools:   None — pure language generation
Model:   Apple Intelligence — highest quality

Every setting says “creative freedom.”

Incoherent Configuration (Bad)

Prompt:  Creative writing assistant
Temp:    0.2 — suppresses creativity
Tools:   All enabled — will trigger on prompts
Model:   Qwen 0.5B — too small for creative work

The prompt says creativity, but every other setting fights it.

Configuration Checklist

1
Temperature: Does it match the freedom level in my prompt?
2
Tools: Does every enabled tool serve this agent's purpose?
3
Model: Is it capable enough for what my prompt asks?
4
Prompt length: Is it appropriate for the model tier?
5
Consistency: Do all settings reinforce the same intent?