Skip to main content
Raysurfer is designed for agents that chain multiple tool calls together in production workflows. If your agent only makes 1-3 tool calls per run, check Is Raysurfer Right for You? first.

Installation

uv add raysurfer

Set Your API Key

Get your API key from the dashboard and set it as an environment variable:
export RAYSURFER_API_KEY=your_api_key_here

Drop-in Replacement

Raysurfer wraps the Claude Agent SDK. Swap your client and method names:
# Before
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions

async with ClaudeSDKClient(options) as client:
    await client.query("Generate quarterly report")
    async for msg in client.receive_response():
        print(msg)

# After
from raysurfer import RaysurferClient
from claude_agent_sdk import ClaudeAgentOptions  # Options come from Claude SDK

async with RaysurferClient(options) as client:
    await client.query("Generate quarterly report")
    async for msg in client.response():
        print(msg)

Key Differences

Claude SDKRaysurfer
ClaudeSDKClientRaysurferClient
client.query()client.query()
client.receive_response()client.response()
Options (ClaudeAgentOptions) come directly from claude_agent_sdk — no Raysurfer-specific options needed.

Verification Checklist

Follow these three steps to confirm caching is working and understand the similarity threshold for your use case.

Step 1: Confirm Code Generation on First Run

Run your agent with a unique query for the first time. This run generates code from scratch via the Claude Agent SDK, so it will take the full generation time.
await client.query("Generate a quarterly revenue report for Q3 2025")
Watch the output — you should see the agent generating code, executing tools, and producing results at normal speed. This first run populates the cache.

Step 2: Confirm Cache Hit on Second Run

Run the exact same query again:
await client.query("Generate a quarterly revenue report for Q3 2025")
This time, Raysurfer pulls the cached code file instead of regenerating it. You should see the result return near-instantly because the only synchronous work is fetching the cached code — everything else runs asynchronously.
If the second run completes dramatically faster than the first, caching is working. You’re in the clear.
Run the same query using claude-agent-sdk directly (without Raysurfer) to get a baseline time, then compare it against the cached Raysurfer run to measure your exact speedup.

Step 3: Explore the Similarity Boundary

Now try similar but not identical queries to see how far you can drift before you stop getting cache hits. Start close and gradually move further away:
# Very similar — likely cache hit
await client.query("Generate a quarterly revenue report for Q4 2025")

# Moderately similar — may or may not hit cache
await client.query("Generate a monthly revenue summary for Q3 2025")

# Different intent — likely cache miss, generates fresh code
await client.query("Build a customer churn analysis dashboard")
For each query, note whether the response time is near-instant (cache hit) or takes the full generation time (cache miss). This tells you how semantically similar your queries need to be to reuse the same cached workflow code.

Need help tuning for your use case?

Book a 15-minute call and we’ll walk through your integration together.

What Gets Cached?

Raysurfer caches:
  • Code outputs from agent tool calls
  • Generated reports, templates, and documents
  • API response patterns
  • Any structured output your agent produces
Raysurfer automatically verifies and stores successful code snippets from your runs for future reuse.

Programmatic Tool Calling

Register tools, then either pass in code (primary mode) or optionally generate code in the sandbox with your own key + prompt:
from raysurfer import AsyncRaySurfer

rs = AsyncRaySurfer(api_key="your_api_key")

@rs.tool
def add(a: int, b: int) -> int:
    """Add two numbers together."""
    return a + b

user_code = "result = add(5, 3)\nprint(result)"
result = await rs.execute(
    "Use add to compute 5 + 3, then print the result",
    user_code=user_code,
)
print(result.result)     # "8"
print(result.cache_hit)  # False (reserved field)
See Python SDK — Programmatic Tool Calling or TypeScript SDK — Programmatic Tool Calling for full details. In Python, @rs.tool can wrap any function and uses its docstring as the tool description in the schema payload. In TypeScript, tool(name, description, parameters, callback) wraps any callback and uses your description string the same way. execute() supports two modes: user_code/userCode (primary) or optional sandbox codegen with a user-provided prompt and provider key. If you already have an LLM tool loop and just want to wrap it with .search() + .upload(), use Wrap Any LLM Tool Flow.

Next Steps

Python SDK

Full Python SDK reference and examples

TypeScript SDK

Full TypeScript SDK reference and examples

How It Works

Learn about proven code retrieval and reputation scoring

Is It Right for Me?

Check if your agent workflow is a good fit for Raysurfer