Cat breed detector with Llama 3.2 Vision on Cloudflare
Step-by-step tutorial: build a Cloudflare Worker that runs Llama 3.2 11B Vision on Workers AI and returns structured JSON cat-breed predictions.
In this tutorial you build a cat breed detector with Meta’s Llama 3.2 11B Vision Instruct model on Cloudflare Workers AI.
The 11B model strikes a good balance between accuracy and performance for focused vision tasks. For applications like breed identification, where speed and predictable pricing matter more than multi-step reasoning, smaller specialised models at the edge often beat their larger counterparts in real-world use.
What you’ll build
- A Cloudflare Worker that fetches a random cat image and identifies the breed
- Vision inference at the edge using Llama 3.2 11B Vision Instruct
- Structured JSON output enforced with a strict JSON Schema
- A deployment on Cloudflare’s global network with predictable per-call pricing
Tutorial setup 4 prerequisites
- A Cloudflare account —
sign up at cloudflare.com if you do not have one.
- Workers AI access —
accept the Workers AI Terms of Service in your Cloudflare dashboard.
- Llama 3.2 Vision model access —
accept the terms of service for
@cf/meta/llama-3.2-11b-vision-instructin the Workers AI catalog. - Node.js —
download the LTS version from nodejs.org.
Step-by-step guide
Step 1: Create a new Worker project
Create a new Worker project named cat-breed-detector:
npm create cloudflare@latest cat-breed-detector
When prompted:
- Select Hello World Example for the template
- Choose Worker only for the deployment target
- Select TypeScript for the language
This creates a new directory with a basic Worker project structure.
Step 2: Configure the AI binding
To use Workers AI from your Worker, add an AI binding to wrangler.jsonc.
Open wrangler.jsonc and add the following below the observability section:
"ai": {
"binding": "AI"
}
This binding exposes the Workers AI runtime to your Worker through env.AI.
Next, generate TypeScript types for your Worker bindings:
npx wrangler types
This creates a worker-configuration.d.ts file with type definitions for your environment bindings.
Step 3: Start the development server
Navigate to your project directory and start the dev server:
cd cat-breed-detector
npm run start
Open the localhost URL shown in your terminal (usually http://localhost:8787). You should see “Hello World!” in your browser.
Step 4: Implement the cat breed detector
Replace the contents of src/index.ts with:
export interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const res = await fetch('https://cataas.com/cat');
const blob = await res.arrayBuffer();
// Convert array buffer to base64 data URL in chunks to avoid stack overflow
const uint8Array = new Uint8Array(blob);
let binaryString = '';
const chunkSize = 8192;
for (let i = 0; i < uint8Array.length; i += chunkSize) {
const chunk = uint8Array.slice(i, i + chunkSize);
binaryString += String.fromCharCode(...chunk);
}
const base64 = btoa(binaryString);
const dataUrl = `data:image/jpeg;base64,${base64}`;
const messages = [
{
role: 'system',
content:
'You are a cat breed expert assistant. You must respond with valid JSON only, matching the provided schema exactly.',
},
{
role: 'user',
content: [
{
type: 'text',
text: 'Analyze this image and identify the cat breed. Respond with a JSON object containing: breed (string), confidence (one of: high, medium, low), and description (string with brief description of the cat).',
},
{ type: 'image_url', image_url: { url: dataUrl } },
],
},
];
const response = await env.AI.run('@cf/meta/llama-3.2-11b-vision-instruct', {
messages,
max_tokens: 512,
response_format: {
type: 'json_schema',
json_schema: {
name: 'cat_breed_analysis',
strict: true,
schema: {
type: 'object',
properties: {
breed: {
type: 'string',
description: 'The cat breed identified in the image',
},
confidence: {
type: 'string',
enum: ['high', 'medium', 'low'],
description: 'Confidence level of the breed identification',
},
description: {
type: 'string',
description: 'A brief description of the cat in the image',
},
},
required: ['breed', 'confidence', 'description'],
additionalProperties: false,
},
},
},
});
return Response.json(response);
},
} satisfies ExportedHandler<Env>;
The Worker fetches a random cat image from cataas.com/cat and converts it
to a base64 data URL in 8192-byte chunks. Larger chunks can overflow the
call stack when passed to String.fromCharCode. It then sends the image
and a text prompt to Llama 3.2 Vision using the Messages API, and forces
the model to answer in a specific JSON shape with response_format.
The Messages API
Llama 3.2 Vision uses the chat-based Messages API format, like other modern LLMs. The messages array contains:
- A
systemmessage that sets the model’s behaviour - A
usermessage with both text and image content
JSON Schema mode
The response_format parameter enforces structured output. With strict: true, the model’s response is guaranteed to match the schema: breed, confidence, and description, with confidence restricted to high, medium, or low. You do not need to parse free-form text or handle unexpected shapes.
Step 5: Deploy to Cloudflare
Deploy your Worker to Cloudflare’s global network:
npx wrangler deploy
Wrangler prints the URL where your Worker is live, for example https://cat-breed-detector.your-subdomain.workers.dev.
Verification / Testing
With the dev server still running, refresh your browser. You should see a JSON response similar to:
{
"breed": "Domestic Shorthair",
"confidence": "medium",
"description": "An orange tabby cat with distinctive striped markings"
}
Each refresh fetches a different cat image, so the breed and description will change. The shape is guaranteed by the schema.
After deploying, hit the *.workers.dev URL Wrangler printed and confirm you get the same JSON shape from the edge.
What you learned
- How to wire a Cloudflare Worker to Workers AI with the
AIbinding - How to send images to a vision model using the Messages API format
- How to enforce a strict JSON shape on the response with JSON Schema mode
Related articles
Connect Tableau Server to Claude Code via MCP + PAT
Wire Claude Code into Tableau Server using the official MCP server, authenticated with a Personal Access Token — list workbooks and views from chat.
Give Claude Code system-wide memory with CLAUDE.md
Put your KPIs, role, and writing style in one file at ~/.claude/CLAUDE.md and Claude Code reads it on every prompt — across every project.
Connect Claude Code to Slack with the official plugin
Install the official Slack plugin for Claude Code with one command, then post to channels and read threads from any session — no custom MCP server needed.