Why use the 11B Vision model instead of a larger one?

For focused vision tasks like breed identification, the 11B model gives a good balance of accuracy, latency, and predictable per-call pricing — smaller specialised models at the edge often beat larger ones in real-world use.

Why base64-encode the image in chunks?

Passing the entire Uint8Array to String.fromCharCode can overflow the call stack on larger images. Encoding in 8192-byte chunks avoids the overflow while keeping the result identical.

How does JSON Schema mode guarantee the response shape?

Setting response_format with strict mode forces the model to emit JSON matching the supplied schema. The breed, confidence, and description fields are always present and confidence is constrained to high, medium, or low.

Cat breed detector with Llama 3.2 Vision on Cloudflare

Step-by-step tutorial: build a Cloudflare Worker that runs Llama 3.2 11B Vision on Workers AI and returns structured JSON cat-breed predictions.

November 1, 2025·Updated May 9, 2026·6 min read·Koppelvlak

In this tutorial you build a cat breed detector with Meta’s Llama 3.2 11B Vision Instruct model on Cloudflare Workers AI.

The 11B model strikes a good balance between accuracy and performance for focused vision tasks. For applications like breed identification, where speed and predictable pricing matter more than multi-step reasoning, smaller specialised models at the edge often beat their larger counterparts in real-world use.

What you’ll build

A Cloudflare Worker that fetches a random cat image and identifies the breed
Vision inference at the edge using Llama 3.2 11B Vision Instruct
Structured JSON output enforced with a strict JSON Schema
A deployment on Cloudflare’s global network with predictable per-call pricing

Tutorial setup 4 prerequisites

Requirements

A Cloudflare account —
sign up at cloudflare.com if you do not have one.
Workers AI access —
accept the Workers AI Terms of Service in your Cloudflare dashboard.
Llama 3.2 Vision model access —
accept the terms of service for @cf/meta/llama-3.2-11b-vision-instruct in the Workers AI catalog.
Node.js —
download the LTS version from nodejs.org.

Step-by-step guide

Step 1: Create a new Worker project

Create a new Worker project named cat-breed-detector:

npm create cloudflare@latest cat-breed-detector

When prompted:

Select Hello World Example for the template
Choose Worker only for the deployment target
Select TypeScript for the language

This creates a new directory with a basic Worker project structure.

Step 2: Configure the AI binding

To use Workers AI from your Worker, add an AI binding to wrangler.jsonc.

Open wrangler.jsonc and add the following below the observability section:

"ai": {
  "binding": "AI"
}

This binding exposes the Workers AI runtime to your Worker through env.AI.

Next, generate TypeScript types for your Worker bindings:

npx wrangler types

This creates a worker-configuration.d.ts file with type definitions for your environment bindings.

Step 3: Start the development server

Navigate to your project directory and start the dev server:

cd cat-breed-detector
npm run start

Open the localhost URL shown in your terminal (usually http://localhost:8787). You should see “Hello World!” in your browser.

Step 4: Implement the cat breed detector

Replace the contents of src/index.ts with:

export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const res = await fetch('https://cataas.com/cat');
    const blob = await res.arrayBuffer();

    // Convert array buffer to base64 data URL in chunks to avoid stack overflow
    const uint8Array = new Uint8Array(blob);
    let binaryString = '';
    const chunkSize = 8192;
    for (let i = 0; i < uint8Array.length; i += chunkSize) {
      const chunk = uint8Array.slice(i, i + chunkSize);
      binaryString += String.fromCharCode(...chunk);
    }
    const base64 = btoa(binaryString);
    const dataUrl = `data:image/jpeg;base64,${base64}`;

    const messages = [
      {
        role: 'system',
        content:
          'You are a cat breed expert assistant. You must respond with valid JSON only, matching the provided schema exactly.',
      },
      {
        role: 'user',
        content: [
          {
            type: 'text',
            text: 'Analyze this image and identify the cat breed. Respond with a JSON object containing: breed (string), confidence (one of: high, medium, low), and description (string with brief description of the cat).',
          },
          { type: 'image_url', image_url: { url: dataUrl } },
        ],
      },
    ];

    const response = await env.AI.run('@cf/meta/llama-3.2-11b-vision-instruct', {
      messages,
      max_tokens: 512,
      response_format: {
        type: 'json_schema',
        json_schema: {
          name: 'cat_breed_analysis',
          strict: true,
          schema: {
            type: 'object',
            properties: {
              breed: {
                type: 'string',
                description: 'The cat breed identified in the image',
              },
              confidence: {
                type: 'string',
                enum: ['high', 'medium', 'low'],
                description: 'Confidence level of the breed identification',
              },
              description: {
                type: 'string',
                description: 'A brief description of the cat in the image',
              },
            },
            required: ['breed', 'confidence', 'description'],
            additionalProperties: false,
          },
        },
      },
    });

    return Response.json(response);
  },
} satisfies ExportedHandler<Env>;

The Messages API

Llama 3.2 Vision uses the chat-based Messages API format, like other modern LLMs. The messages array contains:

A system message that sets the model’s behaviour
A user message with both text and image content

JSON Schema mode

The response_format parameter enforces structured output. With strict: true, the model’s response is guaranteed to match the schema: breed, confidence, and description, with confidence restricted to high, medium, or low. You do not need to parse free-form text or handle unexpected shapes.

Step 5: Deploy to Cloudflare

Deploy your Worker to Cloudflare’s global network:

npx wrangler deploy

Wrangler prints the URL where your Worker is live, for example https://cat-breed-detector.your-subdomain.workers.dev.

Verification / Testing

With the dev server still running, refresh your browser. You should see a JSON response similar to:

{
  "breed": "Domestic Shorthair",
  "confidence": "medium",
  "description": "An orange tabby cat with distinctive striped markings"
}

Each refresh fetches a different cat image, so the breed and description will change. The shape is guaranteed by the schema.

After deploying, hit the *.workers.dev URL Wrangler printed and confirm you get the same JSON shape from the edge.

What you learned

How to wire a Cloudflare Worker to Workers AI with the AI binding
How to send images to a vision model using the Messages API format
How to enforce a strict JSON shape on the response with JSON Schema mode

#workers-ai #llama #vision-models

Tooling

Connect Tableau Server to Claude Code via MCP + PAT

Wire Claude Code into Tableau Server using the official MCP server, authenticated with a Personal Access Token — list workbooks and views from chat.

May 3, 20264 min read

Tooling

Give Claude Code system-wide memory with CLAUDE.md

Put your KPIs, role, and writing style in one file at ~/.claude/CLAUDE.md and Claude Code reads it on every prompt — across every project.

May 3, 20263 min read

Tooling

Connect Claude Code to Slack with the official plugin

Install the official Slack plugin for Claude Code with one command, then post to channels and read threads from any session — no custom MCP server needed.

April 22, 20261 min read

What you’ll build

Step-by-step guide

Step 1: Create a new Worker project

Step 2: Configure the AI binding

Step 3: Start the development server

Step 4: Implement the cat breed detector

The Messages API

JSON Schema mode

Step 5: Deploy to Cloudflare

Verification / Testing

What you learned

Related articles

Connect Tableau Server to Claude Code via MCP + PAT

Give Claude Code system-wide memory with CLAUDE.md

Connect Claude Code to Slack with the official plugin