AI

Cat breed detector with Llama 3.2 Vision on Cloudflare

Step-by-step tutorial: build a Cloudflare Worker that runs Llama 3.2 11B Vision on Workers AI and returns structured JSON cat-breed predictions.

··6 min read·Koppelvlak

In this tutorial you build a cat breed detector with Meta’s Llama 3.2 11B Vision Instruct model on Cloudflare Workers AI.

The 11B model strikes a good balance between accuracy and performance for focused vision tasks. For applications like breed identification, where speed and predictable pricing matter more than multi-step reasoning, smaller specialised models at the edge often beat their larger counterparts in real-world use.

What you’ll build

  • A Cloudflare Worker that fetches a random cat image and identifies the breed
  • Vision inference at the edge using Llama 3.2 11B Vision Instruct
  • Structured JSON output enforced with a strict JSON Schema
  • A deployment on Cloudflare’s global network with predictable per-call pricing
Tutorial setup 4 prerequisites
Requirements
  • A Cloudflare account

    sign up at cloudflare.com if you do not have one.

  • Workers AI access

    accept the Workers AI Terms of Service in your Cloudflare dashboard.

  • Llama 3.2 Vision model access

    accept the terms of service for @cf/meta/llama-3.2-11b-vision-instruct in the Workers AI catalog.

  • Node.js

    download the LTS version from nodejs.org.

Step-by-step guide

Step 1: Create a new Worker project

Create a new Worker project named cat-breed-detector:

npm create cloudflare@latest cat-breed-detector

When prompted:

  • Select Hello World Example for the template
  • Choose Worker only for the deployment target
  • Select TypeScript for the language

This creates a new directory with a basic Worker project structure.

Step 2: Configure the AI binding

To use Workers AI from your Worker, add an AI binding to wrangler.jsonc.

Open wrangler.jsonc and add the following below the observability section:

"ai": {
  "binding": "AI"
}

This binding exposes the Workers AI runtime to your Worker through env.AI.

Next, generate TypeScript types for your Worker bindings:

npx wrangler types

This creates a worker-configuration.d.ts file with type definitions for your environment bindings.

Step 3: Start the development server

Navigate to your project directory and start the dev server:

cd cat-breed-detector
npm run start

Open the localhost URL shown in your terminal (usually http://localhost:8787). You should see “Hello World!” in your browser.

Step 4: Implement the cat breed detector

Replace the contents of src/index.ts with:

export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const res = await fetch('https://cataas.com/cat');
    const blob = await res.arrayBuffer();

    // Convert array buffer to base64 data URL in chunks to avoid stack overflow
    const uint8Array = new Uint8Array(blob);
    let binaryString = '';
    const chunkSize = 8192;
    for (let i = 0; i < uint8Array.length; i += chunkSize) {
      const chunk = uint8Array.slice(i, i + chunkSize);
      binaryString += String.fromCharCode(...chunk);
    }
    const base64 = btoa(binaryString);
    const dataUrl = `data:image/jpeg;base64,${base64}`;

    const messages = [
      {
        role: 'system',
        content:
          'You are a cat breed expert assistant. You must respond with valid JSON only, matching the provided schema exactly.',
      },
      {
        role: 'user',
        content: [
          {
            type: 'text',
            text: 'Analyze this image and identify the cat breed. Respond with a JSON object containing: breed (string), confidence (one of: high, medium, low), and description (string with brief description of the cat).',
          },
          { type: 'image_url', image_url: { url: dataUrl } },
        ],
      },
    ];

    const response = await env.AI.run('@cf/meta/llama-3.2-11b-vision-instruct', {
      messages,
      max_tokens: 512,
      response_format: {
        type: 'json_schema',
        json_schema: {
          name: 'cat_breed_analysis',
          strict: true,
          schema: {
            type: 'object',
            properties: {
              breed: {
                type: 'string',
                description: 'The cat breed identified in the image',
              },
              confidence: {
                type: 'string',
                enum: ['high', 'medium', 'low'],
                description: 'Confidence level of the breed identification',
              },
              description: {
                type: 'string',
                description: 'A brief description of the cat in the image',
              },
            },
            required: ['breed', 'confidence', 'description'],
            additionalProperties: false,
          },
        },
      },
    });

    return Response.json(response);
  },
} satisfies ExportedHandler<Env>;

The Messages API

Llama 3.2 Vision uses the chat-based Messages API format, like other modern LLMs. The messages array contains:

  • A system message that sets the model’s behaviour
  • A user message with both text and image content

JSON Schema mode

The response_format parameter enforces structured output. With strict: true, the model’s response is guaranteed to match the schema: breed, confidence, and description, with confidence restricted to high, medium, or low. You do not need to parse free-form text or handle unexpected shapes.

Step 5: Deploy to Cloudflare

Deploy your Worker to Cloudflare’s global network:

npx wrangler deploy

Wrangler prints the URL where your Worker is live, for example https://cat-breed-detector.your-subdomain.workers.dev.

Verification / Testing

With the dev server still running, refresh your browser. You should see a JSON response similar to:

{
  "breed": "Domestic Shorthair",
  "confidence": "medium",
  "description": "An orange tabby cat with distinctive striped markings"
}

Each refresh fetches a different cat image, so the breed and description will change. The shape is guaranteed by the schema.

After deploying, hit the *.workers.dev URL Wrangler printed and confirm you get the same JSON shape from the edge.

What you learned

  • How to wire a Cloudflare Worker to Workers AI with the AI binding
  • How to send images to a vision model using the Messages API format
  • How to enforce a strict JSON shape on the response with JSON Schema mode