Building AI-Powered Applications with Cloudflare Workers AI#

Cloudflare Workers AI brings artificial intelligence to the edge, allowing you to run ML models globally with zero cold starts. This comprehensive guide covers building AI-powered applications using language models, computer vision, embeddings, and vector search capabilities.

Table of Contents#

Introduction to Workers AI#

Cloudflare Workers AI provides access to popular open-source AI models that run on Cloudflare’s global network. Unlike traditional AI services, Workers AI offers:

Zero cold starts - Models are already loaded and ready
Global distribution - Run AI at 285+ locations worldwide
Cost-effective pricing - Pay per request, not per hour
No infrastructure management - Serverless AI inference
Privacy-focused - Data never leaves Cloudflare’s network

Available Model Categories#

Category	Models	Use Cases
Text Generation	Llama 2, Mistral, CodeLlama	Chatbots, content creation, code generation
Text Classification	DistilBERT, RoBERTa	Sentiment analysis, content moderation
Translation	m2m100, NLLB	Multi-language translation
Text Embeddings	BGE, E5	Semantic search, similarity matching
Image Classification	ResNet, EfficientNet	Object detection, content tagging
Image Generation	Stable Diffusion	AI art, image synthesis
Speech Recognition	Whisper	Voice transcription
Object Detection	YOLO, DETR	Computer vision applications

Getting Started#

1. Project Setup#

1
# Create a new Workers AI project
2
npm create cloudflare@latest my-ai-app -- --template=worker-ai
3

4
cd my-ai-app
5

6
# Install additional dependencies
7
npm install @cloudflare/ai zod openai

2. Configuration#

wrangler.toml:

1
name = "ai-powered-app"
2
main = "src/index.ts"
3
compatibility_date = "2025-01-10"
4

5
# AI binding
6
[ai]
7
binding = "AI"
8

9
# Optional: Add D1 for conversation history
10
[[d1_databases]]
11
binding = "DB"
12
database_name = "ai-conversations"
13
database_id = "your-database-id"
14

15
# Optional: Add KV for caching
16
[[kv_namespaces]]
17
binding = "CACHE"
18
id = "your-kv-namespace-id"
19

20
# Optional: Add R2 for file storage
21
[[r2_buckets]]
22
binding = "STORAGE"
23
bucket_name = "ai-files"
24

25
[vars]
26
OPENAI_API_KEY = "your-openai-key" # For comparison/fallback
27
MAX_TOKENS = "2048"
28
RATE_LIMIT = "100"

3. Basic Workers AI Structure#

1
import { Ai } from '@cloudflare/ai';
2

3
export interface Env {
4
  AI: Ai;
5
  DB?: D1Database;
6
  CACHE?: KVNamespace;
7
  STORAGE?: R2Bucket;
8
  OPENAI_API_KEY?: string;
9
}
10

11
export default {
12
  async fetch(request: Request, env: Env): Promise<Response> {
13
    const ai = new Ai(env.AI);
14

15
    // Route requests
16
    const url = new URL(request.url);
17

18
    if (url.pathname === '/chat') {
19
      return handleChat(request, ai, env);
20
    } else if (url.pathname === '/image') {
21
      return handleImageGeneration(request, ai, env);
22
    } else if (url.pathname === '/analyze') {
23
      return handleImageAnalysis(request, ai, env);
24
    }
25

26
    return new Response('AI Service Ready', { status: 200 });
27
  },
28
};

Text Generation and Chatbots#

1. Basic Text Generation#

1
// Simple text completion
2
async function generateText(prompt: string, ai: Ai): Promise<string> {
3
  const response = await ai.run('@cf/meta/llama-2-7b-chat-int8', {
4
    messages: [
5
      { role: 'user', content: prompt }
6
    ],
7
    max_tokens: 256,
8
    temperature: 0.7,
9
  });
10

11
  return response.response;
12
}
13

14
// Advanced text generation with system prompts
15
async function generateWithContext(
16
  systemPrompt: string,
17
  userPrompt: string,
18
  ai: Ai
19
): Promise<string> {
20
  const response = await ai.run('@cf/mistral/mistral-7b-instruct-v0.1', {
21
    messages: [
22
      { role: 'system', content: systemPrompt },
23
      { role: 'user', content: userPrompt }
24
    ],
25
    max_tokens: 512,
26
    temperature: 0.8,
27
    top_p: 0.9,
28
  });
29

30
  return response.response;
31
}

2. Conversational Chatbot#

1
interface ChatMessage {
2
  role: 'system' | 'user' | 'assistant';
3
  content: string;
4
  timestamp: string;
5
}
6

7
interface Conversation {
8
  id: string;
9
  messages: ChatMessage[];
10
  model: string;
11
  created_at: string;
12
  updated_at: string;
13
}
14

15
async function handleChat(request: Request, ai: Ai, env: Env): Promise<Response> {
16
  try {
17
    const { message, conversationId, model = '@cf/meta/llama-2-7b-chat-int8' } =
18
      await request.json() as {
19
        message: string;
20
        conversationId?: string;
21
        model?: string;
22
      };
23

24
    // Get or create conversation
25
    let conversation = await getConversation(conversationId, env);
26
    if (!conversation) {
27
      conversation = await createConversation(model, env);
28
    }
29

30
    // Add user message
31
    conversation.messages.push({
32
      role: 'user',
33
      content: message,
34
      timestamp: new Date().toISOString()
35
    });
36

37
    // Keep only last 10 messages to manage context window
38
    if (conversation.messages.length > 10) {
39
      conversation.messages = conversation.messages.slice(-10);
40
    }
41

42
    // Generate AI response
43
    const aiResponse = await ai.run(model, {
44
      messages: conversation.messages,
45
      max_tokens: 512,
46
      temperature: 0.7,
47
      stream: false
48
    });
49

50
    // Add AI message
51
    const assistantMessage: ChatMessage = {
52
      role: 'assistant',
53
      content: aiResponse.response,
54
      timestamp: new Date().toISOString()
55
    };
56

57
    conversation.messages.push(assistantMessage);
58
    conversation.updated_at = new Date().toISOString();
59

60
    // Save conversation
61
    await saveConversation(conversation, env);
62

63
    return new Response(JSON.stringify({
64
      message: aiResponse.response,
65
      conversationId: conversation.id,
66
      usage: {
67
        tokens: aiResponse.response.length, // Approximate
68
        model: model
69
      }
70
    }), {
71
      headers: { 'Content-Type': 'application/json' }
72
    });
73

74
  } catch (error) {
75
    console.error('Chat error:', error);
76
    return new Response(JSON.stringify({
77
      error: 'Failed to generate response'
78
    }), {
79
      status: 500,
80
      headers: { 'Content-Type': 'application/json' }
81
    });
82
  }
83
}
84

85
// Conversation management functions
86
async function getConversation(
87
  id: string | undefined,
88
  env: Env
89
): Promise<Conversation | null> {
90
  if (!id || !env.DB) return null;
91

92
  const result = await env.DB.prepare(
93
    'SELECT * FROM conversations WHERE id = ?'
94
  ).bind(id).first();
95

96
  if (!result) return null;
97

98
  return {
99
    ...result,
100
    messages: JSON.parse(result.messages as string)
101
  } as Conversation;
102
}
103

104
async function createConversation(model: string, env: Env): Promise<Conversation> {
105
  const id = crypto.randomUUID();
106
  const now = new Date().toISOString();
107

108
  const conversation: Conversation = {
109
    id,
110
    messages: [{
111
      role: 'system',
112
      content: 'You are a helpful AI assistant. Provide accurate, concise, and friendly responses.',
113
      timestamp: now
114
    }],
115
    model,
116
    created_at: now,
117
    updated_at: now
118
  };
119

120
  if (env.DB) {
121
    await env.DB.prepare(`
122
      INSERT INTO conversations (id, messages, model, created_at, updated_at)
123
      VALUES (?, ?, ?, ?, ?)
124
    `).bind(
125
      id,
126
      JSON.stringify(conversation.messages),
127
      model,
128
      now,
129
      now
130
    ).run();
131
  }
132

133
  return conversation;
134
}

3. Streaming Responses#

1
async function handleStreamingChat(request: Request, ai: Ai, env: Env): Promise<Response> {
2
  const { message } = await request.json();
3

4
  // Create a readable stream
5
  const { readable, writable } = new TransformStream();
6
  const writer = writable.getWriter();
7

8
  // Start streaming response
9
  streamAIResponse(message, ai, writer);
10

11
  return new Response(readable, {
12
    headers: {
13
      'Content-Type': 'text/plain; charset=utf-8',
14
      'Transfer-Encoding': 'chunked',
15
    }
16
  });
17
}
18

19
async function streamAIResponse(message: string, ai: Ai, writer: WritableStreamDefaultWriter) {
20
  try {
21
    const response = await ai.run('@cf/meta/llama-2-7b-chat-int8', {
22
      messages: [{ role: 'user', content: message }],
23
      stream: true
24
    });
25

26
    if (response.readable) {
27
      const reader = response.readable.getReader();
28
      const decoder = new TextDecoder();
29

30
      while (true) {
31
        const { done, value } = await reader.read();
32
        if (done) break;
33

34
        const chunk = decoder.decode(value, { stream: true });
35
        await writer.write(new TextEncoder().encode(chunk));
36
      }
37
    }
38
  } catch (error) {
39
    await writer.write(new TextEncoder().encode(`Error: ${error.message}`));
40
  } finally {
41
    await writer.close();
42
  }
43
}

Image Generation and Computer Vision#

1. AI Image Generation#

1
async function generateImage(prompt: string, ai: Ai): Promise<Response> {
2
  try {
3
    const response = await ai.run('@cf/stabilityai/stable-diffusion-xl-base-1.0', {
4
      prompt,
5
      num_steps: 20,
6
      strength: 1.0,
7
      guidance: 7.5,
8
    });
9

10
    return new Response(response, {
11
      headers: {
12
        'Content-Type': 'image/png',
13
        'Cache-Control': 'public, max-age=3600',
14
      },
15
    });
16
  } catch (error) {
17
    console.error('Image generation error:', error);
18
    return new Response('Failed to generate image', { status: 500 });
19
  }
20
}
21

22
// Advanced image generation with parameters
23
async function handleImageGeneration(request: Request, ai: Ai, env: Env): Promise<Response> {
24
  const formData = await request.formData();
25

26
  const prompt = formData.get('prompt') as string;
27
  const negativePrompt = formData.get('negative_prompt') as string || '';
28
  const steps = parseInt(formData.get('steps') as string || '20');
29
  const guidance = parseFloat(formData.get('guidance') as string || '7.5');
30
  const width = parseInt(formData.get('width') as string || '1024');
31
  const height = parseInt(formData.get('height') as string || '1024');
32

33
  if (!prompt) {
34
    return new Response('Prompt is required', { status: 400 });
35
  }
36

37
  try {
38
    const response = await ai.run('@cf/stabilityai/stable-diffusion-xl-base-1.0', {
39
      prompt,
40
      negative_prompt: negativePrompt,
41
      num_steps: Math.min(Math.max(steps, 1), 50), // Limit steps
42
      guidance: Math.min(Math.max(guidance, 1.0), 20.0), // Limit guidance
43
      width: Math.min(width, 1024), // Limit dimensions
44
      height: Math.min(height, 1024),
45
    });
46

47
    // Optionally store in R2
48
    if (env.STORAGE) {
49
      const key = `generated/${Date.now()}-${crypto.randomUUID()}.png`;
50
      await env.STORAGE.put(key, response, {
51
        customMetadata: {
52
          prompt,
53
          model: '@cf/stabilityai/stable-diffusion-xl-base-1.0',
54
          generated_at: new Date().toISOString(),
55
        }
56
      });
57
    }
58

59
    return new Response(response, {
60
      headers: {
61
        'Content-Type': 'image/png',
62
        'X-Generated-At': new Date().toISOString(),
63
        'X-Model': '@cf/stabilityai/stable-diffusion-xl-base-1.0',
64
      },
65
    });
66
  } catch (error) {
67
    console.error('Image generation error:', error);
68
    return new Response('Failed to generate image', { status: 500 });
69
  }
70
}

2. Image Analysis and Classification#

1
async function analyzeImage(imageData: ArrayBuffer, ai: Ai): Promise<any> {
2
  // Image classification
3
  const classification = await ai.run('@cf/microsoft/resnet-50', {
4
    image: [...new Uint8Array(imageData)]
5
  });
6

7
  // Object detection
8
  const objects = await ai.run('@cf/facebook/detr-resnet-50', {
9
    image: [...new Uint8Array(imageData)]
10
  });
11

12
  return {
13
    classification,
14
    objects,
15
    analysis_timestamp: new Date().toISOString()
16
  };
17
}
18

19
async function handleImageAnalysis(request: Request, ai: Ai, env: Env): Promise<Response> {
20
  try {
21
    const formData = await request.formData();
22
    const imageFile = formData.get('image') as File;
23

24
    if (!imageFile) {
25
      return new Response('No image provided', { status: 400 });
26
    }
27

28
    // Validate file type
29
    if (!imageFile.type.startsWith('image/')) {
30
      return new Response('Invalid file type', { status: 400 });
31
    }
32

33
    // Validate file size (max 10MB)
34
    if (imageFile.size > 10 * 1024 * 1024) {
35
      return new Response('File too large', { status: 400 });
36
    }
37

38
    const imageBuffer = await imageFile.arrayBuffer();
39

40
    // Analyze image
41
    const analysis = await analyzeImage(imageBuffer, ai);
42

43
    // Extract insights
44
    const insights = {
45
      dominant_objects: analysis.objects?.[0]?.label || 'Unknown',
46
      confidence_score: analysis.classification?.[0]?.score || 0,
47
      detected_objects: analysis.objects?.length || 0,
48
      categories: analysis.classification?.slice(0, 3).map((c: any) => ({
49
        label: c.label,
50
        confidence: c.score
51
      })) || [],
52
      metadata: {
53
        file_size: imageFile.size,
54
        file_type: imageFile.type,
55
        analyzed_at: new Date().toISOString(),
56
      }
57
    };
58

59
    // Cache results
60
    if (env.CACHE) {
61
      const cacheKey = `analysis:${await hashArrayBuffer(imageBuffer)}`;
62
      await env.CACHE.put(cacheKey, JSON.stringify(insights), {
63
        expirationTtl: 3600, // 1 hour
64
      });
65
    }
66

67
    return new Response(JSON.stringify(insights), {
68
      headers: { 'Content-Type': 'application/json' }
69
    });
70

71
  } catch (error) {
72
    console.error('Image analysis error:', error);
73
    return new Response(JSON.stringify({
74
      error: 'Failed to analyze image'
75
    }), {
76
      status: 500,
77
      headers: { 'Content-Type': 'application/json' }
78
    });
79
  }
80
}
81

82
// Helper function to hash image for caching
83
async function hashArrayBuffer(buffer: ArrayBuffer): Promise<string> {
84
  const digest = await crypto.subtle.digest('SHA-256', buffer);
85
  return Array.from(new Uint8Array(digest))
86
    .map(b => b.toString(16).padStart(2, '0'))
87
    .join('');
88
}

3. Image-to-Text (OCR) and Captioning#

1
async function extractTextFromImage(imageData: ArrayBuffer, ai: Ai): Promise<string> {
2
  // Use an OCR model
3
  const response = await ai.run('@cf/microsoft/trocr-base-printed', {
4
    image: [...new Uint8Array(imageData)]
5
  });
6

7
  return response.text || '';
8
}
9

10
async function generateImageCaption(imageData: ArrayBuffer, ai: Ai): Promise<string> {
11
  // Use image captioning model
12
  const response = await ai.run('@cf/microsoft/git-large-coco', {
13
    image: [...new Uint8Array(imageData)]
14
  });
15

16
  return response.description || 'Unable to generate caption';
17
}

Text Embeddings and Vector Search#

1. Generate Text Embeddings#

1
async function generateEmbedding(text: string, ai: Ai): Promise<number[]> {
2
  const response = await ai.run('@cf/baai/bge-base-en-v1.5', {
3
    text: text
4
  });
5

6
  return response.data[0];
7
}
8

9
async function generateMultipleEmbeddings(texts: string[], ai: Ai): Promise<number[][]> {
10
  const embeddings: number[][] = [];
11

12
  // Process in batches to avoid rate limits
13
  const batchSize = 10;
14
  for (let i = 0; i < texts.length; i += batchSize) {
15
    const batch = texts.slice(i, i + batchSize);
16
    const batchPromises = batch.map(text => generateEmbedding(text, ai));
17
    const batchResults = await Promise.all(batchPromises);
18
    embeddings.push(...batchResults);
19
  }
20

21
  return embeddings;
22
}

2. Vector Database with D1#

1
-- Create vector table schema
2
CREATE TABLE IF NOT EXISTS documents (
3
  id TEXT PRIMARY KEY,
4
  content TEXT NOT NULL,
5
  embedding_json TEXT NOT NULL, -- Store as JSON string
6
  metadata_json TEXT,
7
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
8
);
9

10
-- Create index for fast retrieval
11
CREATE INDEX idx_documents_created_at ON documents(created_at);

1
interface Document {
2
  id: string;
3
  content: string;
4
  embedding: number[];
5
  metadata?: any;
6
  created_at: string;
7
}
8

9
class VectorStore {
10
  private db: D1Database;
11
  private ai: Ai;
12

13
  constructor(db: D1Database, ai: Ai) {
14
    this.db = db;
15
    this.ai = ai;
16
  }
17

18
  async addDocument(content: string, metadata?: any): Promise<string> {
19
    const id = crypto.randomUUID();
20
    const embedding = await generateEmbedding(content, this.ai);
21

22
    await this.db.prepare(`
23
      INSERT INTO documents (id, content, embedding_json, metadata_json)
24
      VALUES (?, ?, ?, ?)
25
    `).bind(
26
      id,
27
      content,
28
      JSON.stringify(embedding),
29
      JSON.stringify(metadata || {})
30
    ).run();
31

32
    return id;
33
  }
34

35
  async addDocuments(docs: Array<{ content: string; metadata?: any }>): Promise<string[]> {
36
    const embeddings = await generateMultipleEmbeddings(
37
      docs.map(d => d.content),
38
      this.ai
39
    );
40

41
    const ids: string[] = [];
42
    const statements = docs.map((doc, index) => {
43
      const id = crypto.randomUUID();
44
      ids.push(id);
45

46
      return this.db.prepare(`
47
        INSERT INTO documents (id, content, embedding_json, metadata_json)
48
        VALUES (?, ?, ?, ?)
49
      `).bind(
50
        id,
51
        doc.content,
52
        JSON.stringify(embeddings[index]),
53
        JSON.stringify(doc.metadata || {})
54
      );
55
    });
56

57
    await this.db.batch(statements);
58
    return ids;
59
  }
60

61
  async search(query: string, limit: number = 10): Promise<Document[]> {
62
    // Generate embedding for query
63
    const queryEmbedding = await generateEmbedding(query, this.ai);
64

65
    // Get all documents (in production, you'd want pagination)
66
    const results = await this.db.prepare(`
67
      SELECT * FROM documents ORDER BY created_at DESC LIMIT 1000
68
    `).all();
69

70
    // Calculate similarities and sort
71
    const documentsWithSimilarity = results.results.map(doc => {
72
      const docEmbedding = JSON.parse(doc.embedding_json as string);
73
      const similarity = this.cosineSimilarity(queryEmbedding, docEmbedding);
74

75
      return {
76
        ...doc,
77
        similarity,
78
        embedding: docEmbedding,
79
        metadata: JSON.parse(doc.metadata_json as string || '{}')
80
      };
81
    });
82

83
    // Sort by similarity and return top results
84
    return documentsWithSimilarity
85
      .sort((a, b) => b.similarity - a.similarity)
86
      .slice(0, limit)
87
      .map(doc => ({
88
        id: doc.id as string,
89
        content: doc.content as string,
90
        embedding: doc.embedding,
91
        metadata: doc.metadata,
92
        created_at: doc.created_at as string
93
      }));
94
  }
95

96
  private cosineSimilarity(a: number[], b: number[]): number {
97
    const dotProduct = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
98
    const magnitudeA = Math.sqrt(a.reduce((sum, ai) => sum + ai * ai, 0));
99
    const magnitudeB = Math.sqrt(b.reduce((sum, bi) => sum + bi * bi, 0));
100

101
    return dotProduct / (magnitudeA * magnitudeB);
102
  }
103
}

3. RAG (Retrieval Augmented Generation)#

1
async function handleRAGQuery(
2
  query: string,
3
  vectorStore: VectorStore,
4
  ai: Ai
5
): Promise<string> {
6
  // 1. Search for relevant documents
7
  const relevantDocs = await vectorStore.search(query, 5);
8

9
  // 2. Prepare context from retrieved documents
10
  const context = relevantDocs
11
    .map(doc => doc.content)
12
    .join('\n\n---\n\n');
13

14
  // 3. Generate response with context
15
  const prompt = `Based on the following context, answer the user's question:
16

17
Context:
18
${context}
19

20
Question: ${query}
21

22
Answer:`;
23

24
  const response = await ai.run('@cf/meta/llama-2-7b-chat-int8', {
25
    messages: [
26
      {
27
        role: 'system',
28
        content: 'You are a helpful assistant that answers questions based on provided context. If the context doesn\'t contain relevant information, say so.'
29
      },
30
      {
31
        role: 'user',
32
        content: prompt
33
      }
34
    ],
35
    max_tokens: 512,
36
    temperature: 0.3,
37
  });
38

39
  return response.response;
40
}
41

42
// RAG endpoint
43
async function handleRAG(request: Request, ai: Ai, env: Env): Promise<Response> {
44
  try {
45
    const { query } = await request.json();
46

47
    if (!query || !env.DB) {
48
      return new Response('Query and database required', { status: 400 });
49
    }
50

51
    const vectorStore = new VectorStore(env.DB, ai);
52
    const answer = await handleRAGQuery(query, vectorStore, ai);
53

54
    return new Response(JSON.stringify({
55
      query,
56
      answer,
57
      timestamp: new Date().toISOString()
58
    }), {
59
      headers: { 'Content-Type': 'application/json' }
60
    });
61

62
  } catch (error) {
63
    console.error('RAG error:', error);
64
    return new Response(JSON.stringify({
65
      error: 'Failed to process query'
66
    }), {
67
      status: 500,
68
      headers: { 'Content-Type': 'application/json' }
69
    });
70
  }
71
}

Advanced AI Applications#

1
async function handleMultiModal(request: Request, ai: Ai, env: Env): Promise<Response> {
2
  try {
3
    const formData = await request.formData();
4
    const text = formData.get('text') as string;
5
    const imageFile = formData.get('image') as File;
6

7
    let imageAnalysis = null;
8
    if (imageFile) {
9
      const imageBuffer = await imageFile.arrayBuffer();
10
      imageAnalysis = await analyzeImage(imageBuffer, ai);
11
    }
12

13
    // Combine text and image context
14
    let combinedPrompt = text;
15
    if (imageAnalysis) {
16
      const imageDescription = `Image contains: ${imageAnalysis.classification?.[0]?.label || 'unknown object'}`;
17
      combinedPrompt = `${text}\n\nImage context: ${imageDescription}`;
18
    }
19

20
    // Generate response
21
    const response = await ai.run('@cf/meta/llama-2-7b-chat-int8', {
22
      messages: [{ role: 'user', content: combinedPrompt }],
23
      max_tokens: 512,
24
    });
25

26
    return new Response(JSON.stringify({
27
      text_response: response.response,
28
      image_analysis: imageAnalysis,
29
      combined_context: combinedPrompt
30
    }), {
31
      headers: { 'Content-Type': 'application/json' }
32
    });
33

34
  } catch (error) {
35
    console.error('Multi-modal error:', error);
36
    return new Response('Failed to process multi-modal request', { status: 500 });
37
  }
38
}

2. Content Moderation AI#

1
interface ModerationResult {
2
  safe: boolean;
3
  categories: {
4
    hate: number;
5
    harassment: number;
6
    selfHarm: number;
7
    sexual: number;
8
    violence: number;
9
    spam: number;
10
  };
11
  confidence: number;
12
  flagged_content?: string[];
13
}
14

15
async function moderateContent(content: string, ai: Ai): Promise<ModerationResult> {
16
  // Use text classification for content moderation
17
  const response = await ai.run('@cf/huggingface/distilbert-sst-2-int8', {
18
    text: content
19
  });
20

21
  // This is a simplified example - you'd use specialized moderation models
22
  const result: ModerationResult = {
23
    safe: true,
24
    categories: {
25
      hate: 0,
26
      harassment: 0,
27
      selfHarm: 0,
28
      sexual: 0,
29
      violence: 0,
30
      spam: 0,
31
    },
32
    confidence: response.score || 0,
33
  };
34

35
  // Add custom rules
36
  const flaggedWords = ['spam', 'hate', 'violence', 'inappropriate'];
37
  const flaggedContent = flaggedWords.filter(word =>
38
    content.toLowerCase().includes(word)
39
  );
40

41
  if (flaggedContent.length > 0) {
42
    result.safe = false;
43
    result.flagged_content = flaggedContent;
44
    result.categories.spam = flaggedContent.includes('spam') ? 0.8 : 0;
45
  }
46

47
  return result;
48
}
49

50
async function handleContentModeration(request: Request, ai: Ai, env: Env): Promise<Response> {
51
  try {
52
    const { content, userId, contextId } = await request.json();
53

54
    const moderation = await moderateContent(content, ai);
55

56
    // Log moderation result
57
    if (env.DB) {
58
      await env.DB.prepare(`
59
        INSERT INTO moderation_logs
60
        (user_id, context_id, content, safe, categories, timestamp)
61
        VALUES (?, ?, ?, ?, ?, ?)
62
      `).bind(
63
        userId,
64
        contextId,
65
        content,
66
        moderation.safe,
67
        JSON.stringify(moderation.categories),
68
        new Date().toISOString()
69
      ).run();
70
    }
71

72
    return new Response(JSON.stringify({
73
      safe: moderation.safe,
74
      confidence: moderation.confidence,
75
      action: moderation.safe ? 'allow' : 'block',
76
      categories: moderation.categories,
77
      flagged_content: moderation.flagged_content
78
    }), {
79
      headers: { 'Content-Type': 'application/json' }
80
    });
81

82
  } catch (error) {
83
    console.error('Moderation error:', error);
84
    return new Response(JSON.stringify({
85
      safe: false,
86
      action: 'block',
87
      error: 'Moderation failed'
88
    }), {
89
      status: 500,
90
      headers: { 'Content-Type': 'application/json' }
91
    });
92
  }
93
}

3. AI-Powered Code Generation#

1
async function generateCode(
2
  prompt: string,
3
  language: string,
4
  ai: Ai
5
): Promise<string> {
6
  const systemPrompt = `You are an expert programmer. Generate clean, well-commented code in ${language}.
7
Include error handling and follow best practices.`;
8

9
  const response = await ai.run('@cf/meta/llama-2-7b-chat-int8', {
10
    messages: [
11
      { role: 'system', content: systemPrompt },
12
      { role: 'user', content: prompt }
13
    ],
14
    max_tokens: 1024,
15
    temperature: 0.3,
16
  });
17

18
  return response.response;
19
}
20

21
async function handleCodeGeneration(request: Request, ai: Ai, env: Env): Promise<Response> {
22
  try {
23
    const { prompt, language = 'javascript', validate = false } = await request.json();
24

25
    const code = await generateCode(prompt, language, ai);
26

27
    let validation = null;
28
    if (validate) {
29
      // Optional: validate generated code
30
      validation = await validateCode(code, language, ai);
31
    }
32

33
    // Cache generated code
34
    if (env.CACHE) {
35
      const cacheKey = `code:${await hashString(prompt + language)}`;
36
      await env.CACHE.put(cacheKey, JSON.stringify({ code, validation }), {
37
        expirationTtl: 3600,
38
      });
39
    }
40

41
    return new Response(JSON.stringify({
42
      prompt,
43
      language,
44
      code,
45
      validation,
46
      generated_at: new Date().toISOString()
47
    }), {
48
      headers: { 'Content-Type': 'application/json' }
49
    });
50

51
  } catch (error) {
52
    console.error('Code generation error:', error);
53
    return new Response(JSON.stringify({
54
      error: 'Failed to generate code'
55
    }), {
56
      status: 500,
57
      headers: { 'Content-Type': 'application/json' }
58
    });
59
  }
60
}
61

62
async function validateCode(code: string, language: string, ai: Ai): Promise<any> {
63
  const validationPrompt = `Analyze the following ${language} code for:
64
1. Syntax errors
65
2. Best practices
66
3. Security issues
67
4. Performance concerns
68

69
Code:
70
\`\`\`${language}
71
${code}
72
\`\`\`
73

74
Provide a JSON response with issues found.`;
75

76
  const response = await ai.run('@cf/meta/llama-2-7b-chat-int8', {
77
    messages: [{ role: 'user', content: validationPrompt }],
78
    max_tokens: 512,
79
    temperature: 0.1,
80
  });
81

82
  try {
83
    return JSON.parse(response.response);
84
  } catch {
85
    return { analysis: response.response };
86
  }
87
}

Performance and Optimization#

1. Caching Strategies#

1
class AICache {
2
  private kv: KVNamespace;
3
  private ttl: number;
4

5
  constructor(kv: KVNamespace, ttl: number = 3600) {
6
    this.kv = kv;
7
    this.ttl = ttl;
8
  }
9

10
  async get<T>(key: string): Promise<T | null> {
11
    const cached = await this.kv.get(key, { type: 'json' });
12
    return cached as T | null;
13
  }
14

15
  async set<T>(key: string, value: T, customTtl?: number): Promise<void> {
16
    await this.kv.put(key, JSON.stringify(value), {
17
      expirationTtl: customTtl || this.ttl
18
    });
19
  }
20

21
  async getOrSet<T>(
22
    key: string,
23
    factory: () => Promise<T>,
24
    customTtl?: number
25
  ): Promise<T> {
26
    const cached = await this.get<T>(key);
27
    if (cached) return cached;
28

29
    const value = await factory();
30
    await this.set(key, value, customTtl);
31
    return value;
32
  }
33

34
  generateKey(prefix: string, ...parts: string[]): string {
35
    return `${prefix}:${parts.join(':')}`;
36
  }
37
}
38

39
// Usage in AI endpoints
40
async function cachedTextGeneration(prompt: string, ai: Ai, cache: AICache): Promise<string> {
41
  const cacheKey = cache.generateKey('text', await hashString(prompt));
42

43
  return cache.getOrSet(cacheKey, async () => {
44
    const response = await ai.run('@cf/meta/llama-2-7b-chat-int8', {
45
      messages: [{ role: 'user', content: prompt }],
46
      max_tokens: 256,
47
      temperature: 0.7,
48
    });
49

50
    return response.response;
51
  }, 7200); // Cache for 2 hours
52
}

2. Request Batching#

1
class RequestBatcher {
2
  private batches: Map<string, any[]> = new Map();
3
  private timers: Map<string, any> = new Map();
4
  private readonly batchSize: number;
5
  private readonly batchTimeout: number;
6

7
  constructor(batchSize: number = 10, batchTimeout: number = 100) {
8
    this.batchSize = batchSize;
9
    this.batchTimeout = batchTimeout;
10
  }
11

12
  async batch<T, R>(
13
    key: string,
14
    item: T,
15
    processor: (items: T[]) => Promise<R[]>
16
  ): Promise<R> {
17
    return new Promise((resolve, reject) => {
18
      // Get or create batch
19
      if (!this.batches.has(key)) {
20
        this.batches.set(key, []);
21
      }
22

23
      const batch = this.batches.get(key)!;
24
      batch.push({ item, resolve, reject });
25

26
      // Process if batch is full
27
      if (batch.length >= this.batchSize) {
28
        this.processBatch(key, processor);
29
        return;
30
      }
31

32
      // Set timer for batch timeout
33
      if (!this.timers.has(key)) {
34
        const timer = setTimeout(() => {
35
          this.processBatch(key, processor);
36
        }, this.batchTimeout);
37

38
        this.timers.set(key, timer);
39
      }
40
    });
41
  }
42

43
  private async processBatch<T, R>(
44
    key: string,
45
    processor: (items: T[]) => Promise<R[]>
46
  ): Promise<void> {
47
    const batch = this.batches.get(key);
48
    if (!batch || batch.length === 0) return;
49

50
    // Clear batch and timer
51
    this.batches.delete(key);
52
    const timer = this.timers.get(key);
53
    if (timer) {
54
      clearTimeout(timer);
55
      this.timers.delete(key);
56
    }
57

58
    try {
59
      const items = batch.map(b => b.item);
60
      const results = await processor(items);
61

62
      // Resolve all promises
63
      batch.forEach((b, index) => {
64
        if (results[index] !== undefined) {
65
          b.resolve(results[index]);
66
        } else {
67
          b.reject(new Error('No result for item'));
68
        }
69
      });
70
    } catch (error) {
71
      // Reject all promises
72
      batch.forEach(b => b.reject(error));
73
    }
74
  }
75
}
76

77
// Usage
78
const embeddingBatcher = new RequestBatcher(20, 50); // Batch 20 items, 50ms timeout
79

80
async function batchedEmbedding(text: string, ai: Ai): Promise<number[]> {
81
  return embeddingBatcher.batch('embeddings', text, async (texts: string[]) => {
82
    return generateMultipleEmbeddings(texts, ai);
83
  });
84
}

3. Rate Limiting and Circuit Breaker#

1
class RateLimiter {
2
  private requests: Map<string, number[]> = new Map();
3
  private readonly windowMs: number;
4
  private readonly maxRequests: number;
5

6
  constructor(maxRequests: number, windowMs: number) {
7
    this.maxRequests = maxRequests;
8
    this.windowMs = windowMs;
9
  }
10

11
  async checkLimit(key: string): Promise<boolean> {
12
    const now = Date.now();
13
    const windowStart = now - this.windowMs;
14

15
    // Get request times for this key
16
    const requests = this.requests.get(key) || [];
17

18
    // Remove old requests
19
    const validRequests = requests.filter(time => time > windowStart);
20

21
    // Check if limit exceeded
22
    if (validRequests.length >= this.maxRequests) {
23
      return false;
24
    }
25

26
    // Add current request
27
    validRequests.push(now);
28
    this.requests.set(key, validRequests);
29

30
    return true;
31
  }
32

33
  getRemaining(key: string): number {
34
    const now = Date.now();
35
    const windowStart = now - this.windowMs;
36
    const requests = this.requests.get(key) || [];
37
    const validRequests = requests.filter(time => time > windowStart);
38

39
    return Math.max(0, this.maxRequests - validRequests.length);
40
  }
41
}
42

43
class CircuitBreaker {
44
  private failures: number = 0;
45
  private lastFailureTime: number = 0;
46
  private state: 'closed' | 'open' | 'half-open' = 'closed';
47

48
  constructor(
49
    private readonly failureThreshold: number = 5,
50
    private readonly recoveryTimeout: number = 60000
51
  ) {}
52

53
  async execute<T>(operation: () => Promise<T>): Promise<T> {
54
    if (this.state === 'open') {
55
      if (Date.now() - this.lastFailureTime > this.recoveryTimeout) {
56
        this.state = 'half-open';
57
      } else {
58
        throw new Error('Circuit breaker is open');
59
      }
60
    }
61

62
    try {
63
      const result = await operation();
64
      this.onSuccess();
65
      return result;
66
    } catch (error) {
67
      this.onFailure();
68
      throw error;
69
    }
70
  }
71

72
  private onSuccess(): void {
73
    this.failures = 0;
74
    this.state = 'closed';
75
  }
76

77
  private onFailure(): void {
78
    this.failures++;
79
    this.lastFailureTime = Date.now();
80

81
    if (this.failures >= this.failureThreshold) {
82
      this.state = 'open';
83
    }
84
  }
85
}

Deployment and Monitoring#

1. Environment Configuration#

1
# wrangler.toml for production
2
name = "ai-app-prod"
3
main = "src/index.ts"
4
compatibility_date = "2025-01-10"
5

6
[ai]
7
binding = "AI"
8

9
[[d1_databases]]
10
binding = "DB"
11
database_name = "ai-prod-db"
12
database_id = "prod-db-id"
13

14
[[kv_namespaces]]
15
binding = "CACHE"
16
id = "prod-cache-id"
17
preview_id = "dev-cache-id"
18

19
[[r2_buckets]]
20
binding = "STORAGE"
21
bucket_name = "ai-prod-storage"
22
preview_bucket_name = "ai-dev-storage"
23

24
[env.production.vars]
25
ENVIRONMENT = "production"
26
LOG_LEVEL = "info"
27
RATE_LIMIT_MAX = "1000"
28
RATE_LIMIT_WINDOW = "3600000"
29

30
# Staging environment
31
[env.staging]
32
name = "ai-app-staging"
33

34
[env.staging.vars]
35
ENVIRONMENT = "staging"
36
LOG_LEVEL = "debug"
37
RATE_LIMIT_MAX = "100"
38

39
# Development environment
40
[env.development]
41
name = "ai-app-dev"
42

43
[env.development.vars]
44
ENVIRONMENT = "development"
45
LOG_LEVEL = "debug"
46
RATE_LIMIT_MAX = "50"

2. Monitoring and Logging#

1
interface LogEntry {
2
  timestamp: string;
3
  level: 'debug' | 'info' | 'warn' | 'error';
4
  message: string;
5
  metadata?: any;
6
  requestId?: string;
7
  userId?: string;
8
}
9

10
class Logger {
11
  constructor(
12
    private readonly level: string = 'info',
13
    private readonly requestId?: string
14
  ) {}
15

16
  private shouldLog(level: string): boolean {
17
    const levels = ['debug', 'info', 'warn', 'error'];
18
    return levels.indexOf(level) >= levels.indexOf(this.level);
19
  }
20

21
  private log(level: 'debug' | 'info' | 'warn' | 'error', message: string, metadata?: any): void {
22
    if (!this.shouldLog(level)) return;
23

24
    const entry: LogEntry = {
25
      timestamp: new Date().toISOString(),
26
      level,
27
      message,
28
      metadata,
29
      requestId: this.requestId,
30
    };
31

32
    console.log(JSON.stringify(entry));
33
  }
34

35
  debug(message: string, metadata?: any): void {
36
    this.log('debug', message, metadata);
37
  }
38

39
  info(message: string, metadata?: any): void {
40
    this.log('info', message, metadata);
41
  }
42

43
  warn(message: string, metadata?: any): void {
44
    this.log('warn', message, metadata);
45
  }
46

47
  error(message: string, metadata?: any): void {
48
    this.log('error', message, metadata);
49
  }
50
}
51

52
// Usage metrics
53
class MetricsCollector {
54
  private metrics: Map<string, number> = new Map();
55

56
  increment(metric: string, value: number = 1): void {
57
    const current = this.metrics.get(metric) || 0;
58
    this.metrics.set(metric, current + value);
59
  }
60

61
  gauge(metric: string, value: number): void {
62
    this.metrics.set(metric, value);
63
  }
64

65
  timing(metric: string, duration: number): void {
66
    this.metrics.set(`${metric}_duration`, duration);
67
  }
68

69
  getMetrics(): Record<string, number> {
70
    return Object.fromEntries(this.metrics);
71
  }
72

73
  reset(): void {
74
    this.metrics.clear();
75
  }
76
}
77

78
// Middleware for monitoring
79
async function withMonitoring(
80
  request: Request,
81
  handler: (request: Request, logger: Logger, metrics: MetricsCollector) => Promise<Response>
82
): Promise<Response> {
83
  const requestId = crypto.randomUUID();
84
  const logger = new Logger('info', requestId);
85
  const metrics = new MetricsCollector();
86
  const startTime = Date.now();
87

88
  logger.info('Request started', {
89
    method: request.method,
90
    url: request.url,
91
    headers: Object.fromEntries(request.headers.entries())
92
  });
93

94
  try {
95
    const response = await handler(request, logger, metrics);
96
    const duration = Date.now() - startTime;
97

98
    metrics.timing('request', duration);
99
    metrics.increment('requests_success');
100

101
    logger.info('Request completed', {
102
      status: response.status,
103
      duration,
104
      metrics: metrics.getMetrics()
105
    });
106

107
    // Add monitoring headers
108
    response.headers.set('X-Request-ID', requestId);
109
    response.headers.set('X-Response-Time', String(duration));
110

111
    return response;
112
  } catch (error) {
113
    const duration = Date.now() - startTime;
114

115
    metrics.timing('request', duration);
116
    metrics.increment('requests_error');
117

118
    logger.error('Request failed', {
119
      error: error.message,
120
      stack: error.stack,
121
      duration,
122
      metrics: metrics.getMetrics()
123
    });
124

125
    throw error;
126
  }
127
}

3. Health Checks and Diagnostics#

1
async function handleHealthCheck(request: Request, env: Env): Promise<Response> {
2
  const health = {
3
    status: 'healthy',
4
    timestamp: new Date().toISOString(),
5
    checks: {} as any
6
  };
7

8
  // Check AI service
9
  try {
10
    await env.AI.run('@cf/meta/llama-2-7b-chat-int8', {
11
      messages: [{ role: 'user', content: 'test' }],
12
      max_tokens: 1
13
    });
14
    health.checks.ai = { status: 'healthy' };
15
  } catch (error) {
16
    health.checks.ai = { status: 'unhealthy', error: error.message };
17
    health.status = 'degraded';
18
  }
19

20
  // Check database
21
  if (env.DB) {
22
    try {
23
      await env.DB.prepare('SELECT 1').first();
24
      health.checks.database = { status: 'healthy' };
25
    } catch (error) {
26
      health.checks.database = { status: 'unhealthy', error: error.message };
27
      health.status = 'degraded';
28
    }
29
  }
30

31
  // Check cache
32
  if (env.CACHE) {
33
    try {
34
      await env.CACHE.put('health-check', 'test', { expirationTtl: 60 });
35
      await env.CACHE.get('health-check');
36
      health.checks.cache = { status: 'healthy' };
37
    } catch (error) {
38
      health.checks.cache = { status: 'unhealthy', error: error.message };
39
      health.status = 'degraded';
40
    }
41
  }
42

43
  const statusCode = health.status === 'healthy' ? 200 : 503;
44

45
  return new Response(JSON.stringify(health, null, 2), {
46
    status: statusCode,
47
    headers: { 'Content-Type': 'application/json' }
48
  });
49
}

Security Best Practices#

1. Input Validation and Sanitization#

1
import { z } from 'zod';
2

3
// Validation schemas
4
const ChatRequestSchema = z.object({
5
  message: z.string().min(1).max(4000),
6
  conversationId: z.string().uuid().optional(),
7
  model: z.string().regex(/^@cf\//).optional(),
8
});
9

10
const ImageGenerationSchema = z.object({
11
  prompt: z.string().min(1).max(1000),
12
  negative_prompt: z.string().max(1000).optional(),
13
  steps: z.number().int().min(1).max(50).optional(),
14
  guidance: z.number().min(1.0).max(20.0).optional(),
15
});
16

17
// Sanitization functions
18
function sanitizeText(text: string): string {
19
  return text
20
    .replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, '')
21
    .replace(/<[^>]+>/g, '')
22
    .trim();
23
}
24

25
function validateAndSanitizeInput<T>(
26
  data: unknown,
27
  schema: z.ZodSchema<T>
28
): T {
29
  const validated = schema.parse(data);
30

31
  // Sanitize string fields
32
  if (typeof validated === 'object' && validated !== null) {
33
    Object.keys(validated).forEach(key => {
34
      if (typeof validated[key] === 'string') {
35
        validated[key] = sanitizeText(validated[key]);
36
      }
37
    });
38
  }
39

40
  return validated;
41
}

2. Rate Limiting and Abuse Prevention#

1
class AdvancedRateLimiter {
2
  constructor(
3
    private kv: KVNamespace,
4
    private readonly limits: {
5
      requests: { window: number; max: number };
6
      tokens: { window: number; max: number };
7
      cost: { window: number; max: number };
8
    }
9
  ) {}
10

11
  async checkLimits(
12
    key: string,
13
    cost: number = 1,
14
    tokens: number = 0
15
  ): Promise<{ allowed: boolean; details: any }> {
16
    const now = Date.now();
17
    const promises = [
18
      this.checkLimit(`requests:${key}`, this.limits.requests, now),
19
      this.checkLimit(`tokens:${key}`, this.limits.tokens, now, tokens),
20
      this.checkLimit(`cost:${key}`, this.limits.cost, now, cost),
21
    ];
22

23
    const [requestsResult, tokensResult, costResult] = await Promise.all(promises);
24

25
    const allowed = requestsResult.allowed && tokensResult.allowed && costResult.allowed;
26

27
    return {
28
      allowed,
29
      details: {
30
        requests: requestsResult,
31
        tokens: tokensResult,
32
        cost: costResult,
33
      }
34
    };
35
  }
36

37
  private async checkLimit(
38
    key: string,
39
    limit: { window: number; max: number },
40
    now: number,
41
    increment: number = 1
42
  ): Promise<{ allowed: boolean; current: number; remaining: number; resetTime: number }> {
43
    const windowStart = now - limit.window;
44
    const windowKey = `${key}:${Math.floor(now / limit.window)}`;
45

46
    const current = await this.kv.get(windowKey) || '0';
47
    const currentValue = parseInt(current) + increment;
48

49
    const allowed = currentValue <= limit.max;
50
    const remaining = Math.max(0, limit.max - currentValue);
51
    const resetTime = Math.ceil(now / limit.window) * limit.window;
52

53
    if (allowed) {
54
      await this.kv.put(windowKey, String(currentValue), {
55
        expirationTtl: Math.ceil(limit.window / 1000)
56
      });
57
    }
58

59
    return {
60
      allowed,
61
      current: currentValue,
62
      remaining,
63
      resetTime,
64
    };
65
  }
66
}

3. Content Security and Filtering#

1
class ContentFilter {
2
  private readonly bannedPatterns: RegExp[];
3
  private readonly sensitivePatterns: RegExp[];
4

5
  constructor() {
6
    this.bannedPatterns = [
7
      /\b(hack|exploit|malware)\b/i,
8
      /\b(password|token|secret)\s*[:=]\s*\S+/i,
9
      /<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi,
10
    ];
11

12
    this.sensitivePatterns = [
13
      /\b\d{3}-\d{2}-\d{4}\b/, // SSN
14
      /\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/, // Credit card
15
      /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/, // Email
16
    ];
17
  }
18

19
  filterContent(content: string): {
20
    safe: boolean;
21
    filtered: string;
22
    issues: string[];
23
  } {
24
    const issues: string[] = [];
25
    let filtered = content;
26

27
    // Check for banned content
28
    for (const pattern of this.bannedPatterns) {
29
      if (pattern.test(content)) {
30
        issues.push('Contains prohibited content');
31
        filtered = filtered.replace(pattern, '[FILTERED]');
32
      }
33
    }
34

35
    // Check for sensitive data
36
    for (const pattern of this.sensitivePatterns) {
37
      if (pattern.test(content)) {
38
        issues.push('Contains sensitive information');
39
        filtered = filtered.replace(pattern, '[REDACTED]');
40
      }
41
    }
42

43
    return {
44
      safe: issues.length === 0,
45
      filtered,
46
      issues,
47
    };
48
  }
49
}

Conclusion#

Cloudflare Workers AI democratizes access to artificial intelligence by providing:

Global edge deployment of AI models
Zero cold start latency for instant responses
Cost-effective pricing with pay-per-use model
Easy integration with existing Workers ecosystem
Privacy-first approach with data processing at the edge

Key Benefits#

Performance - Sub-100ms AI inference globally
Scalability - Handle millions of requests automatically
Cost Efficiency - No idle costs, only pay for usage
Developer Experience - Simple API, powerful capabilities
Privacy - Data never leaves Cloudflare’s network

Getting Started Checklist#

Set up Cloudflare Workers AI account
Choose appropriate models for your use case
Implement caching and rate limiting
Add monitoring and logging
Test performance and accuracy
Deploy with proper security measures
Monitor usage and costs
Optimize for production workloads

Future Possibilities#

Custom model deployment - Train and deploy your own models
Multi-modal AI - Combined text, image, and audio processing
Real-time streaming - Live AI interactions
Edge fine-tuning - Adapt models to specific use cases
Federated learning - Collaborative model training