Claude API: Vision + Computer Use — Function Reference

Source files:

  • vision/engine.js — core image analysis with Claude
  • vision/advanced.js (842 LOC) — object detection, face analysis, scene classification
  • vision/analyzer.js — multi-type analysis orchestrator
  • vision/storage.js — image buffer + analysis cache
  • vision/preprocessor.js — image format normalization
  • vision/batch-vision.js — batch image processing
  • vision/pdf-vision.js — PDF visual analysis
  • computer/beta-session.js (831 LOC) — Anthropic Computer Use beta session manager
  • computer/beta-api.js (746 LOC) — Computer Use API client
  • computer/tools.js (677 LOC) — computer use tool definitions
  • computer/session.js, computer/actions.js, computer/safety.js, computer/screenshot.js

Exported Functions — vision/engine.js

analyzeImage(imageId, analysisType, options): Promise<AnalysisResult>

Purpose: Send image to Claude API for analysis using type-specific prompts. Parameters:

  • imageId (string): Stored image ID
  • analysisType (ANALYSIS_TYPES): general ocr chart objects face scene color qa
  • useCache (boolean): Return cached result if available (default: true)
  • question (string null): For QA type — the question to ask

Returns: { imageId, analysisType, result, confidence, model, tokensUsed, processingTimeMs, cached } Anthropic SDK call: Direct fetch to /messages endpoint with image block in messages content. Notes for rewrite: Uses fetch() directly with x-api-key header, not the Anthropic SDK client class. Caches results via storeAnalysis().


batchAnalyzeImages(imageIds, analysisType, options): Promise<BatchResult>

Purpose: Analyze multiple images with configurable concurrency. Parameters:

  • imageIds (string[]): Array of image IDs
  • concurrency (number): Parallel requests (default: 3)

Returns: { success, completed, failed, results, errors }


compareImages(imageId1, imageId2, options): Promise<ComparisonResult>

Purpose: Send two images to Claude for side-by-side comparison. Parameters:

  • detailed (boolean): Request detailed comparison (default: false)

Returns: Comparison analysis with similarities, differences, description.


Constants: ANALYSIS_TYPES

| Value | Prompt focus | |——-|————-| | GENERAL | Full description, composition, text | | OCR | Text extraction with location hints | | CHART | Data extraction, chart type, JSON data | | OBJECTS | Object names, counts, locations | | FACE | Age, gender, emotion, orientation (objective only) | | SCENE | Indoor/outdoor, time of day, weather, mood | | COLOR | Dominant colors, palette, temperature, harmony | | QA | Answer a specific question about the image |


Exported Functions — vision/advanced.js

detectObjectsWithBoundingBoxes(imageId, options): Promise<DetectionResult>

Purpose: Detect objects and generate bounding box coordinates. Parameters:

  • confidenceThreshold (number): Min confidence to include (default: 0.5)
  • maxObjects (number): Max objects to return (default: 50)
  • categories (string[] null): Filter to categories: person, vehicle, animal, furniture, electronics, nature, food, building, clothing

Returns: { success, imageId, totalDetected, objects: [{id, label, category, confidence, boundingBox: {x,y,width,height}, count}], processingTimeMs } Notes for rewrite: Bounding boxes are SYNTHETIC — generated from location hints in Claude’s text response (not real CV model). Replace generateSyntheticBoundingBox() with YOLO/Detectron2 in production.


analyzeFaces(imageId, options): Promise<FaceResult>

Purpose: Detect and analyze faces with optional emotion/attribute analysis. Parameters:

  • includeEmotions (boolean): Include emotion detection (default: true)
  • includeAttributes (boolean): Include attributes (default: true)
  • includeLandmarks (boolean): Include facial landmarks (default: false)

Returns: { success, imageId, faceCount, faces: [{id, ageRange, gender, emotions, orientation, features, boundingBox, landmarks?}], confidence, processingTimeMs } Notes for rewrite: Landmarks are also SYNTHETIC (generateSyntheticLandmarks()). All analysis is Claude-text-parsed.


classifySceneAdvanced(imageId, options): Promise<SceneResult>

Purpose: Advanced scene classification with environment attributes. Parameters:

  • includeAttributes (boolean): Include scene attributes (default: true)
  • includeActivities (boolean): Include activity detection (default: false)

Returns: { success, imageId, classification: {primaryCategory, secondaryCategories, indoor, timeOfDay, weather, location, mood, confidence}, processingTimeMs }


extractTextFromImage(imageId, options): Promise<OCRResult>

Purpose: OCR extraction with position and style information. Returns: { success, imageId, text, blocks: [{text, location, style, confidence}], processingTimeMs }


analyzeChartData(imageId, options): Promise<ChartResult>

Purpose: Extract chart/graph data including type, labels, and structured data. Returns: { success, imageId, chartType, title, data, trends, processingTimeMs }


analyzeImageColors(imageId, options): Promise<ColorResult>

Purpose: Color analysis including dominant colors and palette. Returns: { success, imageId, analysis: {dominant, palette, temperature, contrast, harmony}, processingTimeMs }


compareImagesAdvanced(imageId1, imageId2, options): Promise<AdvancedComparisonResult>

Purpose: Detailed comparison with similarity score and diff highlights.


Exported Class — BetaComputerSession (computer/beta-session.js)

constructor(config)

Parameters:

  • displayWidth (number): Screen width (default: 1920)
  • displayHeight (number): Screen height (default: 1080)
  • maxActions (number): Action limit per session (default: 100)
  • maxDurationMs (number): Max session duration (default: 5 min)
  • maxScreenshots (number): Screenshot limit (default: 50)
  • captureScreenshots (boolean): Auto-screenshot (default: true)
  • autoPauseOnError (boolean): Pause on action error (default: false)
  • description (string null)

start(): Promise<StartResult>

Purpose: Initialize session, check beta API availability, capture initial screenshot. Returns: { success, sessionId, startTime, config }


end(): Promise<EndResult>

Purpose: Finalize session, capture final screenshot, sync action history. Returns: { success, sessionId, duration, summary }


pause(): PauseResult

Purpose: Pause active session (sync, not async).


resume(): ResumeResult

Purpose: Resume paused session, updates totalPausedDuration.


executeAction(actionType, params): Promise<ActionResult>

Purpose: Execute a single computer use action with safety validation. Parameters:

  • actionType (BetaActionTypes): screenshot mouse_move left_click right_click double_click type key scroll
  • params (object): Action-specific parameters

Returns: { success, sessionId, actionNumber, ...actionData } Notes for rewrite: Validates params via validateBetaActionParams(). Checks session limits before every action. Tracks cursor positions for mouse actions.


executeSequence(actions, options): Promise<SequenceResult>

Purpose: Execute multiple actions sequentially with delay between each. Parameters:

  • actions (Array): [{ action: actionType, params: {} }]
  • continueOnError (boolean): Continue if an action fails (default: false)
  • delayBetweenActions (number): ms between actions (default: 100)

Returns: { success, results, totalActions, completedActions, failedActions, duration }


getSummary(): SessionSummary

Purpose: Return session statistics without ending it. Returns: { sessionId, state, duration, actionCount, screenshotCount, errorCount, errors }


getActionHistory(): ActionRecord[]

Purpose: Get full action history from the BetaAPIClient.


checkLimits(): LimitCheckResult

Returns: { limited: boolean, reason?: string }


Constants: BetaSessionStates

IDLE, ACTIVE, PAUSED, ENDED, ERROR

Constants: BETA_SAFETY_LIMITS

  • maxActions: 1000
  • maxDurationMs: 3600000 (1 hour)
  • maxScreenshots: 200
  • maxTextLength: 10000
  • maxWaitTime: 60000
  • maxCoordinate: 4096

Exported Functions — computer/beta-api.js

BetaAPIClient (class)

Wraps Anthropic’s computer use beta API:

executeAction(actionType, params): Promise<ActionResult>

Anthropic SDK call: client.beta.messages.create() with computer_use_20241022 tool

screenshot(): Promise<ScreenshotResult>

Returns: { success, data: base64String, mimeType: "image/png" }

getActionHistory(): ActionRecord[]


isBetaAPIAvailable(): Promise<boolean>

Purpose: Check if the computer use beta API is accessible.


Constants: BetaActionTypes

SCREENSHOT, MOUSE_MOVE, LEFT_CLICK, RIGHT_CLICK, DOUBLE_CLICK, TYPE, KEY, SCROLL


Exported Functions — computer/tools.js

validateBetaActionParams(actionType, params): ValidationResult

Purpose: Validate action parameters before API call.

convertBetaToInternal(actionType, params): InternalParams

Purpose: Convert beta API params to internal action format.

getBetaToolDefinitions(): ToolDefinition[]

Purpose: Return Anthropic-compatible tool definition objects for computer use. Notes for rewrite: These are the tool schemas passed to anthropic-beta: computer-use-2024-10-22.


Key Data Structures

Structure Fields Purpose
AnalysisResult imageId, analysisType, result, confidence, model, tokensUsed, processingTimeMs, cached Image analysis
DetectedObject id, label, category, confidence, boundingBox, count Object detection result
BoundingBox x, y, width, height Normalized coordinates 0-1
FaceResult ageRange, gender, emotions[], orientation, features[], position Face analysis
ActionRecord actionType, params, result, timestamp, actionNumber Computer use action log

External Dependencies (vision/engine.js)

  • fetch() (Node.js built-in) — direct HTTP to Anthropic API
  • @modelcontextprotocol/sdk/types.jsMcpError, ErrorCode
  • ../../config/claude.js — API key + URL constants

External Dependencies (computer/)

  • Anthropic SDK: client.beta.messages.create() with anthropic-beta: computer-use-2024-10-22 header

Notes for Rewrite

  • Vision engine.js uses raw fetch() not the Anthropic SDK — refactor to use ClaudeClient from core/client.js.
  • Bounding boxes in advanced.js are SYNTHETIC — production needs a real CV model.
  • BetaComputerSession creates a BetaAPIClient internally — cannot inject a custom client.
  • Computer use requires anthropic-beta: computer-use-2024-10-22 header.
  • Safety limits are constants, not configurable per-deployment — consider making them env-var driven.
  • vision/storage.js stores image buffers in memory (Map) — add persistent storage for production.

Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.