Claude API: Vision + Computer Use — Function Reference

Source files:

vision/engine.js — core image analysis with Claude
vision/advanced.js (842 LOC) — object detection, face analysis, scene classification
vision/analyzer.js — multi-type analysis orchestrator
vision/storage.js — image buffer + analysis cache
vision/preprocessor.js — image format normalization
vision/batch-vision.js — batch image processing
vision/pdf-vision.js — PDF visual analysis
computer/beta-session.js (831 LOC) — Anthropic Computer Use beta session manager
computer/beta-api.js (746 LOC) — Computer Use API client
computer/tools.js (677 LOC) — computer use tool definitions
computer/session.js, computer/actions.js, computer/safety.js, computer/screenshot.js

Exported Functions — vision/engine.js

analyzeImage(imageId, analysisType, options): Promise<AnalysisResult>

Purpose: Send image to Claude API for analysis using type-specific prompts. Parameters:

imageId (string): Stored image ID
analysisType (ANALYSIS_TYPES): general ocr chart objects face scene color qa
useCache (boolean): Return cached result if available (default: true)
question (string null): For QA type — the question to ask

Returns: { imageId, analysisType, result, confidence, model, tokensUsed, processingTimeMs, cached } Anthropic SDK call: Direct fetch to /messages endpoint with image block in messages content. Notes for rewrite: Uses fetch() directly with x-api-key header, not the Anthropic SDK client class. Caches results via storeAnalysis().

batchAnalyzeImages(imageIds, analysisType, options): Promise<BatchResult>

Purpose: Analyze multiple images with configurable concurrency. Parameters:

imageIds (string[]): Array of image IDs
concurrency (number): Parallel requests (default: 3)

Returns: { success, completed, failed, results, errors }

compareImages(imageId1, imageId2, options): Promise<ComparisonResult>

Purpose: Send two images to Claude for side-by-side comparison. Parameters:

detailed (boolean): Request detailed comparison (default: false)

Returns: Comparison analysis with similarities, differences, description.

Constants: ANALYSIS_TYPES

Exported Functions — vision/advanced.js

detectObjectsWithBoundingBoxes(imageId, options): Promise<DetectionResult>

Purpose: Detect objects and generate bounding box coordinates. Parameters:

confidenceThreshold (number): Min confidence to include (default: 0.5)
maxObjects (number): Max objects to return (default: 50)
categories (string[] null): Filter to categories: person, vehicle, animal, furniture, electronics, nature, food, building, clothing

Returns: { success, imageId, totalDetected, objects: [{id, label, category, confidence, boundingBox: {x,y,width,height}, count}], processingTimeMs } Notes for rewrite: Bounding boxes are SYNTHETIC — generated from location hints in Claude’s text response (not real CV model). Replace generateSyntheticBoundingBox() with YOLO/Detectron2 in production.

analyzeFaces(imageId, options): Promise<FaceResult>

Purpose: Detect and analyze faces with optional emotion/attribute analysis. Parameters:

includeEmotions (boolean): Include emotion detection (default: true)
includeAttributes (boolean): Include attributes (default: true)
includeLandmarks (boolean): Include facial landmarks (default: false)

Returns: { success, imageId, faceCount, faces: [{id, ageRange, gender, emotions, orientation, features, boundingBox, landmarks?}], confidence, processingTimeMs } Notes for rewrite: Landmarks are also SYNTHETIC (generateSyntheticLandmarks()). All analysis is Claude-text-parsed.

classifySceneAdvanced(imageId, options): Promise<SceneResult>

Purpose: Advanced scene classification with environment attributes. Parameters:

includeAttributes (boolean): Include scene attributes (default: true)
includeActivities (boolean): Include activity detection (default: false)

Returns: { success, imageId, classification: {primaryCategory, secondaryCategories, indoor, timeOfDay, weather, location, mood, confidence}, processingTimeMs }

extractTextFromImage(imageId, options): Promise<OCRResult>

Purpose: OCR extraction with position and style information. Returns: { success, imageId, text, blocks: [{text, location, style, confidence}], processingTimeMs }

analyzeChartData(imageId, options): Promise<ChartResult>

Purpose: Extract chart/graph data including type, labels, and structured data. Returns: { success, imageId, chartType, title, data, trends, processingTimeMs }

analyzeImageColors(imageId, options): Promise<ColorResult>

Purpose: Color analysis including dominant colors and palette. Returns: { success, imageId, analysis: {dominant, palette, temperature, contrast, harmony}, processingTimeMs }

compareImagesAdvanced(imageId1, imageId2, options): Promise<AdvancedComparisonResult>

Purpose: Detailed comparison with similarity score and diff highlights.

Exported Class — BetaComputerSession (computer/beta-session.js)

constructor(config)

Parameters:

displayWidth (number): Screen width (default: 1920)
displayHeight (number): Screen height (default: 1080)
maxActions (number): Action limit per session (default: 100)
maxDurationMs (number): Max session duration (default: 5 min)
maxScreenshots (number): Screenshot limit (default: 50)
captureScreenshots (boolean): Auto-screenshot (default: true)
autoPauseOnError (boolean): Pause on action error (default: false)
description (string null)

start(): Promise<StartResult>

Purpose: Initialize session, check beta API availability, capture initial screenshot. Returns: { success, sessionId, startTime, config }

end(): Promise<EndResult>

Purpose: Finalize session, capture final screenshot, sync action history. Returns: { success, sessionId, duration, summary }

pause(): PauseResult

Purpose: Pause active session (sync, not async).

resume(): ResumeResult

Purpose: Resume paused session, updates totalPausedDuration.

executeAction(actionType, params): Promise<ActionResult>

Purpose: Execute a single computer use action with safety validation. Parameters:

actionType (BetaActionTypes): screenshot mouse_move left_click right_click double_click type key scroll

params (object): Action-specific parameters

Returns: { success, sessionId, actionNumber, ...actionData } Notes for rewrite: Validates params via validateBetaActionParams(). Checks session limits before every action. Tracks cursor positions for mouse actions.

executeSequence(actions, options): Promise<SequenceResult>

Purpose: Execute multiple actions sequentially with delay between each. Parameters:

actions (Array): [{ action: actionType, params: {} }]
continueOnError (boolean): Continue if an action fails (default: false)
delayBetweenActions (number): ms between actions (default: 100)

Returns: { success, results, totalActions, completedActions, failedActions, duration }

getSummary(): SessionSummary

Purpose: Return session statistics without ending it. Returns: { sessionId, state, duration, actionCount, screenshotCount, errorCount, errors }

getActionHistory(): ActionRecord[]

Purpose: Get full action history from the BetaAPIClient.

checkLimits(): LimitCheckResult

Returns: { limited: boolean, reason?: string }

Constants: BetaSessionStates

IDLE, ACTIVE, PAUSED, ENDED, ERROR

Constants: BETA_SAFETY_LIMITS

maxActions: 1000
maxDurationMs: 3600000 (1 hour)
maxScreenshots: 200
maxTextLength: 10000
maxWaitTime: 60000
maxCoordinate: 4096

Exported Functions — computer/beta-api.js

BetaAPIClient (class)

Wraps Anthropic’s computer use beta API:

executeAction(actionType, params): Promise<ActionResult>

Anthropic SDK call: client.beta.messages.create() with computer_use_20241022 tool

screenshot(): Promise<ScreenshotResult>

Returns: { success, data: base64String, mimeType: "image/png" }

getActionHistory(): ActionRecord[]

isBetaAPIAvailable(): Promise<boolean>

Purpose: Check if the computer use beta API is accessible.

Constants: BetaActionTypes

SCREENSHOT, MOUSE_MOVE, LEFT_CLICK, RIGHT_CLICK, DOUBLE_CLICK, TYPE, KEY, SCROLL

Exported Functions — computer/tools.js

validateBetaActionParams(actionType, params): ValidationResult

Purpose: Validate action parameters before API call.

convertBetaToInternal(actionType, params): InternalParams

Purpose: Convert beta API params to internal action format.

getBetaToolDefinitions(): ToolDefinition[]

Purpose: Return Anthropic-compatible tool definition objects for computer use. Notes for rewrite: These are the tool schemas passed to anthropic-beta: computer-use-2024-10-22.

Key Data Structures

Structure	Fields	Purpose
AnalysisResult	imageId, analysisType, result, confidence, model, tokensUsed, processingTimeMs, cached	Image analysis
DetectedObject	id, label, category, confidence, boundingBox, count	Object detection result
BoundingBox	x, y, width, height	Normalized coordinates 0-1
FaceResult	ageRange, gender, emotions[], orientation, features[], position	Face analysis
ActionRecord	actionType, params, result, timestamp, actionNumber	Computer use action log

External Dependencies (vision/engine.js)

fetch() (Node.js built-in) — direct HTTP to Anthropic API
@modelcontextprotocol/sdk/types.js — McpError, ErrorCode
../../config/claude.js — API key + URL constants

External Dependencies (computer/)

Anthropic SDK: client.beta.messages.create() with anthropic-beta: computer-use-2024-10-22 header

Notes for Rewrite

Vision engine.js uses raw fetch() not the Anthropic SDK — refactor to use ClaudeClient from core/client.js.
Bounding boxes in advanced.js are SYNTHETIC — production needs a real CV model.
BetaComputerSession creates a BetaAPIClient internally — cannot inject a custom client.
Computer use requires anthropic-beta: computer-use-2024-10-22 header.
Safety limits are constants, not configurable per-deployment — consider making them env-var driven.
vision/storage.js stores image buffers in memory (Map) — add persistent storage for production.