Claude API: Vision + Computer Use — Function Reference
Source files:
vision/engine.js— core image analysis with Claudevision/advanced.js(842 LOC) — object detection, face analysis, scene classificationvision/analyzer.js— multi-type analysis orchestratorvision/storage.js— image buffer + analysis cachevision/preprocessor.js— image format normalizationvision/batch-vision.js— batch image processingvision/pdf-vision.js— PDF visual analysiscomputer/beta-session.js(831 LOC) — Anthropic Computer Use beta session managercomputer/beta-api.js(746 LOC) — Computer Use API clientcomputer/tools.js(677 LOC) — computer use tool definitionscomputer/session.js,computer/actions.js,computer/safety.js,computer/screenshot.js
Exported Functions — vision/engine.js
analyzeImage(imageId, analysisType, options): Promise<AnalysisResult>
Purpose: Send image to Claude API for analysis using type-specific prompts. Parameters:
imageId(string): Stored image ID-
analysisType(ANALYSIS_TYPES): generalocr chart objects face scene color qa useCache(boolean): Return cached result if available (default: true)-
question(stringnull): For QA type — the question to ask
Returns: { imageId, analysisType, result, confidence, model, tokensUsed, processingTimeMs, cached }
Anthropic SDK call: Direct fetch to /messages endpoint with image block in messages content.
Notes for rewrite: Uses fetch() directly with x-api-key header, not the Anthropic SDK client class. Caches results via storeAnalysis().
batchAnalyzeImages(imageIds, analysisType, options): Promise<BatchResult>
Purpose: Analyze multiple images with configurable concurrency. Parameters:
imageIds(string[]): Array of image IDsconcurrency(number): Parallel requests (default: 3)
Returns: { success, completed, failed, results, errors }
compareImages(imageId1, imageId2, options): Promise<ComparisonResult>
Purpose: Send two images to Claude for side-by-side comparison. Parameters:
detailed(boolean): Request detailed comparison (default: false)
Returns: Comparison analysis with similarities, differences, description.
Constants: ANALYSIS_TYPES
| Value | Prompt focus | |——-|————-| | GENERAL | Full description, composition, text | | OCR | Text extraction with location hints | | CHART | Data extraction, chart type, JSON data | | OBJECTS | Object names, counts, locations | | FACE | Age, gender, emotion, orientation (objective only) | | SCENE | Indoor/outdoor, time of day, weather, mood | | COLOR | Dominant colors, palette, temperature, harmony | | QA | Answer a specific question about the image |
Exported Functions — vision/advanced.js
detectObjectsWithBoundingBoxes(imageId, options): Promise<DetectionResult>
Purpose: Detect objects and generate bounding box coordinates. Parameters:
confidenceThreshold(number): Min confidence to include (default: 0.5)maxObjects(number): Max objects to return (default: 50)-
categories(string[]null): Filter to categories: person, vehicle, animal, furniture, electronics, nature, food, building, clothing
Returns: { success, imageId, totalDetected, objects: [{id, label, category, confidence, boundingBox: {x,y,width,height}, count}], processingTimeMs }
Notes for rewrite: Bounding boxes are SYNTHETIC — generated from location hints in Claude’s text response (not real CV model). Replace generateSyntheticBoundingBox() with YOLO/Detectron2 in production.
analyzeFaces(imageId, options): Promise<FaceResult>
Purpose: Detect and analyze faces with optional emotion/attribute analysis. Parameters:
includeEmotions(boolean): Include emotion detection (default: true)includeAttributes(boolean): Include attributes (default: true)includeLandmarks(boolean): Include facial landmarks (default: false)
Returns: { success, imageId, faceCount, faces: [{id, ageRange, gender, emotions, orientation, features, boundingBox, landmarks?}], confidence, processingTimeMs }
Notes for rewrite: Landmarks are also SYNTHETIC (generateSyntheticLandmarks()). All analysis is Claude-text-parsed.
classifySceneAdvanced(imageId, options): Promise<SceneResult>
Purpose: Advanced scene classification with environment attributes. Parameters:
includeAttributes(boolean): Include scene attributes (default: true)includeActivities(boolean): Include activity detection (default: false)
Returns: { success, imageId, classification: {primaryCategory, secondaryCategories, indoor, timeOfDay, weather, location, mood, confidence}, processingTimeMs }
extractTextFromImage(imageId, options): Promise<OCRResult>
Purpose: OCR extraction with position and style information.
Returns: { success, imageId, text, blocks: [{text, location, style, confidence}], processingTimeMs }
analyzeChartData(imageId, options): Promise<ChartResult>
Purpose: Extract chart/graph data including type, labels, and structured data.
Returns: { success, imageId, chartType, title, data, trends, processingTimeMs }
analyzeImageColors(imageId, options): Promise<ColorResult>
Purpose: Color analysis including dominant colors and palette.
Returns: { success, imageId, analysis: {dominant, palette, temperature, contrast, harmony}, processingTimeMs }
compareImagesAdvanced(imageId1, imageId2, options): Promise<AdvancedComparisonResult>
Purpose: Detailed comparison with similarity score and diff highlights.
Exported Class — BetaComputerSession (computer/beta-session.js)
constructor(config)
Parameters:
displayWidth(number): Screen width (default: 1920)displayHeight(number): Screen height (default: 1080)maxActions(number): Action limit per session (default: 100)maxDurationMs(number): Max session duration (default: 5 min)maxScreenshots(number): Screenshot limit (default: 50)captureScreenshots(boolean): Auto-screenshot (default: true)autoPauseOnError(boolean): Pause on action error (default: false)-
description(stringnull)
start(): Promise<StartResult>
Purpose: Initialize session, check beta API availability, capture initial screenshot.
Returns: { success, sessionId, startTime, config }
end(): Promise<EndResult>
Purpose: Finalize session, capture final screenshot, sync action history.
Returns: { success, sessionId, duration, summary }
pause(): PauseResult
Purpose: Pause active session (sync, not async).
resume(): ResumeResult
Purpose: Resume paused session, updates totalPausedDuration.
executeAction(actionType, params): Promise<ActionResult>
Purpose: Execute a single computer use action with safety validation. Parameters:
-
actionType(BetaActionTypes): screenshotmouse_move left_click right_click double_click type key scroll params(object): Action-specific parameters
Returns: { success, sessionId, actionNumber, ...actionData }
Notes for rewrite: Validates params via validateBetaActionParams(). Checks session limits before every action. Tracks cursor positions for mouse actions.
executeSequence(actions, options): Promise<SequenceResult>
Purpose: Execute multiple actions sequentially with delay between each. Parameters:
actions(Array):[{ action: actionType, params: {} }]continueOnError(boolean): Continue if an action fails (default: false)delayBetweenActions(number): ms between actions (default: 100)
Returns: { success, results, totalActions, completedActions, failedActions, duration }
getSummary(): SessionSummary
Purpose: Return session statistics without ending it.
Returns: { sessionId, state, duration, actionCount, screenshotCount, errorCount, errors }
getActionHistory(): ActionRecord[]
Purpose: Get full action history from the BetaAPIClient.
checkLimits(): LimitCheckResult
Returns: { limited: boolean, reason?: string }
Constants: BetaSessionStates
IDLE, ACTIVE, PAUSED, ENDED, ERROR
Constants: BETA_SAFETY_LIMITS
- maxActions: 1000
- maxDurationMs: 3600000 (1 hour)
- maxScreenshots: 200
- maxTextLength: 10000
- maxWaitTime: 60000
- maxCoordinate: 4096
Exported Functions — computer/beta-api.js
BetaAPIClient (class)
Wraps Anthropic’s computer use beta API:
executeAction(actionType, params): Promise<ActionResult>
Anthropic SDK call: client.beta.messages.create() with computer_use_20241022 tool
screenshot(): Promise<ScreenshotResult>
Returns: { success, data: base64String, mimeType: "image/png" }
getActionHistory(): ActionRecord[]
isBetaAPIAvailable(): Promise<boolean>
Purpose: Check if the computer use beta API is accessible.
Constants: BetaActionTypes
SCREENSHOT, MOUSE_MOVE, LEFT_CLICK, RIGHT_CLICK, DOUBLE_CLICK, TYPE, KEY, SCROLL
Exported Functions — computer/tools.js
validateBetaActionParams(actionType, params): ValidationResult
Purpose: Validate action parameters before API call.
convertBetaToInternal(actionType, params): InternalParams
Purpose: Convert beta API params to internal action format.
getBetaToolDefinitions(): ToolDefinition[]
Purpose: Return Anthropic-compatible tool definition objects for computer use.
Notes for rewrite: These are the tool schemas passed to anthropic-beta: computer-use-2024-10-22.
Key Data Structures
| Structure | Fields | Purpose |
|---|---|---|
| AnalysisResult | imageId, analysisType, result, confidence, model, tokensUsed, processingTimeMs, cached | Image analysis |
| DetectedObject | id, label, category, confidence, boundingBox, count | Object detection result |
| BoundingBox | x, y, width, height | Normalized coordinates 0-1 |
| FaceResult | ageRange, gender, emotions[], orientation, features[], position | Face analysis |
| ActionRecord | actionType, params, result, timestamp, actionNumber | Computer use action log |
External Dependencies (vision/engine.js)
fetch()(Node.js built-in) — direct HTTP to Anthropic API@modelcontextprotocol/sdk/types.js—McpError,ErrorCode../../config/claude.js— API key + URL constants
External Dependencies (computer/)
- Anthropic SDK:
client.beta.messages.create()withanthropic-beta: computer-use-2024-10-22header
Notes for Rewrite
- Vision
engine.jsuses rawfetch()not the Anthropic SDK — refactor to useClaudeClientfromcore/client.js. - Bounding boxes in
advanced.jsare SYNTHETIC — production needs a real CV model. BetaComputerSessioncreates aBetaAPIClientinternally — cannot inject a custom client.- Computer use requires
anthropic-beta: computer-use-2024-10-22header. - Safety limits are constants, not configurable per-deployment — consider making them env-var driven.
vision/storage.jsstores image buffers in memory (Map) — add persistent storage for production.