Skip to main content

POST /api/image

Generate AI images by providing input images and a text prompt. Uses Google’s Gemini gemini-3-pro-image-preview model.

Authentication

Requires a valid NextAuth session.

Request Body

{
  "images": ["string[]"],
  "prompt": "string"
}

Parameters

images
string[]
required
Array of image URLs or base64-encoded images. Must contain at least one image.
  • Can be Azure Blob Storage URLs
  • Can be base64 data URLs (e.g., data:image/png;base64,...)
  • Can be external HTTP/HTTPS URLs
prompt
string
required
Text description of the desired image transformation or generation.Example prompts:
  • “Make this image look like a watercolor painting”
  • “Transform this into a cyberpunk style”
  • “Convert to black and white with high contrast”

Response

Success Response (200)

{
  "imageBase64": "data:image/png;base64,...",
  "imageUrl": "https://[storage-account].blob.core.windows.net/catafract/generated-[timestamp].png",
  "text": "Optional text response from the model"
}
imageBase64
string
Base64-encoded image data URL for immediate display
imageUrl
string
Permanent URL to the generated image in Azure Blob Storage
text
string
Optional text generated by the model (if any)

Error Responses

{
  "error": "No images provided"
}

Example Request

const response = await fetch('/api/image', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    images: [
      'https://example.blob.core.windows.net/catafract/image1.png',
      'https://example.blob.core.windows.net/catafract/image2.png'
    ],
    prompt: 'Combine these images into a dramatic sunset scene'
  })
});

const data = await response.json();
console.log(data.imageUrl); // Azure Blob Storage URL

Generation Metadata

The endpoint also saves generation metadata to Azure Cosmos DB:
{
  "id": "gen-[timestamp]",
  "userId": "user@example.com",
  "prompt": "User's prompt text",
  "inputImages": ["array of input image URLs"],
  "outputImageUrl": "https://...",
  "createdAt": "2024-01-01T00:00:00.000Z"
}

Processing Time

Image generation typically takes 20-30 seconds. The frontend displays a countdown timer during generation.

Model Configuration

  • Model: gemini-3-pro-image-preview (primary)
  • Alternative: gemini-2.5-flash-image (commented out in code)
  • Input: Supports multiple images + text prompt
  • Output: Single generated image

Image Processing

  1. Input Processing:
    • External URLs are fetched and converted to base64
    • Data URLs are processed to extract base64 data
    • MIME types are detected automatically
  2. Generation:
    • Images are sent to Gemini API as inline data
    • Prompt is appended as text content
    • Model generates a new image
  3. Output Processing:
    • Generated image is uploaded to Azure Blob Storage
    • Metadata is saved to Cosmos DB
    • Both base64 and permanent URL are returned

Best Practices

  • Use clear, descriptive prompts
  • Provide high-quality input images
  • Limit to 2-3 input images for best results
  • Handle the 30-second timeout appropriately in your UI

Limitations

  • Requires at least one input image
  • Generation time is ~30 seconds
  • Image quality depends on prompt clarity
  • Relies on Gemini API availability