Image Generation

POST /api/image

Generate AI images by providing input images and a text prompt. Uses Google’s Gemini gemini-3-pro-image-preview model.

Authentication

Requires a valid NextAuth session.

Request Body

{
  "images": ["string[]"],
  "prompt": "string"
}

Parameters

images

string[]

required

Array of image URLs or base64-encoded images. Must contain at least one image.

Can be Azure Blob Storage URLs
Can be base64 data URLs (e.g., data:image/png;base64,...)
Can be external HTTP/HTTPS URLs

prompt

string

required

Text description of the desired image transformation or generation.Example prompts:

“Make this image look like a watercolor painting”
“Transform this into a cyberpunk style”
“Convert to black and white with high contrast”

Response

Success Response (200)

{
  "imageBase64": "data:image/png;base64,...",
  "imageUrl": "https://[storage-account].blob.core.windows.net/catafract/generated-[timestamp].png",
  "text": "Optional text response from the model"
}

imageBase64

string

Base64-encoded image data URL for immediate display

imageUrl

string

Permanent URL to the generated image in Azure Blob Storage

text

string

Optional text generated by the model (if any)

Error Responses

{
  "error": "No images provided"
}

Example Request

const response = await fetch('/api/image', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    images: [
      'https://example.blob.core.windows.net/catafract/image1.png',
      'https://example.blob.core.windows.net/catafract/image2.png'
    ],
    prompt: 'Combine these images into a dramatic sunset scene'
  })
});

const data = await response.json();
console.log(data.imageUrl); // Azure Blob Storage URL

Generation Metadata

The endpoint also saves generation metadata to Azure Cosmos DB:

{
  "id": "gen-[timestamp]",
  "userId": "user@example.com",
  "prompt": "User's prompt text",
  "inputImages": ["array of input image URLs"],
  "outputImageUrl": "https://...",
  "createdAt": "2024-01-01T00:00:00.000Z"
}

Processing Time

Image generation typically takes 20-30 seconds. The frontend displays a countdown timer during generation.

Model Configuration

Model: gemini-3-pro-image-preview (primary)
Alternative: gemini-2.5-flash-image (commented out in code)
Input: Supports multiple images + text prompt
Output: Single generated image

Image Processing

Input Processing:
- External URLs are fetched and converted to base64
- Data URLs are processed to extract base64 data
- MIME types are detected automatically
Generation:
- Images are sent to Gemini API as inline data
- Prompt is appended as text content
- Model generates a new image
Output Processing:
- Generated image is uploaded to Azure Blob Storage
- Metadata is saved to Cosmos DB
- Both base64 and permanent URL are returned

Best Practices

Use clear, descriptive prompts
Provide high-quality input images
Limit to 2-3 input images for best results
Handle the 30-second timeout appropriately in your UI

Limitations

Requires at least one input image
Generation time is ~30 seconds
Image quality depends on prompt clarity
Relies on Gemini API availability

Overview

Endpoints

POST /api/image

Authentication

Request Body

Parameters

Response

Success Response (200)

Error Responses

Example Request

Generation Metadata

Processing Time

Model Configuration

Image Processing

Best Practices

Limitations

Overview

Endpoints

​POST /api/image

​Authentication

​Request Body

​Parameters

​Response

​Success Response (200)

​Error Responses

​Example Request

​Generation Metadata

​Processing Time

​Model Configuration

​Image Processing

​Best Practices

​Limitations

POST /api/image

Authentication

Request Body

Parameters

Response

Success Response (200)

Error Responses

Example Request

Generation Metadata

Processing Time

Model Configuration

Image Processing

Best Practices

Limitations