Database Schema - Catafract

Overview

Catafract uses Azure Cosmos DB (SQL API) for persistent storage. The database name is catafract and contains four primary collections.

Collections

Users Collection

Stores user account information from Google OAuth. Container Name: users Partition Key: /email

Schema

interface User {
  id: string;                    // UUID (auto-generated)
  email: string;                 // From Google OAuth (partition key)
  name: string;                  // From Google OAuth
  image: string;                 // Profile picture URL
  createdAt: string;             // ISO 8601 timestamp
  isPro: boolean;                // Subscription status
  provider: string;              // Always "google"
  polarCustomerId?: string;      // Polar customer ID (if subscribed)
  subscriptionStatus?: string;   // "active" | "wont-renew" | "revoked"
}

Example Document

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "email": "user@example.com",
  "name": "John Doe",
  "image": "https://lh3.googleusercontent.com/a/...",
  "createdAt": "2024-01-15T10:30:00.000Z",
  "isPro": false,
  "provider": "google"
}

Operations

Create: On first Google OAuth sign-in
Read: On session validation, user data fetch
Update: On subscription changes via Polar webhook

Projects Collection

Stores project metadata for organizing canvas workspaces. Container Name: projects Partition Key: /userId

Schema

interface Project {
  id: string;           // UUID
  userId: string;       // Reference to user (partition key)
  name: string;         // Project name
  createdDate: Date;    // Creation timestamp
}

Example Document

{
  "id": "7b2e8f45-a1c3-4d92-9f21-3b4e5f6a7c8d",
  "userId": "550e8400-e29b-41d4-a716-446655440000",
  "name": "My First Project",
  "createdDate": "2024-01-15T14:22:00.000Z"
}

Operations

Create: When user creates a new project
Read: When loading user’s projects list
Update: Not currently implemented
Delete: Not currently implemented

Canvas Collection

Stores the node and edge data for each project’s canvas. Container Name: canvas Partition Key: /projectId

Schema

interface Canvas {
  id: string;           // UUID
  projectId: string;    // Reference to project (partition key)
  nodes: Node[];        // Array of canvas nodes
  edges: Edge[];        // Array of connections
}

interface Node {
  id: string;
  type: 'upload' | 'generation';
  position: { x: number; y: number };
  data: {
    type: 'upload' | 'generation';
    image?: string;
    prompt?: string;
    isGenerating?: boolean;
  };
}

interface Edge {
  id: string;
  source: string;
  target: string;
  sourceHandle?: string;
  targetHandle?: string;
}

Example Document

{
  "id": "9c4f2b1e-3d5a-4e8b-a7c9-1f2e3d4a5b6c",
  "projectId": "7b2e8f45-a1c3-4d92-9f21-3b4e5f6a7c8d",
  "nodes": [
    {
      "id": "upload-1",
      "type": "upload",
      "position": { "x": 100, "y": 100 },
      "data": {
        "type": "upload",
        "image": "https://catafractstorage.blob.core.windows.net/catafract/1704067200000-photo.jpg"
      }
    },
    {
      "id": "generation-1",
      "type": "generation",
      "position": { "x": 400, "y": 100 },
      "data": {
        "type": "generation",
        "prompt": "Make this image look like a watercolor painting",
        "image": "https://catafractstorage.blob.core.windows.net/catafract/generated-1704067230000.png"
      }
    }
  ],
  "edges": [
    {
      "id": "edge-1",
      "source": "upload-1",
      "target": "generation-1"
    }
  ]
}

Operations

Create/Upsert: Auto-saved every 1 second (debounced)
Read: When loading a project’s canvas
Update: Via upsert operation on every auto-save
Delete: Not currently implemented

Generations Collection

Stores metadata about AI-generated images for analytics and tracking. Container Name: generations Partition Key: /userId

Schema

interface Generation {
  id: string;              // Format: "gen-{timestamp}"
  userId: string;          // User email (partition key)
  prompt: string;          // Text prompt used
  inputImages: string[];   // Array of input image URLs
  outputImageUrl: string;  // Generated image URL
  createdAt: string;       // ISO 8601 timestamp
}

Example Document

{
  "id": "gen-1704067230000",
  "userId": "user@example.com",
  "prompt": "Make this image look like a watercolor painting",
  "inputImages": [
    "https://catafractstorage.blob.core.windows.net/catafract/1704067200000-photo.jpg"
  ],
  "outputImageUrl": "https://catafractstorage.blob.core.windows.net/catafract/generated-1704067230000.png",
  "createdAt": "2024-01-15T15:30:30.000Z"
}

Operations

Create: After successful image generation
Read: Not currently implemented (for future analytics)
Update: Not implemented
Delete: Not implemented

Partition Key Strategy

Why These Partition Keys?

Users (/email):

Natural partition key as email is unique
All queries for user data filter by email
Even distribution across partitions

Projects (/userId):

Users query their own projects most frequently
Co-locates all projects for a user
Efficient for listing user’s projects

Canvas (/projectId):

Each project has exactly one canvas
Canvas is always loaded by project ID
Perfect 1:1 relationship

Generations (/userId):

Future-proofing for per-user analytics
Can track generation history by user
Even distribution expected

Indexing Policy

Cosmos DB uses automatic indexing by default. All properties are indexed unless explicitly excluded.

Recommended Custom Indexes

For production optimization, consider these indexes:

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*"
    }
  ],
  "excludedPaths": [
    {
      "path": "/nodes/*"
    },
    {
      "path": "/edges/*"
    },
    {
      "path": "/inputImages/*"
    }
  ]
}

Rationale:

Exclude large arrays (nodes, edges, inputImages) from indexing
Reduces index size and write costs
These fields are not queried directly

Query Patterns

Get User by Email

SELECT * FROM c WHERE c.email = @email

Get User’s Projects

SELECT * FROM c WHERE c.userId = @userId

Get Canvas for Project

SELECT * FROM c WHERE c.projectId = @projectId

Get User Generations (Future)

SELECT * FROM c WHERE c.userId = @userId ORDER BY c.createdAt DESC

Performance Considerations

Request Units (RU)

Typical RU costs:

User lookup: ~2-3 RU
Project list: ~3-5 RU per project
Canvas load: ~10-50 RU (depends on node count)
Canvas save: ~10-50 RU (depends on node count)

Optimization Tips

Use partition keys in queries: Always include partition key for efficient queries
Limit canvas size: Large canvas documents (1000+ nodes) increase RU costs
Consider pagination: For users with many projects or generations
Monitor RU consumption: Set up alerts for unexpected spikes

Scaling Considerations

Throughput: Start with 400 RU/s, scale based on usage
Storage: Monitor document sizes, especially canvas documents
Partitioning: Current strategy scales to millions of users
Archival: Consider archiving old generations

Database Utilities

Creating Collections

See Configuration for Azure CLI commands to create all required collections.

Backup Strategy

Azure Cosmos DB provides automatic backups every 4 hours with 8-hour retention by default. Consider upgrading to continuous backup for production.

Monitoring

Monitor these metrics:

Request Unit consumption
Document count per collection
Average document size
Query performance
Throttling events (429 errors)

Get Started

Configuration

​Overview

​Collections

​Users Collection

​Schema

​Example Document

​Operations

​Projects Collection

​Schema

​Example Document

​Operations

​Canvas Collection

​Schema

​Example Document

​Operations

​Generations Collection

​Schema

​Example Document

​Operations

​Partition Key Strategy

​Why These Partition Keys?

​Indexing Policy

​Recommended Custom Indexes

​Query Patterns

​Get User by Email

​Get User’s Projects

​Get Canvas for Project

​Get User Generations (Future)

​Performance Considerations

​Request Units (RU)

​Optimization Tips

​Scaling Considerations

​Database Utilities

​Creating Collections

​Backup Strategy

​Monitoring

Overview

Collections

Users Collection

Schema

Example Document

Operations

Projects Collection

Schema

Example Document

Operations

Canvas Collection

Schema

Example Document

Operations

Generations Collection

Schema

Example Document

Operations

Partition Key Strategy

Why These Partition Keys?

Indexing Policy

Recommended Custom Indexes

Query Patterns

Get User by Email

Get User’s Projects

Get Canvas for Project

Get User Generations (Future)

Performance Considerations

Request Units (RU)

Optimization Tips

Scaling Considerations

Database Utilities

Creating Collections

Backup Strategy

Monitoring