Overview
Catafract uses Azure Cosmos DB (SQL API) for persistent storage. The database name iscatafract and contains four primary collections.
Collections
Users Collection
Stores user account information from Google OAuth. Container Name:users
Partition Key: /email
Schema
Example Document
Operations
- Create: On first Google OAuth sign-in
- Read: On session validation, user data fetch
- Update: On subscription changes via Polar webhook
Projects Collection
Stores project metadata for organizing canvas workspaces. Container Name:projects
Partition Key: /userId
Schema
Example Document
Operations
- Create: When user creates a new project
- Read: When loading user’s projects list
- Update: Not currently implemented
- Delete: Not currently implemented
Canvas Collection
Stores the node and edge data for each project’s canvas. Container Name:canvas
Partition Key: /projectId
Schema
Example Document
Operations
- Create/Upsert: Auto-saved every 1 second (debounced)
- Read: When loading a project’s canvas
- Update: Via upsert operation on every auto-save
- Delete: Not currently implemented
Generations Collection
Stores metadata about AI-generated images for analytics and tracking. Container Name:generations
Partition Key: /userId
Schema
Example Document
Operations
- Create: After successful image generation
- Read: Not currently implemented (for future analytics)
- Update: Not implemented
- Delete: Not implemented
Partition Key Strategy
Why These Partition Keys?
Users (/email):
- Natural partition key as email is unique
- All queries for user data filter by email
- Even distribution across partitions
/userId):
- Users query their own projects most frequently
- Co-locates all projects for a user
- Efficient for listing user’s projects
/projectId):
- Each project has exactly one canvas
- Canvas is always loaded by project ID
- Perfect 1:1 relationship
/userId):
- Future-proofing for per-user analytics
- Can track generation history by user
- Even distribution expected
Indexing Policy
Cosmos DB uses automatic indexing by default. All properties are indexed unless explicitly excluded.Recommended Custom Indexes
For production optimization, consider these indexes:- Exclude large arrays (
nodes,edges,inputImages) from indexing - Reduces index size and write costs
- These fields are not queried directly
Query Patterns
Get User by Email
Get User’s Projects
Get Canvas for Project
Get User Generations (Future)
Performance Considerations
Request Units (RU)
Typical RU costs:- User lookup: ~2-3 RU
- Project list: ~3-5 RU per project
- Canvas load: ~10-50 RU (depends on node count)
- Canvas save: ~10-50 RU (depends on node count)
Optimization Tips
- Use partition keys in queries: Always include partition key for efficient queries
- Limit canvas size: Large canvas documents (1000+ nodes) increase RU costs
- Consider pagination: For users with many projects or generations
- Monitor RU consumption: Set up alerts for unexpected spikes
Scaling Considerations
- Throughput: Start with 400 RU/s, scale based on usage
- Storage: Monitor document sizes, especially canvas documents
- Partitioning: Current strategy scales to millions of users
- Archival: Consider archiving old generations
Database Utilities
Creating Collections
See Configuration for Azure CLI commands to create all required collections.Backup Strategy
Azure Cosmos DB provides automatic backups every 4 hours with 8-hour retention by default. Consider upgrading to continuous backup for production.Monitoring
Monitor these metrics:- Request Unit consumption
- Document count per collection
- Average document size
- Query performance
- Throttling events (429 errors)