Model Guide
21 models across image, video, audio, and 3D. Here's what each one does, what it costs, and when to pick it.
Image Generation
Text to image
Start with FLUX Pro 1.1 for most work. Switch to Ideogram when you need text in images, HiDream for speed, or Ultra for print-resolution output.
FLUX Pro 1.1
Black Forest Labs' flagship. Top-tier prompt adherence, photorealism, and output diversity. The default choice for most creative work.
FLUX Pro Ultra
Native 4-megapixel output (2048×2048+). Includes a Raw mode for candid, less-synthetic aesthetics. Use when resolution matters.
FLUX Dev
Open-weight model distilled from Pro. Near-Pro quality at lower cost. Great for iteration and testing before committing to Pro.
Ideogram v3
The best model for rendering readable text, logos, and typography inside images. Also supports style transfer from reference images.
HiDream Fast
17B sparse MoE model that generates in just 14 steps. Highest prompt adherence scores on benchmarks. Trades some fine detail for raw speed.
Enhancement
Post-processing
Use Bria for clean product cutouts, BEN for tricky hair/fur edges. Creative Upscale to add detail, Super Resolution to preserve it. CodeFormer for faces.
BG Remove (Bria)
Commercial-grade background removal. Clean edges on products and objects. Fully licensed for commercial use with no IP risk.
BG Remove (BEN)
Open-source eraser with confidence-guided matting. Excels at hair, fur, and semi-transparent edges. Also supports 4K and video.
Creative Upscale
Generative upscaler that re-imagines detail as it scales up. Uses your prompt to hallucinate plausible textures and features. Adds detail that wasn’t there.
Super Resolution
Dense Residual Connected Transformer. Pixel-accurate 4× upscaling that preserves the original precisely. No hallucinated detail.
Face Restore
CodeFormer. Reconstructs severely degraded faces from blur, compression, or low-res. Adjustable fidelity slider controls how much it creates vs preserves.
Video
Image to video
Kling Pro for final production clips. Standard for iteration. Luma Ray 2 Flash for fast prototyping. LTX for budget batches.
Kling 1.6 Pro
1080p clips up to 10 seconds with first-frame and last-frame conditioning. Precise control over start and end states for transitions and storytelling.
Kling 1.6 Standard
720p clips up to 5 seconds. Solid baseline quality with natural motion at a more accessible price point. Good for iteration before upgrading to Pro.
Luma Ray 2 Flash
3× faster and ⅓ the cost of full Ray 2. Physically plausible motion and realistic lighting. Ideal for rapid prototyping and quick iterations.
LTX Video
Fastest open-source video model. Extreme 192:1 compression for near real-time generation. Quality trades off for speed — needs descriptive prompts to shine.
Voice
Text to speech
Kokoro for fast, cheap narration in English. MiniMax HD for professional voiceover quality. ElevenLabs when you need non-English languages.
Kokoro EN
82M-parameter model ranked #1 on HuggingFace TTS Arena. Processes text in under 0.3 seconds at 210× realtime speed. English only, fixed voice set.
MiniMax HD
#1 on the Speech Arena ELO leaderboard. Maximum voice quality, emotional expression, and naturalness. The choice for audiobooks and polished output.
MiniMax Turbo
Ultra-low-latency variant. Thousands of characters per second. #3 on Speech Arena. Trades some expressiveness for speed — perfect for live applications.
ElevenLabs Turbo
Industry standard with 32 language support. Best voice cloning ecosystem. Balanced quality, latency, and language breadth. The safe all-rounder.
3D
Image to mesh
Trellis for fast, clean meshes with PBR materials. Hunyuan3D Full for production-grade geometry and textures. Turbo for quick previews.
Trellis
Microsoft’s open-source model using sparse voxel latents. Generates meshes with full PBR materials (base color, roughness, metallic) in ~3 seconds.
Hunyuan3D Full
Tencent’s full-scale system combining a shape generator with dedicated texture synthesis. Superior geometric detail for games, rendering, and downstream editing.
Hunyuan3D Turbo
Distilled variant generating 3D assets in ~1 second with only 5GB VRAM. Fastest 3D option for rapid iteration. Quality below the full model.
Preset Chains
One-click workflows that chain models together. Available in the studio inspector.
Product Shoot
Starts with: Product photo
Animate Shot
Starts with: Text prompt
Podcast Kit
Starts with: Topic prompt
Launch Pack
Starts with: Text prompt
Product to 3D
Starts with: Product photo
Pay only for what you generate. No subscriptions.
Open Studio