Model Guide

21 models across image, video, audio, and 3D. Here's what each one does, what it costs, and when to pick it.

Image Generation

Text to image

Start with FLUX Pro 1.1 for most work. Switch to Ideogram when you need text in images, HiDream for speed, or Ultra for print-resolution output.

RECOMMENDED

FLUX Pro 1.1

10 cr$0.040
All-rounder

Black Forest Labs' flagship. Top-tier prompt adherence, photorealism, and output diversity. The default choice for most creative work.

Speed~5s

FLUX Pro Ultra

15 cr$0.060
High-res & print

Native 4-megapixel output (2048×2048+). Includes a Raw mode for candid, less-synthetic aesthetics. Use when resolution matters.

Speed~15s

FLUX Dev

6 cr$0.025
Budget & experimentation

Open-weight model distilled from Pro. Near-Pro quality at lower cost. Great for iteration and testing before committing to Pro.

Speed~15s

Ideogram v3

15 cr$0.060
Text in images

The best model for rendering readable text, logos, and typography inside images. Also supports style transfer from reference images.

Speed~15s

HiDream Fast

3 cr$0.010
Speed & cost

17B sparse MoE model that generates in just 14 steps. Highest prompt adherence scores on benchmarks. Trades some fine detail for raw speed.

Speed~5s

Enhancement

Post-processing

Use Bria for clean product cutouts, BEN for tricky hair/fur edges. Creative Upscale to add detail, Super Resolution to preserve it. CodeFormer for faces.

RECOMMENDED

BG Remove (Bria)

5 cr$0.018
Product cutouts

Commercial-grade background removal. Clean edges on products and objects. Fully licensed for commercial use with no IP risk.

Speed~5s

BG Remove (BEN)

6 cr$0.025
Hair & fur edges

Open-source eraser with confidence-guided matting. Excels at hair, fur, and semi-transparent edges. Also supports 4K and video.

Speed~15s

Creative Upscale

12 cr$0.050
AI-generated images

Generative upscaler that re-imagines detail as it scales up. Uses your prompt to hallucinate plausible textures and features. Adds detail that wasn’t there.

Speed~15s

Super Resolution

2 cr$0.0045
Faithful 4× upscale

Dense Residual Connected Transformer. Pixel-accurate 4× upscaling that preserves the original precisely. No hallucinated detail.

Speed~5s

Face Restore

1 cr$0.0021
Damaged faces

CodeFormer. Reconstructs severely degraded faces from blur, compression, or low-res. Adjustable fidelity slider controls how much it creates vs preserves.

Speed<1s

Video

Image to video

Kling Pro for final production clips. Standard for iteration. Luma Ray 2 Flash for fast prototyping. LTX for budget batches.

RECOMMENDED

Kling 1.6 Pro

100 cr$0.47
Production quality

1080p clips up to 10 seconds with first-frame and last-frame conditioning. Precise control over start and end states for transitions and storytelling.

Speed~60s

Kling 1.6 Standard

55 cr$0.23
General video

720p clips up to 5 seconds. Solid baseline quality with natural motion at a more accessible price point. Good for iteration before upgrading to Pro.

Speed~15s

Luma Ray 2 Flash

50 cr$0.20
Fast previews

3× faster and ⅓ the cost of full Ray 2. Physically plausible motion and realistic lighting. Ideal for rapid prototyping and quick iterations.

Speed~5s

LTX Video

10 cr$0.040
Budget & speed

Fastest open-source video model. Extreme 192:1 compression for near real-time generation. Quality trades off for speed — needs descriptive prompts to shine.

Speed~5s

Voice

Text to speech

Kokoro for fast, cheap narration in English. MiniMax HD for professional voiceover quality. ElevenLabs when you need non-English languages.

RECOMMENDED

Kokoro EN

5 cr$0.020
Fast English narration

82M-parameter model ranked #1 on HuggingFace TTS Arena. Processes text in under 0.3 seconds at 210× realtime speed. English only, fixed voice set.

Speed<1s

MiniMax HD

12 cr$0.050
Professional voiceover

#1 on the Speech Arena ELO leaderboard. Maximum voice quality, emotional expression, and naturalness. The choice for audiobooks and polished output.

Speed~15s

MiniMax Turbo

8 cr$0.030
Real-time & chatbots

Ultra-low-latency variant. Thousands of characters per second. #3 on Speech Arena. Trades some expressiveness for speed — perfect for live applications.

Speed~5s

ElevenLabs Turbo

12 cr$0.050
Multilingual

Industry standard with 32 language support. Best voice cloning ecosystem. Balanced quality, latency, and language breadth. The safe all-rounder.

Speed~5s

3D

Image to mesh

Trellis for fast, clean meshes with PBR materials. Hunyuan3D Full for production-grade geometry and textures. Turbo for quick previews.

RECOMMENDED

Trellis

5 cr$0.020
Fast PBR assets

Microsoft’s open-source model using sparse voxel latents. Generates meshes with full PBR materials (base color, roughness, metallic) in ~3 seconds.

Speed~5s

Hunyuan3D Full

40 cr$0.16
Production meshes

Tencent’s full-scale system combining a shape generator with dedicated texture synthesis. Superior geometric detail for games, rendering, and downstream editing.

Speed~60s

Hunyuan3D Turbo

20 cr$0.080
Quick previews

Distilled variant generating 3D assets in ~1 second with only 5GB VRAM. Fastest 3D option for rapid iteration. Quality below the full model.

Speed~5s

Preset Chains

One-click workflows that chain models together. Available in the studio inspector.

Product Shoot

72 cr~$0.29

Starts with: Product photo

BG RemoveCreative UpscaleKling 1.6 Standard

Animate Shot

65 cr~$0.27

Starts with: Text prompt

FLUX Pro 1.1Kling 1.6 Standard

Podcast Kit

15 cr~$0.06

Starts with: Topic prompt

FLUX Pro 1.1Kokoro EN

Launch Pack

78 cr~$0.32

Starts with: Text prompt

FLUX DevBG RemoveCreative UpscaleKling 1.6 Standard

Product to 3D

10 cr~$0.04

Starts with: Product photo

BG RemoveTrellis

Pay only for what you generate. No subscriptions.

Open Studio