Model Guide
35+ models across image, video, audio, and 3D. Here's what each one does, what it costs, and when to pick it.
Image Generation
Text to image
Start with Nano Banana Pro or FLUX 2 Pro for most work. Switch to Ideogram when you need text in images, Recraft for design assets, or Grok for aesthetic imagery.
Nano Banana Pro
Google Gemini 2.5 Flash Image. State-of-the-art generation with exceptional character consistency across poses, lighting, and scenes. Supports up to 4K resolution.
FLUX 2 Pro
Latest FLUX generation. Zero-config approach — just prompt and go. Studio-grade quality with enhanced typography and text rendering.
Nano Banana
Google’s base image model. Fast, affordable, and surprisingly capable. Great for iteration before upgrading to Pro.
Recraft V3
SOTA on HuggingFace T2I Benchmark. Generates vector art, long text, and brand-consistent imagery. Supports style presets and color palettes.
Grok Imagine
xAI’s highly aesthetic image generator. Produces visually striking results with strong artistic style. Fast and affordable.
FLUX Pro 1.1
Black Forest Labs' flagship. Top-tier prompt adherence, photorealism, and output diversity. Proven workhorse for professional creative work.
FLUX Pro Ultra
Native 4-megapixel output (2048×2048+). Includes a Raw mode for candid, less-synthetic aesthetics. Use when resolution matters.
FLUX Dev
Open-weight model distilled from Pro. Near-Pro quality at lower cost. Great for iteration and testing before committing to Pro.
Ideogram v3
The best model for rendering readable text, logos, and typography inside images. Also supports style transfer from reference images.
HiDream Fast
17B sparse MoE model that generates in just 14 steps. Highest prompt adherence scores on benchmarks. Trades some fine detail for raw speed.
Enhancement
Post-processing
Use Bria for clean product cutouts, BEN for tricky hair/fur edges. Topaz for professional upscaling, Creative Upscale to add detail, Super Resolution to preserve it.
BG Remove (Bria)
Commercial-grade background removal. Clean edges on products and objects. Fully licensed for commercial use with no IP risk.
BG Remove (BEN)
Open-source eraser with confidence-guided matting. Excels at hair, fur, and semi-transparent edges. Also supports 4K and video.
Topaz Upscale
Industry-standard AI upscaling with multiple modes (Standard, High Fidelity, CGI, Text Refine). Face enhancement built in. Up to 4× with exceptional detail.
Creative Upscale
Generative upscaler that re-imagines detail as it scales up. Uses your prompt to hallucinate plausible textures and features. Adds detail that wasn’t there.
Super Resolution
Dense Residual Connected Transformer. Pixel-accurate 4× upscaling that preserves the original precisely. No hallucinated detail.
Face Restore
CodeFormer. Reconstructs severely degraded faces from blur, compression, or low-res. Adjustable fidelity slider controls how much it creates vs preserves.
Video
Image to video
Veo 3 for cinematic clips with audio. Kling 2.1 Pro for precise motion control. Hailuo for character consistency. LTX for budget batches.
Veo 3
Google DeepMind’s flagship. Generates cinematic video with natural dialogue, voice-overs, and ambient audio. Up to 8 seconds at 1080p.
Veo 3 Fast
Faster variant of Veo 3 with audio generation. Same quality class at reduced cost. Good for iteration before final renders.
Veo 2
Advanced physics simulation with precise camera controls and high-fidelity motion. Cinematic lighting and realistic material interactions.
Kling 2.1 Pro
Professional-grade video with enhanced visual fidelity. Supports motion brushes, special effects, multi-image input, and precise camera control.
Hailuo 02
MiniMax’s latest. Consistent character motion with end-frame conditioning. 768p at 25fps. Strong for narrative sequences and character animation.
Wan 2.6
Alibaba’s latest. Transforms a single image into multi-scene narratives with proper transitions. Supports 5–15 second clips with prompt expansion.
Kling 1.6 Pro
1080p clips up to 10 seconds with first-frame and last-frame conditioning. Precise control over start and end states for transitions and storytelling.
Kling 1.6 Standard
720p clips up to 5 seconds. Solid baseline quality with natural motion at a more accessible price point. Good for iteration before upgrading to Pro.
Luma Ray 2 Flash
3× faster and ⅓ the cost of full Ray 2. Physically plausible motion and realistic lighting. Ideal for rapid prototyping and quick iterations.
LTX Video
Fastest open-source video model. Extreme 192:1 compression for near real-time generation. Quality trades off for speed — needs descriptive prompts to shine.
Voice
Text to speech
Dia TTS for multi-speaker dialogue and emotional expression. Kokoro for fast English narration. MiniMax HD for professional voiceover quality. ElevenLabs for multilingual.
Dia TTS
Studio-quality speech with multi-speaker dialogue using [S1]/[S2] tags. Supports emotional nonverbals like laughter, sighs, and throat clearing.
Kokoro EN
82M-parameter model ranked #1 on HuggingFace TTS Arena. Processes text in under 0.3 seconds at 210× realtime speed. English only, fixed voice set.
MiniMax HD
#1 on the Speech Arena ELO leaderboard. Maximum voice quality, emotional expression, and naturalness. The choice for audiobooks and polished output.
MiniMax Turbo
Ultra-low-latency variant. Thousands of characters per second. #3 on Speech Arena. Trades some expressiveness for speed — perfect for live applications.
ElevenLabs Turbo
Industry standard with 32 language support. Best voice cloning ecosystem. Balanced quality, latency, and language breadth. The safe all-rounder.
3D
Image to mesh
Meshy 6 for production-ready meshes. Trellis for fast PBR assets. Hunyuan3D Full for maximum geometric detail.
Meshy 6
Latest from Meshy. Generates realistic, production-ready 3D meshes from images or text. High-quality topology suitable for rendering, animation, and real-time apps.
Trellis
Microsoft’s open-source model using sparse voxel latents. Generates meshes with full PBR materials (base color, roughness, metallic) in ~3 seconds.
Hunyuan3D Full
Tencent’s full-scale system combining a shape generator with dedicated texture synthesis. Superior geometric detail for games, rendering, and downstream editing.
Hunyuan3D Turbo
Distilled variant generating 3D assets in ~1 second with only 5GB VRAM. Fastest 3D option for rapid iteration. Quality below the full model.
Preset Chains
One-click workflows that chain models together. Available in the studio inspector.
Product Shoot
Starts with: Product photo
Animate Shot
Starts with: Text prompt
Podcast Kit
Starts with: Topic prompt
Launch Pack
Starts with: Text prompt
Product to 3D
Starts with: Product photo
Pay only for what you generate. No subscriptions.
Open Studio