Hi James, the most important thing is a strong model and a high-quality reference audio (one speaker and limited background noise). There are many models available, requiring between 3-25 or 30 minutes. Elevenlabs is the best known, but I'd recommend exploring the full ecosystem to find the best model for your needs. Here are a few to look at:
- Cartesia
- Resemble AI
- Hume AI
- Sesame CSM 1B
- Fish Audio
- Qwen 3 TTS