SublyAI is an AI-powered video subtitle generator that uses Google Gemini AI for speech recognition and translation. Unlike cloud-based tools (such as VEED or Kapwing), SublyAI processes videos locally in the user's browser - video files never leave your device.
Supported formats: MP4, MOV, AVI. No file size limits.
Audio is extracted locally in your browser using WebCodecs API. Video never leaves your device.
Phase 1: LLM creates transcript with word-level timestamp precision. Phase 2: Second LLM performs final translation or transcript refinement for perfect results.
Automatic translation using Google Gemini AI. Including English, Czech, German, French, Spanish, and more.
Export as SRT, VTT, or burn-in (embed subtitles directly into video client-side).
Current AI models have inherent limitations: they either provide accurate word-level timestamps but imperfect translation, or they can perfectly adapt text for readability but lose timing precision (so-called "timestamp drift"). SublyAI is the first in the world to combine two specialized LLM models: Phase 1 extracts precise transcript with word-level timestamps. Phase 2 uses a different LLM optimized for language quality and context. The result is subtitles that are accurate in both timing and linguistic expression.
| Feature | SublyAI | VEED | Kapwing |
|---|---|---|---|
| Video Processing | Client-side (in your browser) | Cloud-based (uploaded to servers) | Cloud-based (uploaded to servers) |
| Privacy | Video never leaves your device | Video uploaded to cloud | Video uploaded to cloud |
| AI Technology | Google Gemini + two-phase processing | Proprietary AI/not specified | Proprietary AI/not specified |
| Timestamp Accuracy | Word-level precision | Sentence-level | Sentence-level |
| Speed | ~30 seconds (no queues) | Depends on queue | Depends on queue |
| Price | 60 min/week FREE | Paid (from $12/month) | Freemium with watermark |
| Import Own Subtitles | Free, no credits deducted | Limited | Limited |
SublyAI uses WebCodecs API for audio extraction and FFmpeg.wasm for video processing directly in your browser. Your video files are processed locally; only extracted audio is transmitted to Google Cloud for AI analysis.
Phase 1: Speech-to-text with word-level alignment. Phase 2: Language refinement for optimal readability and translation. This approach overcomes limitations of current models that suffer from either timestamp drift or suboptimal language output.
Extracted audio is processed over encrypted connection (SSL/TLS). We use ephemeral storage (signed URLs) - audio files are automatically deleted after processing completion.
MP4, MOV, AVI, WebM
SRT, VTT
SRT, VTT, Burn-in (video with subtitles)
99+ languages including English, Czech, German, French, Spanish, Italian, Polish, Russian, Chinese, Japanese, and more
All features free during beta
After official launch: priority pricing for early adopters