In today's world of digital content, voice and audio are becoming increasingly important. From podcasts and audiobooks to accessibility features and multilingual content, the demand for high-quality text-to-speech (TTS) and audio dubbing solutions continues to grow. As developers, we're often tasked with integrating these capabilities into our applications, and doing so in a way that's maintainable, testable, and scalable.
In this article, we'll be exploring how to integrate ElevenLabs - one of the most advanced AI voice platforms - into a NestJS application. We'll build a production-ready audio service that leverages ElevenLabs powerful features, including text-to-speech generation, voice cloning, audio streaming, and automatic dubbing across multiple languages. More importantly, we'll implement these features following NestJS best practices and clean architecture principles. So let's dive right in!
What is ElevenLabs?
ElevenLabs is an AI audio research and deployment company that provides state-of-the-art text => to => speech and voice AI technology. Their API allows us developers to generate incredibly realistic voiceovers in multiple languages, clone voices, stream audio in real-time, and even dub entire videos automatically while preserving the speaker's emotion, timing, and tone.
Some key features include:
- Text-to-Speech: Convert text to lifelike speech in over 32 languages
- Voice Cloning: Create custom AI voices from audio samples
- Real-time Streaming: Stream audio as it's being generated for low-latency applications
- AI Dubbing: Automatically translate and dub audio/video content across 29 languages with speaker detection and voice preservation
- Multiple Models: Choose from various models optimized for quality, speed, or multilingual support
Prerequisites
Before we begin, make sure you have the following installed:
- Node.js (version 18 or higher)
- NestJS CLI installed globally
- An ElevenLabs account with an API key (get yours at elevenlabs.io)
Setting Up the Project
Let's start by creating a new NestJS project and installing the necessary dependencies:
$ nest new elevenlabs-nest$ cd elevenlabs-nestSelect your preferred package manager when prompted and let's install some required packages:
$ npm i @nestjs/config @elevenlabs/elevenlabs-jsWe'll be using the
@nestjs/configpackage to help us manage environment variables, and we're installing the official ElevenLabs Node.js SDK@elevenlabs/elevenlabs-js.
Configuring the Application
So let's start off by setting up our environment variables. Let's create a .env file in the root of our project:
ELEVENLABS_API_KEY=your_api_key_hereNow let's update the app.module.ts file to include the ConfigModule:
import { Module } from '@nestjs/common';import { ConfigModule } from '@nestjs/config';import { AppController } from './app.controller';import { AppService } from './app.service';@Module({ imports: [ ConfigModule.forRoot({ isGlobal: true, }), ], controllers: [AppController], providers: [AppService],})export class AppModule {}Creating the Audio Module
Now that we have the fundamental parts setup, let's get to the fun stuff and start setting up our ElevenLabs-related audio code.
Let's generate a dedicated module for our audio-related functionality:
$ nest generate module audio$ nest generate service audio$ nest generate controller audioThis creates an audio folder with the necessary files.
Just as a look ahead, our folder structure will eventually look like this:
src/
├── audio/
│ ├── audio.controller.ts
│ ├── audio.module.ts
│ ├── audio.service.ts
│ ├── dto/
│ │ ├── text-to-speech.dto.ts
│ │ └── dubbing.dto.ts
│ └── interfaces/
│ └── audio-service.interface.ts
├── app.module.ts
└── main.tsImplementing the Audio Service Interface
Since we all love clean architecture principles, let's first define an interface for our audio service. This way our business logic will be independent of any specific implementation (ElevenLabs in this case) and this allows for easy testing and potential provider swapping in the future.
Create src/audio/interfaces/audio-service.interface.ts:
import { Readable } from 'stream';export interface Voice { voiceId: string; name: string; category?: string; description?: string;}export interface TextToSpeechOptions { text: string; voiceId: string; modelId?: string; stability?: number; similarityBoost?: number;}export interface DubbingOptions { sourceFile: Buffer; targetLanguage: string; sourceLanguage?: string; numSpeakers?: number; watermark?: boolean;}export interface DubbingStatus { dubbingId: string; status: string; targetLanguages?: string[]; error?: string;}export interface IAudioService { convertTextToSpeech(options: TextToSpeechOptions): Promise<Buffer>; streamTextToSpeech(options: TextToSpeechOptions): Promise<Readable>; getAvailableVoices(): Promise<Voice[]>; createDubbing(options: DubbingOptions): Promise<string>; getDubbingStatus(dubbingId: string): Promise<DubbingStatus>; downloadDubbedAudio(dubbingId: string, languageCode: string): Promise<Buffer>;}export const AUDIO_SERVICE_TOKEN = 'AUDIO_SERVICE_TOKEN';Creating DTOs
DTOs (Data Transfer Objects) help us validate and type our incoming requests. Create the following DTOs:
Let's create a few new files, and populate them with the class below each section.
/audio/dto/text-to-speech.dto.ts:
import { IsString, IsOptional, IsNumber, Min, Max } from 'class-validator';export class TextToSpeechDto { @IsString() text: string; @IsString() voiceId: string; @IsOptional() @IsString() modelId?: string; @IsOptional() @IsNumber() @Min(0) @Max(1) stability?: number; @IsOptional() @IsNumber() @Min(0) @Max(1) similarityBoost?: number;}/audio/dto/dubbing.dto.ts:
import { IsString, IsOptional, IsNumber, IsBoolean } from 'class-validator';export class CreateDubbingDto { @IsString() targetLanguage: string; @IsOptional() @IsString() sourceLanguage?: string; @IsOptional() @IsNumber() numSpeakers?: number; @IsOptional() @IsBoolean() watermark?: boolean;}export class DubbingStatusDto { @IsString() dubbingId: string;}export class DownloadDubbingDto { @IsString() dubbingId: string; @IsString() languageCode: string;}Implementing the ElevenLabs Service
Great! Now let's implement our audio service using the ElevenLabs SDK. Create a new file /audio/elevenlabs-audio.service.ts:
import { Injectable, Logger } from '@nestjs/common';import { ConfigService } from '@nestjs/config';import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';import { Readable } from 'stream';import { IAudioService, Voice, TextToSpeechOptions, DubbingOptions, DubbingStatus,} from './interfaces/audio-service.interface';@Injectable()export class ElevenLabsAudioService implements IAudioService { private readonly logger = new Logger(ElevenLabsAudioService.name); private readonly client: ElevenLabsClient; constructor(private configService: ConfigService) { const apiKey = this.configService.get<string>('ELEVENLABS_API_KEY'); if (!apiKey) { throw new Error('ELEVENLABS_API_KEY is not configured'); } this.client = new ElevenLabsClient({ apiKey }); this.logger.log('ElevenLabs client initialized'); } async convertTextToSpeech(options: TextToSpeechOptions): Promise<Buffer> { this.logger.log(`Converting text to speech for voice: ${options.voiceId}`); try { const audio = await this.client.textToSpeech.convert(options.voiceId, { text: options.text, model_id: options.modelId || 'eleven_multilingual_v2', voice_settings: options.stability !== undefined || options.similarityBoost !== undefined ? { stability: options.stability ?? 0.5, similarity_boost: options.similarityBoost ?? 0.75, } : undefined, }); // Convert the async iterator to a buffer const chunks: Buffer[] = []; for await (const chunk of audio) { chunks.push(Buffer.from(chunk)); } const buffer = Buffer.concat(chunks); this.logger.log(`Generated audio buffer of size: ${buffer.length} bytes`); return buffer; } catch (error) { this.logger.error(`Failed to convert text to speech: ${error.message}`); throw error; } } async streamTextToSpeech(options: TextToSpeechOptions): Promise<Readable> { this.logger.log(`Streaming text to speech for voice: ${options.voiceId}`); try { const audioStream = await this.client.textToSpeech.convertAsStream( options.voiceId, { text: options.text, model_id: options.modelId || 'eleven_turbo_v2_5', voice_settings: options.stability !== undefined || options.similarityBoost !== undefined ? { stability: options.stability ?? 0.5, similarity_boost: options.similarityBoost ?? 0.75, } : undefined, }, ); // Convert async iterator to readable stream const readable = new Readable({ async read() { try { for await (const chunk of audioStream) { this.push(Buffer.from(chunk)); } this.push(null); } catch (error) { this.destroy(error); } }, }); return readable; } catch (error) { this.logger.error(`Failed to stream text to speech: ${error.message}`); throw error; } } async getAvailableVoices(): Promise<Voice[]> { this.logger.log('Fetching available voices'); try { const response = await this.client.voices.getAll(); return response.voices.map((voice) => ({ voiceId: voice.voice_id, name: voice.name, category: voice.category, description: voice.description, })); } catch (error) { this.logger.error(`Failed to fetch voices: ${error.message}`); throw error; } } async createDubbing(options: DubbingOptions): Promise<string> { this.logger.log(`Creating dubbing project for language: ${options.targetLanguage}`); try { const response = await this.client.dubbing.dubAVideoOrAnAudioFile({ file: options.sourceFile, target_lang: options.targetLanguage, source_lang: options.sourceLanguage, num_speakers: options.numSpeakers || 0, // 0 for automatic detection watermark: options.watermark ?? false, }); this.logger.log(`Dubbing project created with ID: ${response.dubbing_id}`); return response.dubbing_id; } catch (error) { this.logger.error(`Failed to create dubbing: ${error.message}`); throw error; } } async getDubbingStatus(dubbingId: string): Promise<DubbingStatus> { this.logger.log(`Checking dubbing status for ID: ${dubbingId}`); try { const metadata = await this.client.dubbing.getDubbingProjectMetadata(dubbingId); return { dubbingId: metadata.dubbing_id, status: metadata.status, targetLanguages: metadata.target_languages, error: metadata.error_message, }; } catch (error) { this.logger.error(`Failed to get dubbing status: ${error.message}`); throw error; } } async downloadDubbedAudio( dubbingId: string, languageCode: string, ): Promise<Buffer> { this.logger.log( `Downloading dubbed audio for ID: ${dubbingId}, language: ${languageCode}`, ); try { const audioStream = await this.client.dubbing.getTranscriptForDub( dubbingId, languageCode, ); // Convert stream to buffer const chunks: Buffer[] = []; for await (const chunk of audioStream) { chunks.push(Buffer.from(chunk)); } const buffer = Buffer.concat(chunks); this.logger.log(`Downloaded dubbed audio buffer of size: ${buffer.length} bytes`); return buffer; } catch (error) { this.logger.error(`Failed to download dubbed audio: ${error.message}`); throw error; } }}Creating the Controller
Now let's create a controller that exposes these capabilities through REST endpoints. Update src/audio/audio.controller.ts:
import { Controller, Post, Get, Body, Param, Res, UseInterceptors, UploadedFile, BadRequestException, NotFoundException, Inject, StreamableFile,} from '@nestjs/common';import { FileInterceptor } from '@nestjs/platform-express';import { Response } from 'express';import { IAudioService, AUDIO_SERVICE_TOKEN } from './interfaces/audio-service.interface';import { TextToSpeechDto } from './dto/text-to-speech.dto';import { CreateDubbingDto, DownloadDubbingDto } from './dto/dubbing.dto';@Controller('audio')export class AudioController { constructor( @Inject(AUDIO_SERVICE_TOKEN) private readonly audioService: IAudioService, ) {} @Post('text-to-speech') async generateSpeech( @Body() dto: TextToSpeechDto, @Res() res: Response, ): Promise<void> { const audioBuffer = await this.audioService.convertTextToSpeech({ text: dto.text, voiceId: dto.voiceId, modelId: dto.modelId, stability: dto.stability, similarityBoost: dto.similarityBoost, }); res.set({ 'Content-Type': 'audio/mpeg', 'Content-Disposition': 'attachment; filename="speech.mp3"', 'Content-Length': audioBuffer.length, }); res.send(audioBuffer); } @Post('text-to-speech/stream') async streamSpeech( @Body() dto: TextToSpeechDto, ): Promise<StreamableFile> { const audioStream = await this.audioService.streamTextToSpeech({ text: dto.text, voiceId: dto.voiceId, modelId: dto.modelId, stability: dto.stability, similarityBoost: dto.similarityBoost, }); return new StreamableFile(audioStream, { type: 'audio/mpeg', disposition: 'attachment; filename="speech.mp3"', }); } @Get('voices') async getVoices() { return this.audioService.getAvailableVoices(); } @Post('dubbing') @UseInterceptors(FileInterceptor('file')) async createDubbing( @UploadedFile() file: Express.Multer.File, @Body() dto: CreateDubbingDto, ) { if (!file) { throw new BadRequestException('File is required'); } const dubbingId = await this.audioService.createDubbing({ sourceFile: file.buffer, targetLanguage: dto.targetLanguage, sourceLanguage: dto.sourceLanguage, numSpeakers: dto.numSpeakers, watermark: dto.watermark, }); return { dubbingId, message: 'Dubbing project created successfully', }; } @Get('dubbing/:dubbingId/status') async getDubbingStatus(@Param('dubbingId') dubbingId: string) { const status = await this.audioService.getDubbingStatus(dubbingId); if (!status) { throw new NotFoundException('Dubbing project not found'); } return status; } @Get('dubbing/:dubbingId/download/:languageCode') async downloadDubbing( @Param() params: DownloadDubbingDto, @Res() res: Response, ): Promise<void> { const audioBuffer = await this.audioService.downloadDubbedAudio( params.dubbingId, params.languageCode, ); res.set({ 'Content-Type': 'audio/mpeg', 'Content-Disposition': `attachment; filename="dubbed-${params.languageCode}.mp3"`, 'Content-Length': audioBuffer.length, }); res.send(audioBuffer); }}Wiring Everything Together
Now we need to update our audio.module.ts to provide all the necessary dependencies:
import { Module } from '@nestjs/common';import { ConfigModule } from '@nestjs/config';import { AudioController } from './audio.controller';import { ElevenLabsAudioService } from './elevenlabs-audio.service';import { AUDIO_SERVICE_TOKEN } from './interfaces/audio-service.interface';@Module({ imports: [ConfigModule], controllers: [AudioController], providers: [ { provide: AUDIO_SERVICE_TOKEN, useClass: ElevenLabsAudioService, }, ], exports: [AUDIO_SERVICE_TOKEN],})export class AudioModule {}Finally, let's make sure to import the AudioModule into our app.module.ts:
import { Module } from '@nestjs/common';import { ConfigModule } from '@nestjs/config';import { AudioModule } from './audio/audio.module';import { AppController } from './app.controller';import { AppService } from './app.service';@Module({ imports: [ ConfigModule.forRoot({ isGlobal: true, }), AudioModule, ], controllers: [AppController], providers: [AppService],})export class AppModule {}Adding File Upload Support
To support file uploads for dubbing, we need to install and configure Multer:
$ npm i -D @types/multerLet's update our main.ts to enable CORS and set up global validation:
import { NestFactory } from '@nestjs/core';import { ValidationPipe } from '@nestjs/common';import { AppModule } from './app.module';async function bootstrap() { const app = await NestFactory.create(AppModule); app.useGlobalPipes( // <-- new ValidationPipe({ whitelist: true, forbidNonWhitelisted: true, transform: true, }), ); app.enableCors(); // <-- await app.listen(3000); console.log('Application is running on: http://localhost:3000');}bootstrap();Testing the Implementation
Now let's test our implementation by firing up our NestJS application!
$ npm run start:dev1. Getting Available Voices
Using CURL, let's run through and test how everything is working so far!
curl http://localhost:3000/audio/voicesThis will return a list of available voices that you can use for text-to-speech.
2. Converting Text to Speech
curl -X POST http://localhost:3000/audio/text-to-speech \ -H "Content-Type: application/json" \ -d '{ "text": "Hello! This is a test of the ElevenLabs integration with NestJS.", "voiceId": "21m00Tcm4TlvDq8ikWAM", "modelId": "eleven_multilingual_v2" }' \ --output speech.mp33. Streaming Text to Speech
curl -X POST http://localhost:3000/audio/text-to-speech/stream \ -H "Content-Type: application/json" \ -d '{ "text": "This is a streaming response for low-latency applications.", "voiceId": "21m00Tcm4TlvDq8ikWAM", "modelId": "eleven_turbo_v2_5" }' \ --output streamed-speech.mp34. Creating a Dubbing Project
curl -X POST http://localhost:3000/audio/dubbing \ -F "file=@your-audio-file.mp3" \ -F "targetLanguage=es" \ -F "sourceLanguage=en"This will return a dubbing ID that you can use to check status and download the result.
5. Checking Dubbing Status
curl http://localhost:3000/audio/dubbing/{dubbingId}/status6. Downloading Dubbed Audio
Once the status is "dubbed", you can download the result:
curl http://localhost:3000/audio/dubbing/{dubbingId}/download/es \ --output dubbed-spanish.mp3Advanced Features and Best Practices
Implementing Caching
For production applications, you'll want to cache generated audio to reduce API calls and costs:
import { Injectable, Logger, Inject } from '@nestjs/common';import { CACHE_MANAGER } from '@nestjs/cache-manager';import { Cache } from 'cache-manager';import * as crypto from 'crypto';@Injectable()export class CachedAudioService implements IAudioService { private readonly logger = new Logger(CachedAudioService.name); constructor( @Inject(AUDIO_SERVICE_TOKEN) private readonly audioService: IAudioService, @Inject(CACHE_MANAGER) private cacheManager: Cache, ) {} async convertTextToSpeech(options: TextToSpeechOptions): Promise<Buffer> { // Create a cache key based on the options const cacheKey = this.generateCacheKey(options); // Check cache first const cached = await this.cacheManager.get<Buffer>(cacheKey); if (cached) { this.logger.log('Returning cached audio'); return cached; } // Generate new audio const audio = await this.audioService.convertTextToSpeech(options); // Cache for 24 hours await this.cacheManager.set(cacheKey, audio, 86400); return audio; } private generateCacheKey(options: TextToSpeechOptions): string { const data = JSON.stringify(options); return crypto.createHash('md5').update(data).digest('hex'); } // Implement other methods...}Implementing Error Handling
In our applications we always want to make sure we implement proper error handling with custom exceptions, for situations where things go wrong.
import { HttpException, HttpStatus } from '@nestjs/common';export class AudioGenerationException extends HttpException { constructor(message: string) { super( { statusCode: HttpStatus.INTERNAL_SERVER_ERROR, message: 'Audio generation failed', error: message, }, HttpStatus.INTERNAL_SERVER_ERROR, ); }}export class VoiceNotFoundException extends HttpException { constructor(voiceId: string) { super( { statusCode: HttpStatus.NOT_FOUND, message: `Voice with ID ${voiceId} not found`, }, HttpStatus.NOT_FOUND, ); }}Rate Limiting
Let's also implement rate-limiting to ensure we're protecting our API from abuse!
import { Module } from '@nestjs/common';import { ThrottlerModule } from '@nestjs/throttler';@Module({ imports: [ ThrottlerModule.forRoot([{ ttl: 60000, limit: 10, }]), ],})export class AppModule {}Let's also add the throttler guard to our AudioController:
import { UseGuards } from '@nestjs/common';import { ThrottlerGuard } from '@nestjs/throttler';@Controller('audio')@UseGuards(ThrottlerGuard)export class AudioController { // ... controller methods}Use Cases
The integration we've built so far opens up numerous possibilities for our Nest applications:
- Content Creation Platforms: Generate voiceovers for video content, podcasts, or educational materials
- Accessibility Features: Convert written content to audio for visually impaired users
- Language Learning Apps: Create pronunciation guides and multilingual content
- Customer Service: Generate automated voice responses or create multilingual support content
- Audiobook Production: Convert books and articles into audio format
- Marketing: Create multilingual ad campaigns with consistent voice branding across languages
Bringing it all together
In no time, we've managed to build a production-ready integration between ElevenLabs and NestJS. By abstracting the audio service behind an interface, we've created a maintainable and testable solution that can easily adapt to changing requirements or alternative providers.
The implementation we've created provides:
- Text-to-speech conversion with customizable voice settings
- Real-time audio streaming for low-latency applications
- Voice discovery and management
- Automatic dubbing and translation across multiple languages
- A clean separation of concerns following SOLID principles
- Easy testability and maintainability
We've also set everything up in a way that makes it straightforward to extend the functionality further, so have fun seeing what else you can do! You could add features like voice cloning, sound effects generation, or integration with other AI services. The skies the limit! 🐈
Learn NestJS - Official NestJS Courses 📚
Level-up your NestJS and Node.js ecosystem skills in these incremental workshop-style courses, from the NestJS Creator himself, and help support the NestJS framework! 🐈🚀 The NestJS Fundamentals Course is now LIVE and 25% off for a limited time!
🎉 NEW - NestJS Course Extensions now live!
- NestJS Advanced Concepts Course now LIVE!
- NestJS Advanced Bundle (Advanced Architecture and Advanced Concepts) now 22% OFF!
- NestJS Microservices now LIVE!
- NestJS Authentication / Authorization Course now LIVE!
- NestJS GraphQL Course (code-first & schema-first approaches) are now LIVE!
- NestJS Authentication / Authorization Course now LIVE!



