Voice AI Integration with ElevenLabs and NestJS

In today's world of digital content, voice and audio are becoming increasingly important. From podcasts and audiobooks to accessibility features and multilingual content, the demand for high-quality text-to-speech (TTS) and audio dubbing solutions continues to grow. As developers, we're often tasked with integrating these capabilities into our applications, and doing so in a way that's maintainable, testable, and scalable.

In this article, we'll be exploring how to integrate ElevenLabs - one of the most advanced AI voice platforms - into a NestJS application. We'll build a production-ready audio service that leverages ElevenLabs powerful features, including text-to-speech generation, voice cloning, audio streaming, and automatic dubbing across multiple languages. More importantly, we'll implement these features following NestJS best practices and clean architecture principles. So let's dive right in!

What is ElevenLabs?

ElevenLabs is an AI audio research and deployment company that provides state-of-the-art text => to => speech and voice AI technology. Their API allows us developers to generate incredibly realistic voiceovers in multiple languages, clone voices, stream audio in real-time, and even dub entire videos automatically while preserving the speaker's emotion, timing, and tone.

Some key features include:

Text-to-Speech: Convert text to lifelike speech in over 32 languages
Voice Cloning: Create custom AI voices from audio samples
Real-time Streaming: Stream audio as it's being generated for low-latency applications
AI Dubbing: Automatically translate and dub audio/video content across 29 languages with speaker detection and voice preservation
Multiple Models: Choose from various models optimized for quality, speed, or multilingual support

Prerequisites

Before we begin, make sure you have the following installed:

Node.js (version 18 or higher)
NestJS CLI installed globally
An ElevenLabs account with an API key (get yours at elevenlabs.io)

Setting Up the Project

Let's start by creating a new NestJS project and installing the necessary dependencies:

$ nest new elevenlabs-nest
$ cd elevenlabs-nest

Select your preferred package manager when prompted and let's install some required packages:

$ npm i @nestjs/config @elevenlabs/elevenlabs-js

We'll be using the @nestjs/config package to help us manage environment variables, and we're installing the official ElevenLabs Node.js SDK @elevenlabs/elevenlabs-js.

Configuring the Application

So let's start off by setting up our environment variables. Let's create a .env file in the root of our project:

ELEVENLABS_API_KEY=your_api_key_here

Now let's update the app.module.ts file to include the ConfigModule:

import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';
import { AppController } from './app.controller';
import { AppService } from './app.service';
@Module({
  imports: [
    ConfigModule.forRoot({
      isGlobal: true,
    }),
  ],
  controllers: [AppController],
  providers: [AppService],
})
export class AppModule {}

Creating the Audio Module

Now that we have the fundamental parts setup, let's get to the fun stuff and start setting up our ElevenLabs-related audio code.

Let's generate a dedicated module for our audio-related functionality:

$ nest generate module audio
$ nest generate service audio
$ nest generate controller audio

This creates an audio folder with the necessary files. Just as a look ahead, our folder structure will eventually look like this:

src/
├── audio/
│   ├── audio.controller.ts
│   ├── audio.module.ts
│   ├── audio.service.ts
│   ├── dto/
│   │   ├── text-to-speech.dto.ts
│   │   └── dubbing.dto.ts
│   └── interfaces/
│       └── audio-service.interface.ts
├── app.module.ts
└── main.ts

Implementing the Audio Service Interface

Since we all love clean architecture principles, let's first define an interface for our audio service. This way our business logic will be independent of any specific implementation (ElevenLabs in this case) and this allows for easy testing and potential provider swapping in the future.

Create src/audio/interfaces/audio-service.interface.ts:

import { Readable } from 'stream';
export interface Voice {
  voiceId: string;
  name: string;
  category?: string;
  description?: string;
}
export interface TextToSpeechOptions {
  text: string;
  voiceId: string;
  modelId?: string;
  stability?: number;
  similarityBoost?: number;
}
export interface DubbingOptions {
  sourceFile: Buffer;
  targetLanguage: string;
  sourceLanguage?: string;
  numSpeakers?: number;
  watermark?: boolean;
}
export interface DubbingStatus {
  dubbingId: string;
  status: string;
  targetLanguages?: string[];
  error?: string;
}
export interface IAudioService {
  convertTextToSpeech(options: TextToSpeechOptions): Promise<Buffer>;
  streamTextToSpeech(options: TextToSpeechOptions): Promise<Readable>;
  getAvailableVoices(): Promise<Voice[]>;
  createDubbing(options: DubbingOptions): Promise<string>;
  getDubbingStatus(dubbingId: string): Promise<DubbingStatus>;
  downloadDubbedAudio(dubbingId: string, languageCode: string): Promise<Buffer>;
}
export const AUDIO_SERVICE_TOKEN = 'AUDIO_SERVICE_TOKEN';

Creating DTOs

DTOs (Data Transfer Objects) help us validate and type our incoming requests. Create the following DTOs:

Let's create a few new files, and populate them with the class below each section. /audio/dto/text-to-speech.dto.ts:

import { IsString, IsOptional, IsNumber, Min, Max } from 'class-validator';
export class TextToSpeechDto {
  @IsString()
  text: string;
  @IsString()
  voiceId: string;
  @IsOptional()
  @IsString()
  modelId?: string;
  @IsOptional()
  @IsNumber()
  @Min(0)
  @Max(1)
  stability?: number;
  @IsOptional()
  @IsNumber()
  @Min(0)
  @Max(1)
  similarityBoost?: number;
}

/audio/dto/dubbing.dto.ts:

import { IsString, IsOptional, IsNumber, IsBoolean } from 'class-validator';
export class CreateDubbingDto {
  @IsString()
  targetLanguage: string;
  @IsOptional()
  @IsString()
  sourceLanguage?: string;
  @IsOptional()
  @IsNumber()
  numSpeakers?: number;
  @IsOptional()
  @IsBoolean()
  watermark?: boolean;
}
export class DubbingStatusDto {
  @IsString()
  dubbingId: string;
}
export class DownloadDubbingDto {
  @IsString()
  dubbingId: string;
  @IsString()
  languageCode: string;
}

Implementing the ElevenLabs Service

Great! Now let's implement our audio service using the ElevenLabs SDK. Create a new file /audio/elevenlabs-audio.service.ts:

import { Injectable, Logger } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import { Readable } from 'stream';
import {
  IAudioService,
  Voice,
  TextToSpeechOptions,
  DubbingOptions,
  DubbingStatus,
} from './interfaces/audio-service.interface';
@Injectable()
export class ElevenLabsAudioService implements IAudioService {
  private readonly logger = new Logger(ElevenLabsAudioService.name);
  private readonly client: ElevenLabsClient;
  constructor(private configService: ConfigService) {
    const apiKey = this.configService.get<string>('ELEVENLABS_API_KEY');
    
    if (!apiKey) {
      throw new Error('ELEVENLABS_API_KEY is not configured');
    }
    this.client = new ElevenLabsClient({ apiKey });
    this.logger.log('ElevenLabs client initialized');
  }
  async convertTextToSpeech(options: TextToSpeechOptions): Promise<Buffer> {
    this.logger.log(`Converting text to speech for voice: ${options.voiceId}`);
    try {
      const audio = await this.client.textToSpeech.convert(options.voiceId, {
        text: options.text,
        model_id: options.modelId || 'eleven_multilingual_v2',
        voice_settings: options.stability !== undefined || options.similarityBoost !== undefined
          ? {
              stability: options.stability ?? 0.5,
              similarity_boost: options.similarityBoost ?? 0.75,
            }
          : undefined,
      });
      // Convert the async iterator to a buffer
      const chunks: Buffer[] = [];
      for await (const chunk of audio) {
        chunks.push(Buffer.from(chunk));
      }
      const buffer = Buffer.concat(chunks);
      this.logger.log(`Generated audio buffer of size: ${buffer.length} bytes`);
      return buffer;
    } catch (error) {
      this.logger.error(`Failed to convert text to speech: ${error.message}`);
      throw error;
    }
  }
  async streamTextToSpeech(options: TextToSpeechOptions): Promise<Readable> {
    this.logger.log(`Streaming text to speech for voice: ${options.voiceId}`);
    try {
      const audioStream = await this.client.textToSpeech.convertAsStream(
        options.voiceId,
        {
          text: options.text,
          model_id: options.modelId || 'eleven_turbo_v2_5',
          voice_settings: options.stability !== undefined || options.similarityBoost !== undefined
            ? {
                stability: options.stability ?? 0.5,
                similarity_boost: options.similarityBoost ?? 0.75,
              }
            : undefined,
        },
      );
      // Convert async iterator to readable stream
      const readable = new Readable({
        async read() {
          try {
            for await (const chunk of audioStream) {
              this.push(Buffer.from(chunk));
            }
            this.push(null);
          } catch (error) {
            this.destroy(error);
          }
        },
      });
      return readable;
    } catch (error) {
      this.logger.error(`Failed to stream text to speech: ${error.message}`);
      throw error;
    }
  }
  async getAvailableVoices(): Promise<Voice[]> {
    this.logger.log('Fetching available voices');
    try {
      const response = await this.client.voices.getAll();
      
      return response.voices.map((voice) => ({
        voiceId: voice.voice_id,
        name: voice.name,
        category: voice.category,
        description: voice.description,
      }));
    } catch (error) {
      this.logger.error(`Failed to fetch voices: ${error.message}`);
      throw error;
    }
  }
  async createDubbing(options: DubbingOptions): Promise<string> {
    this.logger.log(`Creating dubbing project for language: ${options.targetLanguage}`);
    try {
      const response = await this.client.dubbing.dubAVideoOrAnAudioFile({
        file: options.sourceFile,
        target_lang: options.targetLanguage,
        source_lang: options.sourceLanguage,
        num_speakers: options.numSpeakers || 0, // 0 for automatic detection
        watermark: options.watermark ?? false,
      });
      this.logger.log(`Dubbing project created with ID: ${response.dubbing_id}`);
      return response.dubbing_id;
    } catch (error) {
      this.logger.error(`Failed to create dubbing: ${error.message}`);
      throw error;
    }
  }
  async getDubbingStatus(dubbingId: string): Promise<DubbingStatus> {
    this.logger.log(`Checking dubbing status for ID: ${dubbingId}`);
    try {
      const metadata = await this.client.dubbing.getDubbingProjectMetadata(dubbingId);
      return {
        dubbingId: metadata.dubbing_id,
        status: metadata.status,
        targetLanguages: metadata.target_languages,
        error: metadata.error_message,
      };
    } catch (error) {
      this.logger.error(`Failed to get dubbing status: ${error.message}`);
      throw error;
    }
  }
  async downloadDubbedAudio(
    dubbingId: string,
    languageCode: string,
  ): Promise<Buffer> {
    this.logger.log(
      `Downloading dubbed audio for ID: ${dubbingId}, language: ${languageCode}`,
    );
    try {
      const audioStream = await this.client.dubbing.getTranscriptForDub(
        dubbingId,
        languageCode,
      );
      // Convert stream to buffer
      const chunks: Buffer[] = [];
      for await (const chunk of audioStream) {
        chunks.push(Buffer.from(chunk));
      }
      const buffer = Buffer.concat(chunks);
      this.logger.log(`Downloaded dubbed audio buffer of size: ${buffer.length} bytes`);
      return buffer;
    } catch (error) {
      this.logger.error(`Failed to download dubbed audio: ${error.message}`);
      throw error;
    }
  }
}

Creating the Controller

Now let's create a controller that exposes these capabilities through REST endpoints. Update src/audio/audio.controller.ts:

import {
  Controller,
  Post,
  Get,
  Body,
  Param,
  Res,
  UseInterceptors,
  UploadedFile,
  BadRequestException,
  NotFoundException,
  Inject,
  StreamableFile,
} from '@nestjs/common';
import { FileInterceptor } from '@nestjs/platform-express';
import { Response } from 'express';
import { IAudioService, AUDIO_SERVICE_TOKEN } from './interfaces/audio-service.interface';
import { TextToSpeechDto } from './dto/text-to-speech.dto';
import { CreateDubbingDto, DownloadDubbingDto } from './dto/dubbing.dto';
@Controller('audio')
export class AudioController {
  constructor(
    @Inject(AUDIO_SERVICE_TOKEN)
    private readonly audioService: IAudioService,
  ) {}
  @Post('text-to-speech')
  async generateSpeech(
    @Body() dto: TextToSpeechDto,
    @Res() res: Response,
  ): Promise<void> {
    const audioBuffer = await this.audioService.convertTextToSpeech({
      text: dto.text,
      voiceId: dto.voiceId,
      modelId: dto.modelId,
      stability: dto.stability,
      similarityBoost: dto.similarityBoost,
    });
    res.set({
      'Content-Type': 'audio/mpeg',
      'Content-Disposition': 'attachment; filename="speech.mp3"',
      'Content-Length': audioBuffer.length,
    });
    res.send(audioBuffer);
  }
  @Post('text-to-speech/stream')
  async streamSpeech(
    @Body() dto: TextToSpeechDto,
  ): Promise<StreamableFile> {
    const audioStream = await this.audioService.streamTextToSpeech({
      text: dto.text,
      voiceId: dto.voiceId,
      modelId: dto.modelId,
      stability: dto.stability,
      similarityBoost: dto.similarityBoost,
    });
    return new StreamableFile(audioStream, {
      type: 'audio/mpeg',
      disposition: 'attachment; filename="speech.mp3"',
    });
  }
  @Get('voices')
  async getVoices() {
    return this.audioService.getAvailableVoices();
  }
  @Post('dubbing')
  @UseInterceptors(FileInterceptor('file'))
  async createDubbing(
    @UploadedFile() file: Express.Multer.File,
    @Body() dto: CreateDubbingDto,
  ) {
    if (!file) {
      throw new BadRequestException('File is required');
    }
    const dubbingId = await this.audioService.createDubbing({
      sourceFile: file.buffer,
      targetLanguage: dto.targetLanguage,
      sourceLanguage: dto.sourceLanguage,
      numSpeakers: dto.numSpeakers,
      watermark: dto.watermark,
    });
    return {
      dubbingId,
      message: 'Dubbing project created successfully',
    };
  }
  @Get('dubbing/:dubbingId/status')
  async getDubbingStatus(@Param('dubbingId') dubbingId: string) {
    const status = await this.audioService.getDubbingStatus(dubbingId);
    if (!status) {
      throw new NotFoundException('Dubbing project not found');
    }
    return status;
  }
  @Get('dubbing/:dubbingId/download/:languageCode')
  async downloadDubbing(
    @Param() params: DownloadDubbingDto,
    @Res() res: Response,
  ): Promise<void> {
    const audioBuffer = await this.audioService.downloadDubbedAudio(
      params.dubbingId,
      params.languageCode,
    );
    res.set({
      'Content-Type': 'audio/mpeg',
      'Content-Disposition': `attachment; filename="dubbed-${params.languageCode}.mp3"`,
      'Content-Length': audioBuffer.length,
    });
    res.send(audioBuffer);
  }
}

Wiring Everything Together

Now we need to update our audio.module.ts to provide all the necessary dependencies:

import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';
import { AudioController } from './audio.controller';
import { ElevenLabsAudioService } from './elevenlabs-audio.service';
import { AUDIO_SERVICE_TOKEN } from './interfaces/audio-service.interface';
@Module({
  imports: [ConfigModule],
  controllers: [AudioController],
  providers: [
    {
      provide: AUDIO_SERVICE_TOKEN,
      useClass: ElevenLabsAudioService,
    },
  ],
  exports: [AUDIO_SERVICE_TOKEN],
})
export class AudioModule {}

Finally, let's make sure to import the AudioModule into our app.module.ts:

import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';
import { AudioModule } from './audio/audio.module';
import { AppController } from './app.controller';
import { AppService } from './app.service';
@Module({
  imports: [
    ConfigModule.forRoot({
      isGlobal: true,
    }),
    AudioModule,
  ],
  controllers: [AppController],
  providers: [AppService],
})
export class AppModule {}

Adding File Upload Support

To support file uploads for dubbing, we need to install and configure Multer:

$ npm i -D @types/multer

Let's update our main.ts to enable CORS and set up global validation:

import { NestFactory } from '@nestjs/core';
import { ValidationPipe } from '@nestjs/common';
import { AppModule } from './app.module';
async function bootstrap() {
  const app = await NestFactory.create(AppModule);
  
  app.useGlobalPipes( // <--
    new ValidationPipe({
      whitelist: true,
      forbidNonWhitelisted: true,
      transform: true,
    }),
  );
  
  app.enableCors(); // <--
  
  await app.listen(3000);
  console.log('Application is running on: http://localhost:3000');
}
bootstrap();

Testing the Implementation

Now let's test our implementation by firing up our NestJS application!

$ npm run start:dev

1. Getting Available Voices

Using CURL, let's run through and test how everything is working so far!

curl http://localhost:3000/audio/voices

This will return a list of available voices that you can use for text-to-speech.

2. Converting Text to Speech

curl -X POST http://localhost:3000/audio/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello! This is a test of the ElevenLabs integration with NestJS.",
    "voiceId": "21m00Tcm4TlvDq8ikWAM",
    "modelId": "eleven_multilingual_v2"
  }' \
  --output speech.mp3

3. Streaming Text to Speech

curl -X POST http://localhost:3000/audio/text-to-speech/stream \
  -H "Content-Type: application/json" \
  -d '{
    "text": "This is a streaming response for low-latency applications.",
    "voiceId": "21m00Tcm4TlvDq8ikWAM",
    "modelId": "eleven_turbo_v2_5"
  }' \
  --output streamed-speech.mp3

4. Creating a Dubbing Project

curl -X POST http://localhost:3000/audio/dubbing \
  -F "file=@your-audio-file.mp3" \
  -F "targetLanguage=es" \
  -F "sourceLanguage=en"

This will return a dubbing ID that you can use to check status and download the result.

5. Checking Dubbing Status

curl http://localhost:3000/audio/dubbing/{dubbingId}/status

6. Downloading Dubbed Audio

Once the status is "dubbed", you can download the result:

curl http://localhost:3000/audio/dubbing/{dubbingId}/download/es \
  --output dubbed-spanish.mp3

Advanced Features and Best Practices

Implementing Caching

For production applications, you'll want to cache generated audio to reduce API calls and costs:

import { Injectable, Logger, Inject } from '@nestjs/common';
import { CACHE_MANAGER } from '@nestjs/cache-manager';
import { Cache } from 'cache-manager';
import * as crypto from 'crypto';
@Injectable()
export class CachedAudioService implements IAudioService {
  private readonly logger = new Logger(CachedAudioService.name);
  constructor(
    @Inject(AUDIO_SERVICE_TOKEN)
    private readonly audioService: IAudioService,
    @Inject(CACHE_MANAGER)
    private cacheManager: Cache,
  ) {}
  async convertTextToSpeech(options: TextToSpeechOptions): Promise<Buffer> {
    // Create a cache key based on the options
    const cacheKey = this.generateCacheKey(options);
    
    // Check cache first
    const cached = await this.cacheManager.get<Buffer>(cacheKey);
    if (cached) {
      this.logger.log('Returning cached audio');
      return cached;
    }
    // Generate new audio
    const audio = await this.audioService.convertTextToSpeech(options);
    
    // Cache for 24 hours
    await this.cacheManager.set(cacheKey, audio, 86400);
    
    return audio;
  }
  private generateCacheKey(options: TextToSpeechOptions): string {
    const data = JSON.stringify(options);
    return crypto.createHash('md5').update(data).digest('hex');
  }
  // Implement other methods...
}

Implementing Error Handling

In our applications we always want to make sure we implement proper error handling with custom exceptions, for situations where things go wrong.

import { HttpException, HttpStatus } from '@nestjs/common';
export class AudioGenerationException extends HttpException {
  constructor(message: string) {
    super(
      {
        statusCode: HttpStatus.INTERNAL_SERVER_ERROR,
        message: 'Audio generation failed',
        error: message,
      },
      HttpStatus.INTERNAL_SERVER_ERROR,
    );
  }
}
export class VoiceNotFoundException extends HttpException {
  constructor(voiceId: string) {
    super(
      {
        statusCode: HttpStatus.NOT_FOUND,
        message: `Voice with ID ${voiceId} not found`,
      },
      HttpStatus.NOT_FOUND,
    );
  }
}

Rate Limiting

Let's also implement rate-limiting to ensure we're protecting our API from abuse!

import { Module } from '@nestjs/common';
import { ThrottlerModule } from '@nestjs/throttler';
@Module({
  imports: [
    ThrottlerModule.forRoot([{
      ttl: 60000,
      limit: 10,
    }]),
  ],
})
export class AppModule {}

Let's also add the throttler guard to our AudioController:

import { UseGuards } from '@nestjs/common';
import { ThrottlerGuard } from '@nestjs/throttler';
@Controller('audio')
@UseGuards(ThrottlerGuard)
export class AudioController {
  // ... controller methods
}

Use Cases

The integration we've built so far opens up numerous possibilities for our Nest applications:

Content Creation Platforms: Generate voiceovers for video content, podcasts, or educational materials
Accessibility Features: Convert written content to audio for visually impaired users
Language Learning Apps: Create pronunciation guides and multilingual content
Customer Service: Generate automated voice responses or create multilingual support content
Audiobook Production: Convert books and articles into audio format
Marketing: Create multilingual ad campaigns with consistent voice branding across languages

Bringing it all together

In no time, we've managed to build a production-ready integration between ElevenLabs and NestJS. By abstracting the audio service behind an interface, we've created a maintainable and testable solution that can easily adapt to changing requirements or alternative providers.

The implementation we've created provides:

Text-to-speech conversion with customizable voice settings
Real-time audio streaming for low-latency applications
Voice discovery and management
Automatic dubbing and translation across multiple languages
A clean separation of concerns following SOLID principles
Easy testability and maintainability

We've also set everything up in a way that makes it straightforward to extend the functionality further, so have fun seeing what else you can do! You could add features like voice cloning, sound effects generation, or integration with other AI services. The skies the limit! 🐈

Voice AI Integration with ElevenLabs and NestJS.

Mark Pieszak on October 27, 2025 , 7 min read

Learn NestJS - Official NestJS Courses 📚

Share this Post!

📬 Trilon Newsletter

Mark Pieszak Trilon Co-Founder

About the Author

More from the Trilon Blog .

Mirsad Halilčević on February 20, 2025

Kamil Mysliwiec on January 22, 2025

Jay McDoniel on July 18, 2023

What we do at Trilon .

Consulting .

Development .

Workshops .

Open-source .

Write us a message .

Rather send us an email? Write to:

Mark Pieszak
on October 27, 2025 , 7 min read

Mark Pieszak
Trilon Co-Founder

Mirsad Halilčević
on February 20, 2025

Kamil Mysliwiec
on January 22, 2025

Jay McDoniel
on July 18, 2023