Voice AI Integration with ElevenLabs and NestJS.

Mark Pieszak | Trilon Consulting
Mark Pieszak

In today's world of digital content, voice and audio are becoming increasingly important. From podcasts and audiobooks to accessibility features and multilingual content, the demand for high-quality text-to-speech (TTS) and audio dubbing solutions continues to grow. As developers, we're often tasked with integrating these capabilities into our applications, and doing so in a way that's maintainable, testable, and scalable.

In this article, we'll be exploring how to integrate ElevenLabs - one of the most advanced AI voice platforms - into a NestJS application. We'll build a production-ready audio service that leverages ElevenLabs powerful features, including text-to-speech generation, voice cloning, audio streaming, and automatic dubbing across multiple languages. More importantly, we'll implement these features following NestJS best practices and clean architecture principles. So let's dive right in!

What is ElevenLabs?

ElevenLabs is an AI audio research and deployment company that provides state-of-the-art text => to => speech and voice AI technology. Their API allows us developers to generate incredibly realistic voiceovers in multiple languages, clone voices, stream audio in real-time, and even dub entire videos automatically while preserving the speaker's emotion, timing, and tone.

Some key features include:

  • Text-to-Speech: Convert text to lifelike speech in over 32 languages
  • Voice Cloning: Create custom AI voices from audio samples
  • Real-time Streaming: Stream audio as it's being generated for low-latency applications
  • AI Dubbing: Automatically translate and dub audio/video content across 29 languages with speaker detection and voice preservation
  • Multiple Models: Choose from various models optimized for quality, speed, or multilingual support

Prerequisites

Before we begin, make sure you have the following installed:

  • Node.js (version 18 or higher)
  • NestJS CLI installed globally
  • An ElevenLabs account with an API key (get yours at elevenlabs.io)

Setting Up the Project

Let's start by creating a new NestJS project and installing the necessary dependencies:

$ nest new elevenlabs-nest
$ cd elevenlabs-nest

Select your preferred package manager when prompted and let's install some required packages:

$ npm i @nestjs/config @elevenlabs/elevenlabs-js

We'll be using the @nestjs/config package to help us manage environment variables, and we're installing the official ElevenLabs Node.js SDK @elevenlabs/elevenlabs-js.

Configuring the Application

So let's start off by setting up our environment variables. Let's create a .env file in the root of our project:

ELEVENLABS_API_KEY=your_api_key_here

Now let's update the app.module.ts file to include the ConfigModule:

import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';
import { AppController } from './app.controller';
import { AppService } from './app.service';
@Module({
imports: [
ConfigModule.forRoot({
isGlobal: true,
}),
],
controllers: [AppController],
providers: [AppService],
})
export class AppModule {}

Creating the Audio Module

Now that we have the fundamental parts setup, let's get to the fun stuff and start setting up our ElevenLabs-related audio code.

Let's generate a dedicated module for our audio-related functionality:

$ nest generate module audio
$ nest generate service audio
$ nest generate controller audio

This creates an audio folder with the necessary files. Just as a look ahead, our folder structure will eventually look like this:

src/ ├── audio/ │ ├── audio.controller.ts │ ├── audio.module.ts │ ├── audio.service.ts │ ├── dto/ │ │ ├── text-to-speech.dto.ts │ │ └── dubbing.dto.ts │ └── interfaces/ │ └── audio-service.interface.ts ├── app.module.ts └── main.ts

Implementing the Audio Service Interface

Since we all love clean architecture principles, let's first define an interface for our audio service. This way our business logic will be independent of any specific implementation (ElevenLabs in this case) and this allows for easy testing and potential provider swapping in the future.

Create src/audio/interfaces/audio-service.interface.ts:

import { Readable } from 'stream';
export interface Voice {
voiceId: string;
name: string;
category?: string;
description?: string;
}
export interface TextToSpeechOptions {
text: string;
voiceId: string;
modelId?: string;
stability?: number;
similarityBoost?: number;
}
export interface DubbingOptions {
sourceFile: Buffer;
targetLanguage: string;
sourceLanguage?: string;
numSpeakers?: number;
watermark?: boolean;
}
export interface DubbingStatus {
dubbingId: string;
status: string;
targetLanguages?: string[];
error?: string;
}
export interface IAudioService {
convertTextToSpeech(options: TextToSpeechOptions): Promise<Buffer>;
streamTextToSpeech(options: TextToSpeechOptions): Promise<Readable>;
getAvailableVoices(): Promise<Voice[]>;
createDubbing(options: DubbingOptions): Promise<string>;
getDubbingStatus(dubbingId: string): Promise<DubbingStatus>;
downloadDubbedAudio(dubbingId: string, languageCode: string): Promise<Buffer>;
}
export const AUDIO_SERVICE_TOKEN = 'AUDIO_SERVICE_TOKEN';

Creating DTOs

DTOs (Data Transfer Objects) help us validate and type our incoming requests. Create the following DTOs:

Let's create a few new files, and populate them with the class below each section. /audio/dto/text-to-speech.dto.ts:

import { IsString, IsOptional, IsNumber, Min, Max } from 'class-validator';
export class TextToSpeechDto {
@IsString()
text: string;
@IsString()
voiceId: string;
@IsOptional()
@IsString()
modelId?: string;
@IsOptional()
@IsNumber()
@Min(0)
@Max(1)
stability?: number;
@IsOptional()
@IsNumber()
@Min(0)
@Max(1)
similarityBoost?: number;
}

/audio/dto/dubbing.dto.ts:

import { IsString, IsOptional, IsNumber, IsBoolean } from 'class-validator';
export class CreateDubbingDto {
@IsString()
targetLanguage: string;
@IsOptional()
@IsString()
sourceLanguage?: string;
@IsOptional()
@IsNumber()
numSpeakers?: number;
@IsOptional()
@IsBoolean()
watermark?: boolean;
}
export class DubbingStatusDto {
@IsString()
dubbingId: string;
}
export class DownloadDubbingDto {
@IsString()
dubbingId: string;
@IsString()
languageCode: string;
}

Implementing the ElevenLabs Service

Great! Now let's implement our audio service using the ElevenLabs SDK. Create a new file /audio/elevenlabs-audio.service.ts:

import { Injectable, Logger } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import { Readable } from 'stream';
import {
IAudioService,
Voice,
TextToSpeechOptions,
DubbingOptions,
DubbingStatus,
} from './interfaces/audio-service.interface';
@Injectable()
export class ElevenLabsAudioService implements IAudioService {
private readonly logger = new Logger(ElevenLabsAudioService.name);
private readonly client: ElevenLabsClient;
constructor(private configService: ConfigService) {
const apiKey = this.configService.get<string>('ELEVENLABS_API_KEY');
if (!apiKey) {
throw new Error('ELEVENLABS_API_KEY is not configured');
}
this.client = new ElevenLabsClient({ apiKey });
this.logger.log('ElevenLabs client initialized');
}
async convertTextToSpeech(options: TextToSpeechOptions): Promise<Buffer> {
this.logger.log(`Converting text to speech for voice: ${options.voiceId}`);
try {
const audio = await this.client.textToSpeech.convert(options.voiceId, {
text: options.text,
model_id: options.modelId || 'eleven_multilingual_v2',
voice_settings: options.stability !== undefined || options.similarityBoost !== undefined
? {
stability: options.stability ?? 0.5,
similarity_boost: options.similarityBoost ?? 0.75,
}
: undefined,
});
// Convert the async iterator to a buffer
const chunks: Buffer[] = [];
for await (const chunk of audio) {
chunks.push(Buffer.from(chunk));
}
const buffer = Buffer.concat(chunks);
this.logger.log(`Generated audio buffer of size: ${buffer.length} bytes`);
return buffer;
} catch (error) {
this.logger.error(`Failed to convert text to speech: ${error.message}`);
throw error;
}
}
async streamTextToSpeech(options: TextToSpeechOptions): Promise<Readable> {
this.logger.log(`Streaming text to speech for voice: ${options.voiceId}`);
try {
const audioStream = await this.client.textToSpeech.convertAsStream(
options.voiceId,
{
text: options.text,
model_id: options.modelId || 'eleven_turbo_v2_5',
voice_settings: options.stability !== undefined || options.similarityBoost !== undefined
? {
stability: options.stability ?? 0.5,
similarity_boost: options.similarityBoost ?? 0.75,
}
: undefined,
},
);
// Convert async iterator to readable stream
const readable = new Readable({
async read() {
try {
for await (const chunk of audioStream) {
this.push(Buffer.from(chunk));
}
this.push(null);
} catch (error) {
this.destroy(error);
}
},
});
return readable;
} catch (error) {
this.logger.error(`Failed to stream text to speech: ${error.message}`);
throw error;
}
}
async getAvailableVoices(): Promise<Voice[]> {
this.logger.log('Fetching available voices');
try {
const response = await this.client.voices.getAll();
return response.voices.map((voice) => ({
voiceId: voice.voice_id,
name: voice.name,
category: voice.category,
description: voice.description,
}));
} catch (error) {
this.logger.error(`Failed to fetch voices: ${error.message}`);
throw error;
}
}
async createDubbing(options: DubbingOptions): Promise<string> {
this.logger.log(`Creating dubbing project for language: ${options.targetLanguage}`);
try {
const response = await this.client.dubbing.dubAVideoOrAnAudioFile({
file: options.sourceFile,
target_lang: options.targetLanguage,
source_lang: options.sourceLanguage,
num_speakers: options.numSpeakers || 0, // 0 for automatic detection
watermark: options.watermark ?? false,
});
this.logger.log(`Dubbing project created with ID: ${response.dubbing_id}`);
return response.dubbing_id;
} catch (error) {
this.logger.error(`Failed to create dubbing: ${error.message}`);
throw error;
}
}
async getDubbingStatus(dubbingId: string): Promise<DubbingStatus> {
this.logger.log(`Checking dubbing status for ID: ${dubbingId}`);
try {
const metadata = await this.client.dubbing.getDubbingProjectMetadata(dubbingId);
return {
dubbingId: metadata.dubbing_id,
status: metadata.status,
targetLanguages: metadata.target_languages,
error: metadata.error_message,
};
} catch (error) {
this.logger.error(`Failed to get dubbing status: ${error.message}`);
throw error;
}
}
async downloadDubbedAudio(
dubbingId: string,
languageCode: string,
): Promise<Buffer> {
this.logger.log(
`Downloading dubbed audio for ID: ${dubbingId}, language: ${languageCode}`,
);
try {
const audioStream = await this.client.dubbing.getTranscriptForDub(
dubbingId,
languageCode,
);
// Convert stream to buffer
const chunks: Buffer[] = [];
for await (const chunk of audioStream) {
chunks.push(Buffer.from(chunk));
}
const buffer = Buffer.concat(chunks);
this.logger.log(`Downloaded dubbed audio buffer of size: ${buffer.length} bytes`);
return buffer;
} catch (error) {
this.logger.error(`Failed to download dubbed audio: ${error.message}`);
throw error;
}
}
}

Creating the Controller

Now let's create a controller that exposes these capabilities through REST endpoints. Update src/audio/audio.controller.ts:

import {
Controller,
Post,
Get,
Body,
Param,
Res,
UseInterceptors,
UploadedFile,
BadRequestException,
NotFoundException,
Inject,
StreamableFile,
} from '@nestjs/common';
import { FileInterceptor } from '@nestjs/platform-express';
import { Response } from 'express';
import { IAudioService, AUDIO_SERVICE_TOKEN } from './interfaces/audio-service.interface';
import { TextToSpeechDto } from './dto/text-to-speech.dto';
import { CreateDubbingDto, DownloadDubbingDto } from './dto/dubbing.dto';
@Controller('audio')
export class AudioController {
constructor(
@Inject(AUDIO_SERVICE_TOKEN)
private readonly audioService: IAudioService,
) {}
@Post('text-to-speech')
async generateSpeech(
@Body() dto: TextToSpeechDto,
@Res() res: Response,
): Promise<void> {
const audioBuffer = await this.audioService.convertTextToSpeech({
text: dto.text,
voiceId: dto.voiceId,
modelId: dto.modelId,
stability: dto.stability,
similarityBoost: dto.similarityBoost,
});
res.set({
'Content-Type': 'audio/mpeg',
'Content-Disposition': 'attachment; filename="speech.mp3"',
'Content-Length': audioBuffer.length,
});
res.send(audioBuffer);
}
@Post('text-to-speech/stream')
async streamSpeech(
@Body() dto: TextToSpeechDto,
): Promise<StreamableFile> {
const audioStream = await this.audioService.streamTextToSpeech({
text: dto.text,
voiceId: dto.voiceId,
modelId: dto.modelId,
stability: dto.stability,
similarityBoost: dto.similarityBoost,
});
return new StreamableFile(audioStream, {
type: 'audio/mpeg',
disposition: 'attachment; filename="speech.mp3"',
});
}
@Get('voices')
async getVoices() {
return this.audioService.getAvailableVoices();
}
@Post('dubbing')
@UseInterceptors(FileInterceptor('file'))
async createDubbing(
@UploadedFile() file: Express.Multer.File,
@Body() dto: CreateDubbingDto,
) {
if (!file) {
throw new BadRequestException('File is required');
}
const dubbingId = await this.audioService.createDubbing({
sourceFile: file.buffer,
targetLanguage: dto.targetLanguage,
sourceLanguage: dto.sourceLanguage,
numSpeakers: dto.numSpeakers,
watermark: dto.watermark,
});
return {
dubbingId,
message: 'Dubbing project created successfully',
};
}
@Get('dubbing/:dubbingId/status')
async getDubbingStatus(@Param('dubbingId') dubbingId: string) {
const status = await this.audioService.getDubbingStatus(dubbingId);
if (!status) {
throw new NotFoundException('Dubbing project not found');
}
return status;
}
@Get('dubbing/:dubbingId/download/:languageCode')
async downloadDubbing(
@Param() params: DownloadDubbingDto,
@Res() res: Response,
): Promise<void> {
const audioBuffer = await this.audioService.downloadDubbedAudio(
params.dubbingId,
params.languageCode,
);
res.set({
'Content-Type': 'audio/mpeg',
'Content-Disposition': `attachment; filename="dubbed-${params.languageCode}.mp3"`,
'Content-Length': audioBuffer.length,
});
res.send(audioBuffer);
}
}

Wiring Everything Together

Now we need to update our audio.module.ts to provide all the necessary dependencies:

import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';
import { AudioController } from './audio.controller';
import { ElevenLabsAudioService } from './elevenlabs-audio.service';
import { AUDIO_SERVICE_TOKEN } from './interfaces/audio-service.interface';
@Module({
imports: [ConfigModule],
controllers: [AudioController],
providers: [
{
provide: AUDIO_SERVICE_TOKEN,
useClass: ElevenLabsAudioService,
},
],
exports: [AUDIO_SERVICE_TOKEN],
})
export class AudioModule {}

Finally, let's make sure to import the AudioModule into our app.module.ts:

import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';
import { AudioModule } from './audio/audio.module';
import { AppController } from './app.controller';
import { AppService } from './app.service';
@Module({
imports: [
ConfigModule.forRoot({
isGlobal: true,
}),
AudioModule,
],
controllers: [AppController],
providers: [AppService],
})
export class AppModule {}

Adding File Upload Support

To support file uploads for dubbing, we need to install and configure Multer:

$ npm i -D @types/multer

Let's update our main.ts to enable CORS and set up global validation:

import { NestFactory } from '@nestjs/core';
import { ValidationPipe } from '@nestjs/common';
import { AppModule } from './app.module';
async function bootstrap() {
const app = await NestFactory.create(AppModule);
app.useGlobalPipes( // <--
new ValidationPipe({
whitelist: true,
forbidNonWhitelisted: true,
transform: true,
}),
);
app.enableCors(); // <--
await app.listen(3000);
console.log('Application is running on: http://localhost:3000');
}
bootstrap();

Testing the Implementation

Now let's test our implementation by firing up our NestJS application!

$ npm run start:dev

1. Getting Available Voices

Using CURL, let's run through and test how everything is working so far!

curl http://localhost:3000/audio/voices

This will return a list of available voices that you can use for text-to-speech.

2. Converting Text to Speech

curl -X POST http://localhost:3000/audio/text-to-speech \
-H "Content-Type: application/json" \
-d '{
"text": "Hello! This is a test of the ElevenLabs integration with NestJS.",
"voiceId": "21m00Tcm4TlvDq8ikWAM",
"modelId": "eleven_multilingual_v2"
}' \
--output speech.mp3

3. Streaming Text to Speech

curl -X POST http://localhost:3000/audio/text-to-speech/stream \
-H "Content-Type: application/json" \
-d '{
"text": "This is a streaming response for low-latency applications.",
"voiceId": "21m00Tcm4TlvDq8ikWAM",
"modelId": "eleven_turbo_v2_5"
}' \
--output streamed-speech.mp3

4. Creating a Dubbing Project

curl -X POST http://localhost:3000/audio/dubbing \
-F "file=@your-audio-file.mp3" \
-F "targetLanguage=es" \
-F "sourceLanguage=en"

This will return a dubbing ID that you can use to check status and download the result.

5. Checking Dubbing Status

curl http://localhost:3000/audio/dubbing/{dubbingId}/status

6. Downloading Dubbed Audio

Once the status is "dubbed", you can download the result:

curl http://localhost:3000/audio/dubbing/{dubbingId}/download/es \
--output dubbed-spanish.mp3

Advanced Features and Best Practices

Implementing Caching

For production applications, you'll want to cache generated audio to reduce API calls and costs:

import { Injectable, Logger, Inject } from '@nestjs/common';
import { CACHE_MANAGER } from '@nestjs/cache-manager';
import { Cache } from 'cache-manager';
import * as crypto from 'crypto';
@Injectable()
export class CachedAudioService implements IAudioService {
private readonly logger = new Logger(CachedAudioService.name);
constructor(
@Inject(AUDIO_SERVICE_TOKEN)
private readonly audioService: IAudioService,
@Inject(CACHE_MANAGER)
private cacheManager: Cache,
) {}
async convertTextToSpeech(options: TextToSpeechOptions): Promise<Buffer> {
// Create a cache key based on the options
const cacheKey = this.generateCacheKey(options);
// Check cache first
const cached = await this.cacheManager.get<Buffer>(cacheKey);
if (cached) {
this.logger.log('Returning cached audio');
return cached;
}
// Generate new audio
const audio = await this.audioService.convertTextToSpeech(options);
// Cache for 24 hours
await this.cacheManager.set(cacheKey, audio, 86400);
return audio;
}
private generateCacheKey(options: TextToSpeechOptions): string {
const data = JSON.stringify(options);
return crypto.createHash('md5').update(data).digest('hex');
}
// Implement other methods...
}

Implementing Error Handling

In our applications we always want to make sure we implement proper error handling with custom exceptions, for situations where things go wrong.

import { HttpException, HttpStatus } from '@nestjs/common';
export class AudioGenerationException extends HttpException {
constructor(message: string) {
super(
{
statusCode: HttpStatus.INTERNAL_SERVER_ERROR,
message: 'Audio generation failed',
error: message,
},
HttpStatus.INTERNAL_SERVER_ERROR,
);
}
}
export class VoiceNotFoundException extends HttpException {
constructor(voiceId: string) {
super(
{
statusCode: HttpStatus.NOT_FOUND,
message: `Voice with ID ${voiceId} not found`,
},
HttpStatus.NOT_FOUND,
);
}
}

Rate Limiting

Let's also implement rate-limiting to ensure we're protecting our API from abuse!

import { Module } from '@nestjs/common';
import { ThrottlerModule } from '@nestjs/throttler';
@Module({
imports: [
ThrottlerModule.forRoot([{
ttl: 60000,
limit: 10,
}]),
],
})
export class AppModule {}

Let's also add the throttler guard to our AudioController:

import { UseGuards } from '@nestjs/common';
import { ThrottlerGuard } from '@nestjs/throttler';
@Controller('audio')
@UseGuards(ThrottlerGuard)
export class AudioController {
// ... controller methods
}

Use Cases

The integration we've built so far opens up numerous possibilities for our Nest applications:

  1. Content Creation Platforms: Generate voiceovers for video content, podcasts, or educational materials
  2. Accessibility Features: Convert written content to audio for visually impaired users
  3. Language Learning Apps: Create pronunciation guides and multilingual content
  4. Customer Service: Generate automated voice responses or create multilingual support content
  5. Audiobook Production: Convert books and articles into audio format
  6. Marketing: Create multilingual ad campaigns with consistent voice branding across languages

Bringing it all together

In no time, we've managed to build a production-ready integration between ElevenLabs and NestJS. By abstracting the audio service behind an interface, we've created a maintainable and testable solution that can easily adapt to changing requirements or alternative providers.

The implementation we've created provides:

  • Text-to-speech conversion with customizable voice settings
  • Real-time audio streaming for low-latency applications
  • Voice discovery and management
  • Automatic dubbing and translation across multiple languages
  • A clean separation of concerns following SOLID principles
  • Easy testability and maintainability

We've also set everything up in a way that makes it straightforward to extend the functionality further, so have fun seeing what else you can do! You could add features like voice cloning, sound effects generation, or integration with other AI services. The skies the limit! 🐈


Learn NestJS - Official NestJS Courses 📚

Level-up your NestJS and Node.js ecosystem skills in these incremental workshop-style courses, from the NestJS Creator himself, and help support the NestJS framework! 🐈

🚀 The NestJS Fundamentals Course is now LIVE and 25% off for a limited time!

🎉 NEW - NestJS Course Extensions now live!
#ElevenLabs
#undefined
#NodeJS

Share this Post!

📬 Trilon Newsletter

Stay up to date with all the latest Articles & News!

More from the Trilon Blog .

Mirsad Halilčević | Trilon Consulting
Mirsad Halilčević

NestJS & DrizzleORM: A Great Match

A deep dive into using DrizzleORM with NestJS with multiple connections, in a manner similar to TypeORM!

Read More
Kamil Mysliwiec | Trilon Consulting
Kamil Mysliwiec

Announcing NestJS 11: What’s New

Today I am excited to announce the official release of Nest 11: A progressive Node.js framework for building efficient and enterprise-grade, server-side applications.

Read More
Jay McDoniel | Trilon Consulting
Jay McDoniel

NestJS Metadata Deep Dive

In this article we'll be doing a deep-dive and learning about how NestJS uses Metadata internally for everything from dependency injection, to decorators we use everyday!

Read More

What we do at Trilon .

At Trilon, our goal is to help elevate teams - giving them the push they need to truly succeed in today's ever-changing tech world.

Trilon - Consulting

Consulting .

Let us help take your Application to the next level - planning the next big steps, reviewing architecture, and brainstorming with the team to ensure you achieve your most ambitious goals!

Trilon - Development and Team Augmentation

Development .

Trilon can become part of your development process, making sure that you're building enterprise-grade, scalable applications with best-practices in mind, all while getting things done better and faster!

Trilon - Workshops on NestJS, Node, and other modern JavaScript topics

Workshops .

Have a Trilon team member come to YOU! Get your team up to speed with guided workshops on a huge variety of topics. Modern NodeJS (or NestJS) development, JavaScript frameworks, Reactive Programming, or anything in between! We've got you covered.

Trilon - Open-source contributors

Open-source .

We love open-source because we love giving back to the community! We help maintain & contribute to some of the largest open-source projects, and hope to always share our knowledge with the world!

Explore more

Write us a message .

Let's talk about how Trilon can help your next project get to the next level.

Rather send us an email? Write to:

hello@trilon.io
© 2019-2025 Trilon.