TTS API Tutorial
A developer-focused guide to using the asynchronous Text-to-Speech API, with complete code examples based on the OpenAPI specification.
This guide provides a technically accurate walkthrough of the WesenAI Text-to-Speech (TTS) API. We will cover the asynchronous job-based workflow, from submitting a synthesis request to retrieving the final audio file, with complete code examples.
Core Concepts: The Asynchronous Workflow
High-quality speech synthesis is not instantaneous. To prevent request timeouts and provide a non-blocking experience, our TTS API operates on an asynchronous model. This is a common pattern for resource-intensive tasks and offers significant flexibility.
The process involves these distinct steps:
- Submit a Job: You send a
POST
request with your text and configuration. The server immediately accepts the job and returns a uniquejobId
. - Poll for Status: You use the
jobId
to makeGET
requests to a status endpoint. You poll this endpoint periodically until the job's state becomescompleted
orfailed
. - Retrieve the Result: Once the job is
completed
, you use thejobId
to download the audio from a separate endpoint.
API Reference
- Base URL:
https://tts.api.wesen.ai
- Authentication:
Authorization: Bearer YOUR_API_KEY
orX-API-Key: YOUR_API_KEY
Step 1: Discovering Available Voices
Before submitting a job, you need to know which voices are available. You can retrieve a list of all supported voice configurations from the /v1/meta/voices
endpoint.
GET /v1/meta/voices
import requests WESEN_API_KEY = "YOUR_API_KEY" API_URL = "https://tts.api.wesen.ai/v1/meta/voices" headers = { "Authorization": f"Bearer {WESEN_API_KEY}" } response = requests.get(API_URL, headers=headers) if response.status_code == 200: voices = response.json() print("Available Voices:") for voice in voices: print(f"- ID: {voice.get('id')}, Name: {voice.get('name')}, Language: {voice.get('language')}") # Example Output: # - ID: dawit, Name: Jack (US English), Language: en-US else: print(f"Error: {response.status_code} - {response.text}")
From this list, choose a voice id
to use as the configId
in your job submission.
Step 2: Submitting a Synthesis Job
To start the synthesis process, you send a POST
request to /v1/job
. The body must contain the configId
and either text
or ssml
.
POST /v1/job
Request Body (TtsRequestDto
):
configId
(string, required): The ID of the voice configuration (e.g.,am-female-1
).text
(string, optional): The plain text to synthesize.ssml
(string, optional): SSML-formatted text for more control. You must provide eithertext
orssml
.format
(string, optional):mp3
(default) orwav
.
A successful submission returns a 201 Created
status and a JobSubmissionResponse
object containing the crucial jobId
.
# Continuing from the previous example... TTS_JOB_URL = "https://tts.api.wesen.ai/v1/job" payload = { "configId": "almaz", # An example voice ID "text": "ሰላም! ይህ የዌሰን ኤአይ የጽሑፍ-ወደ-ንግግር አገልግሎት ሙከራ ነው።", "format": "mp3" } submit_response = requests.post(TTS_JOB_URL, headers=headers, json=payload) if submit_response.status_code == 201: job_info = submit_response.json() job_id = job_info.get("jobId") print(f"Job submitted successfully. Job ID: {job_id}") else: print(f"Error submitting job: {submit_response.status_code} - {submit_response.text}") job_id = None # Stop execution
Step 3: Polling for Job Completion
With the jobId
, you can now poll the /v1/job/{jobId}/status
endpoint to check on the progress.
GET /v1/job/{jobId}/status
The response is a TtsJobStatusDto
object which contains a state
field. The possible states are: queued
, processing
, completed
, failed
. You should continue polling until the state is either completed
or failed
.
import time # Continuing from the previous example... if job_id: status_url = f"https://tts.api.wesen.ai/v1/job/{job_id}/status" while True: status_response = requests.get(status_url, headers=headers) if status_response.status_code != 200: print(f"Error fetching status: {status_response.status_code} - {status_response.text}") break status_data = status_response.json() job_state = status_data.get("state") progress = status_data.get("progress", 0) print(f"Job state: {job_state} ({progress}%)") if job_state == "completed": print("Synthesis complete!") break elif job_state == "failed": error_details = status_data.get("error", "Unknown error") print(f"Job failed: {error_details}") break # Wait for a reasonable interval before polling again time.sleep(5)
Step 4: Retrieving the Audio File
Once the job state is completed
, the audio file is ready for download from the /v1/job/{jobId}/audio
endpoint.
GET /v1/job/{jobId}/audio
This endpoint streams the binary audio data.
# Continuing from the previous example... if job_state == "completed": audio_url = f"https://tts.api.wesen.ai/v1/job/{job_id}/audio" audio_response = requests.get(audio_url, headers=headers, stream=True) if audio_response.status_code == 200: file_path = f"{job_id}.mp3" with open(file_path, "wb") as f: for chunk in audio_response.iter_content(chunk_size=8192): f.write(chunk) print(f"Audio file saved to: {file_path}") else: print(f"Error downloading audio: {audio_response.status_code} - {audio_response.text}")
Advanced Topic: Streaming vs. Polling
For UIs requiring real-time feedback, polling can feel sluggish. The API provides an SSE (Server-Sent Events) endpoint at POST /v1/job/stream
that submits the job and immediately begins streaming status updates over the same connection, removing the need to poll. This is ideal for interactive applications.