TTS API Tutorial
A developer-focused guide to using the asynchronous Text-to-Speech API, with complete code examples based on the OpenAPI specification.
This guide provides a technically accurate walkthrough of the WesenAI Text-to-Speech (TTS) API. We will cover the asynchronous job-based workflow, from submitting a synthesis request to retrieving the final audio file, with complete code examples.
Core Concepts: The Asynchronous Workflow
High-quality speech synthesis is not instantaneous. To prevent request timeouts and provide a non-blocking experience, our TTS API operates on an asynchronous model. This is a common pattern for resource-intensive tasks and offers significant flexibility.
The process involves these distinct steps:
- Submit a Job: You send a
POSTrequest with your text and configuration. The server immediately accepts the job and returns a uniquejobId. - Poll for Status: You use the
jobIdto makeGETrequests to a status endpoint. You poll this endpoint periodically until the job's state becomescompletedorfailed. - Retrieve the Result: Once the job is
completed, you use thejobIdto download the audio from a separate endpoint.
API Reference
- Base URL:
https://tts.api.wesen.ai - Authentication:
Authorization: Bearer YOUR_API_KEYorX-API-Key: YOUR_API_KEY
Step 1: Discovering Available Voices
Before submitting a job, you need to know which voices are available. You can retrieve a list of all supported voice configurations from the /v1/meta/voices endpoint.
GET /v1/meta/voices
import requests
WESEN_API_KEY = "YOUR_API_KEY"
API_URL = "https://tts.api.wesen.ai/v1/meta/voices"
headers = {
"Authorization": f"Bearer {WESEN_API_KEY}"
}
response = requests.get(API_URL, headers=headers)
if response.status_code == 200:
voices = response.json()
print("Available Voices:")
for voice in voices:
print(f"- ID: {voice.get('id')}, Name: {voice.get('name')}, Language: {voice.get('language')}")
# Example Output:
# - ID: dawit, Name: Jack (US English), Language: en-US
else:
print(f"Error: {response.status_code} - {response.text}")
From this list, choose a voice id to use as the configId in your job submission.
Step 2: Submitting a Synthesis Job
To start the synthesis process, you send a POST request to /v1/job. The body must contain the configId and either text or ssml.
POST /v1/job
Request Body (TtsRequestDto):
configId(string, required): The ID of the voice configuration (e.g.,am-female-1).text(string, optional): The plain text to synthesize.ssml(string, optional): SSML-formatted text for more control. You must provide eithertextorssml.format(string, optional):mp3(default) orwav.
A successful submission returns a 201 Created status and a JobSubmissionResponse object containing the crucial jobId.
# Continuing from the previous example...
TTS_JOB_URL = "https://tts.api.wesen.ai/v1/job"
payload = {
"configId": "almaz", # An example voice ID
"text": "ሰላም! ይህ የዌሰን ኤአይ የጽሑፍ-ወደ-ንግግር አገልግሎት ሙከራ ነው።",
"format": "mp3"
}
submit_response = requests.post(TTS_JOB_URL, headers=headers, json=payload)
if submit_response.status_code == 201:
job_info = submit_response.json()
job_id = job_info.get("jobId")
print(f"Job submitted successfully. Job ID: {job_id}")
else:
print(f"Error submitting job: {submit_response.status_code} - {submit_response.text}")
job_id = None # Stop execution
Step 3: Polling for Job Completion
With the jobId, you can now poll the /v1/job/{jobId}/status endpoint to check on the progress.
GET /v1/job/{jobId}/status
The response is a TtsJobStatusDto object which contains a state field. The possible states are: queued, processing, completed, failed. You should continue polling until the state is either completed or failed.
import time
# Continuing from the previous example...
if job_id:
status_url = f"https://tts.api.wesen.ai/v1/job/{job_id}/status"
while True:
status_response = requests.get(status_url, headers=headers)
if status_response.status_code != 200:
print(f"Error fetching status: {status_response.status_code} - {status_response.text}")
break
status_data = status_response.json()
job_state = status_data.get("state")
progress = status_data.get("progress", 0)
print(f"Job state: {job_state} ({progress}%)")
if job_state == "completed":
print("Synthesis complete!")
break
elif job_state == "failed":
error_details = status_data.get("error", "Unknown error")
print(f"Job failed: {error_details}")
break
# Wait for a reasonable interval before polling again
time.sleep(5)
Step 4: Retrieving the Audio File
Once the job state is completed, the audio file is ready for download from the /v1/job/{jobId}/audio endpoint.
GET /v1/job/{jobId}/audio
This endpoint streams the binary audio data.
# Continuing from the previous example...
if job_state == "completed":
audio_url = f"https://tts.api.wesen.ai/v1/job/{job_id}/audio"
audio_response = requests.get(audio_url, headers=headers, stream=True)
if audio_response.status_code == 200:
file_path = f"{job_id}.mp3"
with open(file_path, "wb") as f:
for chunk in audio_response.iter_content(chunk_size=8192):
f.write(chunk)
print(f"Audio file saved to: {file_path}")
else:
print(f"Error downloading audio: {audio_response.status_code} - {audio_response.text}")
Advanced Topic: Streaming vs. Polling
For UIs requiring real-time feedback, polling can feel sluggish. The API provides an SSE (Server-Sent Events) endpoint at POST /v1/job/stream that submits the job and immediately begins streaming status updates over the same connection, removing the need to poll. This is ideal for interactive applications.