TTS API Tutorial

A developer-focused guide to using the asynchronous Text-to-Speech API, with complete code examples based on the OpenAPI specification.

This guide provides a technically accurate walkthrough of the WesenAI Text-to-Speech (TTS) API. We will cover the asynchronous job-based workflow, from submitting a synthesis request to retrieving the final audio file, with complete code examples.

Core Concepts: The Asynchronous Workflow

High-quality speech synthesis is not instantaneous. To prevent request timeouts and provide a non-blocking experience, our TTS API operates on an asynchronous model. This is a common pattern for resource-intensive tasks and offers significant flexibility.

The process involves these distinct steps:

  1. Submit a Job: You send a POST request with your text and configuration. The server immediately accepts the job and returns a unique jobId.
  2. Poll for Status: You use the jobId to make GET requests to a status endpoint. You poll this endpoint periodically until the job's state becomes completed or failed.
  3. Retrieve the Result: Once the job is completed, you use the jobId to download the audio from a separate endpoint.

API Reference

  • Base URL: https://tts.api.wesen.ai
  • Authentication: Authorization: Bearer YOUR_API_KEY or X-API-Key: YOUR_API_KEY

Step 1: Discovering Available Voices

Before submitting a job, you need to know which voices are available. You can retrieve a list of all supported voice configurations from the /v1/meta/voices endpoint.

GET /v1/meta/voices

import requests

WESEN_API_KEY = "YOUR_API_KEY"
API_URL = "https://tts.api.wesen.ai/v1/meta/voices"

headers = {
    "Authorization": f"Bearer {WESEN_API_KEY}"
}

response = requests.get(API_URL, headers=headers)

if response.status_code == 200:
    voices = response.json()
    print("Available Voices:")
    for voice in voices:
        print(f"- ID: {voice.get('id')}, Name: {voice.get('name')}, Language: {voice.get('language')}")
        # Example Output:
        # - ID: dawit, Name: Jack (US English), Language: en-US
else:
    print(f"Error: {response.status_code} - {response.text}")

From this list, choose a voice id to use as the configId in your job submission.

Step 2: Submitting a Synthesis Job

To start the synthesis process, you send a POST request to /v1/job. The body must contain the configId and either text or ssml.

POST /v1/job

Request Body (TtsRequestDto):

  • configId (string, required): The ID of the voice configuration (e.g., am-female-1).
  • text (string, optional): The plain text to synthesize.
  • ssml (string, optional): SSML-formatted text for more control. You must provide either text or ssml.
  • format (string, optional): mp3 (default) or wav.

A successful submission returns a 201 Created status and a JobSubmissionResponse object containing the crucial jobId.

# Continuing from the previous example...

TTS_JOB_URL = "https://tts.api.wesen.ai/v1/job"

payload = {
    "configId": "almaz", # An example voice ID
    "text": "ሰላም! ይህ የዌሰን ኤአይ የጽሑፍ-ወደ-ንግግር አገልግሎት ሙከራ ነው።",
    "format": "mp3"
}

submit_response = requests.post(TTS_JOB_URL, headers=headers, json=payload)

if submit_response.status_code == 201:
    job_info = submit_response.json()
    job_id = job_info.get("jobId")
    print(f"Job submitted successfully. Job ID: {job_id}")
else:
    print(f"Error submitting job: {submit_response.status_code} - {submit_response.text}")
    job_id = None # Stop execution

Step 3: Polling for Job Completion

With the jobId, you can now poll the /v1/job/{jobId}/status endpoint to check on the progress.

GET /v1/job/{jobId}/status

The response is a TtsJobStatusDto object which contains a state field. The possible states are: queued, processing, completed, failed. You should continue polling until the state is either completed or failed.

import time

# Continuing from the previous example...

if job_id:
    status_url = f"https://tts.api.wesen.ai/v1/job/{job_id}/status"
    
    while True:
        status_response = requests.get(status_url, headers=headers)
        
        if status_response.status_code != 200:
            print(f"Error fetching status: {status_response.status_code} - {status_response.text}")
            break

        status_data = status_response.json()
        job_state = status_data.get("state")
        progress = status_data.get("progress", 0)

        print(f"Job state: {job_state} ({progress}%)")

        if job_state == "completed":
            print("Synthesis complete!")
            break
        elif job_state == "failed":
            error_details = status_data.get("error", "Unknown error")
            print(f"Job failed: {error_details}")
            break
        
        # Wait for a reasonable interval before polling again
        time.sleep(5)

Step 4: Retrieving the Audio File

Once the job state is completed, the audio file is ready for download from the /v1/job/{jobId}/audio endpoint.

GET /v1/job/{jobId}/audio

This endpoint streams the binary audio data.

# Continuing from the previous example...

if job_state == "completed":
    audio_url = f"https://tts.api.wesen.ai/v1/job/{job_id}/audio"
    audio_response = requests.get(audio_url, headers=headers, stream=True)

    if audio_response.status_code == 200:
        file_path = f"{job_id}.mp3"
        with open(file_path, "wb") as f:
            for chunk in audio_response.iter_content(chunk_size=8192):
                f.write(chunk)
        print(f"Audio file saved to: {file_path}")
    else:
        print(f"Error downloading audio: {audio_response.status_code} - {audio_response.text}")

Advanced Topic: Streaming vs. Polling

For UIs requiring real-time feedback, polling can feel sluggish. The API provides an SSE (Server-Sent Events) endpoint at POST /v1/job/stream that submits the job and immediately begins streaming status updates over the same connection, removing the need to poll. This is ideal for interactive applications.