STT API Tutorial

A developer-focused guide to the WesenAI Speech-to-Text API, including file uploads, job-based transcription, and result retrieval based on the OpenAPI specification.

This tutorial provides a comprehensive, technically accurate guide to the WesenAI Speech-to-Text (STT) API. We will cover the primary workflow: uploading a file, submitting a transcription job, and retrieving the results.

Core Concepts: Asynchronous Transcription

Similar to our TTS service, audio transcription is an asynchronous process. The STT API is designed to handle audio files of varying lengths without blocking your application or causing request timeouts.

The primary workflow is as follows:

Upload Audio File: You send your audio file via a POST request to the /v1/job/upload endpoint.
Submit Transcription Job: The server processes the upload and returns a jobId. You use this jobId to start the transcription.
Poll for Status: Using the jobId, you poll the /v1/job/{jobId}/status endpoint until the job is completed.
Retrieve Results: Once complete, you retrieve the transcription from the /v1/job/{jobId}/result endpoint.

API Reference

Base URL: https://stt.api.wesen.ai
Authentication: Authorization: Bearer YOUR_API_KEY or X-API-Key: YOUR_API_KEY

Step 1: Uploading an Audio File

The process begins by uploading your audio file. The API accepts various formats, but wav or mp3 are recommended.

POST `/v1/job/upload`

This endpoint accepts multipart/form-data. You must provide the audio file and the necessary parameters for transcription.

import requests

WESEN_API_KEY = "YOUR_API_KEY"
UPLOAD_URL = "https://stt.api.wesen.ai/v1/job/upload"
AUDIO_FILE_PATH = "path/to/your/audio.wav" # Replace with your file path

headers = {
    "Authorization": f"Bearer {WESEN_API_KEY}"
}

files = {
    'file': (AUDIO_FILE_PATH, open(AUDIO_FILE_PATH, 'rb'), 'audio/wav'),
    'provider': (None, 'google'),
    'language': (None, 'am-ET'), # Amharic (Ethiopia)
    'includeWordBoundaries': (None, 'true')
}

upload_response = requests.post(UPLOAD_URL, headers=headers, files=files)

job_id = None
if upload_response.status_code == 201:
    job_info = upload_response.json()
    job_id = job_info.get("jobId")
    print(f"File uploaded and job created successfully. Job ID: {job_id}")
else:
    print(f"Error uploading file: {upload_response.status_code} - {upload_response.text}")

The response to a successful upload is a JobSubmissionResponse object, which contains the jobId needed for the next steps.

Step 2: Polling for Transcription Completion

With the jobId, you can now poll the /v1/job/{jobId}/status endpoint.

GET `/v1/job/{jobId}/status`

The logic here is identical to the TTS workflow. You poll this endpoint periodically until the job state becomes completed or failed.

import time

if job_id:
    status_url = f"https://stt.api.wesen.ai/v1/job/{job_id}/status"
    
    while True:
        status_response = requests.get(status_url, headers=headers)
        
        if status_response.status_code != 200:
            print(f"Error fetching status: {status_response.status_code} - {status_response.text}")
            break

        status_data = status_response.json()
        job_state = status_data.get("state")
        progress = status_data.get("progress", 0)

        print(f"Job state: {job_state} ({progress}%)")

        if job_state == "completed":
            print("Transcription complete!")
            break
        elif job_state == "failed":
            error_details = status_data.get("error", "Unknown error")
            print(f"Job failed: {error_details}")
            break
        
        time.sleep(5)

Step 3: Retrieving the Transcription Result

Once the job state is completed, the transcription is ready. You can retrieve it from the /v1/job/{jobId}/result endpoint.

GET `/v1/job/{jobId}/result`

You can request the result as either json (default) or text. The JSON response includes the full transcript, confidence score, and word timings if requested.

if job_state == "completed":
    result_url = f"https://stt.api.wesen.ai/v1/job/{job_id}/result?format=json"
    result_response = requests.get(result_url, headers=headers)

    if result_response.status_code == 200:
        transcription_data = result_response.json()
        print("\n--- Transcription Result ---")
        print(f"Text: {transcription_data.get('text')}")
        print(f"Confidence: {transcription_data.get('confidence')}")
        
        word_timings = transcription_data.get('wordTimings')
        if word_timings:
            print("\n--- Word Timings ---")
            for word_info in word_timings:
                word = word_info.get('word')
                start = word_info.get('startTime')
                end = word_info.get('endTime')
                print(f"- Word: '{word}', Start: {start}s, End: {end}s")
    else:
        print(f"Error downloading result: {result_response.status_code} - {result_response.text}")

Alternative Workflow: Transcribe from URL

If your audio file is already accessible via a public URL, you can skip the upload step and submit the job directly using the /v1/job/transcribe endpoint.

POST `/v1/job/transcribe`

Request Body (TranscriptionRequestDto):

audioUrl (string, required): A public URL to the audio file.
provider (string, required): google, azure, etc.
language (string, required): Language code (e.g., am-ET).

This method initiates the same asynchronous job, and you would use the same polling and result retrieval logic as shown above.

STT API Tutorial

A developer-focused guide to the WesenAI Speech-to-Text API, including file uploads, job-based transcription, and result retrieval based on the OpenAPI specification.

Core Concepts: Asynchronous Transcription

The primary workflow is as follows:

Upload Audio File: You send your audio file via a POST request to the /v1/job/upload endpoint.
Submit Transcription Job: The server processes the upload and returns a jobId. You use this jobId to start the transcription.
Poll for Status: Using the jobId, you poll the /v1/job/{jobId}/status endpoint until the job is completed.
Retrieve Results: Once complete, you retrieve the transcription from the /v1/job/{jobId}/result endpoint.

API Reference

Base URL: https://stt.api.wesen.ai
Authentication: Authorization: Bearer YOUR_API_KEY or X-API-Key: YOUR_API_KEY

Step 1: Uploading an Audio File

The process begins by uploading your audio file. The API accepts various formats, but wav or mp3 are recommended.

POST `/v1/job/upload`

This endpoint accepts multipart/form-data. You must provide the audio file and the necessary parameters for transcription.

import requests

WESEN_API_KEY = "YOUR_API_KEY"
UPLOAD_URL = "https://stt.api.wesen.ai/v1/job/upload"
AUDIO_FILE_PATH = "path/to/your/audio.wav" # Replace with your file path

headers = {
    "Authorization": f"Bearer {WESEN_API_KEY}"
}

files = {
    'file': (AUDIO_FILE_PATH, open(AUDIO_FILE_PATH, 'rb'), 'audio/wav'),
    'provider': (None, 'google'),
    'language': (None, 'am-ET'), # Amharic (Ethiopia)
    'includeWordBoundaries': (None, 'true')
}

upload_response = requests.post(UPLOAD_URL, headers=headers, files=files)

job_id = None
if upload_response.status_code == 201:
    job_info = upload_response.json()
    job_id = job_info.get("jobId")
    print(f"File uploaded and job created successfully. Job ID: {job_id}")
else:
    print(f"Error uploading file: {upload_response.status_code} - {upload_response.text}")

The response to a successful upload is a JobSubmissionResponse object, which contains the jobId needed for the next steps.

Step 2: Polling for Transcription Completion

With the jobId, you can now poll the /v1/job/{jobId}/status endpoint.

GET `/v1/job/{jobId}/status`

The logic here is identical to the TTS workflow. You poll this endpoint periodically until the job state becomes completed or failed.

import time

if job_id:
    status_url = f"https://stt.api.wesen.ai/v1/job/{job_id}/status"
    
    while True:
        status_response = requests.get(status_url, headers=headers)
        
        if status_response.status_code != 200:
            print(f"Error fetching status: {status_response.status_code} - {status_response.text}")
            break

        status_data = status_response.json()
        job_state = status_data.get("state")
        progress = status_data.get("progress", 0)

        print(f"Job state: {job_state} ({progress}%)")

        if job_state == "completed":
            print("Transcription complete!")
            break
        elif job_state == "failed":
            error_details = status_data.get("error", "Unknown error")
            print(f"Job failed: {error_details}")
            break
        
        time.sleep(5)

Step 3: Retrieving the Transcription Result

Once the job state is completed, the transcription is ready. You can retrieve it from the /v1/job/{jobId}/result endpoint.

GET `/v1/job/{jobId}/result`

You can request the result as either json (default) or text. The JSON response includes the full transcript, confidence score, and word timings if requested.

if job_state == "completed":
    result_url = f"https://stt.api.wesen.ai/v1/job/{job_id}/result?format=json"
    result_response = requests.get(result_url, headers=headers)

    if result_response.status_code == 200:
        transcription_data = result_response.json()
        print("\n--- Transcription Result ---")
        print(f"Text: {transcription_data.get('text')}")
        print(f"Confidence: {transcription_data.get('confidence')}")
        
        word_timings = transcription_data.get('wordTimings')
        if word_timings:
            print("\n--- Word Timings ---")
            for word_info in word_timings:
                word = word_info.get('word')
                start = word_info.get('startTime')
                end = word_info.get('endTime')
                print(f"- Word: '{word}', Start: {start}s, End: {end}s")
    else:
        print(f"Error downloading result: {result_response.status_code} - {result_response.text}")

Alternative Workflow: Transcribe from URL

If your audio file is already accessible via a public URL, you can skip the upload step and submit the job directly using the /v1/job/transcribe endpoint.

POST `/v1/job/transcribe`

Request Body (TranscriptionRequestDto):

audioUrl (string, required): A public URL to the audio file.
provider (string, required): google, azure, etc.
language (string, required): Language code (e.g., am-ET).

This method initiates the same asynchronous job, and you would use the same polling and result retrieval logic as shown above.

STT API Tutorial

A developer-focused guide to the WesenAI Speech-to-Text API, including file uploads, job-based transcription, and result retrieval based on the OpenAPI specification.

Core Concepts: Asynchronous Transcription

API Reference

Step 1: Uploading an Audio File

POST /v1/job/upload

Step 2: Polling for Transcription Completion

GET /v1/job/{jobId}/status

Step 3: Retrieving the Transcription Result

GET /v1/job/{jobId}/result

Alternative Workflow: Transcribe from URL

POST /v1/job/transcribe

STT API Tutorial

A developer-focused guide to the WesenAI Speech-to-Text API, including file uploads, job-based transcription, and result retrieval based on the OpenAPI specification.

Core Concepts: Asynchronous Transcription

API Reference

Step 1: Uploading an Audio File

POST /v1/job/upload

Step 2: Polling for Transcription Completion

GET /v1/job/{jobId}/status

Step 3: Retrieving the Transcription Result

GET /v1/job/{jobId}/result

Alternative Workflow: Transcribe from URL

POST /v1/job/transcribe

POST `/v1/job/upload`

GET `/v1/job/{jobId}/status`

GET `/v1/job/{jobId}/result`

POST `/v1/job/transcribe`

POST `/v1/job/upload`

GET `/v1/job/{jobId}/status`

GET `/v1/job/{jobId}/result`

POST `/v1/job/transcribe`