STT API Tutorial
A developer-focused guide to the WesenAI Speech-to-Text API, including file uploads, job-based transcription, and result retrieval based on the OpenAPI specification.
This tutorial provides a comprehensive, technically accurate guide to the WesenAI Speech-to-Text (STT) API. We will cover the primary workflow: uploading a file, submitting a transcription job, and retrieving the results.
Core Concepts: Asynchronous Transcription
Similar to our TTS service, audio transcription is an asynchronous process. The STT API is designed to handle audio files of varying lengths without blocking your application or causing request timeouts.
The primary workflow is as follows:
- Upload Audio File: You send your audio file via a
POST
request to the/v1/job/upload
endpoint. - Submit Transcription Job: The server processes the upload and returns a
jobId
. You use thisjobId
to start the transcription. - Poll for Status: Using the
jobId
, you poll the/v1/job/{jobId}/status
endpoint until the job iscompleted
. - Retrieve Results: Once complete, you retrieve the transcription from the
/v1/job/{jobId}/result
endpoint.
API Reference
- Base URL:
https://stt.api.wesen.ai
- Authentication:
Authorization: Bearer YOUR_API_KEY
orX-API-Key: YOUR_API_KEY
Step 1: Uploading an Audio File
The process begins by uploading your audio file. The API accepts various formats, but wav
or mp3
are recommended.
POST /v1/job/upload
This endpoint accepts multipart/form-data
. You must provide the audio file and the necessary parameters for transcription.
import requests WESEN_API_KEY = "YOUR_API_KEY" UPLOAD_URL = "https://stt.api.wesen.ai/v1/job/upload" AUDIO_FILE_PATH = "path/to/your/audio.wav" # Replace with your file path headers = { "Authorization": f"Bearer {WESEN_API_KEY}" } files = { 'file': (AUDIO_FILE_PATH, open(AUDIO_FILE_PATH, 'rb'), 'audio/wav'), 'provider': (None, 'google'), 'language': (None, 'am-ET'), # Amharic (Ethiopia) 'includeWordBoundaries': (None, 'true') } upload_response = requests.post(UPLOAD_URL, headers=headers, files=files) job_id = None if upload_response.status_code == 201: job_info = upload_response.json() job_id = job_info.get("jobId") print(f"File uploaded and job created successfully. Job ID: {job_id}") else: print(f"Error uploading file: {upload_response.status_code} - {upload_response.text}")
The response to a successful upload is a JobSubmissionResponse
object, which contains the jobId
needed for the next steps.
Step 2: Polling for Transcription Completion
With the jobId
, you can now poll the /v1/job/{jobId}/status
endpoint.
GET /v1/job/{jobId}/status
The logic here is identical to the TTS workflow. You poll this endpoint periodically until the job state
becomes completed
or failed
.
import time if job_id: status_url = f"https://stt.api.wesen.ai/v1/job/{job_id}/status" while True: status_response = requests.get(status_url, headers=headers) if status_response.status_code != 200: print(f"Error fetching status: {status_response.status_code} - {status_response.text}") break status_data = status_response.json() job_state = status_data.get("state") progress = status_data.get("progress", 0) print(f"Job state: {job_state} ({progress}%)") if job_state == "completed": print("Transcription complete!") break elif job_state == "failed": error_details = status_data.get("error", "Unknown error") print(f"Job failed: {error_details}") break time.sleep(5)
Step 3: Retrieving the Transcription Result
Once the job state is completed
, the transcription is ready. You can retrieve it from the /v1/job/{jobId}/result
endpoint.
GET /v1/job/{jobId}/result
You can request the result as either json
(default) or text
. The JSON response includes the full transcript, confidence score, and word timings if requested.
if job_state == "completed": result_url = f"https://stt.api.wesen.ai/v1/job/{job_id}/result?format=json" result_response = requests.get(result_url, headers=headers) if result_response.status_code == 200: transcription_data = result_response.json() print("\n--- Transcription Result ---") print(f"Text: {transcription_data.get('text')}") print(f"Confidence: {transcription_data.get('confidence')}") word_timings = transcription_data.get('wordTimings') if word_timings: print("\n--- Word Timings ---") for word_info in word_timings: word = word_info.get('word') start = word_info.get('startTime') end = word_info.get('endTime') print(f"- Word: '{word}', Start: {start}s, End: {end}s") else: print(f"Error downloading result: {result_response.status_code} - {result_response.text}")
Alternative Workflow: Transcribe from URL
If your audio file is already accessible via a public URL, you can skip the upload step and submit the job directly using the /v1/job/transcribe
endpoint.
POST /v1/job/transcribe
Request Body (TranscriptionRequestDto
):
audioUrl
(string, required): A public URL to the audio file.provider
(string, required):google
,azure
, etc.language
(string, required): Language code (e.g.,am-ET
).
This method initiates the same asynchronous job, and you would use the same polling and result retrieval logic as shown above.