STT API Tutorial
A developer-focused guide to the WesenAI Speech-to-Text API, including file uploads, job-based transcription, and result retrieval based on the OpenAPI specification.
This tutorial provides a comprehensive, technically accurate guide to the WesenAI Speech-to-Text (STT) API. We will cover the primary workflow: uploading a file, submitting a transcription job, and retrieving the results.
Core Concepts: Asynchronous Transcription
Similar to our TTS service, audio transcription is an asynchronous process. The STT API is designed to handle audio files of varying lengths without blocking your application or causing request timeouts.
The primary workflow is as follows:
- Upload Audio File: You send your audio file via a
POSTrequest to the/v1/job/uploadendpoint. - Submit Transcription Job: The server processes the upload and returns a
jobId. You use thisjobIdto start the transcription. - Poll for Status: Using the
jobId, you poll the/v1/job/{jobId}/statusendpoint until the job iscompleted. - Retrieve Results: Once complete, you retrieve the transcription from the
/v1/job/{jobId}/resultendpoint.
API Reference
- Base URL:
https://stt.api.wesen.ai - Authentication:
Authorization: Bearer YOUR_API_KEYorX-API-Key: YOUR_API_KEY
Step 1: Uploading an Audio File
The process begins by uploading your audio file. The API accepts various formats, but wav or mp3 are recommended.
POST /v1/job/upload
This endpoint accepts multipart/form-data. You must provide the audio file and the necessary parameters for transcription.
import requests
WESEN_API_KEY = "YOUR_API_KEY"
UPLOAD_URL = "https://stt.api.wesen.ai/v1/job/upload"
AUDIO_FILE_PATH = "path/to/your/audio.wav" # Replace with your file path
headers = {
"Authorization": f"Bearer {WESEN_API_KEY}"
}
files = {
'file': (AUDIO_FILE_PATH, open(AUDIO_FILE_PATH, 'rb'), 'audio/wav'),
'provider': (None, 'google'),
'language': (None, 'am-ET'), # Amharic (Ethiopia)
'includeWordBoundaries': (None, 'true')
}
upload_response = requests.post(UPLOAD_URL, headers=headers, files=files)
job_id = None
if upload_response.status_code == 201:
job_info = upload_response.json()
job_id = job_info.get("jobId")
print(f"File uploaded and job created successfully. Job ID: {job_id}")
else:
print(f"Error uploading file: {upload_response.status_code} - {upload_response.text}")
The response to a successful upload is a JobSubmissionResponse object, which contains the jobId needed for the next steps.
Step 2: Polling for Transcription Completion
With the jobId, you can now poll the /v1/job/{jobId}/status endpoint.
GET /v1/job/{jobId}/status
The logic here is identical to the TTS workflow. You poll this endpoint periodically until the job state becomes completed or failed.
import time
if job_id:
status_url = f"https://stt.api.wesen.ai/v1/job/{job_id}/status"
while True:
status_response = requests.get(status_url, headers=headers)
if status_response.status_code != 200:
print(f"Error fetching status: {status_response.status_code} - {status_response.text}")
break
status_data = status_response.json()
job_state = status_data.get("state")
progress = status_data.get("progress", 0)
print(f"Job state: {job_state} ({progress}%)")
if job_state == "completed":
print("Transcription complete!")
break
elif job_state == "failed":
error_details = status_data.get("error", "Unknown error")
print(f"Job failed: {error_details}")
break
time.sleep(5)
Step 3: Retrieving the Transcription Result
Once the job state is completed, the transcription is ready. You can retrieve it from the /v1/job/{jobId}/result endpoint.
GET /v1/job/{jobId}/result
You can request the result as either json (default) or text. The JSON response includes the full transcript, confidence score, and word timings if requested.
if job_state == "completed":
result_url = f"https://stt.api.wesen.ai/v1/job/{job_id}/result?format=json"
result_response = requests.get(result_url, headers=headers)
if result_response.status_code == 200:
transcription_data = result_response.json()
print("\n--- Transcription Result ---")
print(f"Text: {transcription_data.get('text')}")
print(f"Confidence: {transcription_data.get('confidence')}")
word_timings = transcription_data.get('wordTimings')
if word_timings:
print("\n--- Word Timings ---")
for word_info in word_timings:
word = word_info.get('word')
start = word_info.get('startTime')
end = word_info.get('endTime')
print(f"- Word: '{word}', Start: {start}s, End: {end}s")
else:
print(f"Error downloading result: {result_response.status_code} - {result_response.text}")
Alternative Workflow: Transcribe from URL
If your audio file is already accessible via a public URL, you can skip the upload step and submit the job directly using the /v1/job/transcribe endpoint.
POST /v1/job/transcribe
Request Body (TranscriptionRequestDto):
audioUrl(string, required): A public URL to the audio file.provider(string, required):google,azure, etc.language(string, required): Language code (e.g.,am-ET).
This method initiates the same asynchronous job, and you would use the same polling and result retrieval logic as shown above.