Each integer value is associated with one of the speakers you see in the list below. All names starting with “caf_XXXXX” (female) and “cam_XXXXX” (male) belong to speakers in the OpenSLR69 dataset and short Catalan names in the festcat dataset.
Please take into account the apostrophes. An effective way is to create a temporary JSON file to pass text and input parameters for inference.
bash
printf'%s''{ "text": "L'\''Aina ha preparat aquest model de síntesi de veu.", "voice": 39, "type": "text"}'>data.jsoncurl -X POST https://x6g02u4lkf25gcjo.us-east-1.aws.endpoints.huggingface.cloud/api/tts -H "Content-Type: application/json" -H "Authorization: Bearer <hf_token>" -d @data.json | play -t wav -
rmdata.json//Somecode
Javascript
Executed with Node.js. Install NPM (Node Package Manager) and with NPM install fetch-node library.
example.js
constfetch=require("node-fetch");constfs=require("fs");// Define the API URL and headersconstAPI_URL="https://x6g02u4lkf25gcjo.us-east-1.aws.endpoints.huggingface.cloud/api/tts";constheaders= {"Authorization":"Bearer <hf_token>","Content-Type":"application/json"};// Function to send the requestasyncfunctionquery(text) {constdata= { text: text, voice:20, };try {// POST requestconstresponse=awaitfetch(API_URL, { method:"POST", headers: headers, body:JSON.stringify(data) });// Check the responseif (!response.ok) {thrownewError(`Error: ${response.status}${response.statusText}`); }// Convert the response to a bufferconstbuffer=awaitresponse.buffer();// Write the buffer to an output filefs.writeFile("output.wav", buffer, (err) => {if (err) {console.error("Error saving the file:", err); } else {console.log("File saved as output.wav"); } }); } catch (error) {console.error("Error making request:", error); }}// Example usagequery("Bon dia.");
text
Input text to be converted to speech
string
Any text input
N/A
voice
Speaker ID for voice output
int
47 different speakers available
N/A
temperature
Controls sampling variance during inference. Lower values yield higher quality but less variability
float
0.2 to 0.67
0.2 to 0.67
length_scale
Related to speech speed. Higher values make the speech slower, while lower values make it faster