List of associated accents and speakers (female voice and male voice) available.
Region
Name 1
Name 2
Each integer value is associated with one of the speakers you see in the list below. All names starting with “caf_XXXXX” (female) and “cam_XXXXX” (male) belong to speakers in the OpenSLR69 dataset and short Catalan names in the festcat dataset.
One effective way is to create a temporary JSON archive to pass text and input parameters for inference.
printf'%s''{ "text": "Ara caldrà veure què passa amb aquestes investigacions judicials, si segueixen endavant i se l'\''acaba jutjant, o si es desestimen d'\''acord amb la immunitat que tindrà un altre cop com a president del país.",
"voice": “quim”, “accent”: “balear”, "type": "text"}'>data.jsoncurl -X POST https://p1b28cv1e843tih1.eu-west-1.aws.endpoints.huggingface.cloud/api/tts -H "Content-Type: application/json" -H "Authorization: Bearer <hf_token>" -d @data.json | play -t wav -
Javascript (Nodejs)
Executed with Nodejs. Install NPM (Node Package Manager) and with NPM install fetch-node library.
constfetch=require("node-fetch");constfs=require("fs");// Define the API URL and headersconstAPI_URL="https://p1b28cv1e843tih1.eu-west-1.aws.endpoints.huggingface.cloud/api/tts";constheaders= {"Authorization":"Bearer <hf_token>","Content-Type":"application/json"};// Function to send the requestasyncfunctionquery(text) {constdata= { text: text, accent: “balear”, voice: “olga” };try {// POST requestconstresponse=awaitfetch(API_URL, { method:"POST", headers: headers, body:JSON.stringify(data) });// Check the responseif (!response.ok) {thrownewError(`Error: ${response.status}${response.statusText}`); }// Convert the response to a bufferconstbuffer=awaitresponse.buffer();// Write the buffer to an output filefs.writeFile("output.wav", buffer, (err) => {if (err) {console.error("Error saving the file:", err); } else {console.log("File saved as output.wav"); } }); } catch (error) {console.error("Error making request:", error); }}// Your queryquery("Bon dia.");
text
Input text to be converted to speech
string
Any text input
N/A
voice
Speaker name for voice output
string
8 speakers available
N/A
accent
Selected dialect for voice output
string
balear, central, nord-occidental, valencia
N/A
temperature
Controls sampling variance during inference. Lower values yield higher quality but less variability
float
0.2 to 0.67
0.2 to 0.67
length_scale
Related to speech speed. Higher values make the speech slower, while lower values make it faster