speechbrain-m-ctc-t-large
speechbrain-m-ctc-t-large
Version: 1
HuggingFaceLast updated May 2025
speechbrain/m-ctc-t-large powered by Hugging Face Inference Toolkit

Send Request

You can use cURL or any REST Client to send a request to the AzureML endpoint with your AzureML token.
curl <AZUREML_ENDPOINT_URL> \
    -X POST \
    -H "Authorization: Bearer <AZUREML_TOKEN>" \
    -H "Content-Type: audio/flac" \
    --data-binary @"audio.flac"

Supported Parameters

  • inputs (string): The input audio data as a base64-encoded string. If no parameters are provided, you can also provide the audio data as a raw bytes payload.
  • parameters (object):
    • return_timestamps (boolean): Whether to output corresponding timestamps with the generated text
    • generation_parameters (object):
    • temperature (float): The value used to modulate the next token probabilities.
    • top_k (integer): The number of highest probability vocabulary tokens to keep for top-k-filtering.
    • top_p (float): If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation.
    • typical_p (float): Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that add up to typical_p or higher are kept for generation. See this paper for more details.
    • epsilon_cutoff (float): If set to float strictly between 0 and 1, only tokens with a conditional probability greater than epsilon_cutoff will be sampled. In the paper, suggested values range from 3e-4 to 9e-4, depending on the size of the model. See Truncation Sampling as Language Model Desmoothing for more details.
    • eta_cutoff (float): Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to float strictly between 0 and 1, a token is only considered if it is greater than either eta_cutoff or sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits))). The latter term is intuitively the expected next token probability, scaled by sqrt(eta_cutoff). In the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model. See Truncation Sampling as Language Model Desmoothing for more details.
    • max_length (integer): The maximum length (in tokens) of the generated text, including the input.
    • max_new_tokens (integer): The maximum number of tokens to generate. Takes precedence over max_length.
    • min_length (integer): The minimum length (in tokens) of the generated text, including the input.
    • min_new_tokens (integer): The minimum number of tokens to generate. Takes precedence over min_length.
    • do_sample (boolean): Whether to use sampling instead of greedy decoding when generating new tokens.
    • early_stopping (enum): Possible values: never, true, false.
    • num_beams (integer): Number of beams to use for beam search.
    • num_beam_groups (integer): Number of groups to divide num_beams into in order to ensure diversity among different groups of beams. See this paper for more details.
    • penalty_alpha (float): The value balances the model confidence and the degeneration penalty in contrastive search decoding.
    • use_cache (boolean): Whether the model should use the past last key/values attentions to speed up decoding
Check the full API Specification at the Hugging Face Inference documentation .
Model Specifications
LicenseApache-2.0
Last UpdatedMay 2025
PublisherHuggingFace