AI Model Catalog | Microsoft Foundry Models

microsoft-unispeech-sat-base-100h-libri-ft

Version: 1

HuggingFace•Last updated May 2025

microsoft/unispeech-sat-base-100h-libri-ft powered by Hugging Face Inference Toolkit

Send Request

You can use cURL or any REST Client to send a request to the AzureML endpoint with your AzureML token.

curl <AZUREML_ENDPOINT_URL> \
    -X POST \
    -H "Authorization: Bearer <AZUREML_TOKEN>" \
    -H "Content-Type: audio/flac" \
    --data-binary @"audio.flac"

Supported Parameters

inputs (string): The input audio data as a base64-encoded string. If no parameters are provided, you can also provide the audio data as a raw bytes payload.
parameters (object):
- return_timestamps (boolean): Whether to output corresponding timestamps with the generated text
- generation_parameters (object):
- temperature (float): The value used to modulate the next token probabilities.
- top_k (integer): The number of highest probability vocabulary tokens to keep for top-k-filtering.
- top_p (float): If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation.
- typical_p (float): Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that add up to typical_p or higher are kept for generation. See this paper for more details.
- epsilon_cutoff (float): If set to float strictly between 0 and 1, only tokens with a conditional probability greater than epsilon_cutoff will be sampled. In the paper, suggested values range from 3e-4 to 9e-4, depending on the size of the model. See Truncation Sampling as Language Model Desmoothing for more details.
- eta_cutoff (float): Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to float strictly between 0 and 1, a token is only considered if it is greater than either eta_cutoff or sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits))). The latter term is intuitively the expected next token probability, scaled by sqrt(eta_cutoff). In the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model. See Truncation Sampling as Language Model Desmoothing for more details.
- max_length (integer): The maximum length (in tokens) of the generated text, including the input.
- max_new_tokens (integer): The maximum number of tokens to generate. Takes precedence over max_length.
- min_length (integer): The minimum length (in tokens) of the generated text, including the input.
- min_new_tokens (integer): The minimum number of tokens to generate. Takes precedence over min_length.
- do_sample (boolean): Whether to use sampling instead of greedy decoding when generating new tokens.
- early_stopping (enum): Possible values: never, true, false.
- num_beams (integer): Number of beams to use for beam search.
- num_beam_groups (integer): Number of groups to divide num_beams into in order to ensure diversity among different groups of beams. See this paper for more details.
- penalty_alpha (float): The value balances the model confidence and the degeneration penalty in contrastive search decoding.
- use_cache (boolean): Whether the model should use the past last key/values attentions to speed up decoding

Check the full API Specification at the Hugging Face Inference documentation .

Model Specifications

LicenseApache-2.0

Last UpdatedMay 2025

PublisherHuggingFace

Quick Start