deci-decidiffusion-v1-0
Version: 7
DeciDiffusion
1.0 is an 820 million parameter latent diffusion model designed for text-to-image conversion. Trained initially on the LAION-v2 dataset and fine-tuned on the LAION-ART dataset, the model's training involved advanced techniques to improve speed, training performance, and achieve superior inference quality.
DeciDiffusion 1.0 retains key elements from Stable Diffusion, like the Variational Autoencoder (VAE) and CLIP's pre-trained Text Encoder, while introducing notable improvements. But U-Net is replaced with the more efficient U-Net-NAS which is developed by Deci. This novel component streamlines the model by reducing parameters, resulting in enhanced computational efficiency.
For more details, review the blog .
Training Details
Training Procedure
This model was trained in 4 phases.- It was trained from scratch for 1.28 million steps at a resolution of 256x256 using 320 million samples from LAION-v2.
- The model was trained for 870k steps at a higher resolution of 512x512 on the same dataset to capture more fine-detailed information.
- Training for 65k steps with EMA, a different learning rate scheduler, and more qualitative data.
- Then the model underwent fine-tuning on a 2 million sample subset of the LAION-ART dataset.
Limitations and Biases
Limitations
The model has limitations and may not perform optimally in various scenarios. It doesn't generate entirely photorealistic images. Rendering legible text is beyond its capability. The generation of faces and human figures may lack precision. The model is primarily optimized for English captions and may not be as effective with other languages. The auto-encoding component of the model is lossy.Biases
DeciDiffusion primarily underwent training on subsets of LAION-v2, with a focus on English descriptions. As a result, there might be underrepresentation of non-English communities and cultures, potentially introducing bias towards white and western norms. The accuracy of outputs from non-English prompts is notably less accurate. Considering these biases, users are advised to exercise caution when using DeciDiffusion, irrespective of the input provided.License
creativeml-openrail++-mInference Samples
Inference type | Python sample (Notebook) | CLI with YAML |
---|---|---|
Real time | text-to-image-online-endpoint.ipynb | text-to-image-online-endpoint.sh |
Batch | text-to-image-batch-endpoint.ipynb | text-to-image-batch-endpoint.sh |
Inference with Azure AI Content Safety (AACS) Samples
Inference type | Python sample (Notebook) |
---|---|
Real time | safe-text-to-image-online-deployment.ipynb |
Batch | safe-text-to-image-batch-endpoint.ipynb |
Sample input and output
Sample input
{
"input_data": {
"columns": ["prompt"],
"data": ["A photo of an astronaut riding a horse on Mars"],
"index": [0]
}
}
Sample output
[
{
"prompt": "A photo of an astronaut riding a horse on Mars",
"generated_image": "image",
"nsfw_content_detected": null
}
]
Note:
- "image" string is in base64 format.
- The
deci-decidiffusion-v1-0
model checks for the NSFW content in generated image. We highly recommend to use the model with Azure AI Content Safety (AACS) . Please refer sample online and batch notebooks for AACS integrated deployments.
Visualization of inference result for a sample prompt - "a photograph of an astronaut riding a horse"
/output_deci_decidiffusion_v1_0.png)
Model Specifications
LicenseCreativeml-openrail++-m
Last UpdatedJune 2024
PublisherDeci AI