Gretel Navigator Tabular
Gretel Navigator Tabular
Version: 1
GretelLast updated January 2025
Gretel Navigator Tabular generates production-quality synthetic data optimized for AI and machine learning development from prompts, schema definitions, or seed examples. Unlike single-LLM approaches to data generation, Navigator Tabular employs a compound AI architecture specifically engineered for synthetic data, combining top open-source SLM models fine-tuned across 10+ industry domains. This purpose-built system creates diverse, domain-specific datasets at scales of hundreds to millions of examples while preserving complex statistical relationships and offering increased speed and accuracy compared to manual data creation.

Key Features

  • Natural language interface to specify data requirements
  • Schema-based data generation
  • Real-time and streaming data generation
  • Dataset augmentation and modification
  • Structured data supported as LLM inputs and outputs

Top Use Cases

  • Creating synthetic data for LLM training and fine-tuning
  • Generating evaluation datasets for AI models and RAG systems
  • Augmenting limited training data with diverse synthetic samples
  • Creating realistic PII/PHI data for model testing

Documentation and Resources

Input Examples

Natural Language Prompts

Generate customer bank transaction data with the following columns:
- customer_name: Full names in Western format
- transaction_date: Dates within the last 30 days
- transaction_amount: Dollar amounts between $1-$10,000
- transaction_type: Either 'debit' or 'credit'
- transaction_category: Common banking categories like 'dining', 'retail', 'utilities'
- account_balance: Running balance after each transaction

Schema-Based Input

CREATE TABLE transactions (
    customer_name VARCHAR(100),
    customer_id CHAR(8),
    transaction_date DATE,
    transaction_amount DECIMAL(10,2),
    transaction_type VARCHAR(6),
    transaction_category VARCHAR(50),
    account_balance DECIMAL(10,2)
);

Data Generation Architecture

  • Agentic workflow system for synthetic data generation
  • Multi-modal support (tabular, text)
  • Scalable generation (up to millions of records)
  • Underlying LLMs fine-tuned by Gretel on 10 different industry data and formats including healthcare, life sciences, financial, manufacturing, retail

Example Open Datasets

High-quality open synthetic datasets created using Navigator available on HuggingFace:

Service Limitations

  • Fine-tuning capability for Gretel Navigator Tabular is not yet available on Azure AI.

Responsible AI Considerations

Navigator Tabular is designed to democratize synthetic data generation while upholding high standards of responsible AI development. The system incorporates automated alignment checks to detect the generation of harmful or discriminatory data while respecting legitimate use cases across industries. Navigator Tabular is trained exclusively on high-quality, license-compliant datasets spanning 10+ sectors, ensuring both legal compliance and output quality. However, like any advanced AI system, Navigator Tabular may occasionally produce unexpected or biased outputs. We therefore recommend that users conduct appropriate testing and validation for their specific use cases. Gretel’s governance framework includes privacy-preserving architecture, regular security audits, and continuous monitoring for bias and quality control. Through ongoing model updates and strict access controls, we maintain alignment with responsible AI principles while protecting against potential misuse. Users are encouraged to review our Responsible Use Guidelines and implement appropriate safety measures based on their specific applications and industry requirements.
Model Specifications
LicenseLlama 3.1 community licensed
Training DataDec 2023
Last UpdatedJanuary 2025
Input TypeText,Json,Csv
Output TypeText,Json,Csv
PublisherGretel
Languages1 Language