Overview
Sanitize Parquet is a feature that processes structured data sources to extract metadata and prepare them for further processing such as data augmentation or text-to-SQL operations in agents. This feature works with any structured data format including parquet files, CSV files, databases, time series data, knowledge graphs, and data warehouses.Category
Structured Data - This feature is designed to work with structured data sources containing columns, tables, and relational data formats.Input
The Sanitize Parquet feature accepts data from the following input sources:-
File Upload - Upload structured files directly from your local system. Supported formats include:
- Parquet files: Columnar storage format optimized for analytics
- CSV files: Comma-separated values and other delimited text files
- Any data with columns: Tabular data formats with defined schemas
-
Live Data Connectors - Connect directly to live data sources without duplicating data:
- Databases: Direct connections to relational databases
- Data Warehouses: Snowflake, Databricks, and other data warehouse systems
- Time Series Data: Structured time-based data sources
- Knowledge Graphs: Graph-structured data with relationships
The input data source must be marked as “Structured Type” in your data room. The feature works with any data that has columns and a defined schema.
Output
Corvic Table - The Sanitize Parquet feature produces a new Corvic Table with extracted metadata to prepare for further processing such as augmentation or text-to-SQL operations in agents. The output includes:- Schema Metadata: Column names, data types, and constraints extracted from the input data
- Data Quality Metrics: Statistics and quality indicators for each column
- Relationship Information: Foreign keys, relationships, and dependencies between tables
- Processing Ready Format: Structured representation optimized for downstream operations
The output Corvic Table provides comprehensive metadata extraction that enables advanced processing operations including data augmentation, text-to-SQL query generation, and agent-based data operations. This metadata serves as the foundation for intelligent data transformations and analysis workflows.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
input_data_source | string | Yes | The structured data source to sanitize. Select a data source from your data room that contains structured data with columns. Can be parquet files, CSV files, database connections, time series data, knowledge graphs, or data warehouse connections (Snowflake, Databricks, etc.). The data source must be marked as “Structured Type”. |
output_name | string | No | Optional custom name for the output Corvic Table. If not provided, a default name will be automatically generated based on the input data source name. |
Usage Example
To use Sanitize Parquet in a Data App:- Add your structured data source to the Data App canvas
- Click the ”+” button next to the data source
- Select “Sanitize Parquet” from the actions menu
- Select the input data source (if not already selected)
- Optionally provide a name for the output Corvic Table
- Run the Data App to execute the sanitization
- Review the generated Corvic Table containing extracted metadata ready for further processing such as augmentation or text-to-SQL operations

