Vai al contenuto principale

DuckDB

DuckDB is an embedded analytical database that runs in-process inside ilum-core . It provides zero-overhead, single-node SQL execution for small-to-medium data and ad-hoc exploration. Combined with the DuckLake catalog, DuckDB is a first-class option for fast local analytics over object storage.

DuckDB is abilitato per impostazione predefinita in Ilum.

When to use DuckDB

DuckDB is the right engine for:

  • Quick queries on small-to-medium datasets.
  • Ad-hoc exploration where pod startup latency would be a bottleneck.
  • Analytics over DuckLake-managed tables.
  • Single-user, single-node workloads.
  • Rapid prototyping before scaling out to Spark or Trino.

For distributed workloads on large data, prefer Apache Spark . For interactive analytics on large data with concurrent users, prefer Trino .

Execution model

DuckDB runs in-processcon ilum-core :

  • No driver pod, no executor pods, no network round-trips for query execution.
  • Single-node parallelism via DuckDB's vectorized execution engine.
  • Direct reads from object storage (MinIO, S3, GCS, Azure Blob, HDFS) without copying data into a cluster.

This model delivers sub-second response times on small queries that would otherwise be dominated by Spark or Trino startup overhead.

DuckLake catalog

DuckLake is a DuckDB-native catalog enabled by default in Ilum. Tables created through DuckLake are stored on S3-compatible object storage and accessible through DuckDB SQL with no additional configuration.

DuckLake is the default catalog for new DuckDB workloads. Hive Metastore tables remain accessible to DuckDB through standard catalog connectors.

Supported table formats

DuckDB reads and writes:

  • Parquet: Native, with predicate pushdown and zone maps.
  • CSV , JSON: Direct read with schema inference.
  • DuckLake-managed tables: ACID writes through DuckLake.
  • Lago Delta e Iceberg : Read access through DuckDB extensions.

Configurazione

DuckDB and DuckLake are enabled out of the box. The relevant Helm values:

ilum-core : 
SQL :
duckdb:
Abilitato : vero
idleTimeout: 1h
ducklake:
Abilitato : vero

DuckLake table data is stored in MinIO (or any configured S3-compatible backend) at a path configurable through ilum-core.sql.duckdb.ducklake.path.

Selecting DuckDB in the SQL Editor

In the Ilum SQL Editor, the Engine Selector dropdown lets you choose DuckDB for any query. The engine status indicator confirms the in-process engine is ready.

When the automatic engine router is enabled, DuckDB is selected automatically for queries that target small datasets, DuckLake-managed tables, or ad-hoc exploration patterns.

Limitations

  • DuckDB is single-node; it does not scale horizontally across executors.
  • Query concurrency is bounded by the resources allocated to ilum-core .
  • Long-running queries should use Spark or Trino instead, both for resource isolation and for failure recovery.