Here, we look at the backends supported by Fugue. Backends are the execution engines that Fugue runs code on top of. It is also common to mix and match execution engines. For example, big data processing can happen on SparkSQL and then DuckDB can be used on the smaller processed subset of data.
Have questions? Chat with us on Github or Slack:
This is in addition to Spark, Dask, and Ray
Ibis is a Python framework to write analytical workloads on top of data warehouses (along with DataFrames). Ibis can be used in conjunction with Fugue to query from data warehouses.
Polars is a DataFrame library written in Rust (with a Python API) that supports multi-threaded and out-of-core operations. Polars already parallelizes operations well on a local machine. Fugue’s integration is focused on allowing Polars code to run on top of a cluster with Spark, Dask, or Ray. There are certain use cases where this will increase the performance of distributed applications.
This is in addition to SparkSQL
Dask-sql is a Dask project that provides a SQL interface on top of Dask DataFrames (including Dask on GPU). FugueSQL can use the Dask-SQL backend to run Dask-SQL and Dask code together.
DuckDB is an in-process SQL OLAP database management system. It is similar to SQLite but optimized for analytical workloads. DuckDB performs optimizations of queries, allowing it to be 10x - 100x more performant than Pandas in some cases. Good use cases are testing locally, and then moving to SparkSQL when running on big data, or using DuckDB to query initial data before working with local Pandas for more complicated transformations.