Have questions? Chat with us on Github or Slack:
FugueSQL is an alternative to the Fugue Python interface. Both are used to describe your end-to-end workflow logic. The SQL semantic is platform and scale-agnostic, so if you write logic in SQL, it’s very high level and abstract. The underlying computing frameworks will try to excute them in the optimal way.
The syntax of FugueSQL is between standard SQL, json, and python. The goals behind this design are:
To be fully compatible with standard SQL
To create a seamless flow between SQL and Python coding
To minimize syntax overhead, to make code as short as possible while still easy to read
There is a full FugueSQL tutorial, so we will only cover the basics here. The FugueSQL tutorial is also accessible by new users.
To use FugueSQL, you need to make sure you have installed the SQL extra
pip install fugue[sql]
This lets us import the
%%fsql cell magic and use
FugueSQL cells in Jupyter notebooks.
from fugue_notebook import setup setup ()
import pandas as pd df = pd.DataFrame([[0,"hello"],[1,"world"]],columns = ['a','b']) df.head()
FugueSQL will be mapped to the same operations of the programming interface. All ANSI SQL keywords are available in FugueSQL.
%%fsql SELECT * FROM df WHERE a=0 PRINT
Similar to the programming interface of Fugue, we can also bring it to Spark and Dask by specifying the SQL engine.
%%fsql spark SELECT * FROM df WHERE a=0 PRINT
This interface lets users implement their logic in SQL. There are also ways to combine Fugue-SQL and Fugue extensions with Python. Those will be shown in the full Fugue SQL tutorial. There is also an example of an end-to-end workflow in the COVID-19 examples.