FugueSQL
Contents
FugueSQL#
Have questions? Chat with us on Github or Slack:
FugueSQL is an alternative to the Fugue Python interface. Both are used to describe your end-to-end workflow logic. The SQL semantic is platform and scale-agnostic, so if you write logic in SQL, it’s very high level and abstract. The underlying computing frameworks will try to excute them in the optimal way.
The syntax of FugueSQL is between standard SQL, json, and python. The goals behind this design are:
To be fully compatible with standard SQL
SELECT
statementTo create a seamless flow between SQL and Python coding
To minimize syntax overhead, to make code as short as possible while still easy to read
There is a full FugueSQL tutorial, so we will only cover the basics here. The FugueSQL tutorial is also accessible by new users.
Hello World#
To use FugueSQL, you need to make sure you have installed the SQL extra
pip install fugue[sql]
This lets us import the %%fsql
cell magic and use FugueSQL
cells in Jupyter notebooks.
from fugue_notebook import setup
setup ()
import pandas as pd
df = pd.DataFrame([[0,"hello"],[1,"world"]],columns = ['a','b'])
df.head()
a | b | |
---|---|---|
0 | 0 | hello |
1 | 1 | world |
FugueSQL will be mapped to the same operations of the programming interface. All ANSI SQL keywords are available in FugueSQL.
%%fsql
SELECT *
FROM df
WHERE a=0
PRINT
a | b | |
---|---|---|
0 | 0 | hello |
Similar to the programming interface of Fugue, we can also bring it to Spark and Dask by specifying the SQL engine.
%%fsql spark
SELECT *
FROM df
WHERE a=0
PRINT
a | b | |
---|---|---|
0 | 0 | hello |
This interface lets users implement their logic in SQL. There are also ways to combine Fugue-SQL and Fugue extensions with Python. Those will be shown in the full Fugue SQL tutorial. There is also an example of an end-to-end workflow in the COVID-19 examples.