FugueSQL

Contents

FugueSQL

FugueSQL is an alternative to the Fugue Python interface. Both are used to describe your end-to-end workflow logic. The SQL semantic is platform and scale agnostic, so if you write logic in SQL, it’s very high level and abstract, and the underlying computing frameworks will try to excute them in the optimal way.

The syntax of FugueSQL is between standard SQL, json and python. The goals behind this design are:

  • To be fully compatible with standard SQL SELECT statement

  • To create a seamless flow between SQL and Python coding

  • To minimize syntax overhead, to make code as short as possible while still easy to read

There is a full FugueSQL tutorial. so we will only cover the basics here. The FugueSQL tutorial is also accesible by new users.

Hello World

To use FugueSQL, you need to make sure you have installed the SQL extra

pip install fugue[sql]

This lets us import the %%fsql cell magic and use FugueSQL cells in Jupyter notebooks

from fugue_notebook import setup
setup ()
import pandas as pd
df = pd.DataFrame([[0,"hello"],[1,"world"]],columns = ['a','b'])
df.head()
a b
0 0 hello
1 1 world

FugueSQL will be mapped to the same operations of the programming interface. All ANSI SQL keywords are available in FugueSQL

%%fsql
SELECT * 
  FROM df
 WHERE a=0 
 PRINT
a b
0 0 hello
schema: a:long,b:str

Similar to the programming interface of Fugue, we can also bring it to Spark and Dask by specifying the SQL engine.

%%fsql spark
SELECT * 
  FROM df
 WHERE a=0 
 PRINT
a b
0 0 hello
schema: a:long,b:str

This interface lets users implement their logic in SQL. There are also ways to combine Fugue-SQL and Fugue extensions with Python. Those will be shown in the full Fugue SQL tutorial. There is also an example of an end-to-end workflow in the COVID-19 examples