Using Python#

FugueSQL integrates with Python by creating extensions and applying them in the %%fsql cells. This tutorial will show an example of applying a Fugue Transformer. Additionally, we’ll see the simpler ways that the Python interface and FugueSQL interact with each other.

from fugue_jupyter import setup
setup()
import pandas as pd
df = pd.DataFrame({"number": [0,1], "word": ["hello", "world"]})

Jinja Templating to Insert Variables#

Before going to functions, the simplest way FugueSQL integrates with Python is through Jinja templating. DataFrames defined previously are automatically accessible by the DAG. Variables on the other hand, need to be passed with Jinja templating.

x=0
%%fsql
SELECT * FROM df WHERE number={{x}}  # see we can use variable x directly
PRINT
number word
0 0 hello
schema: number:long,word:str

Using Transformers#

Fugue has different extensions that allow Python to interact with SQL. Transformer is most commonly used because it modifies that data in a dataframe. Below we create a Transformer in Python and apply it in the FugueSQL. More on Transformer syntax can be found here.

import re
from typing import Iterable, Dict, Any

# schema: *, vowel_count:int, consonant_count:int
def letter_count(df:Iterable[Dict[str,Any]]) -> Iterable[Dict[str,Any]]:
    for row in df:
        row['vowel_count'] = len(re.findall(r'[aeiou]', row['word'], flags=re.IGNORECASE))
        space_count = len(re.findall(r'[ -]', row['word'], flags=re.IGNORECASE))
        row['consonant_count'] = len(row['word']) - row['vowel_count'] - space_count
        yield row
%%fsql
SELECT * 
FROM df 
WHERE number=1
TRANSFORM USING letter_count
PRINT
number word vowel_count consonant_count
0 1 world 1 4
schema: number:long,word:str,vowel_count:int,consonant_count:int