Welcome to the Fugue Tutorials!#

All questions are welcome in the Slack channel.

Binder ⬅️ Launch these tutorials in Binder

Homepage ⬅️ Check out our source code

Slack Status ⬅️ Chat with us on Slack

What Does Fugue Do?#

Fugue provides an easier interface to using distributed compute effectively and accelerates big data projects. It does this by minimizing the amount of code you need to write, in addition to taking care of tricks and optimizations that lead to more efficient execution on distrubted compute.

Quick Links:

  • Bringing a Python/Pandas function to Spark or Dask? Check the Fugue Transform section.

  • Need a SQL interface on top of Pandas, Spark and Dask? Check the FugueSQL section.

  • For previous conference presentations and blog posts, Check the Resources section.

Installation#

In order to setup your own environment, you can pip install the package. This includes Fugue on native python, Spark and Dask, with Fugue SQL support.

  • Spark requires Java to be installed separately.

pip install fugue[all]

Running the Code#

The simplest way to run the tutorial interactively is to use mybinder. Binder spins up an environment using a container.

  • Some code snippets run slow on binder as the machine on binder isn’t powerful enough for a distributed framework such as Spark.

  • Parallel executions can become sequential, so some of the performance comparison examples will not give you the correct numbers.

Alternatively, you should get decent performance if running its docker image on your own machine:

docker run -p 8888:8888 fugueproject/tutorials:latest