Welcome to the Fugue Tutorials!
Welcome to the Fugue Tutorials!#
All questions are welcome in the Slack channel.
What Does Fugue Do?#
Fugue provides an easier interface to using distributed compute effectively and accelerates big data projects. It does this by minimizing the amount of code you need to write, in addition to taking care of tricks and optimizations that lead to more efficient execution on distrubted compute.
In order to setup your own environment, you can pip install the package. This includes Fugue on native python, Spark and Dask, with Fugue SQL support.
Spark requires Java to be installed separately.
pip install fugue[all]
Running the Code#
The simplest way to run the tutorial interactively is to use mybinder. Binder spins up an environment using a container.
Some code snippets run slow on binder as the machine on binder isn’t powerful enough for a distributed framework such as Spark.
Parallel executions can become sequential, so some of the performance comparison examples will not give you the correct numbers.
Alternatively, you should get decent performance if running its docker image on your own machine:
docker run -p 8888:8888 fugueproject/tutorials:latest