Welcome to the Fugue Tutorials!
Welcome to the Fugue Tutorials!#
Have questions? Chat with us on Github or Slack:
What Does Fugue Do?#
Fugue provides an easier interface to using distributed compute effectively and accelerates big data projects. It does this by minimizing the amount of code you need to write, in addition to taking care of tricks and optimizations that lead to more efficient execution on distrubted compute. Fugue ports Python, Pandas, and SQL code to Spark, Dask, and Ray.
In order to setup your own environment, you can pip (or conda) install the package. This includes Fugue on native python, Spark and Dask, with Fugue SQL support.
Spark requires Java to be installed separately.
pip install fugue
Backend engines are installed separately through pip extras. For example, to install Spark:
pip install fugue[spark]
If Spark, Dask, or Ray are already installed on your machine, Fugue will be able to detect it.
Running the Code#
The simplest way to run the tutorial interactively is to use mybinder. Binder spins up an environment using a container.
Some code snippets run slow on binder as the machine on binder isn’t powerful enough for a distributed framework such as Spark.
Parallel executions can become sequential, so some of the performance comparison examples will not give you the correct numbers.
Alternatively, you should get decent performance if running its docker image on your own machine:
docker run -p 8888:8888 fugueproject/tutorials:latest