Welcome to the Fugue Tutorials!
Contents
Welcome to the Fugue Tutorials!#
Have questions? Chat with us on Github or Slack:
Fugue provides an easier interface to using distributed compute effectively and accelerates big data projects. It does this by minimizing the amount of code you need to write, in addition to taking care of tricks and optimizations that lead to more efficient execution on distrubted compute. Fugue ports Python, Pandas, and SQL code to Spark, Dask, and Ray.
Quick Links:
Scaling Pandas code to Spark, Dask, or Ray? Start with Fugue in 10 minutes.
Need a SQL interface on top of Pandas, Spark and Dask? Check FugueSQL in 10 minutes.
For previous conference presentations and blog posts, check the Content page.
Installation#
In order to setup your own environment, you can pip (or conda) install the package. Fugue can then
pip install fugue
Backend engines are installed separately through pip extras. For example, to install Spark:
pip install fugue[spark]
If Spark, Dask, or Ray are already installed on your machine, Fugue will be able to detect it. Spark requires Java to be installed separately.
Running the Code#
The simplest way to run the tutorial interactively is to use mybinder. Binder spins up an environment using a container.
Some code snippets run slow on binder as the machine on binder isn’t powerful enough for a distributed framework such as Spark.
Parallel executions can become sequential, so some of the performance comparison examples will not give you the correct numbers.
Alternatively, you should get decent performance if running the Docker image on your own machine:
docker run -p 8888:8888 fugueproject/tutorials:latest