01-05-2021



After reading this, you will be able to execute python files and jupyter notebooks that execute Apache Spark code in your local environment. This tutorial applies to OS X and Linux systems. We assume you already have knowledge on python and a console environment.

Studio

1. Download Apache Spark

Visual Studio Python Extension Mac

We will download the latest version currently available at the time of writing this: 3.0.1 from the official website.

Download it and extract it in your computer. The path I'll be using for this tutorial is /Users/myuser/bigdata/spark This folder will contain all the files, like this

Visual Studio on a Mac: The Best of Both Worlds. With these tweaks, I’ve come to love using Visual Studio on a Mac. The performance is good, and by running Windows in a virtual machine, I get the best of both OS worlds. Want to see what I’m building with this setup? Check out our open-source.NET SDK on Github. Python support is not presently available in Visual Studio for Mac, but is available on Mac and Linux through Visual Studio Code. See questions and answers. Visual Studio 2019 and Visual Studio 2017 Download and run the latest Visual Studio installer. Python 3; Install Visual Studio Code and the Python Extension. If you have not already done so, install VS Code. Next, install the Python extension for VS Code from the Visual Studio Marketplace. For additional details on installing extensions, see Extension Marketplace. The Python extension is named Python and it's published by Microsoft.

Now, I will edit the .bashrc file, located in the home of your user

Then we will update our environment variables so we can execute spark programs and our python environments will be able to locate the spark libraries.

Save the file and load the changes executing $ source ~/.bashrc. If this worked, you will be able to open an spark shell.

We are now done installing Spark.

2. Install Visual Studio Code

One of the good things of this IDE is that allows us to run Jupyter notebooks within itself. Follow the Set-up instructions and then install python and the VSCode Python extension.

Then, open a new terminal and install the pyspark package via pip $ pip install pyspark. Note: depending on your installation, the command changes to pip3.

3. Run your pyspark code

Create a new file or notebook in VS Code and you should be able to execute and get some results using the Pi example provided by the library itself.

Troubleshoot

Visual Studio Community

Visual studio community

If you are in a distribution that by default installs python3 (e.g. Ubuntu 20.04), pyspark will mostly fail with a message error like pysparkenv: 'python': No such file or directory.

The first option to fix it is to add to your .profile or .bashrc files the following content

Remember to always reload the configuration via source .bashrc

In this case, the solution worked if I executed pyspark from the command line but not from VSCode's notebook. Since I am using a distribution based on debian, installing tehe following package fixed it:

sudo apt-get install python-is-python3