Install Apache Spark for Ubuntu

Tram Ho

Apache Spark is a framework for handling big data. The platform gained widespread popularity due to its ease of use and improved data processing speed over Hadoop. Apache Spark can distribute the workload on a group of computers in a cluster for more efficient processing of large data sets. This open source tool supports many programming languages ​​such as: Java, Scala, Python and R. In this article, I will share how to install and configure Apache Spark on Ubuntu

Install the necessary packages for Spark

Before you want to install Apache Spark, on your computer must have installed the following environments: Java, Scala, Git. If not, open your terminal and install them all with the following command:

To check whether Java and Scala environments are installed on your machine, use the following command:

Download and set up Spark for Ubuntu

To download Apache Spark for Ubuntu, you visit the website , then choose to download the appropriate version for your computer.
Copy the compressed file to wherever you want to put Spark, usually the added software will be placed in the / opt directory of Ubuntu, but I can find it anywhere as long as you find it convenient for yourself. . Extract the directory with the following command:

Configure Spark environment

In the Home folder, open hidden folders, then navigate to the .profile file, add the following line at the end of the file:

For example, I configured my .profile file as follows (oh forgot, so your computer must have python3 too):

Start Spark Standalone

Move your terminal to the Sbin folder in your Spark directory with the command cd. Run the following command to launch Spark:

To see the Spark Web UI, open a web browser and enter the localhost IP address on port 8080:

(Remember to turn off all other applications that share ports with Spark to avoid conflicts.)
Here is the interface after a successful launch (wait for about 10 seconds):

Reference: ,

Share the news now

Source : Viblo