Install Apache Spark for Ubuntu

Thursday, 11/02/2021

Tram Ho

Apache Spark is a framework for handling big data. The platform gained widespread popularity due to its ease of use and improved data processing speed over Hadoop. Apache Spark can distribute the workload on a group of computers in a cluster for more efficient processing of large data sets. This open source tool supports many programming languages such as: Java, Scala, Python and R. In this article, I will share how to install and configure Apache Spark on Ubuntu

Install the necessary packages for Spark

Before you want to install Apache Spark, on your computer must have installed the following environments: Java, Scala, Git. If not, open your terminal and install them all with the following command:

sudo apt install default-jdk scala git -y

1 2	sudo apt install default-jdk scala git -y

To check whether Java and Scala environments are installed on your machine, use the following command:

java -version; javac -version; scala -version; git --version

1 2	java -version; javac -version; scala -version; git --version

Download and set up Spark for Ubuntu

To download Apache Spark for Ubuntu, you visit the website https://spark.apache.org/downloads.html , then choose to download the appropriate version for your computer.
Copy the compressed file to wherever you want to put Spark, usually the added software will be placed in the / opt directory of Ubuntu, but I can find it anywhere as long as you find it convenient for yourself. . Extract the directory with the following command:

tar xvzf &lt;ten_file_nen_spark&gt;.tgz

1 2	tar xvzf <ten_file_nen_spark>.tgz

Configure Spark environment

In the Home folder, open hidden folders, then navigate to the .profile file, add the following line at the end of the file:

export SPARK_HOME=&lt;duong_dan_toi_thu_muc_ban_vua_dat_spark&gt;
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PYSPARK_PYTHON=/usr/bin/python3

export SPARK_HOME=<duong_dan_toi_thu_muc_ban_vua_dat_spark>

export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

export PYSPARK_PYTHON=/usr/bin/python3

For example, I configured my .profile file as follows (oh forgot, so your computer must have python3 too):

SPARK_HOME=/media/trannguyenhan01092000/LEARN/spark-3.0.1-bin-hadoop2.7
PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
PYSPARK_PYTHON=/usr/bin/python3

SPARK_HOME=/media/trannguyenhan01092000/LEARN/spark-3.0.1-bin-hadoop2.7

PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

PYSPARK_PYTHON=/usr/bin/python3

Start Spark Standalone

Move your terminal to the Sbin folder in your Spark directory with the command cd. Run the following command to launch Spark:

./start-master.sh

1 2	./start-master.sh

To see the Spark Web UI, open a web browser and enter the localhost IP address on port 8080:

http://127.0.0.1:8080/

1 2	http://127.0.0.1:8080/

(Remember to turn off all other applications that share ports with Spark to avoid conflicts.)
Here is the interface after a successful launch (wait for about 10 seconds):

Reference: https://tailieu-bkhn.blogspot.com/ , https://phoenixnap.com/

Share the news now

Source : Viblo

Install Apache Spark for Ubuntu

Install the necessary packages for Spark

Download and set up Spark for Ubuntu

Configure Spark environment

Start Spark Standalone

TikTok becomes the second largest social platform in South Africa

The fastest depreciating after 9 months of launch, iPhone 14 Pro Max continues to break the bottom in Vietnam

Beginner's guide to R: Introduction

10 essential SublimeText plugins for JavaScript developers