Introduction
The following article (is accompanied with a very ‘techy’-heavy warning!) will provide a step-by-step guide to installing Hadoop version 2.6. on a CentOS 7 using an rpm built in a 64 version of the OS.
Prerequisites
- Time (server hours)
- Internet connection (in some capacity)
- VMBox (or a cloud machine access)
- Root access (you can install Hadoop without root access to the system, however it is a bit more complicated. Remember, root access is required only during the installation phase, not for application/service execution)!
How to:
1. Download VMWare player* or Oracle Virtual Box.
2. Download CentOS 7 ISO image** or any other distro based on RHEL.
3. Install VM software.
4. Install the ISO image.
5. Launch Installed VM.
6. Open Terminal.
7. Switch to root user.
8. Execute the following:
# sudo su –
# sudo yum update
9. Install all updates and remove existing JAVA:
# sudo yum remove java
10. Download Oracle JAVA***
a. Download the 64bit .rpm package
b. Execute # yum localinstall <java_pachakge_name>.rpm
11. Set JAVA_HOME****
# vi /etc/profile.d/java.sh
12. Add the following lines:
#!/bin/bash
JAVA_HOME=/usr/java/default
PATH=$JAVA_HOME/bin:$PATH
export PATH JAVA_HOME
# chmod +x /etc/profile.d/java.sh
# source /etc/profile.d/java.sh
13. Check java:
# java -version
Which should return the java version:
# echo $JAVA_HOME
Which in turn should return the java home dir path.
14. Download Maven
# tar -zxvf <maven_pachage_name>.tar.gz -C /opt/
15. Set M3_HOME
# vi /etc/profile.d/maven.sh
16. Add the following lines:
#!/bin/bash
M3_HOME=/opt/<maven_dir_name>
PATH=$M3_HOME/bin:$PATH
export PATH M3_HOME
# chmod +x /etc/profile.d/maven.sh
# source /etc/profile.d/maven.sh
17. Check Maven
# mvn -version
Which should return the Maven version:
# echo $M3_HOME
Which in turn should return the Maven home dir path.
18. Download the following tools for Hadoop native code compilation.
# yum group install “Development
#yum install openssl-devel zlib-devel
19. Download
# yum -y install protobuf-*****
20. Prep for Hadoop: execute the following commands
# groupadd hadoop
# useradd -g hadoop yarn (Note: yarn user – is going to be used for node manager)
# useradd -g hadoop hdfs (Note: hdfs user – is for things related to the hdfs file system)
# useradd -g hadoop mapred (Note: mapred user – related to map reduce jobs)
(Note: You can add passwd to users if you like.)
21. Login to hdfs (Note: This step is required as Hadoop needs a ssh connection without a passphrase.)
# su – hdfs
# ssh-keygen -t rsa -P “”
# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# chmod 0600 ~/.ssh/authorized_keys
22. Test ssh
# ssh localhost date
# yes
23. Exit hdfs user
# exit
24. Download Apache Hadoop (source)
25. Extract tar file into /opt Dir
# tar -zxvf <hadoop_pachage_name>.tar.gz -C /opt/
26. Navigate to the new Hadoop dir
# cd /opt/<hadoop_dir_name>/
27. Edit the pom.xml file and add <additionalparam>-Xdoclint:none</additionalparam> to the properties section. For
…
<!– platform encoding override –>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<additionalparam>-Xdoclint:none</additionalparam>
</properties>
…
(Note: This step in only required if you decided to use Java 8.)
28. Execute the following commands:
# cd ..
# chown hdfs:hadoop <hadoop_dir_name> -R
(Note: Make sure that no permission blocks exist).
29. Build the native Hadoop library
# su – hdfs
# cd /opt/<hadoop_dir_name>
# mvn package -Pdist,native -DskipTests -Dtar
Go grab some coffee/tea… This step is not mandatory, but recommended! Here’s what you should see by the end of the process.
[INFO] ————————————————————————
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main …………………………… SUCCESS [ 16.389 s]
[INFO] Apache Hadoop Project POM …………………….. SUCCESS [ 6.905 s]
[INFO] Apache Hadoop Annotations …………………….. SUCCESS [ 8.923 s]
[INFO] Apache Hadoop Assemblies ……………………… SUCCESS [ 0.340 s]
[INFO] Apache Hadoop Project Dist POM ………………… SUCCESS [ 5.277 s]
[INFO] Apache Hadoop Maven Plugins …………………… SUCCESS [ 8.378 s]
[INFO] Apache Hadoop MiniKDC ………………………… SUCCESS [02:25 min]
[INFO] Apache Hadoop Auth …………………………… SUCCESS [01:47 min]
[INFO] Apache Hadoop Auth Examples …………………… SUCCESS [ 4.060 s]
[INFO] Apache Hadoop Common …………………………. SUCCESS [03:10 min]
[INFO] Apache Hadoop NFS ……………………………. SUCCESS [ 7.413 s]
[INFO] Apache Hadoop KMS ……………………………. SUCCESS [ 45.635 s]
[INFO] Apache Hadoop Common Project ………………….. SUCCESS [ 0.046 s]
[INFO] Apache Hadoop HDFS …………………………… SUCCESS [02:32 min]
[INFO] Apache Hadoop HttpFS …………………………. SUCCESS [ 21.490 s]
[INFO] Apache Hadoop HDFS BookKeeper Journal ………….. SUCCESS [ 17.206 s]
[INFO] Apache Hadoop HDFS-NFS ……………………….. SUCCESS [ 4.122 s]
[INFO] Apache Hadoop HDFS Project ……………………. SUCCESS [ 0.044 s]
[INFO] hadoop-yarn …………………………………. SUCCESS [ 0.054 s]
[INFO] hadoop-yarn-api ……………………………… SUCCESS [ 37.593 s]
[INFO] hadoop-yarn-common …………………………… SUCCESS [01:36 min]
[INFO] hadoop-yarn-server …………………………… SUCCESS [ 0.036 s]
[INFO] hadoop-yarn-server-common …………………….. SUCCESS [ 15.557 s]
[INFO] hadoop-yarn-server-nodemanager ………………… SUCCESS [ 42.800 s]
[INFO] hadoop-yarn-server-web-proxy ………………….. SUCCESS [ 2.961 s]
[INFO] hadoop-yarn-server-applicationhistoryservice ……. SUCCESS [ 6.280 s]
[INFO] hadoop-yarn-server-resourcemanager …………….. SUCCESS [ 20.282 s]
[INFO] hadoop-yarn-server-tests ……………………… SUCCESS [ 5.231 s]
[INFO] hadoop-yarn-client …………………………… SUCCESS [ 7.769 s]
[INFO] hadoop-yarn-applications ……………………… SUCCESS [ 0.031 s]
[INFO] hadoop-yarn-applications-distributedshell ………. SUCCESS [ 3.625 s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher ….. SUCCESS [ 2.082 s]
[INFO] hadoop-yarn-site …………………………….. SUCCESS [ 0.038 s]
[INFO] hadoop-yarn-registry …………………………. SUCCESS [ 5.406 s]
[INFO] hadoop-yarn-project ………………………….. SUCCESS [ 6.252 s]
[INFO] hadoop-mapreduce-client ………………………. SUCCESS [ 0.080 s]
[INFO] hadoop-mapreduce-client-core ………………….. SUCCESS [ 22.981 s]
[INFO] hadoop-mapreduce-client-common ………………… SUCCESS [ 17.918 s]
[INFO] hadoop-mapreduce-client-shuffle ……………….. SUCCESS [ 4.349 s]
[INFO] hadoop-mapreduce-client-app …………………… SUCCESS [ 10.538 s]
[INFO] hadoop-mapreduce-client-hs ……………………. SUCCESS [ 8.806 s]
[INFO] hadoop-mapreduce-client-jobclient ……………… SUCCESS [ 9.771 s]
[INFO] hadoop-mapreduce-client-hs-plugins …………….. SUCCESS [ 1.889 s]
[INFO] Apache Hadoop MapReduce Examples ………………. SUCCESS [ 5.765 s]
[INFO] hadoop-mapreduce …………………………….. SUCCESS [ 4.789 s]
[INFO] Apache Hadoop MapReduce Streaming ……………… SUCCESS [ 8.040 s]
[INFO] Apache Hadoop Distributed Copy ………………… SUCCESS [ 9.787 s]
[INFO] Apache Hadoop Archives ……………………….. SUCCESS [ 2.165 s]
[INFO] Apache Hadoop Rumen ………………………….. SUCCESS [ 6.321 s]
[INFO] Apache Hadoop Gridmix ………………………… SUCCESS [ 4.502 s]
[INFO] Apache Hadoop Data Join ………………………. SUCCESS [ 2.613 s]
[INFO] Apache Hadoop Ant Tasks ………………………. SUCCESS [ 2.081 s]
[INFO] Apache Hadoop Extras …………………………. SUCCESS [ 3.048 s]
[INFO] Apache Hadoop Pipes ………………………….. SUCCESS [ 7.640 s]
[INFO] Apache Hadoop OpenStack support ……………….. SUCCESS [ 4.934 s]
[INFO] Apache Hadoop Amazon Web Services support ………. SUCCESS [ 24.968 s]
[INFO] Apache Hadoop Client …………………………. SUCCESS [ 8.046 s]
[INFO] Apache Hadoop Mini-Cluster ……………………. SUCCESS [ 0.084 s]
[INFO] Apache Hadoop Scheduler Load Simulator …………. SUCCESS [ 5.169 s]
[INFO] Apache Hadoop Tools Dist ……………………… SUCCESS [ 9.050 s]
[INFO] Apache Hadoop Tools ………………………….. SUCCESS [ 0.025 s]
[INFO] Apache Hadoop Distribution ……………………. SUCCESS [ 36.246 s]
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 20:28 min
[INFO] Finished at: 2015-11-23T07:50:32-08:00
[INFO] Final Memory: 215M/847M
[INFO] ————————————————————————
Configuration
1. Switch back to root
# exit
2. Move the native Hadoop to opt
# mv /opt/<hadoop_dir_name>/hadoop-dist/target/<hadoop_version> /opt/
3. Create data dir
# mkdir -p /var/data/hadoop/hdfs/nn
# mkdir -p /var/data/hadoop/hdfs/snn
# mkdir -p /var/data/hadoop/hdfs/dn
# chown hdfs:hadoop /var/data/hadoop/hdfs -R
4. Create log dir
# cd /opt/<hadoop_version>
(Note: This is the new dir we moved a few steps before)
# mkdir logs
# chmod g+w logs
# chown -R yarn:hadoop .
5. Set HADOOP
# vi /etc/profile.d/hadoop.sh
6. Add the following lines:
#!/bin/bash
HADOOP_HOME=/opt/<hadoop_dir_name>
PATH=$HADOOP_HOME/bin:$PATH
export PATH HADOOP_HOME
# chmod +x /etc/profile.d/hadoop.sh
# source /etc/profile.d/hadoop.s
7. Check Hadoop
# echo $HADOOP_HOME
This should return the Hadoop home dir path.
8. Configure Hadoop
# cd /opt/hadoop-2.6.2/etc/hadoop/
# vim core-site.xml
9. Add the following code inside of configuration:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>hdfs</value>
</property>
# vim hdfs-site.xml
10. Add the following code inside of configuration:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/var/data/hadoop/hdfs/nn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:/var/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:var/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/var/data/hadoop/hdfs/dn</value>
</property>
# vim mapred-site.xml
11. Add the following code inside of configuration:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>
# vim yarn-site.xml
12. Add the following code inside of configuration:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
13. Switch to hdfs user
# su – hdfs
# cd /opt/<hadoop_dir>/bin
# ./hdfs namenode -format
# cd /opt/<hadoop_dir>/sbin
# ./hadoop-daemon.sh start namenode
# ./hadoop-daemon.sh start secondarynamenode
# ./hadoop-daemon.sh start datanode
14. Create /mr-history in hdfs file system for job history
# hdfs dfs -mkdir -p /mr-history/tmp
# hdfs dfs -mkdir -p /mr-history/done
# hdfs dfs -chown -R yarn:hadoop /mr-history
15. Start YARN services
# su – yarn
# cd /opt/<hadoop_dir>/sbin
# ./yarn-daemon.sh start resourcemanager
# ./mr-jobhistory-daemon.sh start historyserver
Check the following:
- Check that the serves are up and running.
- Open a web-browser (Firefox recommended) and open two tabs with the following URL’s:
- http://localhost:50070
- http://localhost:8088
- Open a web-browser (Firefox recommended) and open two tabs with the following URL’s:
- Run a sample job to test that Hadoop is working
# su – hdfs
# export YARN_EXAMPLES=/opt/<hadoop_dir>/share/hadoop/mapreduce
# yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-<hadoop_version>.jar pi 8 100000
- You should start to see the execution in the terminal.
- You can also check in the browser in the job tracker how your job is performing.
You are all set. Your test environment is set. Of course you can change the configurations that I have provided here to your liking and needs.
*For the purpose of this demonstration, VMWare player 12.0.1 was used.
**CentOS 7 FullDVD ISO image was used. Ubuntu and other Debian based Linux distros will also work, but some installation steps may differ.
*** You can install the latest version Java or use the recommended version. A list of recommended versions can be found online.
**** There are several ways to set JAVA, I find this the easiest and it guaranties that on reboot JAVA PATH will always stay the same.
***** You can download the latest version of Protocol Buffers from “https://developers.google.com/protocol-buffers/“, but you will need to run a couple of extra commands. The above method is faster and it works just fine.
0 Comments