Thursday, May 7, 2015

Hortonworks HDP Installation using Ambari

Hadoop.HDP.Installation.Ambari
1.     Introduction
This guide is intended to address Hortonworks HDP deployment using Apache Ambari 2.X on VMware virtualized environment.

2.     Environment
This installation guide is appropriate for below environment.
·         Apache Ambari Server 2.X
·         Hortonworks HDP 2.x
·         VMware vSphere 5.5 or later
·         RHEL/CentOS 6.5 or later
·         Internet Explorer 10 or later/Firefox 18 or later/Chrome 26 or later

·         Installation
1.     Overview
Below is the overview of the installation process that this document will describe.
·    Confirm prerequisites
·    Setup one VM for the DNS and NTP server for HDP cluster
·    Setup one VM for Ambari Server
·    Setup 4 VM for HDP cluster node servers
·    Setup SSH Public Key authentication on Ambari Server
·    Use Ambari Manager to deploy HDP cluster

·         Confirm Prerequisites
1.     Prepare VMware virtualized environment
Before you start the installation process, a VMware virtualized environment must be ready to provision the virtual machines required for this deployment. ESXi 5.5 and later revision is recommend for the virtualized environment.
·    Determine Stack Compatibility
Check Ambari & HDP Compatibility Matrix to make sure your Ambari and HDP stack versions are compatible.
Here is the link for compatibility matrix:
3.Prepare the installation images
·         RHEL/CentOS 6.5 or 6.6 DVD ISO files
·         Windows Installation ISO files (Win7, Win2008, Win2012, etc)
·         Internet Explorer 11 installation image
·         Tarball Repository Files
·         Ambari 2.2.2.0
·         HDP 2.4.2.0
·         HDP Utility 1.1.0.20
·    Meets System Requirements
·         Operating System Requirement (RHEL/CentOS 6.x/7.x)
·         Browser Requirements (IE 10/Firefox 18/Chrome 26)
·         HDP Cluster Nodes Requirements
·         OpenSSL (v1.01, build 16 or later) ~ version check command: openssl version && rpm -qa | grep openssl
·         Python (RHEL/CentOS 6: 2.6.*, RHEL/CentOS 7: 2.7.*) ~ version check command: python -V
·         JDK Requirements (Oracle JDK + JCE)
·         Oracle JDK 1.8 64-bit (minimum JDK 1.8_60) (default)
·         Oracle JDK 1.7 64-bit (minimum JDK 1.7_67)
·         JDK Requirements (Open JDK)
·         Open JDK 7
·         JCE is not required for most of OpenJDK as it is included by default
Machine generated alternative text:
(rootÉIocaIhost Packages) # openssl version 
openssL 1 0.le-fips 11 Feb 2013 
openssI—I O . le O el 6. x86 64 
(root@localhost Packages) # python —V 
thon 2. 6.6 
(rootÉIocaIhost Packages) # java —version 
ava version "I.E. 0 77" 
rpm 
grep openssl 
Java (TM) SE Runtime Environment (build I . 8 0 77—b03) 
Java Hotspot (TM) 64—Bit Server (build 25 . 77—b03, mixed mode) 
(rootÉIocaIhost Packages) #
·         Memory Requirements
The Ambari host should have at least 1 GB RAM, with 500 MB free. To check available memory on any host, run:
free -m
Check Memory Requirement Matrix to confirm the memory requirements for your cluster (4GB RAM can support 100 hosts)
·    Collect Server Information
ns.bigdata.emc.com                                 192.168.1.1
ntp.bigdata.emc.com                               192.168.1.1
yumrepo.bigdata.emc.com                     192.168.1.1
jump.bigdata.emc.com                            192.168.1.5
hdp-ambari.bigdata.emc.com                192.168.1.10
hdp-master01.bigdata.emc.com            192.168.1.11
hdp-master02.bigdata.emc.com            192.168.1.12
hdp-worker01.bigdata.emc.com            192.168.1.21
hdp-worker02.bigdata.emc.com            192.168.1.22
hdp-worker03.bigdata.emc.com            192.168.1.23
hdp-worker04.bigdata.emc.com            192.168.1.24

3.     Setup one VM as DNS, NTP, YUM Repository Server
Once all the prerequisites are ready, you can start the installation process. The first step is to setup one VM which can be used as the DNS server, NTP server and yum repository server for HDP hadoop cluster deployment.
1.Create the Linux VM using below specification
vCPU
2 Cores
RAM
4GB
Operating System
RHEL 6.6
OS Disk Capacity
50GB
IP Address
192.168.1.1/255.255.255.0
Host Name
ns.bigdata.emc.local
DNS Server
192.168.1.1
Default Search Domain
bigdata.emc.local
2.Install BIND package to setup DNS server
1.     Install bind package
yum install bind
yum install bind-libs
yum install bind-utils
2.     Configure daemon to start automatically and disable firewall
chkconfig named on
chkconfig iptables off
service iptables stop
3.     Edit configuration file /etc/named.conf
<<named.conf.conf>>
4.     Edit configuration file /var/named/named-forward.zone
<<named-forward.zone.zone>>
5.     Edit configuration file /var/named/named-reverse.zone
<<named-reverse.zone.zone>>
6.     Correct the configuration files owner and permission
ls -l /var/named
chown named:named /var/named/named-forw.zone
chown named:named /var/named/named-rev.zone
ls -l /var/named
7.     Validate the configuration files
named-checkconf /etc/named.conf
named-checkzone bigdata.emc.local /var/named/named-forward.zone
named-checkzone bigdata.emc.local /var/named/named-reverse.zone
8.     Restart the service
service named restart

3.Install ntp package to setup the NTP server
1.     Setup the time zone
cp /usr/shared/zoneinfo/Asia/Shanghai /etc/localtime
2.     Install ntp package
yum install ntp
3.     Edit configuration file /etc/ntp.conf to add below content to setup the local clock server
server 127.127.1.0
fudge 127.127.1.0 stratum 10
4.     Configure daemon to start automatically
service ntpd start
chkconfig ntpd on

4.Install httpd package and setup yum repository server
1.     Install yum utility and httpd package
yum -y install yum-utils
yum -y install createrepo
yum -y install httpd
2.     Configure daemon to start automatically
service httpd start
chkconfig httpd on
3.     Create HDP tarball repository mount point and mount it (cdrom contains the tarball files)
mkdir -p /var/www/html/hdp
mount /dev/cdrom /var/www/html/hdp
4.     Create RHEL DVD repository mount point and mount it (cdrom contains the tarball files)
mkdir -p /var/www/html/dvd
mount /dev/cdrom1 /var/www/html/dvd
5.     Create file /etc/yum.repos.d/ambari.repo
<<ambari.repo.repo>>
6.     Validate yum repository
yum repolist all

4.     Setup one VM as Ambari Server
1.     Create the Linux VM using below specification
vCPU
2 Cores
RAM
4GB
Operating System
RHEL 6.6
OS Disk Capacity
50GB
IP Address
192.168.1.10/255.255.255.0
Host Name
hdp-ambari.bigdata.emc.local
DNS Server
192.168.1.1
Default Search Domain
bigdata.emc.local
2.Install ambari-server package to setup Ambari Server
1.     Create Ambari Repository configuration file /etc/yum.repos.d/ambari.repo
2.     Install Ambari Server
yum install ambari-server
3.     Setup Ambari Server
ambari-server setup --java-home /usr/lib/jvm/jre-1.7.0-openjdk.x86_64
4.     Configure Ambari Server to start automatically
service ambari-server start
chkconfig ambari-server on

5.     Setup 4 or 6 VMs for HDP cluster node server
1.     Create 6 Linux VMs using below specification
vCPU
2 Cores
RAM
4GB
Operating System
RHEL 6.6
OS Disk Capacity
50GB
IP Address
192.168.1.11/255.255.255.0
192.168.1.12/255.255.255.0
192.168.1.21/255.255.255.0
192.168.1.22/255.255.255.0
192.168.1.23/255.255.255.0
192.168.1.24/255.255.255.0
Host Name
hdp-master01.bigdata.emc.local
hdp-master02.bigdata.emc.local
hdp-worker01.bigdata.emc.local
hdp-worker02.bigdata.emc.local
hdp-worker03.bigdata.emc.local
hdp-worker04.bigdata.emc.local
DNS Server
192.168.1.1
Default Search Domain
bigdata.emc.local
2.Update the Inode Count on all HDP cluster nodes
1.     Edit /etc/sysctl.conf and add below content
fs.file-max = 65536
2.     Edit /etc/security/limits.conf and add below content
*       soft    nofile   10000
*       hard    nofile   10000
3.Setup NTP server on all HDP cluster nodes
1.     Install ntp package
yum -y install ntp
2.     Edit configuration file /etc/ntp.conf to add below content to setup the local clock server
server 192.168.1.1
3.     Configure ntpd daemon to start automatically
service ntpd start
chkconfig ntpd on
4.Setup DNS server on all HDP cluster nodes
1.     vi /etc/resolv.conf and add below content
nameserver 192.168.1.1
search bigdata.emc.local
5.Disable SELinux on all HDP cluster nodes
1.     vi /etc/selinux/config and change property SELINUX to disable
SELINUX=disabled
6.Disable Transparent Huge Pages on all HDP cluster nodes
1.     vi /etc/grub.conf and add below contents
transparent_hugepage=never
2.     vi /etc/rc.d/rc.local and add below contents
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
    echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
fi

if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
    echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
fi
7.Set umask 0022
1.     Set the default operating system file and directory permissions to 0022 (022) in /etc/profile, it is the default setting for RHEL/CentOS.
echo umask 0022 >> /etc/profile
8.Install Oracle JDK 1.8 (Optional, can use system default Open JDK 1.7)
1.     Install JDK package
yum install jdk1.8.0_77
alternatives --install /usr/bin/jar jar /usr/java/jdk1.8.0_77/bin/jar 1
alternatives --config java
alternatives --config jar
alternatives --config javac
2.     Install JCE
unzip -o -j -q jce_policy-8.zip -d /usr/java/jdk1.8.0_77/jre/lib/security/
9.Disable Iptables on all HDP cluster nodes
service iptables stop
chkconfig iptables of

6.     Setup SSH Public Key authentication on Ambari Server
1.     Login Ambari Server using root account
2.     Generate SSH public key and private key
ssh-keygen &&
cd /root/.ssh  &&
cat id_rsa.pub >> authorized_keys &&
chmod 600 /root/.ssh/authorized_keys &&
echo "Done"
3.Copy the SSH public key file id_rsa.pub and authorized_keys to the all HDP cluster nodes
ssh root@hdp-master01 "mkdir -p /root/.ssh && chmod 700 /root/.ssh" && scp /root/.ssh/authorized_keys root@hdp-master01:/root/.ssh/ &&
ssh root@hdp-master02 "mkdir -p /root/.ssh && chmod 700 /root/.ssh" && scp /root/.ssh/authorized_keys root@hdp-master02:/root/.ssh/ &&
ssh root@hdp-worker01 "mkdir -p /root/.ssh && chmod 700 /root/.ssh" && scp /root/.ssh/authorized_keys root@hdp-worker01:/root/.ssh/ &&
ssh root@hdp-worker02 "mkdir -p /root/.ssh && chmod 700 /root/.ssh" && scp /root/.ssh/authorized_keys root@hdp-worker02:/root/.ssh/ &&
ssh root@hdp-worker03 "mkdir -p /root/.ssh && chmod 700 /root/.ssh" && scp /root/.ssh/authorized_keys root@hdp-worker03:/root/.ssh/ &&
ssh root@hdp-worker04 "mkdir -p /root/.ssh && chmod 700 /root/.ssh" && scp /root/.ssh/authorized_keys root@hdp-worker04:/root/.ssh/ &&
echo "Done"
4.Test the SSH public key authentication
clear &&
ssh root@hdp-master01 "hostname" &&
ssh root@hdp-master02 "hostname" &&
ssh root@hdp-worker01 "hostname" &&
ssh root@hdp-worker02 "hostname" &&
ssh root@hdp-worker03 "hostname" &&
ssh root@hdp-worker04 "hostname" &&
echo "Done"
5.Verify DNS and NTP configuration on all nodes using SSH
clear &&
ssh root@hdp-ambari   "nslookup \$(hostname) && ntpq -p" &&
ssh root@hdp-master01 "nslookup \$(hostname) && ntpq -p" &&
ssh root@hdp-master02 "nslookup \$(hostname) && ntpq -p" &&
ssh root@hdp-worker01 "nslookup \$(hostname) && ntpq -p" &&
ssh root@hdp-worker02 "nslookup \$(hostname) && ntpq -p" &&
ssh root@hdp-worker03 "nslookup \$(hostname) && ntpq -p" &&
ssh root@hdp-worker04 "nslookup \$(hostname) && ntpq -p" &&
echo "Done"
6.Retain a copy of the SSH Private Key on the machine (jump server) from which you will run the web-based Ambari Install Wizard.

7.     Use Ambari Manager to deploy HDP cluster
1.     Login to Apache Ambari http://192.168.1.10:8080 with default user name and password admin/admin
2.     From the Ambari Welcome page, choose Launch Install Wizard.
3.     Name your cluster
4.     Select Stack HDP 2.4
5.     Expand Advanced Repository Options to set the BASE URL of the repository of RHEL
6.Setup Installation Options
Target Hosts
Host Registration Information
7.Confirm Hosts
8.Ensure there is no errors and warnings for host installation and validation
9.Choose Services
10. Assign Masters
11. Assign Slaves and Clients
12. Customized Services
13. Review
14. Summary

8.     Validation
1.     Login HDP Master Compute node and run below command to validate Hadoop Cluster
hdfs dfs -ls /
hdfs dfs -put -f /etc/hosts /tmp
hdfs dfs -cat /tmp/hosts
hdfs dfs -rm -skipTrash /tmp/hosts
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar pi 10 1000
2.Run TeraSort to test performance
clear &&
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar teragen  10485760 /tmp/TeraGen.1G &&
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar terasort /tmp/TeraGen.1G /tmp/TeraSort.1G &&
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar teravalidate  /tmp/TeraSort.1G /tmp/TeraValidate.1G &&
echo "Done"


clear &&
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 1 &&
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -read  -nrFiles 10 -fileSize 1 &&
echo "Done"




4.     Reference:
·         HDP 2.2.x Installation Book
·         Hortonworks Ambari 2.2.2.0 Repository Download
·         Hortonworks HDP Repository Download
·         Resolving Cluster Deployment Problems