Hadoop.CDH.Installation.Isilon
1. Introduction
This document describes how to create a Hadoop
environment utilizing the Cloudera CDH and EMC Isilon Scale-Out NAS for HDFS
accessible shared storage.
The nodes in Isilon OneFS system work together
as peers in a shared-nothing hardware architecture with no single point of
failure. Each nodes acts as a Hadoop name node and data node, the name node
daemon is a distributed process that runs on all the nodes in the cluster. A
compute client can connect to any node through HDFS.
As nodes are added, the file system expands
dynamically and redistributes data.
2. Environment
This installation guide is appropriate for the
below environment.
·
Cloudera CDH 5.7.X
·
VMware vSphere 5.5 or
later
·
RHEL/CentOS 6.7
·
Internet Explorer 10 or later
·
Isilon OneFS 8.0.0 or
later
·
Installation
1. Overview
Below is the overview of the installation
process that this document will describe.
·
Confirm prerequisites
·
Install Isilon OneFS
·
Configure Isilon OneFS
·
Use Cloudera Manager to
deploy CDH cluster upon Isilon OneFS
·
Validate CDH deployment
·
Confirm
Prerequisites
1. Prepare VMware virtualized environment
Before you start the installation process, a
VMware virtualized environment must be ready to provision the virtual machines
required for this deployment. ESXi 5.5 and later revision is recommend for the
virtualized environment.
· Prepare Cloudera Manager Server and Cluster
Nodes
Please prepare the Cloudera Manager server and
cluster nodes based on the instructions in note
"Hadoop.CDH.Installation.PathB"
3.Isilon OneFS
For low-capacity, non-performance testing of
Isilon, the EMC Isilon OneFS Simulator can be used instead of a cluster of
physical Isilon appliances.
4.Networking
·
10
Gbe Ethernet is required
·
If
using EMC Isilon Simulator, at least two static IP addresses are required, one
for node ext-1 interface, another for the SmartConnect service IP), each
additional Isilon node will require an additional IP address.
·
At
a minimum, you will need to allocate one IP address per Access Zone per Isilon
node.
·
#
of IP addresses = 2 * (# of Isilon Nodes) * (# of Access Zones)
3. Install Isilon OneFS
In this document, Isilon Simulator 8.0.0.1 will
be used to setup a free and non-production use Isilon OneFS 8.0.0 cluster
environment for the deployment of CDH 5.7.X.
You can download Isilon Simulator from this
link: http://www.emc.com/products-solutions/trial-software-download/isilon.htm
For the detailed installation process of Isilon
Simulator, please refer the instruction in note
"Isilon.OneFS.Simulator.Installation"
4. Configure Isilon OneFS
·
Add
the Isilon Simulator nodes hostname and IP information to the named server
configuration file
·
Add
below content to file /var/named/named-forward.zone
cdh-isilon IN
A 172.16.1.20
isilon02 IN
A 172.16.1.21
·
Add below content to
file /var/named/named-reverse.zone
20 IN PTR
cdh-isilon.bigdata.emc.local.
21 IN PTR
isilon02.bigdata.emc.local.
2.Add license to activate Isilon Simulator HDFS
module, the license key listed in below command will be expired by 7/17/2015.
Before add the license, you need to change the current date in Isilon Simulator
node to apply this license
date 1501010001
isi license licenses activate
ACCEL-34PS2-32FWX-RNIWX-LLADX
isi license licenses list
3.Run Isilon Hadoop Tools scripts on Isilon
Simulator node to create required users and directories
·
Download
Isilon Hadoop Tools script from https://github.com/claudiofahey/isilon-hadoop-tools/releases
·
Upload
isilon_create_users.sh and isilon_create_directories.sh on Isilon Simulator
node
·
Run
these two scripts
bash ./isilon_create_users.sh
--dist cdh
bash
./isilon_create_directories.sh --dist cdh --fixperm
4.Run below commands to map the hdfs user to the
Isilon super user, this will allow the hdfs user to chown all files
isi zone zones modify System
--user-mapping-rules="hdfs=>root"
isi services hdfs disable
isi services hdfs enable
5. Use Cloudera Manager to deploy CDH cluster
upon Isilon OneFS
The default port of CMS is 7180
The default user account is: admin/admin
After login, accept the EULA and click "Continue"
2.Choose Cloudera Manager Edition
From the Welcome to Cloudera Manager page, you
can select the edition of Cloudera Manager to install
3.Specify hosts for your CDH cluster
installation
Specify the hosts for your CDH cluster
cdh-master01.bigdata.emc.local
cdh-worker01.bigdata.emc.local
cdh-worker02.bigdata.emc.local
cdh-worker03.bigdata.emc.local
4.Cluster Installation - Select Repository
·
Use
Parcels
·
Parcel
Directory
/opt/cloudera/parcels
·
Local Parcel Repository
Path
/opt/cloudera/parcel-repo
·
Remote Parcel Repository
URLs
·
Select the specific
release of the Cloudera Manager Agent you want to install on your hosts
·
Use Packages (Preferred)
·
Select the version of
CDH
·
Select the specific
release of CDH you want to install on your hosts (Custom Repository )
·
Select the specific
release of the Cloudera Manager Agent you want to install on your hosts (Custom
Repository )
· Cluster Installation - JDK Installation
Options
·
Install
Oracle Java SE Development Kit (JDK)
Unchecked (already installed in VM template)
·
Install Java Unlimited
Strength Encryption Policy Files
Unchecked (already installed in VM template)
· Cluster Installation - Enable Single User Mode
·
Single
User Mode
Unchecked
7.Cluster Installation - Provide SSH login
credentials
·
Password
·
Private
Key
8.Cluster Installation - Installing
Click "Continue"
9.Cluster Installation - Detecting CDH versions
on all hosts
Click "Continue"
10. Cluster Installation -Inspect hosts for
correctness
Click "Finish"
11. Cluster Setup - Choose the CDH 5 services that
you want to install on your cluster
·
Custom Services
·
Isilon
·
YARN
·
ZooKeeper
Note:
Do not select HDFS or Cloudera Navigator.
Isilon takes the place of the usual HDFS service.
Cloudera Navigator is not currently supported with Isilon HDFS.
· Cluster Setup - Customize Role Assignments
·
cdh-master01.bigdata.emc.local
·
YARN Resource Manager
·
Isilon Gateway
·
cdh-worker01~03.bigdata.emc.local
·
YARN Node Manager
·
HBase Region Server (If
Installed)
·
Impala Daemon (If
Installed)
Click "Continue"
· Cluster Setup - Database Setup
Check "Use Embedded Database" and click "Continue"
14. Cluster Setup - Review Changes
·
Default File System URI
hdfs://isilon02.bigdata.emc.local:8020
hdfs://cdh-isilon.bigdata.emc.local:8020
·
WebHDFS URL
Click "Continue"
· Cluster Setup - First Run Command
Click "Continue"
16. Cluster Setup - Congratulations!
Click "Finish"
6. Validate CDH deployment
1. Login CDH Master Compute node and run below
command to validate Hadoop Cluster
clear &&
hdfs dfs -ls / &&
hdfs dfs -put -f /etc/hosts
/tmp &&
hdfs dfs -ls /tmp &&
hdfs dfs -cat /tmp/hosts
&&
hdfs dfs -rm -skipTrash
/tmp/hosts &&
hdfs dfs -ls /tmp &&
(sudo -u hdfs yarn jar
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000 ||
sudo -u hdfs yarn jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi
10 1000) &&
echo "Done"
4. References:
1. EMC Isilon Hadoop Starter Kit for Cloudera with
VMware Big Data Extensions
1. EMC Isilon Best Practices for Hadoop Data
Storage