Tuesday, March 1, 2016

Cloudera CDH insttaion using Isilon storage

Hadoop.CDH.Installation.Isilon
1.     Introduction
This document describes how to create a Hadoop environment utilizing the Cloudera CDH and EMC Isilon Scale-Out NAS for HDFS accessible shared storage.

The nodes in Isilon OneFS system work together as peers in a shared-nothing hardware architecture with no single point of failure. Each nodes acts as a Hadoop name node and data node, the name node daemon is a distributed process that runs on all the nodes in the cluster. A compute client can connect to any node through HDFS.

As nodes are added, the file system expands dynamically and redistributes data.

2.     Environment
This installation guide is appropriate for the below environment.
·         Cloudera CDH 5.7.X
·         VMware vSphere 5.5 or later
·         RHEL/CentOS 6.7
·         Internet Explorer 10 or later
·         Isilon OneFS 8.0.0 or later

·         Installation
1.     Overview
Below is the overview of the installation process that this document will describe.
·    Confirm prerequisites
·    Install Isilon OneFS
·    Configure Isilon OneFS
·    Use Cloudera Manager to deploy CDH cluster upon Isilon OneFS
·    Validate CDH deployment
·         Confirm Prerequisites
1.     Prepare VMware virtualized environment
Before you start the installation process, a VMware virtualized environment must be ready to provision the virtual machines required for this deployment. ESXi 5.5 and later revision is recommend for the virtualized environment.
·    Prepare Cloudera Manager Server and Cluster Nodes
Please prepare the Cloudera Manager server and cluster nodes based on the instructions in note "Hadoop.CDH.Installation.PathB"
3.Isilon OneFS
For low-capacity, non-performance testing of Isilon, the EMC Isilon OneFS Simulator can be used instead of a cluster of physical Isilon appliances.
4.Networking
·         10 Gbe Ethernet is required
·         If using EMC Isilon Simulator, at least two static IP addresses are required, one for node ext-1 interface, another for the SmartConnect service IP), each additional Isilon node will require an additional IP address.
·         At a minimum, you will need to allocate one IP address per Access Zone per Isilon node.
·         # of IP addresses = 2 * (# of Isilon Nodes) * (# of Access Zones)
3.     Install Isilon OneFS
In this document, Isilon Simulator 8.0.0.1 will be used to setup a free and non-production use Isilon OneFS 8.0.0 cluster environment for the deployment of CDH 5.7.X.
You can download Isilon Simulator from this link: http://www.emc.com/products-solutions/trial-software-download/isilon.htm

For the detailed installation process of Isilon Simulator, please refer the instruction in note "Isilon.OneFS.Simulator.Installation"

4.     Configure Isilon OneFS
·         Add the Isilon Simulator nodes hostname and IP information to the named server configuration file
·         Add below content to file /var/named/named-forward.zone
cdh-isilon      IN      A       172.16.1.20
isilon02        IN      A       172.16.1.21
·         Add below content to file /var/named/named-reverse.zone
20              IN      PTR     cdh-isilon.bigdata.emc.local.
21              IN      PTR     isilon02.bigdata.emc.local.
2.Add license to activate Isilon Simulator HDFS module, the license key listed in below command will be expired by 7/17/2015. Before add the license, you need to change the current date in Isilon Simulator node to apply this license
date 1501010001
isi license licenses activate ACCEL-34PS2-32FWX-RNIWX-LLADX
isi license licenses list
3.Run Isilon Hadoop Tools scripts on Isilon Simulator node to create required users and directories
·         Download Isilon Hadoop Tools script from https://github.com/claudiofahey/isilon-hadoop-tools/releases
·         Upload isilon_create_users.sh and isilon_create_directories.sh on Isilon Simulator node
·         Run these two scripts
bash ./isilon_create_users.sh --dist cdh
bash ./isilon_create_directories.sh --dist cdh --fixperm
4.Run below commands to map the hdfs user to the Isilon super user, this will allow the hdfs user to chown all files
isi zone zones modify System --user-mapping-rules="hdfs=>root"
isi services hdfs disable
isi services hdfs enable

5.     Use Cloudera Manager to deploy CDH cluster upon Isilon OneFS
·         Login Cloudera Manager Admin Console http://cdh-manager.bigdata.emc.local:7180
The default port of CMS is 7180
The default user account is: admin/admin
After login, accept the EULA and click "Continue"
2.Choose Cloudera Manager Edition
From the Welcome to Cloudera Manager page, you can select the edition of Cloudera Manager to install
3.Specify hosts for your CDH cluster installation
Specify the hosts for your CDH cluster
cdh-master01.bigdata.emc.local
cdh-worker01.bigdata.emc.local
cdh-worker02.bigdata.emc.local
cdh-worker03.bigdata.emc.local
4.Cluster Installation - Select Repository
·         Use Parcels
·         Parcel Directory
/opt/cloudera/parcels
·         Local Parcel Repository Path
/opt/cloudera/parcel-repo
·         Remote Parcel Repository URLs
·         Select the specific release of the Cloudera Manager Agent you want to install on your hosts
·         Use Packages (Preferred)
·         Select the version of CDH
·         Select the specific release of CDH you want to install on your hosts (Custom Repository )
·         Select the specific release of the Cloudera Manager Agent you want to install on your hosts (Custom Repository )

·    Cluster Installation - JDK Installation Options
·         Install Oracle Java SE Development Kit (JDK)
Unchecked (already installed in VM template)
·         Install Java Unlimited Strength Encryption Policy Files
Unchecked (already installed in VM template)
·    Cluster Installation - Enable Single User Mode
·         Single User Mode
Unchecked
7.Cluster Installation - Provide SSH login credentials
·         Password
·         Private Key
8.Cluster Installation - Installing
Click "Continue"
9.Cluster Installation - Detecting CDH versions on all hosts
Click "Continue"
10. Cluster Installation -Inspect hosts for correctness
Click "Finish"
11. Cluster Setup - Choose the CDH 5 services that you want to install on your cluster
·    Custom Services
·         Isilon
·         YARN
·         ZooKeeper
Note:
Do not select HDFS or Cloudera Navigator.
Isilon takes the place of the usual HDFS service. Cloudera Navigator is not currently supported with Isilon HDFS.
·    Cluster Setup - Customize Role Assignments
·    cdh-master01.bigdata.emc.local
·         YARN Resource Manager
·         Isilon Gateway
·    cdh-worker01~03.bigdata.emc.local
·         YARN Node Manager
·         HBase Region Server (If Installed)
·         Impala Daemon (If Installed)
Click "Continue"
·    Cluster Setup - Database Setup
Check "Use Embedded Database" and click "Continue"
14. Cluster Setup - Review Changes
·    Default File System URI
hdfs://isilon02.bigdata.emc.local:8020
hdfs://cdh-isilon.bigdata.emc.local:8020
·    WebHDFS URL

Click "Continue"
·    Cluster Setup - First Run Command
Click "Continue"
16. Cluster Setup - Congratulations!
Click "Finish"

6.     Validate CDH deployment
1.     Login CDH Master Compute node and run below command to validate Hadoop Cluster
clear &&
hdfs dfs -ls / &&
hdfs dfs -put -f /etc/hosts /tmp &&
hdfs dfs -ls /tmp &&
hdfs dfs -cat /tmp/hosts &&
hdfs dfs -rm -skipTrash /tmp/hosts &&
hdfs dfs -ls /tmp &&
(sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000 ||
sudo -u hdfs yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000) &&
echo "Done"



4.     References:
1.      EMC Isilon Hadoop Starter Kit for Cloudera with VMware Big Data Extensions
1.      EMC Isilon Best Practices for Hadoop Data Storage




No comments:

Post a Comment