Showing posts with label Cloudera. Show all posts
Showing posts with label Cloudera. Show all posts

Thursday, March 17, 2016

Coudera CDH installation using remote repo

Hadoop.CDH.Installation.RemoteRepository
1.     Overview
This topic describes how to create a remote RPM packages/parcels repository and direct hosts in your Cloudera Manager deployment to use that repository.

Once you have created a parcels repository, go to Configuring the Cloudera Manager Server to Use the Parcel URL. After completing these steps, you have established the environment required to install a previous version of Cloudera Manager or install Cloudera Manager to hosts that are not connected to the Internet. Proceed with the installation process, being sure to target the newly created repository.

2.     Creating a Permanent Remote Repository
The repository is typically hosted using HTTP on a host inside your network. If you already have a web server in your organization, you can reuse it and put the parcel files into it.

Below are the detailed steps to setup a permanent remote repository:

1.      Logon the server you want to setup the web server and run below commands to install Apache httpd web server
yum install httpd
systemctl start httpd
systemctl enable httpd
·    RPM Packages
 Download the RPM packages for your OS distribution from:
 Move the RPM packages files to the web server directory, and modify file permissions
mkdir -p /var/www/html/cdh5/packages
tar -xvf cm5.*-centos7.tar.gz -C /var/www/html/cdh5/parcels
chmod -R ugo+rX /var/www/html/cdh5  (might not require)
3.      After moving the files and changing permissions, visit http://hostname:80/cdh5/parcels to verify that you can access the RPM packages. Apache may have been configured to not show indexes, which is also acceptable.

·    Parcels
a.      Download the parcel and manifest.json files for your OS distribution from:
·         CDH 5 - Impala, Spark, and Search are included in the CDH parcel
·         Accumulo - - https://archive.cloudera.com/accumulo-c5/parcels/
·         GPL Extras - https://archive.cloudera.com/gplextras5/parcels/
b.     Move the .parcel and manifest.json files to the web server directory, and modify file permissions
mkdir -p /var/www/html/cdh5/parcels
mv CDH-5.*-el7.parcel /var/www/html/cdh5/parcels
mv manifest.json /var/www/html/cdh5/parcels
chmod -R ugo+rX /var/www/html/cdh5  (might not require)
·         After moving the files and changing permissions, visit http://hostname:80/cdh5/parcels to verify that you can access the parcel. Apache may have been configured to not show indexes, which is also acceptable.

3.     Configuring the Cloudera Manager Server to Use the Parcel URL
1.      Use one of the following methods to open the parcel settings page:
·         Navigation bar
·   
     
·         Click the Configuration button.
·         Menu
·         Select Administration > Settings
·         Select Category > Parcels
2.      In the Remote Parcel Repository URLs list, click  to open an additional row.
3.      Enter the path to the parcel. For example, http://hostname:port/cdh5/parcels/.
4.      Click Save Changes to commit the changes.



4.     Reference
Creating and Using a Remote Parcel Repository for Cloudera Manager



Tuesday, March 1, 2016

Cloudera CDH insttaion using Isilon storage

Hadoop.CDH.Installation.Isilon
1.     Introduction
This document describes how to create a Hadoop environment utilizing the Cloudera CDH and EMC Isilon Scale-Out NAS for HDFS accessible shared storage.

The nodes in Isilon OneFS system work together as peers in a shared-nothing hardware architecture with no single point of failure. Each nodes acts as a Hadoop name node and data node, the name node daemon is a distributed process that runs on all the nodes in the cluster. A compute client can connect to any node through HDFS.

As nodes are added, the file system expands dynamically and redistributes data.

2.     Environment
This installation guide is appropriate for the below environment.
·         Cloudera CDH 5.7.X
·         VMware vSphere 5.5 or later
·         RHEL/CentOS 6.7
·         Internet Explorer 10 or later
·         Isilon OneFS 8.0.0 or later

·         Installation
1.     Overview
Below is the overview of the installation process that this document will describe.
·    Confirm prerequisites
·    Install Isilon OneFS
·    Configure Isilon OneFS
·    Use Cloudera Manager to deploy CDH cluster upon Isilon OneFS
·    Validate CDH deployment
·         Confirm Prerequisites
1.     Prepare VMware virtualized environment
Before you start the installation process, a VMware virtualized environment must be ready to provision the virtual machines required for this deployment. ESXi 5.5 and later revision is recommend for the virtualized environment.
·    Prepare Cloudera Manager Server and Cluster Nodes
Please prepare the Cloudera Manager server and cluster nodes based on the instructions in note "Hadoop.CDH.Installation.PathB"
3.Isilon OneFS
For low-capacity, non-performance testing of Isilon, the EMC Isilon OneFS Simulator can be used instead of a cluster of physical Isilon appliances.
4.Networking
·         10 Gbe Ethernet is required
·         If using EMC Isilon Simulator, at least two static IP addresses are required, one for node ext-1 interface, another for the SmartConnect service IP), each additional Isilon node will require an additional IP address.
·         At a minimum, you will need to allocate one IP address per Access Zone per Isilon node.
·         # of IP addresses = 2 * (# of Isilon Nodes) * (# of Access Zones)
3.     Install Isilon OneFS
In this document, Isilon Simulator 8.0.0.1 will be used to setup a free and non-production use Isilon OneFS 8.0.0 cluster environment for the deployment of CDH 5.7.X.
You can download Isilon Simulator from this link: http://www.emc.com/products-solutions/trial-software-download/isilon.htm

For the detailed installation process of Isilon Simulator, please refer the instruction in note "Isilon.OneFS.Simulator.Installation"

4.     Configure Isilon OneFS
·         Add the Isilon Simulator nodes hostname and IP information to the named server configuration file
·         Add below content to file /var/named/named-forward.zone
cdh-isilon      IN      A       172.16.1.20
isilon02        IN      A       172.16.1.21
·         Add below content to file /var/named/named-reverse.zone
20              IN      PTR     cdh-isilon.bigdata.emc.local.
21              IN      PTR     isilon02.bigdata.emc.local.
2.Add license to activate Isilon Simulator HDFS module, the license key listed in below command will be expired by 7/17/2015. Before add the license, you need to change the current date in Isilon Simulator node to apply this license
date 1501010001
isi license licenses activate ACCEL-34PS2-32FWX-RNIWX-LLADX
isi license licenses list
3.Run Isilon Hadoop Tools scripts on Isilon Simulator node to create required users and directories
·         Download Isilon Hadoop Tools script from https://github.com/claudiofahey/isilon-hadoop-tools/releases
·         Upload isilon_create_users.sh and isilon_create_directories.sh on Isilon Simulator node
·         Run these two scripts
bash ./isilon_create_users.sh --dist cdh
bash ./isilon_create_directories.sh --dist cdh --fixperm
4.Run below commands to map the hdfs user to the Isilon super user, this will allow the hdfs user to chown all files
isi zone zones modify System --user-mapping-rules="hdfs=>root"
isi services hdfs disable
isi services hdfs enable

5.     Use Cloudera Manager to deploy CDH cluster upon Isilon OneFS
·         Login Cloudera Manager Admin Console http://cdh-manager.bigdata.emc.local:7180
The default port of CMS is 7180
The default user account is: admin/admin
After login, accept the EULA and click "Continue"
2.Choose Cloudera Manager Edition
From the Welcome to Cloudera Manager page, you can select the edition of Cloudera Manager to install
3.Specify hosts for your CDH cluster installation
Specify the hosts for your CDH cluster
cdh-master01.bigdata.emc.local
cdh-worker01.bigdata.emc.local
cdh-worker02.bigdata.emc.local
cdh-worker03.bigdata.emc.local
4.Cluster Installation - Select Repository
·         Use Parcels
·         Parcel Directory
/opt/cloudera/parcels
·         Local Parcel Repository Path
/opt/cloudera/parcel-repo
·         Remote Parcel Repository URLs
·         Select the specific release of the Cloudera Manager Agent you want to install on your hosts
·         Use Packages (Preferred)
·         Select the version of CDH
·         Select the specific release of CDH you want to install on your hosts (Custom Repository )
·         Select the specific release of the Cloudera Manager Agent you want to install on your hosts (Custom Repository )

·    Cluster Installation - JDK Installation Options
·         Install Oracle Java SE Development Kit (JDK)
Unchecked (already installed in VM template)
·         Install Java Unlimited Strength Encryption Policy Files
Unchecked (already installed in VM template)
·    Cluster Installation - Enable Single User Mode
·         Single User Mode
Unchecked
7.Cluster Installation - Provide SSH login credentials
·         Password
·         Private Key
8.Cluster Installation - Installing
Click "Continue"
9.Cluster Installation - Detecting CDH versions on all hosts
Click "Continue"
10. Cluster Installation -Inspect hosts for correctness
Click "Finish"
11. Cluster Setup - Choose the CDH 5 services that you want to install on your cluster
·    Custom Services
·         Isilon
·         YARN
·         ZooKeeper
Note:
Do not select HDFS or Cloudera Navigator.
Isilon takes the place of the usual HDFS service. Cloudera Navigator is not currently supported with Isilon HDFS.
·    Cluster Setup - Customize Role Assignments
·    cdh-master01.bigdata.emc.local
·         YARN Resource Manager
·         Isilon Gateway
·    cdh-worker01~03.bigdata.emc.local
·         YARN Node Manager
·         HBase Region Server (If Installed)
·         Impala Daemon (If Installed)
Click "Continue"
·    Cluster Setup - Database Setup
Check "Use Embedded Database" and click "Continue"
14. Cluster Setup - Review Changes
·    Default File System URI
hdfs://isilon02.bigdata.emc.local:8020
hdfs://cdh-isilon.bigdata.emc.local:8020
·    WebHDFS URL

Click "Continue"
·    Cluster Setup - First Run Command
Click "Continue"
16. Cluster Setup - Congratulations!
Click "Finish"

6.     Validate CDH deployment
1.     Login CDH Master Compute node and run below command to validate Hadoop Cluster
clear &&
hdfs dfs -ls / &&
hdfs dfs -put -f /etc/hosts /tmp &&
hdfs dfs -ls /tmp &&
hdfs dfs -cat /tmp/hosts &&
hdfs dfs -rm -skipTrash /tmp/hosts &&
hdfs dfs -ls /tmp &&
(sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000 ||
sudo -u hdfs yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000) &&
echo "Done"



4.     References:
1.      EMC Isilon Hadoop Starter Kit for Cloudera with VMware Big Data Extensions
1.      EMC Isilon Best Practices for Hadoop Data Storage