Tuesday, March 1, 2016

Cloudera CDH insttaion using Isilon storage

Hadoop.CDH.Installation.Isilon
1.     Introduction
This document describes how to create a Hadoop environment utilizing the Cloudera CDH and EMC Isilon Scale-Out NAS for HDFS accessible shared storage.

The nodes in Isilon OneFS system work together as peers in a shared-nothing hardware architecture with no single point of failure. Each nodes acts as a Hadoop name node and data node, the name node daemon is a distributed process that runs on all the nodes in the cluster. A compute client can connect to any node through HDFS.

As nodes are added, the file system expands dynamically and redistributes data.

2.     Environment
This installation guide is appropriate for the below environment.
·         Cloudera CDH 5.7.X
·         VMware vSphere 5.5 or later
·         RHEL/CentOS 6.7
·         Internet Explorer 10 or later
·         Isilon OneFS 8.0.0 or later

·         Installation
1.     Overview
Below is the overview of the installation process that this document will describe.
·    Confirm prerequisites
·    Install Isilon OneFS
·    Configure Isilon OneFS
·    Use Cloudera Manager to deploy CDH cluster upon Isilon OneFS
·    Validate CDH deployment
·         Confirm Prerequisites
1.     Prepare VMware virtualized environment
Before you start the installation process, a VMware virtualized environment must be ready to provision the virtual machines required for this deployment. ESXi 5.5 and later revision is recommend for the virtualized environment.
·    Prepare Cloudera Manager Server and Cluster Nodes
Please prepare the Cloudera Manager server and cluster nodes based on the instructions in note "Hadoop.CDH.Installation.PathB"
3.Isilon OneFS
For low-capacity, non-performance testing of Isilon, the EMC Isilon OneFS Simulator can be used instead of a cluster of physical Isilon appliances.
4.Networking
·         10 Gbe Ethernet is required
·         If using EMC Isilon Simulator, at least two static IP addresses are required, one for node ext-1 interface, another for the SmartConnect service IP), each additional Isilon node will require an additional IP address.
·         At a minimum, you will need to allocate one IP address per Access Zone per Isilon node.
·         # of IP addresses = 2 * (# of Isilon Nodes) * (# of Access Zones)
3.     Install Isilon OneFS
In this document, Isilon Simulator 8.0.0.1 will be used to setup a free and non-production use Isilon OneFS 8.0.0 cluster environment for the deployment of CDH 5.7.X.
You can download Isilon Simulator from this link: http://www.emc.com/products-solutions/trial-software-download/isilon.htm

For the detailed installation process of Isilon Simulator, please refer the instruction in note "Isilon.OneFS.Simulator.Installation"

4.     Configure Isilon OneFS
·         Add the Isilon Simulator nodes hostname and IP information to the named server configuration file
·         Add below content to file /var/named/named-forward.zone
cdh-isilon      IN      A       172.16.1.20
isilon02        IN      A       172.16.1.21
·         Add below content to file /var/named/named-reverse.zone
20              IN      PTR     cdh-isilon.bigdata.emc.local.
21              IN      PTR     isilon02.bigdata.emc.local.
2.Add license to activate Isilon Simulator HDFS module, the license key listed in below command will be expired by 7/17/2015. Before add the license, you need to change the current date in Isilon Simulator node to apply this license
date 1501010001
isi license licenses activate ACCEL-34PS2-32FWX-RNIWX-LLADX
isi license licenses list
3.Run Isilon Hadoop Tools scripts on Isilon Simulator node to create required users and directories
·         Download Isilon Hadoop Tools script from https://github.com/claudiofahey/isilon-hadoop-tools/releases
·         Upload isilon_create_users.sh and isilon_create_directories.sh on Isilon Simulator node
·         Run these two scripts
bash ./isilon_create_users.sh --dist cdh
bash ./isilon_create_directories.sh --dist cdh --fixperm
4.Run below commands to map the hdfs user to the Isilon super user, this will allow the hdfs user to chown all files
isi zone zones modify System --user-mapping-rules="hdfs=>root"
isi services hdfs disable
isi services hdfs enable

5.     Use Cloudera Manager to deploy CDH cluster upon Isilon OneFS
·         Login Cloudera Manager Admin Console http://cdh-manager.bigdata.emc.local:7180
The default port of CMS is 7180
The default user account is: admin/admin
After login, accept the EULA and click "Continue"
2.Choose Cloudera Manager Edition
From the Welcome to Cloudera Manager page, you can select the edition of Cloudera Manager to install
3.Specify hosts for your CDH cluster installation
Specify the hosts for your CDH cluster
cdh-master01.bigdata.emc.local
cdh-worker01.bigdata.emc.local
cdh-worker02.bigdata.emc.local
cdh-worker03.bigdata.emc.local
4.Cluster Installation - Select Repository
·         Use Parcels
·         Parcel Directory
/opt/cloudera/parcels
·         Local Parcel Repository Path
/opt/cloudera/parcel-repo
·         Remote Parcel Repository URLs
·         Select the specific release of the Cloudera Manager Agent you want to install on your hosts
·         Use Packages (Preferred)
·         Select the version of CDH
·         Select the specific release of CDH you want to install on your hosts (Custom Repository )
·         Select the specific release of the Cloudera Manager Agent you want to install on your hosts (Custom Repository )

·    Cluster Installation - JDK Installation Options
·         Install Oracle Java SE Development Kit (JDK)
Unchecked (already installed in VM template)
·         Install Java Unlimited Strength Encryption Policy Files
Unchecked (already installed in VM template)
·    Cluster Installation - Enable Single User Mode
·         Single User Mode
Unchecked
7.Cluster Installation - Provide SSH login credentials
·         Password
·         Private Key
8.Cluster Installation - Installing
Click "Continue"
9.Cluster Installation - Detecting CDH versions on all hosts
Click "Continue"
10. Cluster Installation -Inspect hosts for correctness
Click "Finish"
11. Cluster Setup - Choose the CDH 5 services that you want to install on your cluster
·    Custom Services
·         Isilon
·         YARN
·         ZooKeeper
Note:
Do not select HDFS or Cloudera Navigator.
Isilon takes the place of the usual HDFS service. Cloudera Navigator is not currently supported with Isilon HDFS.
·    Cluster Setup - Customize Role Assignments
·    cdh-master01.bigdata.emc.local
·         YARN Resource Manager
·         Isilon Gateway
·    cdh-worker01~03.bigdata.emc.local
·         YARN Node Manager
·         HBase Region Server (If Installed)
·         Impala Daemon (If Installed)
Click "Continue"
·    Cluster Setup - Database Setup
Check "Use Embedded Database" and click "Continue"
14. Cluster Setup - Review Changes
·    Default File System URI
hdfs://isilon02.bigdata.emc.local:8020
hdfs://cdh-isilon.bigdata.emc.local:8020
·    WebHDFS URL

Click "Continue"
·    Cluster Setup - First Run Command
Click "Continue"
16. Cluster Setup - Congratulations!
Click "Finish"

6.     Validate CDH deployment
1.     Login CDH Master Compute node and run below command to validate Hadoop Cluster
clear &&
hdfs dfs -ls / &&
hdfs dfs -put -f /etc/hosts /tmp &&
hdfs dfs -ls /tmp &&
hdfs dfs -cat /tmp/hosts &&
hdfs dfs -rm -skipTrash /tmp/hosts &&
hdfs dfs -ls /tmp &&
(sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000 ||
sudo -u hdfs yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000) &&
echo "Done"



4.     References:
1.      EMC Isilon Hadoop Starter Kit for Cloudera with VMware Big Data Extensions
1.      EMC Isilon Best Practices for Hadoop Data Storage




Installation of Hortonworks(HDP)

Installation of Hortonworks(HDP)

Steps to Install Hortonworks(HDP) on 2 Nodes Using Cent OS

  1. Pre-requisites
    1. Hadoop can be installed on the following operating systems
      1. Red Hat Enterprise Linux (RHEL) v6.x
      2. Red Hat Enterprise Linux (RHEL) v5.x (deprecated)
      3. CentOS v6.x
      4. CentOS v5.x (deprecated)
      5. Oracle Linux v6.x
      6. Oracle Linux v5.x (deprecated)
      7. SUSE Linux Enterprise Server (SLES) v11, SP1 and SP3
      8. Ubuntu Precise v12.04
    2. Ensure that the nodes have full hostname.
      Command to check hostname: "hostname -f"
    3. Ensure that you have the following LINUX software’s
      1. yum and rpm (RHEL/CentOS/Oracle Linux)
      2. scp, curl, unzip, tar, and wget
      3. OpenSSL (v1.01, build 16 or later)
      4. python v2.6
      Ensure that you have installed Java. Run command "yum install java-1.7.0-openjdk"
    4. Database & Memory requirements - Ambari requires a relational database to store information about the cluster configuration and topology. If you install HDP Stack with Hive or Oozie, they also require a relational database.
      1. Ambari : By default, install an instance of PostgreSQL on the Ambari Server host.
      2. Hive : By default (on RHEL/CentOS/Oracle Linux 6), Ambari will install an instance of MySQL on the Hive Metastore host.
      3. Oozie : By default, Ambari will install an instance of Derby on the Oozie Server host You can also use existing instance of PostgreSQL, MySQL or Oracle. For the Ambari database, if you use an existing Oracle database, make sure the Oracle listener runs on a port other than 8080 to avoid conflict with the default Ambari port.
      Also ensure that you have atleast 8 GB of ram on each host. Apache recommends that you have atleast 1 GB of ram available.
  2. Hosts Preparation:
    Host NameIP Address
    Node 1hdp1.stratapps.com172.16.16.202
    Node 2hdp2.stratapps.com172.16.16.203

    On NODE1 .i.e hdp1.stratapps.com

    1. Checking Kernel Versions & Centos Release Versions
      Step -1:
      Ensure the centos installed is 64 Bit using the command
      #uname -r
      Ensure the Operating system installed is centos 6.7 using the command
      Cat /etc/redha-release
  3. Disabling IP Tables & Selinux
    Step-2:
    Configuring iptables:During cluster installation we need to ensure the connections are open between cluster hosts without any restrictions.
    In order to achieve that we need to Turn Off the IP tables (Firewall).
    The easiest way to do this is by disabling the iptables, as follows:
    #Service iptables stop
    
    #Service ip6tables stop
    
    #chkconfig iptables off
    
    #chkconfig  ip6tables off
    Step-3:
    Disable SELinux and PackageKit. Forthis the following file needs to be changed.
    Vi /etc/sysconfig/selinux
    To permanently disable SELinux set SELINUX=disabled
  4. Adding Host Names to Cluster Nodes
    Step -4:
    Ensure that the nodes have fully qualified domain name (full hostname).
    Use command "hostname -f" to check the full hostname or FQDN.
    Use the /etc/hosts file to add both the node hostnames.
    172.16.16.202 hdp1.stratapps.com hdp1
    172.16.16.203 hdp2.stratapps.com hdp2

    On NODE2i.e hdp2.stratapps.com

    Follow the steps from step1 to step4 on Node2 i.e hdp2.stratapps.com as shown above.
    Once all the steps are completed. Ensure the communication happens between the nodes using the below command on both nodes and vice versa.
    On Node1 issue the command
    #ping hdp2.stratapps.com
    #ping hdp1.stratapps.com
    On Node 2 issue the command
    #ping hdp1.stratapps.com
    #ping hdp2.stratapps.com
  5. SSH Configuration On Cluster Nodes
    Step-5:
    Create password -less SSH. To have Ambari Server automatically install Ambari Agents on all your cluster hosts, you must set up password-less SSH connections between the Ambari Server host and all other hosts in the cluster. The Ambari Server host uses SSH public key authentication to remotely access and install the Ambari Agent.
    Generate public and private SSH keys on the Ambari Server host ("Command ssh-keygen")
    Copy the SSH Public Key(id_rsa.pub) to the root account on your target hosts.
    .ssh/id_rsa .ssh/id_rsa.pub   
    # ssh-copy-id -i $HOME/.ssh/id_rsa.pub root@hdp1.stratapps.com
    # ssh-copy-id -i $HOME/.ssh/id_rsa.pub root@hdp2.stratapps.com
    
    Need to test the ssh connection to the host without password.
    #ssh hdp1.stratapps.com
    and also connect to node2 using ssh without password
    #ssh hdp2.stratapps.com
    Step-6:
    Needs to be performed on Node2 i.e hdp2.stratapps.com
    # ssh-keygen -t rsa
    # ssh-copy-id -i $HOME/.ssh/id_rsa.pub root@hdp1.stratapps.com
    # ssh-copy-id -i $HOME/.ssh/id_rsa.pub root@hdp2.stratapps.com
    Need to test the ssh connection to the host without password.
    #ssh hdp2.stratapps.com
    And also connect to node1 using ssh without password
    #ssh hdp1.stratapps.com
  6. Installation of httpd & ntp Packages
    Step -7:
    Configure an HTTP server. Ensure to installhttpd using the command
    #yum install httpd on both node1 and node2
    Http server is required for webbrowser access.
    Enable NTP on the Node1.i.e hdp1.stratapps.com and on the Node2 hdp2.stratapps.com
    # yum install ntp
    # yum install ntp
  7. Download Ambari and hdp Repositories for Cluster Installation
    Step -8:
    Setting up a local repository: If your cluster is behind a firewall that prevents or limits Internet access, you can install Ambari and a Stack using local repositories else you can use internet to access Hortonwork repositories.
    Obtaining the Repositories: Ambari Repositories : If you do not have Internet access for setting up the Ambari repository, use the link appropriate for your OS family to download a tarball that contains the software.
    wget -nvhttp://public-repo-1.hortonworks.com/HDP/centos6/2.x/GA/2.2.0.0/hdp.repo -O /etc/yum.repos.d/HDP.repo
    HDP Stack Repositories : If you do not have Internet access to set up the Stack repositories, use the link appropriate for your OS family to download a tarball that contains the HDP Stack version you plan to install.
    Wget–nvhttp://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.7.0/ambari.repo -O/etc/yum.repos.d/ambari.repo
  8. Installation and Configuration of Ambari Server
    Step-9: Installing Ambari
    Issue the command “yum install ambari-server”on node1 hdp1.stratapps.com
    Once ambari is installed we need to run the setup using the command.
    #ambari-server setup
    Select a JDK version to download. Enter 1 to download Oracle JDK 1.7.By default, Ambari Server setup downloads and installs Oracle JDK1.7 by accompanying Java Cryptography Extension (JCE) Policy Files.
    Select n at Enter advanced database configuration to use the default, embedded PostgreSQL database for Ambari. The default PostgreSQL database name is ambari. The default user name and password are ambari/bigdata. Else, to use an existing PostgreSQL, MySQL or Oracle database with Ambari, select y.
    Once the setup is completed - Start the Ambari Server
    Run the following command on the Ambari Server host hdp1.stratapps.com
    To start the server:
    #ambari-server start
    To stop the Ambari Server:
    #ambari-server stop
    To know status:
    #ambari-server status
    Installing, Configuring, and Deploying a HDP Cluster
    We will use the Ambari Install Wizard running in your browser to install, configure, and deploy our cluster.
    Log In to Apache Ambari
    After starting the Ambari service, open Ambari Web using a web browser & Point your browser to http://:8080, where is the name of yourambari server host. i.e http:// hdp1.stratapps.com:8080/#/login
    Log in to the Ambari Server using the default user name/password: admin/admin.
    You can change these credentials later.
  9. Post Installation using Ambari Server UI (User Interface)
    Step-10:
  10. Step-11: From the Ambari Welcome page, choose Launch Install Wizard.
    Then press “Launch Install Wizard” for creating new cluster.
  11. Step-12: In Name for your cluster, type a name for the cluster you want to create. There should be no white spaces or special characters in the name
    Give your own name for cluster and Press “Next”.
  12. Step-13: The Service Stack (the Stack) is a coordinated and tested set of HDP components. Use a radio button to select the Stack version you want to install. To install an HDP 2x stack, select the HDP 2.2, HDP 2.1, or HDP 2.0 radio button.
    press “Next”.
  13. Step-14: Need to provide the fully qulaified domain host names here for cluster creation.
    We need to access the private key file created earlier between the hosts.
    For that first we need to open node1. i.e hdp1.stratapps.com with winscp. Follow above screen.
    And then copy id_rsa file(private key) into your desktop. Find the above screen.
    In order to build up the cluster, the install wizard prompts you for general information about how you want to set it up. You need to supply the FQDN of each of your hosts. The wizard also needs to access the private key file you created earlier. Using the host names and key file information, the wizard can locate, access, and interact securely with all hosts in the cluster. If you want to let Ambari automatically install the Ambari Agent on all your hosts using SSH, select Provide your SSH Private Key and either use the Choose File button in the Host Registration Information section to find the private key file that matches the public key you installed earlier on all your hosts or cut and paste the key into the text box manually.
    Then press “Register and Confirm”.
  14. Step-15: Confirm Hosts
    Here it will check ssh authentication with provided private key.Check above screen.
  15. Step-16: Choose Services you want to deploy
  16. Step-17:
    The Ambari install wizard assigns the master components for selected services to appropriate hosts in your cluster and displays the assignments in Assign Masters.
    Hive server,Metastore Web Hcat should be Hosted in same machine. Here we hosted on node1 i.e hdp1.stratapps.com.
  17. Step-18:
    The Ambari installation wizard assigns the slave components (DataNodes, NodeManagers, and RegionServers) to appropriate hosts in your cluster. It also attempts to select hosts for installing the appropriate set of clients.
    All the clients were installed on node2 i.e hdp2.stratapps.com
  18. Step-19:
    Customize Services, here we need to provide usernames and passwords in the required fields.
    Here we need to provide password and mail ID for nagios service.
    Review all the settings and services before starting the installation.
  19. Step-20:
    Start the installation by clicking the Deploy button.
  20. Step-21:
    Here, the dashboard shows that all services are up and runing.
    This completes our Hadoop Server Cluster deployment.
    Note: Difference between single node installation and multinode installation
    In single node hdp installation, we have all hadoop daemons on single node with separate jvms. When comes to the multinode hadoop installation all hadoop daemonsare sharedon multiple nodes.For example here we have 2 nodes, in this two nodes we had shared hadoop daemons based on our hardware requirements.