Pig Installation Guide
Environment Setup
- 1.1 Install latest Apache stable build.
- 1.2 Download Pig0.11.1 version.
Pig Installation
Download the stable version of Pig. In our case it is Pig 0.11.1. This version works with Hadoop 0.20.x, 0.23.x, 1.x, 2.x.
- Execute following command to download Hive 01.2
- Copy the pig binaries into the folder /usr/local/pig, by executing the following command.
cp -r pig-0.11.1.tar.gz /usr/local/pig
- Change the directory to /usr/local/pig by executing the following command
cd /usr/local/pig
- Unzip the compressed pig file by executing the following command:
sudo tar –xvzf pig-0.11.1.tar.gz
- Update the .bashrc file for hduser, so that certain pig parameters are set, every time the hduser logs in. Edit the .bashrc file and add the entries shown below.
$ vi .bashrc export PIG_HOME='/usr/local/pig/pig-0.11.1' export PATH=$HADOOP_HOME/bin:$PIG_HOME/bin:$JAVA_HOME/bin:$PATH
- Set the environment variable JAVA_HOME to point to the Java installation directory, which Pig uses internally.
export JAVA_HOME=<<Java_installation_directory>>
- Compile the .bashrc file by executing the following command.
..bashrc
- Pig is now set up and configured for further use.
Execution Modes in Pig
Pig has 2 modes of execution, local mode and MapReduce mode Both are described in detail below.
Local Mode
Local mode is used to verify and debug Pig scripts or queries. It is efficient for handling small datasets on a single machine. It runs on a single JVM and accesses the local filesystem.
- To run in local mode, execute the following command.
$pig –x local
grunt>
MapReduce Mode
This is the default mode Pig translates the queries into MapReduce jobs, which requires access to a Hadoop cluster and its filesystem.
- To run in MapReduce mode, execute the following command
$pig As soon as the above command runs, the grunt shell opens up where the user can run pig commands against the hadoop filesystem. 2013-10-28 11:39:44,767 [main] INFO org.apache.pig.Main – Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53 2013-10-28 11:39:44,767 [main] INFO org.apache.pig.Main – Logging error messages to: /home/hduser/pig_1382985584762.log 2013-10-28 11:39:44,797 [main] INFO org.apache.pig.impl.util.Utils – Default bootup file /home/hduser/.pigbootup not found 2013-10-28 11:39:45,094 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine– Connecting to hadoop file system at: hdfs://Hadoopmaster:54310 2013-10-28 11:39:45,592 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine– Connecting to map-reduce job tracker at: Hadoopmaster:54311 grunt>
- On viewing the log reports, you can see the filesystem and job tracker that Pig connects to.
- Grunt is an interactive shell for running Pig queries and commands.
- There are 3 ways to run Pig programs, one is to run them via Pig scripts, other is to use grunt shell to run interactive queries, and the 3rd way is to embed a script into a Java code