Sunday, 30 August 2015

Installing Hadoop using Ambari

In this post, I will explain how to setup a hadoop cluster using Ambari.

SCENARIO: In this setup, I have 5 servers. The details of each one of them including their IP address are as below:

------
192.168.151.140  hadoop-master-test.xxx.com
192.168.151.141  hadoop-secondary-test.xxx.com
192.168.151.142  data-node-1-test.xxx.com
192.168.151.143  data-node-2-test.xxx.com
192.168.151.144  data-node-3-test.xxx.com
--------

In am going to use these nodes for my hadoop installation.

PREREQUISITES:

Do the following on all the nodes.

1) Disable iptables.
---------
service iptables stop
chkconfig iptables off
---------

2)Disable libvirtd.
---------
service libvirtd stop
chkconfig libvirtd off
---------

3) Disable transparent hugepages as below:
----------
Run the following commands in the terminal:

------------
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
------------

Open the file /etc/rc.local in gedit and add the following lines to the end:

---------------
if test -f /sys/kernel/mm/redhat_transparent_hugepage/enabled; then

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

fi


if test -f /sys/kernel/mm/redhat_transparent_hugepage/defrag; then

echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

fi
----------------

The more details of THP can be read from the Redhat's official post below:
-------
--------

4) Enable NTPD

Run the following commands in the terminal:

-------
service ntpd start
chkconfig ntpd on
---------

5)  Make sure, forward and reverse DNS works well. In short, the commands:
------
hostname -i ---> should give ip address
hostname -f ----> should give fully qualified domain name.
------

If you don't have a DNS entry for all hosts. All the details in "/etc/hosts" file as below:
-------
192.168.151.140   hadoop-master-test.xxx.com
192.168.151.141   hadoop-secondary-test.xxx.com
192.168.151.142   data-node-1-test.xxx.com
192.168.151.143   data-node-2-test.xxx.com
192.168.151.144   data-node-3-test.xxx.com
-------

6) Adding repositories.

Now, navigate to the folder "/etc/yum.repos.d/" and get the file, ambari.repo as below:
-----
cd /etc/yum.repos.d/

wget http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.0.0/ambari.repo
------

The more details about adding ambari repo can be found from the link below:
--------
https://cwiki.apache.org/confluence/display/AMBARI/Install+Ambari+2.0.0+from+Public+Repositories
--------

PROCEDURE: 
-----------------

Install  Ambari Server
---------
yum -y install ambari-server
---------

Setup ambari server using below command:
---------
ambari-server setup
---------

During the setup process, it will ask you about choice of the JDK – select “Oracle JDK 1.7″, for the rest options simply press enter and accept the defaults. It must output you “Ambari Server ‘setup’ completed successfully.” At the end.

Run the below command, to start ambari server:
-----
ambari-server start
-----

In the browser, load the ambari server URL as below:
-----
http://hostname:8080
-----

Use the username "admin" and password "admin" as default.

The user interface, will look as below:



Installing hadoop components using Ambari:

STEP 1:  Click Launch install wizard button to start installation process and name the cluster as your preference:



STEP 2: Select “HDP 2.2″ stack.


STEP 3: Generating private key.

There are two methods to login to a server.

1) One method to log on to SSH is to pass the login and the password.
2) Second method is how machine A can recognize that user B is trying to login:

Machine A knows the public key of user B

When logging on, user B sends a message to machine A encrypted by private key.

Machine A decrypt the message using the public key (taking it from ".ssh" folder in the home directory of the user B). If it succeeds, it knows that this is really user B.

Therefore, Ambari Server needs the SSH private key to be able to log on to all machines in the cluster and run commands at those machines.

So, let us generate a key pair. Run terminal, run the following commands:

-----------
cd /root/.ssh

ssh-keygen
------------

press enter several times. Run command

------
ls
------

it will show you that there are two files in the directory .ssh:


The first one contains the private key, the second one contains the public key for user root.

Now, copy the public key present in "/root/.ssh/id_rsa.pub" to the file "/root/.ssh/authorized_keys" of all machines in the cluster.

----------
cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys
----------

it will add the contents of the root’s file id_rsa.pub to a file authorized_keys.

Therefore, if a user will connect to this machine by SSH and will specify the correct private key, this machine will look up into authorized_keys and will know that this is root.

To test this run

---------
ssh yourhostname
---------
(replace yourhostname with real host name)

the command will prompt you for confirmation, and then will successfully show a prompt [root@yourhostname ~] that means that we’ve set up a connection by SSH to the host yourhostname (actually, we already were on the same host, but this doesn’t matter).

Press Ctrl+D, you will disconnect from the host yourhostname and get the message “connection to yourhostname closed”.

Now run the command:

---------
cat id_rsa
----------

will dump the file contents to the terminal. Select the region between "begin" and "end" markers, and copy it to ambari UI.

Also, mention all hostnames in the cluster as below:



STEP 4: Confirm the hosts to continue.



STEP 5: Once it finishes, you will see a success message.



STEP 6: Select the services to be installed.


STEP 7:  Next step is to assign masters:


STEP 8: Assign slaves and clients.



STEP 9:  In this step, some services will show warning messages. Click on those services and enter the password in the boxes required.



STEP 9: You can start deploying.



On successful installation, you will see the following message.


The UI of the ambari, will now look as below:


CONFIGURING SERVICES:

Now, when you try to connect to "hive" interface, you will see the following error message.




The permissions on HDFS “/user” directory are initially wrong.

On the “user” folder there are permissions rwx for the user “hdfs” (who is the owner of this directory), but just rx on the group hdfs! Let us add wrire permission for group hdfs, add the root to “hdfs” group. But we shall do this from the user hdfs.


---------
passwd hdfs
---------

it will promt you for password – enter hdfs twice. Then run

----------
su – hdfs
----------

it will switch you under that user. The run

----------
usermod -a -G hdfs root

hdfs dfs -chmod -R 775 /user

hdfs dfs -ls /
----------

And here is the result:



Now, switch back to root user and create a home directory for root as below:
-----------
sudo -u hdfs hadoop fs -mkdir /user/root
sudo -u hdfs hadoop fs -chown root:root /user/root
-----------

Now, you will be able to login to hive shell.


Configuring HDFS for other users:

For other users (who later will be using this all) it is necessary to do the following:

--------------
hdfs dfs -mkdir /user/Manoj

hdfs dfs -chown Manoj:Manoj /user/Manoj
--------------

Reason: It is necessary to have everyone the home directory in the HDFS.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.