Setup Hadoop Cluster using Ansible

Rahulbhatia1998
4 min readMar 21, 2021

Task Description 📝:

Configure the Hadoop cluster using Ansible.

Steps in order to achieve the goal:

  1. Transfer the JDK and Hadoop installation files.
  2. Install the JDK.
  3. Install the Hadoop.
  4. Make a name node/data node directory.
  5. Update the config files i.e. core-site.xml, hdfs-site.xml.
  6. Format the name node.
  7. Start the name node and data node services.
  8. Get the Hadoop cluster report.

I’m using data_node_conf and name_node_conf for keeping the configuration files template, ip.txt is an inventory, hadoop-setup.yml is a playbook, rpm modules, and vars_file.yml and named the groups as namenode and datanaode.

Let’s start automating the setup of the Hadoop Cluster.

These are the common tasks for all the nodes.

  1. Switch off the firewall service

2. Transfer and install common packages on all the nodes.

Setting up the master node.

  1. Make a namenode directory.

2. Copy the configuration files to the master node.

3. Format the name node.

4. Start the Hadoop service on the name node.

5. Debugging the output while starting the node.

Setting up the data nodes.

  1. Make a data node directory.

2. Copy the configuration files to data nodes.

3. Start the Hadoop service on data nodes and debug the output.

Use the command hadoop dfsadmin -reporton any of the nodes in a cluster to get the cluster report.

Now, the Hadoop cluster is ready with three data nodes and one master node.

You can find the code here https://github.com/rahulbhatia-rb/ansible-hadoop

Give a clap, if you found this helpful :).

--

--