How to change replication factor of existing files in HDFS

To set replication of an individual file to 4:

To set replication of an individual file to 4:

./bin/hadoop dfs -setrep -w 4 /path/to/file

You can also do this recursively. To change replication of entire HDFS to 1:

./bin/hadoop dfs -setrep -R -w 1 /

PASSWORD-LESS AUTHENTICATION USERFUL FOR HADOOP ADMINISTRATORS AND LINUX USERS

 

 
Although Hadoop never requires any password-less authentication to communicate between nodes, but from Hadoop administrator perspective it provide a great flexibility while managing multiple nodes together.

a. Generate the SSH private/public key pair for hadoop user in Namenode

i. ssh-keygen –t rsa


b. By default it will be created in ~/.ssh directory in the name of id-rsa.pub

c. Since Hadoop user was create by the Hadoop rpm without a password, so copying the public key using ssh-copy-id command will not work.

Copy the public key from the Namenode to every data node and secondary name node using scp command using loginid user.

i. scp ~/.ssh/id-rsa.pub loginid@IP:/tmp

e. Then log into every system as hadoop user and copy the id-rsa.pub as ~/.ssh/authorized_keys

i. mkdir ~/.ssh

ii. cp /tmp/id-rsa.pub ~/.ssh/authorized_keys

f. Set the permission for .ssh directory and authorized_keys file

i. chmod 700 ~/.ssh

ii. chmod 644 ~/.ssh/authorized_keys

Once the password-less authentication is done it becomes very easy for the administrator write a single script to execute same command throughout the cluster from namenode.

Follow

Get every new post delivered to your Inbox.