Wednesday 11 October 2017

distcp: Copying Data Between Secured And Non-Secured Cluster

Security settings dictate whether DistCp should be run on the source cluster or the destination cluster. The general rule-of-thumb is that if one cluster is secure and the other is not secure, DistCp should be run from the secure cluster -- otherwise there may be security- related issues. When data is copied from secured cluster to non secured cluster, distcp may throw exception with following error

-------
java.io.IOException: Failed on local exception: java.io.IOException: Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.; Host Details : local host is: xxx; destination host is: "yyy":8020;
---------------

if you encounter this situation, run distcp with ipc.client.fallback-to-simple-auth-allowed=true configuration.

-------
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://xxx:8020/src_path hdfs://yyy:8020/target_path
-------

Depending upon cluster setup, above command can be failed with following error:

-------
java.io.EOFException:End of FileException between local host is yyy; destination host is:xxx;
-------

which means target cluster is blocked for RPC communication, in such cases, webhdfs protocol can be used, so above distcp can be rewritten as

-----
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://xxx:8020/src_path webhdfs://yyy:50070/target_path
-----

No comments:

Post a Comment

Note: only a member of this blog may post a comment.