Followers

Monday, May 25, 2020

Spark Debugging tricks

Q1)To Get Temporary variable from spark-shell or Check for variable in memory with $intp.
Ans:
$intp.definedTerms.map(dT => s"${dT.toTermName}: ${$intp.typeOfTerm(dT.toTermName.toString)}").filter(x => x.contains("()org.apache.spark.rdd.RDD")).foreach(println)

Q2)To get Lineage of a RDD.
Ans:
use toDebugString function on RDD.

Q3)To Know Wide or Narrow Dependencies of a RDD.
Ans:
use dependencies function on RDD.

Monday, May 4, 2020

HDFS commands for production environment

Q1)To check size of HDFS directory?
Ans: hdfs dfs -count -v -q -h /user/hive/warehouse/test.db/

Q2)To check if file exist or Not:
Ans:
hdfs dfs -test -e /user/hello/test.txt
echo $? =
{
If file exist
0
else
1
}

Q3)To check if file size is zero or not?
Ans:
hdfs dfs -test -z /user/hello/test.txt
echo $? =
{
If file size =0
0
else
1
}

Q4)To set replication factor of a hdfs file?
Ans:
For File:
hdfs dfs -setrep -w 4 /user/hello/test.txt
For Directory: (it will impact to each file inside directory & not to directory.)
hdfs dfs -setrep -w 1 -R /user/hello/testdir

Q5) To delete file from HDFS Trash folder?
Ans:
hdfs dfs -expunge

Q6)To delete file command to by pass Trash folder?
Ans:
For File:(Move to .Trash folder)
hdfs dfs -rm /user/hello/test.txt
For Directory:(Move to .Trash folder)
hdfs dfs -rm -R /user/hello/testdir
For File:(-skipTrash For permanent delete)
hdfs dfs -rm -skipTrash /user/hello/test.txt
For Directory:(-skipTrash For permanent delete)
hdfs dfs -rm -R -skipTrash /user/hello/testdir