Elasticsearch: JSON document
Elasticsearch is a highly scalable open-source full-text search and analytics engine.
It allows you to store, search, and analyze big volumes of data quickly and in near real time.
It is generally used as the underlying engine/technology that powers applications that have
complex search features and requirements.
Elasticsearch API:
-:Document API
-:Search API
-:Indices API
-:Cat API
-:Cluster API
Single documents API or Multi document APIs
Single Documents API:
-: Index API(Create)
-: Get API
-: Delete API
-: Update API
Multi document APIs:
-: Multi get API
-: Bulk API
-: Delete by query API
-: Update by query API
-: Re-index API(copy data from one index to another index as back-up)
Elasticsearch to database mapping:
-: _index == Database
-:_type == Table
-:_id == primary key
Saturday, July 18, 2020
Monday, May 25, 2020
Spark Debugging tricks
Q1)To Get Temporary variable from spark-shell or Check for variable in memory with $intp.
Ans:
$intp.definedTerms.map(dT => s"${dT.toTermName}: ${$intp.typeOfTerm(dT.toTermName.toString)}").filter(x => x.contains("()org.apache.spark.rdd.RDD")).foreach(println)
Q2)To get Lineage of a RDD.
Ans:
use toDebugString function on RDD.
Q3)To Know Wide or Narrow Dependencies of a RDD.
Ans:
use dependencies function on RDD.
Ans:
$intp.definedTerms.map(dT => s"${dT.toTermName}: ${$intp.typeOfTerm(dT.toTermName.toString)}").filter(x => x.contains("()org.apache.spark.rdd.RDD")).foreach(println)
Q2)To get Lineage of a RDD.
Ans:
use toDebugString function on RDD.
Q3)To Know Wide or Narrow Dependencies of a RDD.
Ans:
use dependencies function on RDD.
Monday, May 4, 2020
HDFS commands for production environment
Q1)To check size of HDFS directory?
Ans: hdfs dfs -count -v -q -h /user/hive/warehouse/test.db/
Q2)To check if file exist or Not:
Ans:
hdfs dfs -test -e /user/hello/test.txt
echo $? =
{
If file exist
0
else
1
}
Q3)To check if file size is zero or not?
Ans:
hdfs dfs -test -z /user/hello/test.txt
echo $? =
{
If file size =0
0
else
1
}
Q4)To set replication factor of a hdfs file?
Ans:
For File:
hdfs dfs -setrep -w 4 /user/hello/test.txt
For Directory: (it will impact to each file inside directory & not to directory.)
hdfs dfs -setrep -w 1 -R /user/hello/testdir
Q5) To delete file from HDFS Trash folder?
Ans:
hdfs dfs -expunge
Q6)To delete file command to by pass Trash folder?
Ans:
For File:(Move to .Trash folder)
hdfs dfs -rm /user/hello/test.txt
For Directory:(Move to .Trash folder)
hdfs dfs -rm -R /user/hello/testdir
For File:(-skipTrash For permanent delete)
hdfs dfs -rm -skipTrash /user/hello/test.txt
For Directory:(-skipTrash For permanent delete)
hdfs dfs -rm -R -skipTrash /user/hello/testdir
Ans: hdfs dfs -count -v -q -h /user/hive/warehouse/test.db/
Q2)To check if file exist or Not:
Ans:
hdfs dfs -test -e /user/hello/test.txt
echo $? =
{
If file exist
0
else
1
}
Q3)To check if file size is zero or not?
Ans:
hdfs dfs -test -z /user/hello/test.txt
echo $? =
{
If file size =0
0
else
1
}
Q4)To set replication factor of a hdfs file?
Ans:
For File:
hdfs dfs -setrep -w 4 /user/hello/test.txt
For Directory: (it will impact to each file inside directory & not to directory.)
hdfs dfs -setrep -w 1 -R /user/hello/testdir
Q5) To delete file from HDFS Trash folder?
Ans:
hdfs dfs -expunge
Q6)To delete file command to by pass Trash folder?
Ans:
For File:(Move to .Trash folder)
hdfs dfs -rm /user/hello/test.txt
For Directory:(Move to .Trash folder)
hdfs dfs -rm -R /user/hello/testdir
For File:(-skipTrash For permanent delete)
hdfs dfs -rm -skipTrash /user/hello/test.txt
For Directory:(-skipTrash For permanent delete)
hdfs dfs -rm -R -skipTrash /user/hello/testdir
Subscribe to:
Posts (Atom)


