Role: Data Engineer
We need someone with 5-8 years of extensive experience in Data Warehousing, ETL and Big data technologies(Hadoop, Hive, Sqoop..etc) and 3+ years of mandatory experience in Spark with Python/Scala with more than one end-to-end implementation experience.
Roles and Responsibilities
- To develop Scala or Python scripts, UDFs using both Data frames/SQL/Data sets and RDD in Spark 2.3+ for Data Aggregation, queries and writing data back into the OLTP system through Sqoop.
- Should have a very good understanding of Partitions, Bucketing concepts and designed both Managed and external tables, ORC files in Hive to optimize performance.
- Wrote and Implemented Spark and Scala scripts to load data from and to store data into Cassandra/Hbase/ any NoSQL
- Implementing SCD Type 1 and Type 2 model using Spark
- Developed Oozie workflow for scheduling and orchestrating the ETL process
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, the correct level of Parallelism and memory tuning
- Streaming data into Elastic search for visualization using Kibana
- Should have implemented the mapping parameters/variables in the mapping and the session level to increase the reusability of the code and parameterize the hardcoded values.
- Knowledge in AWS stacks AWS Glue, S3, SQS
- Exposure to Elastic Search, Solr is a plus
- Exposure to NoSQL Databases Cassandra, MongoDB
- Exposure to Serverless computing