Role characteristic:
Data Integration Engineer provides technical analysis, data acquisition, data transformation (ETL) and data provisioning services on client’s Data Analytics Platform (DAP), powered by Cloudera Data Platform.
Expected experience and knowledge:
▪ Knowing general Hadoop concept and be open to learning new technologies and toolsets.
▪ Experienced in Software Programming/Data Engineering with a good understanding of CI/CD (GIT Flow), Azure DevOps etc.
▪ Awareness and good to have on hands-on of Cloudera Data Platform, ETLs, Relational databases, Data processing.
▪ Understanding of databases like Hive, Impala, and tools like HUE, JupyterHUB, Azure DevOps, SAP PowerDesigner etc.
▪ Structured Query Language (SQL) experience expected, both DML and DDL statements.
▪ Hadoop Hive/Impala and Big Data processing experience preferred.
▪ Linux text file parsing and processing using Python and native Linux tools experience (national characters encoding, analysis of file content and structure, identification, and clean-up of possible problematic characters like new lines and separators etc.).
Optional:
▪ Hands-on PySpark/Python programming exposure and knowledge of Spark SQL welcome.
▪ Azure Data stack such as ADLS2, Azure Database, Azure DataFactory, Azure Functions, Azure Databricks… experience welcome.
Others:
▪ Learning attitude and flexibility with business requirements, project, and timings.
▪ Good communication and presentation skills towards both: business and IT people.
▪ Experience level:
▪ 0-1 years, demonstrable experience with SQL