- Experience
of AWS tools (e.g Athena, Redshift, Glue, EMR) Pyspark, Scala, Python,
Min 6- 10 years.
- Experience
of developing enterprise grade ETL/ELT data pipelines and demonstratable
knowledge of applying Data Engineering best practices (coding practices
to DS, unit testing, version control, code review).
- Big
Data Eco-Systems, Cloudera/Hortonworks, AWS EMR, GCP DataProc or GCP
Cloud Data Fusion.
- Streaming
technologies and processing engines, Kinesis, Kafka, Pub/Sub and Spark
Streaming.
- Experience
of working with CI/CD technologies, Git, Jenkins, Spinnaker, GCP Cloud
Build, Ansible etc and experience building and deploying solutions to
Cloud (AWS, Google Cloud) including Cloud provisioning tools.
- Experience
with relational SQL and NoSQL databases, including Redshift, DynamoDB,
RDS Postgres and Oracle.
- Experience
with data pipeline and workflow management technologies Airflow is a
MUST.
- Cloud
abilities specifically in AWS EC2, S3 and IAM.
- Proficiency
with CI/CD tools.
- Ability
to work with a variety of individuals and groups, in a constructive and
collaborative manner and build and maintain effective relationships.
- Full-stack
development experience across distributed applications and services.
- Experience
implementing the Software Development Lifecycle in an agile environment.
|