- Proven experience as a Data Engineering role with a strong track record of delivering scalable data pipelines.
- Extensive experience designing data solutions including data modelling is required.
- Extensive hands-on experience developing data processing jobs (PySpark / SQL) that demonstrate a strong understanding of software engineering principles is needed.
- Experience orchestrating data pipelines using technology like ADF, Airflow etc is necessary.
- Experience working with both real-time and batch data is important.
- Experience building data pipelines on Azure is crucial. AWS data pipelines will be beneficial.
- Fluency in SQL (any flavour), with experience using Window functions and more advanced features is required.
Responsibilities:
- Data Pipeline Development: Develop and maintain data pipelines that extract, transform, and load (ETL) data from various sources into a centralized data storage system, such as a data warehouse or data lake.
- Data Integration: Integrate data from multiple sources and systems, including databases, APIs, log files, streaming platforms, and external data providers.
- Data Transformation and Processing: Develop data transformation routines to clean, normalize, and aggregate data. Apply data processing techniques to handle complex data structures, handle missing or inconsistent data, and prepare the data for analysis, reporting, or machine learning tasks.
- Contribute to common frameworks and best practices in code development, deployment, and automation/orchestration of data pipelines.
- Implement data governance in line with company standards.
- Partner with Data Analytics and Product leaders to design best practices and standards for developing and productionalising analytic pipelines.
- Partner with Infrastructure leaders on architecture approaches to advance the data and analytics platform, including exploring new tools and techniques that leverage the cloud environment (Azure, Databricks, others).