Job Description
Responsibilities
- · Design, build, and maintain secure, scalable, and efficient data pipelines to support regulatory analysis and reporting.
- · Collect, clean, transform, and integrate data from diverse sources (APIs, databases, flat files, logs) to enable downstream analysis and visualization.
- · Collaborate with internal teams and external data providers to assess requirements, define data integration methods, and troubleshoot technical issues.
- · Develop and manage data infrastructure using open-source or commercial tools, either on cloud platforms or on-premise environments.
- · Implement robust validation checks, logging, and error-handling mechanisms to ensure data quality and integrity.
- · Document technical workflows, data schemas, and pipeline logic to support transparency, collaboration, and maintainability.
- · Enforce data security and privacy standards, including encryption, access controls, and compliance with Rwanda’s data protection laws.
- · Work closely with analysts, policy teams, and departments to deliver timely, accurate, and actionable data for regulatory decision-making.
- · Optimize data workflows for performance, scalability, and cost-efficiency, especially for large or sensitive datasets.
- · Support DevOps practices in data engineering, including CI/CD pipelines, version control, automated testing, and monitoring.
- · Contribute to the development of data documentation standards, metadata management, and RURA’s data cataloging initiatives.
Requirements
- · A Masters’ degree in data engineering, computer science, software engineering, information systems, or a related technical field (e.g., data science, applied mathematics, or statistics) with a minimum of 1 year of professional experience in data engineering, ETL development, or building data infrastructure. Or
- · A Bachelor’s degree in Data Engineering, Computer Science, Software Engineering, Information Systems, or a related technical field (e.g., Data Science, Applied Mathematics, or Statistics), with a minimum of three (3) years of professional experience in data engineering, ETL development, database management, or building and maintaining data infrastructure.
- · Workflow Orchestration: Proficient in Apache Airflow to build, schedule, and monitor DAGs, including creating custom operators and managing retries and failures.
- · Containerization & Dev Environments: Skilled in using Docker to containerize applications, write Dockerfiles, and manage multi-container setups with Docker Compose.
- · Linux & Bash Scripting: Comfortable working in Linux environments for scripting, automation (e.g., cron jobs), file management, and troubleshooting.
- · Programming (Python/R): Ability to write clean, modular ETL code in Python or R, including integration with APIs, databases, and third-party services.
- · SQL & Databases: Experience writing complex queries and working with various database engines (e.g., PostgreSQL, MySQL, Clickhouse) for both OLTP and OLAP contexts.
- · Data Warehousing & Lakes: Knowledge of data modeling, partitioning, performance optimization, and handling large-scale datasets in warehousing environments.
- · Cloud Infrastructure: Capable of deploying and managing data services and infrastructure on cloud platforms such as AWS, Azure, or similar.
- · Monitoring & Logging: Experience using tools to monitor pipeline health, detect failures, and generate alerts for proactive maintenance.
- · Data Quality Management: Implement validation checks, profiling tools, and data monitoring systems to ensure consistency and reliability.
- · Version Control: Proficient in using Git for collaborative development, versioning, and deployment of data workflows.
- · Security & Privacy: Understands best practices for data protection, including access controls, encryption, and secure data transfer methods (e.g., SFTP, VPN).
- · Networking Fundamentals: Familiar with basic networking concepts like IPs, ports, DNS, and firewalls—particularly in multi-environment setups.
- · CI/CD for Data Pipelines: Experience automating deployment and testing using tools such as GitHub Actions, Jenkins, or similar frameworks.
- · Documentation & Communication: Able to document data workflows clearly and communicate technical concepts to non-technical stakeholders
- · Possesses strong problem-solving skills and attention to detail, especially when designing and debugging data systems.
- · Works well independently and in cross-functional teams, collaborating with analysts, engineers, and policy stakeholders.
- · Demonstrates a deep understanding of data modeling, pipelines, and statistical concepts relevant to data quality and performance.
- · Comfortable sourcing, transforming, and integrating data from both structured and unstructured sources using scalable methods.
- · Able to clearly define technical problems, design solutions, and document processes for both technical and non-technical audiences.
- · Brings a proactive, creative mindset to building resilient data platforms and is motivated to improve data access and quality across the organization.
- · Proven ability to design and maintain robust data pipelines across varied data sources and formats.
- · Proficiency in both OLAP and OLTP databases such as Clickhouse, PostgreSQL, MySQL, or similar systems.
- · Comfortable working in Linux-based environments, including scripting, task automation, and basic system troubleshooting.
- · Strong programming skills in Python or another language used in data engineering (e.g., Scala, Java).
- · Familiarity with tools such as Apache Airflow, Docker, Git, and cloud platforms like AWS, GCP, or Azure.
- · Knowledge of data quality, integrity, and security standards, including best practices for governance.
- · Ability to work cross-functionally with analysts, data stewards, and policy teams to deliver high-quality data products.
Click here to visit the source
















