Sr. Data Engineer Azure Databri…

Fusemachines

kathmandu

Experience: More than 5 years

Source: Other Source

Key Skills: Custom Integration Database Design Pyspark Apache Kafka Apache Spark

This job is expired 1 year, 2 months ago

Sr. Data Engineer Azure Databricks

Views: 152 | This job is expired 1 year, 2 months ago

Basic Job Information

Job Category	:	IT & Telecommunication
Job Level	:	Mid Level
No. of Vacancy/s	:	[ 1 ]
Employment Type	:	Full Time
Job Location	:	kathmandu
Apply Before(Deadline)	:	May. 17, 2024 16:10 (1 year, 2 months ago)

Job Specification

Education Level	:	Under Graduate (Bachelor)
Experience Required	:	More than 5 years
Professional Skill Required	:	Custom Integration Database Design Pyspark Apache Kafka Apache Spark

About the job

About Fusemachines

Fusemachines is a leading AI strategy, talent, and education services and products provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, United States, Canada, and Dominican Republic and more than 400 full-time employees). Fusemachines seeks to bring its global expertise in AI to transform companies around the world.

About The Role

This is a full-time position responsible for designing, building, and maintaining the infrastructure required for data integration, storage, processing, and analytics (BI, visualization and Advanced Analytics).

We are looking for a skilled Senior Data Engineer with a strong background in Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps and cloud-based large scale data applications with a passion for data quality, performance and cost optimization. The ideal candidate will develop in an Agile environment, contributing to the architecture, design, and implementation of data products, including migration from Synapse to Azure Data Lake. This role involves hands-on coding, mentoring junior staff and collaboration with multi-disciplined teams to achieve project objectives.

Qualification & Experience

Must have a full-time Bachelor's degree in Computer Science or similar

At least 5 years of experience as a data engineer with strong expertise in Databricks, Azure, DevOps, or other hyperscalers

5+ years of experience with Azure DevOps, GitHub

Proven experience delivering large scale projects and products for Data and Analytics, as a data engineer, including migrations

Following certifications:

Databricks Certified Associate Developer for Apache Spark

Databricks Certified Data Engineer Associate

Microsoft Certified: Azure Fundamentals

Microsoft Certified: Azure Data Engineer Associate

Microsoft Exam: Designing and Implementing Microsoft DevOps Solutions (nice to have)

Required Skills/Competencies

Strong programming Skills in one or more languages such as Python (must have), Scala, and proficiency in writing efficient and optimized code for data integration, migration, storage, processing and manipulation

Strong understanding and experience with SQL and writing advanced SQL queries

Thorough understanding of big data principles, techniques, and best practices

Strong experience with scalable and distributed Data Processing Technologies such as Spark/PySpark (must have: experience with Azure Databricks), DBT and Kafka, to be able to handle large volumes of data

Solid Databricks development experience with significant Python, PySpark, Spark SQL, Pandas, NumPy in Azure environment

Strong experience in designing and implementing efficient ELT/ETL processes in Azure and Databricks and using open source solutions being able to develop custom integration solutions as needed

Skilled in Data Integration from different sources such as APIs, databases, flat files, event streaming

Expertise in data cleansing, transformation, and validation

Proficiency with Relational Databases (Oracle, SQL Server, MySQL, Postgres, or similar) and NonSQL Databases (MongoDB or Table)

Good understanding of Data Modeling and Database Design Principles. Being able to design and implement efficient database schemas that meet the requirements of the data architecture to support data solutions

Strong experience in designing and implementing Data Warehousing, data lake and data lake house, solutions in Azure and Databricks

Good experience with Delta Lake, Unity Catalog, Delta Sharing, Delta Live Tables (DLT)

Strong understanding of the software development lifecycle (SDLC), especially Agile methodologies

Strong knowledge of SDLC tools and technologies Azure DevOps and GitHub, including project management software (Jira, Azure Boards or similar), source code management (GitHub, Azure Repos or similar), CI/CD system (GitHub actions, Azure Pipelines, Jenkins or similar) and binary repository manager (Azure Artifacts or similar)

Strong understanding of DevOps principles, including continuous integration, continuous delivery (CI/CD), infrastructure as code (IaC – Terraform, ARM including hands-on experience), configuration management, automated testing, performance tuning and cost management and optimization.

Strong knowledge in cloud computing specifically in Microsoft Azure services related to data and analytics, such as Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake, Azure Stream Analytics, SQL Server, Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, etc

Experience in Orchestration using technologies like Databricks workflows and Apache Airflow

Strong knowledge of data structures and algorithms and good software engineering practices

Proven experience migrating from Azure Synapse to Azure Data Lake, or other technologies

Strong analytical skills to identify and address technical issues, performance bottlenecks, and system failures

Proficiency in debugging and troubleshooting issues in complex data and analytics environments and pipelines

Good understanding of Data Quality and Governance, including implementation of data quality checks and monitoring processes to ensure that data is accurate, complete, and consistent.

Experience with BI solutions including PowerBI is a plus

Strong written and verbal communication skills to collaborate and articulate complex situations concisely with cross-functional teams, including business users, data architects, DevOps engineers, data analysts, data scientists, developers, and operations teams

Ability to document processes, procedures, and deployment configurations

Understanding of security practices, including network security groups, Azure Active Directory, encryption, and compliance standards

Ability to implement security controls and best practices within data and analytics solutions, including proficient knowledge and working experience on various cloud security vulnerabilities and ways to mitigate them.

Self-motivated with the ability to work well in a team, and experienced in mentoring and coaching different members of the team

A willingness to stay updated with the latest services, Data Engineering trends, and best practices in the field

Comfortable with picking up new technologies independently and working in a rapidly changing environment with ambiguous requirements

Care about architecture, observability, testing, and building reliable infrastructure and data pipelines

Responsibilities

Architect, design, develop, test and maintain high-performance, large-scale, complex data architectures, which support data integration (batch and real-time, ETL and ELT patterns from heterogeneous data systems: APIs and platforms), storage (data lakes, warehouses, data lake houses, etc), processing, orchestration and infrastructure. Ensuring the scalability, reliability, and performance of data systems, focusing on Databricks and Azure

Contribute to detailed design, architectural discussions, and customer requirements sessions

Actively participate in the design, development, and testing of big data products.

Construct and fine-tune Apache Spark jobs and clusters within the Databricks platform

Migrate out of Azure Synapse to Azure Data Lake or other technologies

Assess best practices and design schemas that match business needs for delivering a modern analytics solution (descriptive, diagnostic, predictive, prescriptive)

Design and implement data models and schemas that support efficient data processing and analytics

Design and develop clear, maintainable code with automated testing using Pytest, unittest, integration tests, performance tests, regression tests, etc

Collaborating with cross-functional teams and Product, Engineering, Data Scientists and Analysts to understand data requirements and develop data solutions, including reusable components meeting product deliverables.

Evaluating and implementing new technologies and tools to improve data integration, data processing, storage and analysis

Evaluate, design, implement and maintain data governance solutions: cataloging, lineage, data quality and data governance frameworks that are suitable for a modern analytics solution, considering industry-standard best practices and patterns

Continuously monitor and fine-tune workloads and clusters to achieve optimal performance

Provide guidance and mentorship to junior team members, sharing knowledge and best practices

Maintain clear and comprehensive documentation of the solutions, configurations, and best practices implemented

Promote and enforce best practices in data engineering, data governance, and data quality

Ensure data quality and accuracy

Design, Implement and maintain data security and privacy measures

Be an active member of an Agile team, participating in all ceremonies and continuous improvement activities, being able to work independently as well as collaboratively

Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

Powered by JazzHR

fLs9aXN38R

This job has expired.

Similar Jobs