Data Engineer
Job Description: Data Engineer
Experience: 3 -5 Years
Location: Remote (Bengaluru)
Employment Type: Full-time
Notice Period: Immediate Joiner
Role Overview
We are seeking a skilled Data Engineer with 3+ years of experience in building and maintaining scalable data solutions. The ideal candidate will have hands-on experience across data engineering, data analysis, and visualization, with strong programming fundamentals and the ability to work independently in a remote environment.
Key Responsibilities
Design, develop, and maintain ETL pipelines using PySpark, Apache Airflow, and Azure Data Factory (ADF).
Build and optimize distributed data processing jobs using PySpark.
Orchestrate and schedule workflows using Apache Airflow.
Develop and manage data ingestion and transformation pipelines in Azure Data Factory.
Write clean, efficient, and reusable code using Python .
Develop and optimize complex SQL queries for MySQL and PostgreSQL databases.
Work with MongoDB for handling semi-structured and unstructured data.
Perform data analysis using Pandas and NumPy to support business insights.
Create basic to intermediate data visualizations using Matplotlib, Power BI, and Streamlit.
Monitor data pipelines, troubleshoot issues, and ensure data quality and performance.
Collaborate with cross-functional teams including analysts, data scientists, and product teams.
Required Skills & Qualifications
Programming Languages
Proficiency in Python
Databases
Strong experience with MySQL and PostgreSQL
Hands-on exposure to MongoDB
Data Engineering
Experience building ETL/ELT pipelines
Hands-on with PySpark for large-scale data processing
Experience using Apache Airflow for workflow orchestration
Experience with Azure Data Factory (ADF)
Data Analysis & Visualization
Hands-on experience with Pandas and NumPy
Ability to create visualizations using Matplotlib
Experience with Power BI for dashboards and reporting
Exposure to Streamlit for building data-driven applications
Good to Have
Experience with Azure data services (Azure Data Lake, Synapse Analytics).
Familiarity with Git and version control best practices.
Basic understanding of data warehousing concepts.
Exposure to cloud-based deployments and CI/CD pipelines.