Data Engineer
HAVI is a global, privately owned company focused on innovating, optimizing and managing the supply chains of leading brands. Offering services in marketing analytics, packaging, supply chain management and logistics, HAVI partners with companies to address challenges big and small across the supply chain, from commodity to customer. Founded in 1974, HAVI employs more than 10,000 people and serves customers in more than 100 countries. HAVI’s supply chain services are complemented by the customer engagement services offered by our affiliated company The Marketing Store. For more information, please visit HAVI.com.
Architect, design, implement, enhance, and maintain highly scalable, available, secure, and elastic cloud-ready data solutions using cutting-edge technologies to support our predictive and prescriptive analytics needs. Be an expert in our data domains, act as a trusted partner and advisor to solutions architects and data scientists and become a crucial part of the analytics solution lifecycle – from prototype to production and operations of our data science and advanced analytics solutions in areas such as promotions, supply and demand planning, item/menu level analytics, supply chain simulations, and optimization, competitive benchmarking, and root cause analysis. Continuously improve and advance our data solutions.
This is a hybrid role based at 303-1 Concorde Gate North York, ON, M3C 3N6, Canada. Candidates must reside in the Toronto metropolitan area. Relocation assistance is not offered at this time.
Responsibilities:
- Responsible for working with the data management, data science, decision science, and technology teams to address supply chain data needs in demand and supply planning, replenishment, pricing, and optimization
- Develop/refine the data requirements, design/develop data deliverables, and optimize data pipelines in non-production and production environments
- Design, build, and manage/monitor data pipelines for data structures encompassing data transformation, data models, schemas, metadata, and workload management. The ability to work with both IT and business
- Integrate analytics and data science output into business processes and workflows
- Build and optimize data pipelines, pipeline architectures, and integrated datasets. These should include ETL/ELT, data replication/CI-CD, API design, and access
- Work with and optimize existing ETL processes and data integration and preparation flows and help move them to production
- Work with popular data discovery, analytics, and BI and AI tools in semantic-layer data discovery
- Adept in agile methodologies and capable of applying DevOps and DataOps principles to data pipelines to improve communication, integration, reuse, and automation of data flows between data managers and data consumers across the organization
- Implement Agentic AI capability to drive efficiency and opportunity
Qualifications:
- Bachelor’s degree in computer science, data management, information systems, information science or a related field; advanced degree in computer science, data management, information systems, information science or a related field preferred.
- 3+ years in data engineering building production data pipelines (batch and/or streaming) with Spark on cloud.
- 2+ years hands-on Azure Databricks (PySpark/Scala, Spark SQL, Delta Lake) including:
- Delta Lake operations (MERGE/CDC, OPTIMIZE/Z-ORDER, VACUUM, partitioning, schema evolution).
- Unity Catalog (RBAC, permissions, lineage, data masking/row-level access).
- Databricks Jobs/Workflows or Delta Live Tables.
- Azure Data Factory for orchestration (pipelines, triggers, parameterization, IRs) and integration with ADLS Gen2, Key Vault.
- Strong SQL across large datasets; performance tuning (joins, partitions, file sizing).
- Data quality at scale (e.g., Great Expectations/Deequ), monitoring and alerting; debug/backfill playbooks.
- DevOps for data: Git branching, code reviews, unit/integration testing (pytest/dbx), CI/CD (Azure DevOps/GitHub Actions).
- Infrastructure as Code (Terraform or Bicep) for Databricks workspaces, cluster policies, ADF, storage.
- Observability & cost control: Azure Monitor/Log Analytics; cluster sizing, autoscaling, Photon; cost/perf trade-offs.
- Proven experience collaborating with cross-functional stakeholders (analytics, data governance, product, security) to ship and support data products.
Dimensions & Stakholders:
- Content scope: Data engineering, data modeling, Agentic AI and automation, and data solution operationalization
- Geographical Scope: Global
- Stakeholders & Networks: Analytics and Insights, DevOps, Product Management, Technology
- Working model: Individual contributor, collaborates with cross-functional team within global planning and analytics
Nearest Major Market: Chicago