Data Engineer III - Hybrid Position

Posted 14 March 2022
Location Oklahoma City, United States of America
Job type Full Time

Job Description

Data Engineer III - Hybrid Position

(Requires 1 day per week in either OKC or Grapevine, TX Office)


This position will be located within the Development and IT space and work closely with computer scientists, IT and data scientists to build, deploy and optimize data pipelines and integrate data infrastructure for enabling analytics, reporting, and machine learning workloads at scale.

  • Build, test, and validate robust production-grade data pipelines that can ingest, aggregate, and transform large datasets according to the specifications of the internal teams who will be consuming the data

  • Build frameworks and custom tooling for data pipeline code development

  • Deploy data pipelines and data connectors to production environments

  • Configure connections to source data systems and validates schema definitions with the teams responsible for the source data

  • Monitor data pipelines and data connectors and troubleshoot issues as the arise

  • Monitor data lake environment for performance and data integrity

  • Manage data infrastructure such as Kafka and Kubernetes clusters

  • Collaborate with IT and database teams to maintain the overall data ecosystem

  • Assist data science, business intelligence, and other teams in using the data provided by the data pipelines

  • Mentor junior data engineers

  • Deploy machine learning models to production environments

  • Gather requirements and determine scope of new projects

  • Research and evaluate new technologies and set up proof-of-concept deployments

  • Test proof-of-concept deployments of new technologies

  • Collaborate with data governance and compliance teams to ensure data pipelines and data storage environments meet requirements

  • Serve as on-call for production issues related to data pipelines and other data infrastructure maintained by the data engineering team


  • BS degree in Computer Science or related field

  • Experience:

  • 5+ years of data engineering work experience

  • Experience with ETL and ELT processes in data pipelines

  • Experience with Apache Spark or Databricks and reading and writing Parquet, Avro and JSON

  • Experience coding in Java or Scala and build tools such as Maven, Gradle, and SBT is required

  • Experience with SQL databases

  • Experience coding in Java or Scala and build tools, Maven, Gradle and SBT

  • Experience with CICD tools and processes

  • Experience working with HDFS or S3 storage environments

  • Experience working in a Unix or Linux environment including writing shell scripts


  • Experience coding in Python

  • Experience with NoSQL solutions

  • Experience with Apache Kafka or Confluent is highly preferred

  • Experience with data lake query engines such as Presto or Dremio

  • Experience with Docker and Kubernetes highly preferred

  • Experience with workflow orchestration tools like Apache Airflow, Control-M, or Arrow highly preferred

Skills and Abilities:

  • Strong expertise in computer science fundamentals: data structures, performance complexities, algorithms, and implications of computer architecture on software performance such as I/O and memory tuning.

  • Working knowledge software engineering fundamentals: version control systems such as Git and Github, workflows, ability to write production-ready code.

  • Knowledge of data architecture and data processing engines such as Spark and Hadoop.

  • Ability to create SQL queries of moderate complexity.

  • Knowledge of Java or Scala.

  • Knowledge of Apache Spark

  • Knowledge of Python, R, C#, or PHP is helpful.

  • Knowledge of HDFS and S3 storage environments

  • Strong trouble-shooting skills.

  • Knowledge of technical infrastructure.

  • Strong technical aptitude.

  • Has strong critical thinking skills and the ability to relate them to the products.

  • Demonstrates excellent verbal and written communication skills.