Big Data Engineering, AWS, Spark, Scala, Python, Hadoop, Airflow, Kubernetes
Aktualisiert am 10.10.2024
Profil
Freiberufler / Selbstständiger
Remote-Arbeit
Verfügbar ab: 04.11.2024
Verfügbar zu: 100%
davon vor Ort: 0%
AWS
Apache Spark
Big Data
Scala
Python
Java
Hadoop
Apache Airflow
Apache Kafka
Kubernetes
Apache Cassandra
Neo4j
English
very good
German
beginner

Einsatzorte

Einsatzorte

Deutschland, Schweiz, Österreich

Currently I'm only open to remote projects.

möglich

Projekte

Projekte

2 Jahre 2 Monate
2022-11 - heute

Working on Credit and Liquidity Risk Stress Testing system

Big Data Engineer / Contractor Scala Python Java
Big Data Engineer / Contractor







Spark Hadoop S3 Dremio
Scala Python Java
Nomura
Remote
2 Jahre 7 Monate
2022-06 - heute

Working on migration from on-prem Hadoop cluster to AWS

Big Data Engineer Hadoop Spark EMR ...
Big Data Engineer
  • Converting multiple MapReduce jobs to Spark jobs. 
  • Updating Spark jobs to use DataFrames and Datasets instead of RDDs. 
  • Optimizing existing Spark jobs. 
  • Creating Airflow DAGs to schedule data processing.
Hadoop Spark EMR Airflow S3 Docker Terraform Scala Java Python
Groupon
Remote
1 Jahr 1 Monat
2021-06 - 2022-06

Preparing a framework

Lead Big Data Engineer/Contractor Hadoop Spark Hive ...
Lead Big Data Engineer/Contractor
  • Designing and implementing a feature store for machine learning. Preparing a framework for efficient calculation of thousands different aggregate values (features) from terabytes of data. 
  • Dockerizing Spark applications to run them as containers on EMR cluster in more isolated and standardized way. 
  • Implementing application for serving machine learning model as REST API on Kubernetes cluster using Flask and Gunicorn. Significantly improved response time of the API by using approximate nearest neighbor search algorithm. 
  • Implementing lambda function for transforming new objects created in S3 bucket and storing records in DynamoDB table. 
  • Implementing REST API application using Akka HTTP that serves recommendations stored in DynamoDB table. 
  • Optimizing existing Spark applications. 
  • Working with data scientists on optimizing their solutions and making them production ready
Hadoop Spark Hive Delta Lake Docker Kubernetes EMR S3 CloudFormation Lambda DynamoDB Akka HTTP Flask Gunicorn. Programming languages: Scala Python
Adidas
Remote
4 Monate
2021-03 - 2021-06

Implementing multiple PySpark jobs running on Kubernetes

Big Data Engineer/Contractor Hadoop Spark Hive ...
Big Data Engineer/Contractor
  • Implementing multiple PySpark jobs running on Kubernetes cluster for transforming the data from MS SQL Server and storing it into S3. 
  • Implementing Airflow pipelines for scheduling PySpark jobs and defining dependencies between them. 
Hadoop Spark Hive Docker Kubernetes Airflow AWS MS SQL Server Terraform
Nike
Remote
11 Monate
2020-05 - 2021-03

Designing and implementing a feature store for machine learning

Lead Big Data Engineer Hadoop Spark Hive ...
Lead Big Data Engineer

  • Designing and implementing a feature store for machine learning. Preparing a framework for efficient calculation of thousands different aggregate values (features) from terabytes of data. 
  • Dockerizing Spark applications to run them as containers on EMR cluster in more isolated and standardized way. 
  • Implementing application for serving machine learning model as REST API on Kubernetes cluster using Flask and Gunicorn. Significantly improved response time of the API by using approximate nearest neighbor search algorithm. 
  • Implementing lambda function for transforming new objects created in S3 bucket and storing records in DynamoDB table. 
  • Implementing REST API application using Akka HTTP that serves recommendations stored in DynamoDB table. 
  • Optimizing existing Spark applications. 
  • Working with data scientists on optimizing their solutions and making them production ready

Hadoop Spark Hive Delta Lake Docker Kubernetes EMR S3 CloudFormation Lambda DynamoDB Akka HTTP Flask Gunicorn Scala Python
Adidas
Remote
2 Jahre 4 Monate
2018-01 - 2020-04

Core Banking Platform

Big Data Engineer/Contractor Hadoop Spark Kafka ...
Big Data Engineer/Contractor

  • Implementing report generators for Core Banking Platform using Spark.
  • Implementing random data generators for the purpose of verifying the performance of Spark applications. Analyzing outputs of performance tests and making necessary improvements.
  •  Implementing Spark jobs for file compaction and repartitioning to improve performance of report generators and Hive queries.
  • Working on migration from Cloudera to MapR distribution for Hadoop.
  • Using Flume for reading messages from Kafka, transforming them and persisting into HDFS and HBase.
  • Using Sqoop to ingest data from Oracle database into HDFS.
  • Implementing ETL jobs for transforming files in various formats into Avro format.
  • Automated deployment of applications using Ansible which greatly reduced the number of issues during production deployments.

Hadoop Spark Kafka Hive Flume HBase Oozie Sqoop Splunk Avro Scala Python
Nordea
Warsaw, Remote
2 Monate
2018-01 - 2018-02

Using PySpark and GraphFrames to run graph algorithms

Big Data Engineer/Contractor Spark Athena S3 ...
Big Data Engineer/Contractor
  • Using PySpark and GraphFrames to run graph algorithms. Comparing the performance with Neo4j. 
  • Using PySpark to transform data stored in S3 and generate CSV files in order to import them into Neo4j.
Spark Athena S3 EMR Neo4j. Programming language: Python
Agata Tudek LDI
Remote
5 Monate
2017-07 - 2017-11

Risk Data Processing

Scala Software Engineer / Contractor Hadoop Spark Streaming Kafka ...
Scala Software Engineer / Contractor

  • Developing application for collecting risk data from various sources and processing it in real-time.

Hadoop Spark Streaming Kafka Avro Camel Scala
Citi
Warsaw
2 Jahre 2 Monate
2015-07 - 2017-08

Business Graph

Java, Scala Software Engineer/Contractor Neo4j Cassandra Spark Streaming ...
Java, Scala Software Engineer/Contractor

Developing system that uses data from public sources to conclude "who-knows-who" relationships and help companies to identify valuable relations within their existing customers.

  • Designing and implementing algorithm for concluding "knows" relationships between persons using Spark.
  • Designing and implementing algorithm for finding ultimate beneficial owner of company using Spark GraphX.
  • Creating Neo4j Server plugin for finding shortest paths between nodes in the graph using defined business rules.
  • Implementing REST services that perform Cypher queries in order to retrieve data from nodes and relationships.
  • Implementing fast data import to Neo4j database by writing directly to the files using batch inserter API.
  • Implementing transformations of data stored in Cassandra using Spark into the format that can be easily used to import data into Neo4j database.
  • Designing and implementing synchronization between Cassandra and Neo4j using event driven architecture.
  • Implementing searching nodes in the graph using Cypher queries and Lucene index.
  • Data modelling.
  • Configuring and tuning Neo4j database.

Neo4j Cassandra Spark Streaming Spark GraphX Spring Spray ActiveMQ Docker Redis Solr Scala Java
Kantwert GmbH
Poznan, Remote
1 Jahr
2014-07 - 2015-06

Developing system for optimal planning and precise balancing

Java Developer Spring Hibernate ActiveMQ ...
Java Developer
  • Developing PSIcarlos system for optimal planning and precise balancing of crude transportation
  • Designing and implementing new system functions based on defined requirements. 
  • Preparing technical documentation. 
  • Close cooperation within international team. Discussing customer?s requirements. 
  • Technical support for system users. 
Spring Hibernate ActiveMQ Oracle Apache Tomcat
PSI Polska
Pozna?
4 Monate
2014-03 - 2014-06

Developing V-Desk workflow system for document circulation

Java, C# Developer WPF WinForms MS SQL
Java, C# Developer
  • Developing (mainly optimizing) service for automatic text recognition from scanned documents and retrieving key information from documents using regular expressions. 
  • Developing application for documents scanning and barcodes recognition. 
WPF WinForms MS SQL
PrimeSoft Polska
Pozna?
1 Jahr
2013-03 - 2014-02

Developing V-Tell Call Center system.

C# Developer WCF WPF Mono ...
C# Developer
  • Designing scalable system architecture. 
  • Developing multithreaded WCF services. 
  • Implementing calling in different modes by sending requests and handling events sent using the AMI protocol from the Asterisk PBX. 
  • Implementing automatic calling by using integration of the Asterisk PBX with the PostgreSQL database to create dynamic call queues. 
  • Developing the predictive dialer algorithm that calculates the number of calls to be made based on collected statistics, e.g. percentage of received calls and talk time. 
  • Preparing the mechanism of sending, mixing, compressing and saving recorded calls to the database. 
WCF WPF Mono PostgreSQL MongoDB Asterisk
V-TELL
Pozna?
1 Jahr 9 Monate
2011-07 - 2013-03

Developing Verax Network Management System.

Java Developer Spring Hibernate Adobe Flex ...
Java Developer

  • Implementing advanced plugins for detecting problems and real-time monitoring of devices and applications such as: 
    • PostgreSQL and MySQL database 
    • Active Directory service 
    • VMware ESX servers and virtual machines 
    • .NET applications 
    • Windows and Unix workstations 
    • Cisco, MRV and Juniper routers and switches 
    • APC UPS devices 
    • Devices with undetected type 
  • Creating module for monitoring changes in software installed on detected devices. 

Spring Hibernate Adobe Flex Oracle MS SQL
Verax Systems
Pozna?

Aus- und Weiterbildung

Aus- und Weiterbildung

1 Jahr 9 Monate
2013-10 - 2015-06

Computer Science - Master Studies

Faculty of Computing, Pozna? University of Technology
Faculty of Computing, Pozna? University of Technology
  • Part-time
  • Master thesis: on request
3 Jahre 6 Monate
2009-10 - 2013-03

Computer Science - Bachelor studies

Faculty of Computing, Pozna? University of Technology
Faculty of Computing, Pozna? University of Technology

Position

Position

  • Big Data Engineering
  • Software development

Kompetenzen

Kompetenzen

Top-Skills

AWS Apache Spark Big Data Scala Python Java Hadoop Apache Airflow Apache Kafka Kubernetes Apache Cassandra Neo4j

Produkte / Standards / Erfahrungen / Methoden

Version control systems

  • Git
  • Mercurial
  • SVN
  • CVS


Continuous integration systems 

  • Jenkins
  • Hudson
  • TeamCity


Others 

  • Cassandra
  • Spark
  • Neo4j
  • Kafka
  • Hadoop
  • Hive
  • Flume
  • HBase
  • AWS
  • Docker
  • Kubernetes
  • Airflow
  • Ansible
  • Gradle
  • Maven
  • Ant
  • SBT
  • UML


Hobby and interests 

  • Big data technologies 
  • Machine learning

Betriebssysteme

Windows
Linux

Programmiersprachen

Functional Programming Principles in Scala
Functional Program Design in Scala
Parallel programming
Scala
Java
Python
Groovy
Cypher
CQL
SQL
PL/SQL
Bash

Datenbanken

AWS Certified Big Data - Specialty
MapR Certified Spark Developer v2
Neo4j Certified Professional
Google Cloud Certified Professional Data Engineer

Einsatzorte

Einsatzorte

Deutschland, Schweiz, Österreich

Currently I'm only open to remote projects.

möglich

Projekte

Projekte

2 Jahre 2 Monate
2022-11 - heute

Working on Credit and Liquidity Risk Stress Testing system

Big Data Engineer / Contractor Scala Python Java
Big Data Engineer / Contractor







Spark Hadoop S3 Dremio
Scala Python Java
Nomura
Remote
2 Jahre 7 Monate
2022-06 - heute

Working on migration from on-prem Hadoop cluster to AWS

Big Data Engineer Hadoop Spark EMR ...
Big Data Engineer
  • Converting multiple MapReduce jobs to Spark jobs. 
  • Updating Spark jobs to use DataFrames and Datasets instead of RDDs. 
  • Optimizing existing Spark jobs. 
  • Creating Airflow DAGs to schedule data processing.
Hadoop Spark EMR Airflow S3 Docker Terraform Scala Java Python
Groupon
Remote
1 Jahr 1 Monat
2021-06 - 2022-06

Preparing a framework

Lead Big Data Engineer/Contractor Hadoop Spark Hive ...
Lead Big Data Engineer/Contractor
  • Designing and implementing a feature store for machine learning. Preparing a framework for efficient calculation of thousands different aggregate values (features) from terabytes of data. 
  • Dockerizing Spark applications to run them as containers on EMR cluster in more isolated and standardized way. 
  • Implementing application for serving machine learning model as REST API on Kubernetes cluster using Flask and Gunicorn. Significantly improved response time of the API by using approximate nearest neighbor search algorithm. 
  • Implementing lambda function for transforming new objects created in S3 bucket and storing records in DynamoDB table. 
  • Implementing REST API application using Akka HTTP that serves recommendations stored in DynamoDB table. 
  • Optimizing existing Spark applications. 
  • Working with data scientists on optimizing their solutions and making them production ready
Hadoop Spark Hive Delta Lake Docker Kubernetes EMR S3 CloudFormation Lambda DynamoDB Akka HTTP Flask Gunicorn. Programming languages: Scala Python
Adidas
Remote
4 Monate
2021-03 - 2021-06

Implementing multiple PySpark jobs running on Kubernetes

Big Data Engineer/Contractor Hadoop Spark Hive ...
Big Data Engineer/Contractor
  • Implementing multiple PySpark jobs running on Kubernetes cluster for transforming the data from MS SQL Server and storing it into S3. 
  • Implementing Airflow pipelines for scheduling PySpark jobs and defining dependencies between them. 
Hadoop Spark Hive Docker Kubernetes Airflow AWS MS SQL Server Terraform
Nike
Remote
11 Monate
2020-05 - 2021-03

Designing and implementing a feature store for machine learning

Lead Big Data Engineer Hadoop Spark Hive ...
Lead Big Data Engineer

  • Designing and implementing a feature store for machine learning. Preparing a framework for efficient calculation of thousands different aggregate values (features) from terabytes of data. 
  • Dockerizing Spark applications to run them as containers on EMR cluster in more isolated and standardized way. 
  • Implementing application for serving machine learning model as REST API on Kubernetes cluster using Flask and Gunicorn. Significantly improved response time of the API by using approximate nearest neighbor search algorithm. 
  • Implementing lambda function for transforming new objects created in S3 bucket and storing records in DynamoDB table. 
  • Implementing REST API application using Akka HTTP that serves recommendations stored in DynamoDB table. 
  • Optimizing existing Spark applications. 
  • Working with data scientists on optimizing their solutions and making them production ready

Hadoop Spark Hive Delta Lake Docker Kubernetes EMR S3 CloudFormation Lambda DynamoDB Akka HTTP Flask Gunicorn Scala Python
Adidas
Remote
2 Jahre 4 Monate
2018-01 - 2020-04

Core Banking Platform

Big Data Engineer/Contractor Hadoop Spark Kafka ...
Big Data Engineer/Contractor

  • Implementing report generators for Core Banking Platform using Spark.
  • Implementing random data generators for the purpose of verifying the performance of Spark applications. Analyzing outputs of performance tests and making necessary improvements.
  •  Implementing Spark jobs for file compaction and repartitioning to improve performance of report generators and Hive queries.
  • Working on migration from Cloudera to MapR distribution for Hadoop.
  • Using Flume for reading messages from Kafka, transforming them and persisting into HDFS and HBase.
  • Using Sqoop to ingest data from Oracle database into HDFS.
  • Implementing ETL jobs for transforming files in various formats into Avro format.
  • Automated deployment of applications using Ansible which greatly reduced the number of issues during production deployments.

Hadoop Spark Kafka Hive Flume HBase Oozie Sqoop Splunk Avro Scala Python
Nordea
Warsaw, Remote
2 Monate
2018-01 - 2018-02

Using PySpark and GraphFrames to run graph algorithms

Big Data Engineer/Contractor Spark Athena S3 ...
Big Data Engineer/Contractor
  • Using PySpark and GraphFrames to run graph algorithms. Comparing the performance with Neo4j. 
  • Using PySpark to transform data stored in S3 and generate CSV files in order to import them into Neo4j.
Spark Athena S3 EMR Neo4j. Programming language: Python
Agata Tudek LDI
Remote
5 Monate
2017-07 - 2017-11

Risk Data Processing

Scala Software Engineer / Contractor Hadoop Spark Streaming Kafka ...
Scala Software Engineer / Contractor

  • Developing application for collecting risk data from various sources and processing it in real-time.

Hadoop Spark Streaming Kafka Avro Camel Scala
Citi
Warsaw
2 Jahre 2 Monate
2015-07 - 2017-08

Business Graph

Java, Scala Software Engineer/Contractor Neo4j Cassandra Spark Streaming ...
Java, Scala Software Engineer/Contractor

Developing system that uses data from public sources to conclude "who-knows-who" relationships and help companies to identify valuable relations within their existing customers.

  • Designing and implementing algorithm for concluding "knows" relationships between persons using Spark.
  • Designing and implementing algorithm for finding ultimate beneficial owner of company using Spark GraphX.
  • Creating Neo4j Server plugin for finding shortest paths between nodes in the graph using defined business rules.
  • Implementing REST services that perform Cypher queries in order to retrieve data from nodes and relationships.
  • Implementing fast data import to Neo4j database by writing directly to the files using batch inserter API.
  • Implementing transformations of data stored in Cassandra using Spark into the format that can be easily used to import data into Neo4j database.
  • Designing and implementing synchronization between Cassandra and Neo4j using event driven architecture.
  • Implementing searching nodes in the graph using Cypher queries and Lucene index.
  • Data modelling.
  • Configuring and tuning Neo4j database.

Neo4j Cassandra Spark Streaming Spark GraphX Spring Spray ActiveMQ Docker Redis Solr Scala Java
Kantwert GmbH
Poznan, Remote
1 Jahr
2014-07 - 2015-06

Developing system for optimal planning and precise balancing

Java Developer Spring Hibernate ActiveMQ ...
Java Developer
  • Developing PSIcarlos system for optimal planning and precise balancing of crude transportation
  • Designing and implementing new system functions based on defined requirements. 
  • Preparing technical documentation. 
  • Close cooperation within international team. Discussing customer?s requirements. 
  • Technical support for system users. 
Spring Hibernate ActiveMQ Oracle Apache Tomcat
PSI Polska
Pozna?
4 Monate
2014-03 - 2014-06

Developing V-Desk workflow system for document circulation

Java, C# Developer WPF WinForms MS SQL
Java, C# Developer
  • Developing (mainly optimizing) service for automatic text recognition from scanned documents and retrieving key information from documents using regular expressions. 
  • Developing application for documents scanning and barcodes recognition. 
WPF WinForms MS SQL
PrimeSoft Polska
Pozna?
1 Jahr
2013-03 - 2014-02

Developing V-Tell Call Center system.

C# Developer WCF WPF Mono ...
C# Developer
  • Designing scalable system architecture. 
  • Developing multithreaded WCF services. 
  • Implementing calling in different modes by sending requests and handling events sent using the AMI protocol from the Asterisk PBX. 
  • Implementing automatic calling by using integration of the Asterisk PBX with the PostgreSQL database to create dynamic call queues. 
  • Developing the predictive dialer algorithm that calculates the number of calls to be made based on collected statistics, e.g. percentage of received calls and talk time. 
  • Preparing the mechanism of sending, mixing, compressing and saving recorded calls to the database. 
WCF WPF Mono PostgreSQL MongoDB Asterisk
V-TELL
Pozna?
1 Jahr 9 Monate
2011-07 - 2013-03

Developing Verax Network Management System.

Java Developer Spring Hibernate Adobe Flex ...
Java Developer

  • Implementing advanced plugins for detecting problems and real-time monitoring of devices and applications such as: 
    • PostgreSQL and MySQL database 
    • Active Directory service 
    • VMware ESX servers and virtual machines 
    • .NET applications 
    • Windows and Unix workstations 
    • Cisco, MRV and Juniper routers and switches 
    • APC UPS devices 
    • Devices with undetected type 
  • Creating module for monitoring changes in software installed on detected devices. 

Spring Hibernate Adobe Flex Oracle MS SQL
Verax Systems
Pozna?

Aus- und Weiterbildung

Aus- und Weiterbildung

1 Jahr 9 Monate
2013-10 - 2015-06

Computer Science - Master Studies

Faculty of Computing, Pozna? University of Technology
Faculty of Computing, Pozna? University of Technology
  • Part-time
  • Master thesis: on request
3 Jahre 6 Monate
2009-10 - 2013-03

Computer Science - Bachelor studies

Faculty of Computing, Pozna? University of Technology
Faculty of Computing, Pozna? University of Technology

Position

Position

  • Big Data Engineering
  • Software development

Kompetenzen

Kompetenzen

Top-Skills

AWS Apache Spark Big Data Scala Python Java Hadoop Apache Airflow Apache Kafka Kubernetes Apache Cassandra Neo4j

Produkte / Standards / Erfahrungen / Methoden

Version control systems

  • Git
  • Mercurial
  • SVN
  • CVS


Continuous integration systems 

  • Jenkins
  • Hudson
  • TeamCity


Others 

  • Cassandra
  • Spark
  • Neo4j
  • Kafka
  • Hadoop
  • Hive
  • Flume
  • HBase
  • AWS
  • Docker
  • Kubernetes
  • Airflow
  • Ansible
  • Gradle
  • Maven
  • Ant
  • SBT
  • UML


Hobby and interests 

  • Big data technologies 
  • Machine learning

Betriebssysteme

Windows
Linux

Programmiersprachen

Functional Programming Principles in Scala
Functional Program Design in Scala
Parallel programming
Scala
Java
Python
Groovy
Cypher
CQL
SQL
PL/SQL
Bash

Datenbanken

AWS Certified Big Data - Specialty
MapR Certified Spark Developer v2
Neo4j Certified Professional
Google Cloud Certified Professional Data Engineer

Vertrauen Sie auf Randstad

Im Bereich Freelancing
Im Bereich Arbeitnehmerüberlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.