Freelancer: Big Data Engineering, AWS, Spark, Scala, Python, Hadoop, Airflow, Kubernetes

Freiberufler / Selbstst�ndiger

Remote-Arbeit

Verf�gbar ab: 04.11.2024

Verf�gbar zu: 100%

davon vor Ort: 0%

Top-Skills

AWS

Apache Spark

Big Data

Scala

Python

Java

Hadoop

Apache Airflow

Apache Kafka

Kubernetes

Apache Cassandra

Neo4j

Sprachen

English

German

Einsatzorte

L�nder

Deutschland, Schweiz, �sterreich

Currently I'm only open to remote projects.

Remote-Arbeit

m�glich

Projekte

2 Jahre 2 Monate

2022-11 - heute

Working on Credit and Liquidity Risk Stress Testing system

Big Data Engineer / Contractor Scala Python Java

Rolle

Big Data Engineer / Contractor

Projektinhalte

Produkte

Spark Hadoop S3 Dremio

Kenntnisse

Scala Python Java

Kunde

Nomura

Einsatzort

Remote

2 Jahre 7 Monate

2022-06 - heute

Working on migration from on-prem Hadoop cluster to AWS

Big Data Engineer Hadoop Spark EMR ...

Rolle

Big Data Engineer

Projektinhalte

Converting multiple MapReduce jobs to Spark jobs.�
Updating Spark jobs to use DataFrames and Datasets instead of RDDs.�
Optimizing existing Spark jobs.�
Creating Airflow DAGs to schedule data processing.

Kenntnisse

Hadoop Spark EMR Airflow S3 Docker Terraform Scala Java Python

Kunde

Groupon

Einsatzort

Remote

1 Jahr 1 Monat

2021-06 - 2022-06

Preparing a framework

Lead Big Data Engineer/Contractor Hadoop Spark Hive ...

Rolle

Lead Big Data Engineer/Contractor

Projektinhalte

Designing and implementing a feature store for machine learning. Preparing a framework for efficient calculation of thousands different aggregate values (features) from terabytes of data.�
Dockerizing Spark applications to run them as containers on EMR cluster in more isolated and standardized way.�
Implementing application for serving machine learning model as REST API on Kubernetes cluster using Flask and Gunicorn. Significantly improved response time of the API by using approximate nearest neighbor search algorithm.�
Implementing lambda function for transforming new objects created in S3 bucket and storing records in DynamoDB table.�
Implementing REST API application using Akka HTTP that serves recommendations stored in DynamoDB table.�
Optimizing existing Spark applications.�
Working with data scientists on optimizing their solutions and making them production ready

Kenntnisse

Hadoop Spark Hive Delta Lake Docker Kubernetes EMR S3 CloudFormation Lambda DynamoDB Akka HTTP Flask Gunicorn. Programming languages: Scala Python

Kunde

Adidas

Einsatzort

Remote

4 Monate

2021-03 - 2021-06

Implementing multiple PySpark jobs running on Kubernetes

Big Data Engineer/Contractor Hadoop Spark Hive ...

Rolle

Big Data Engineer/Contractor

Projektinhalte

Implementing multiple PySpark jobs running on Kubernetes cluster for transforming the data from MS SQL Server and storing it into S3.�
Implementing Airflow pipelines for scheduling PySpark jobs and defining dependencies between them.�

Kenntnisse

Hadoop Spark Hive Docker Kubernetes Airflow AWS MS SQL Server Terraform

Kunde

Nike

Einsatzort

Remote

11 Monate

2020-05 - 2021-03

Designing and implementing a feature store for machine learning

Lead Big Data Engineer Hadoop Spark Hive ...

Rolle

Lead Big Data Engineer

Projektinhalte

Designing and implementing a feature store for machine learning. Preparing a framework for efficient calculation of thousands different aggregate values (features) from terabytes of data.�
Dockerizing Spark applications to run them as containers on EMR cluster in more isolated and standardized way.�
Implementing application for serving machine learning model as REST API on Kubernetes cluster using Flask and Gunicorn. Significantly improved response time of the API by using approximate nearest neighbor search algorithm.�
Implementing lambda function for transforming new objects created in S3 bucket and storing records in DynamoDB table.�
Implementing REST API application using Akka HTTP that serves recommendations stored in DynamoDB table.�
Optimizing existing Spark applications.�
Working with data scientists on optimizing their solutions and making them production ready

Kenntnisse

Hadoop Spark Hive Delta Lake Docker Kubernetes EMR S3 CloudFormation Lambda DynamoDB Akka HTTP Flask Gunicorn Scala Python

Kunde

Adidas

Einsatzort

Remote

2 Jahre 4 Monate

2018-01 - 2020-04

Core Banking Platform

Big Data Engineer/Contractor Hadoop Spark Kafka ...

Rolle

Big Data Engineer/Contractor

Projektinhalte

Implementing report generators for Core Banking Platform using Spark.
Implementing random data generators for the purpose of verifying the performance of Spark applications. Analyzing outputs of performance tests and making necessary improvements.
�Implementing Spark jobs for file compaction and repartitioning to improve performance of report generators and Hive queries.
Working on migration from Cloudera to MapR distribution for Hadoop.
Using Flume for reading messages from Kafka, transforming them and persisting into HDFS and HBase.
Using Sqoop to ingest data from Oracle database into HDFS.
Implementing ETL jobs for transforming files in various formats into Avro format.
Automated deployment of applications using Ansible which greatly reduced the number of issues during production deployments.

Kenntnisse

Hadoop Spark Kafka Hive Flume HBase Oozie Sqoop Splunk Avro Scala Python

Kunde

Nordea

Einsatzort

Warsaw, Remote

2 Monate

2018-01 - 2018-02

Using PySpark and GraphFrames to run graph algorithms

Big Data Engineer/Contractor Spark Athena S3 ...

Rolle

Big Data Engineer/Contractor

Projektinhalte

Using PySpark and GraphFrames to run graph algorithms. Comparing the performance with Neo4j.�
Using PySpark to transform data stored in S3 and generate CSV files in order to import them into Neo4j.

Kenntnisse

Spark Athena S3 EMR Neo4j. Programming language: Python

Kunde

Agata Tudek LDI

Einsatzort

Remote

5 Monate

2017-07 - 2017-11

Risk Data Processing

Scala Software Engineer / Contractor Hadoop Spark Streaming Kafka ...

Rolle

Scala Software Engineer / Contractor

Projektinhalte

Developing application for collecting risk data from various sources and processing it in real-time.

Kenntnisse

Hadoop Spark Streaming Kafka Avro Camel Scala

Kunde

Citi

Einsatzort

Warsaw

2 Jahre 2 Monate

2015-07 - 2017-08

Business Graph

Java, Scala Software Engineer/Contractor Neo4j Cassandra Spark Streaming ...

Rolle

Java, Scala Software Engineer/Contractor

Projektinhalte

Developing system that uses data from public sources to conclude "who-knows-who" relationships and help companies to identify valuable relations within their existing customers.

Designing and implementing algorithm for concluding "knows" relationships between persons using Spark.
Designing and implementing algorithm for finding ultimate beneficial owner of company using Spark GraphX.
Creating Neo4j Server plugin for finding shortest paths between nodes in the graph using defined business rules.
Implementing REST services that perform Cypher queries in order to retrieve data from nodes and relationships.
Implementing fast data import to Neo4j database by writing directly to the files using batch inserter API.
Implementing transformations of data stored in Cassandra using Spark into the format that can be easily used to import data into Neo4j database.
Designing and implementing synchronization between Cassandra and Neo4j using event driven architecture.
Implementing searching nodes in the graph using Cypher queries and Lucene index.
Data modelling.
Configuring and tuning Neo4j database.

Kenntnisse

Neo4j Cassandra Spark Streaming Spark GraphX Spring Spray ActiveMQ Docker Redis Solr Scala Java

Kunde

Kantwert GmbH

Einsatzort

Poznan, Remote

1 Jahr

2014-07 - 2015-06

Developing system for optimal planning and precise balancing

Java Developer Spring Hibernate ActiveMQ ...

Rolle

Java Developer

Projektinhalte

Developing PSIcarlos system for optimal planning and precise balancing of crude transportation
Designing and implementing new system functions based on defined requirements.�
Preparing technical documentation.�
Close cooperation within international team. Discussing customer?s requirements.�
Technical support for system users.�

Kenntnisse

Spring Hibernate ActiveMQ Oracle Apache Tomcat

Kunde

PSI Polska

Einsatzort

Pozna?

4 Monate

2014-03 - 2014-06

Developing V-Desk workflow system for document circulation

Java, C# Developer WPF WinForms MS SQL

Rolle

Java, C# Developer

Projektinhalte

Developing (mainly optimizing) service for automatic text recognition from scanned documents and retrieving key information from documents using regular expressions.�
Developing application for documents scanning and barcodes recognition.�

Kenntnisse

WPF WinForms MS SQL

Kunde

PrimeSoft Polska

Einsatzort

Pozna?

1 Jahr

2013-03 - 2014-02

Developing V-Tell Call Center system.

C# Developer WCF WPF Mono ...

Rolle

C# Developer

Projektinhalte

Designing scalable system architecture.�
Developing multithreaded WCF services.�
Implementing calling in different modes by sending requests and handling events sent using the AMI protocol from the Asterisk PBX.�
Implementing automatic calling by using integration of the Asterisk PBX with the PostgreSQL database to create dynamic call queues.�
Developing the predictive dialer algorithm that calculates the number of calls to be made based on collected statistics, e.g. percentage of received calls and talk time.�
Preparing the mechanism of sending, mixing, compressing and saving recorded calls to the database.�

Kenntnisse

WCF WPF Mono PostgreSQL MongoDB Asterisk

Kunde

V-TELL

Einsatzort

Pozna?

1 Jahr 9 Monate

2011-07 - 2013-03

Developing Verax Network Management System.

Java Developer Spring Hibernate Adobe Flex ...

Rolle

Java Developer

Projektinhalte

Implementing advanced plugins for detecting problems and real-time monitoring of devices and applications such as:�
- PostgreSQL and MySQL database�
- Active Directory service�
- VMware ESX servers and virtual machines�
- .NET applications�
- Windows and Unix workstations�
- Cisco, MRV and Juniper routers and switches�
- APC UPS devices�
- Devices with undetected type�
Creating module for monitoring changes in software installed on detected devices.�

Kenntnisse

Spring Hibernate Adobe Flex Oracle MS SQL

Kunde

Verax Systems

Einsatzort

Pozna?

Aus- und Weiterbildung

1 Jahr 9 Monate

2013-10 - 2015-06

Computer Science - Master Studies

Faculty of Computing, Pozna? University of Technology

Institution, Ort

Faculty of Computing, Pozna? University of Technology

Schwerpunkt

Part-time
Master thesis: on request

3 Jahre 6 Monate

2009-10 - 2013-03

Computer Science - Bachelor studies

Faculty of Computing, Pozna? University of Technology

Institution, Ort

Faculty of Computing, Pozna? University of Technology

Position

Big Data Engineering
Software development

Kompetenzen

Top-Skills

AWS Apache Spark Big Data Scala Python Java Hadoop Apache Airflow Apache Kafka Kubernetes Apache Cassandra Neo4j

Produkte / Standards / Erfahrungen / Methoden

Version control systems

Git
Mercurial
SVN
CVS

Continuous integration systems�

Jenkins
Hudson
TeamCity

Others�

Cassandra
Spark
Neo4j
Kafka
Hadoop
Hive
Flume
HBase
AWS
Docker
Kubernetes
Airflow
Ansible
Gradle
Maven
Ant
SBT
UML

Hobby and interests�

Big data technologies�
Machine learning

Betriebssysteme

Windows

Linux

Programmiersprachen

Functional Programming Principles in Scala

Functional Program Design in Scala

Parallel programming

Scala

Java

Python

Groovy

Cypher

CQL

SQL

PL/SQL

Bash

Datenbanken

AWS Certified Big Data - Specialty

MapR Certified Spark Developer v2

Neo4j Certified Professional

Google Cloud Certified Professional Data Engineer

Einsatzorte

L�nder

Deutschland, Schweiz, �sterreich

Currently I'm only open to remote projects.

Remote-Arbeit

m�glich

Projekte

2 Jahre 2 Monate

2022-11 - heute

Working on Credit and Liquidity Risk Stress Testing system

Big Data Engineer / Contractor Scala Python Java

Rolle

Big Data Engineer / Contractor

Projektinhalte

Produkte

Spark Hadoop S3 Dremio

Kenntnisse

Scala Python Java

Kunde

Nomura

Einsatzort

Remote

2 Jahre 7 Monate

2022-06 - heute

Working on migration from on-prem Hadoop cluster to AWS

Big Data Engineer Hadoop Spark EMR ...

Rolle

Big Data Engineer

Projektinhalte

Converting multiple MapReduce jobs to Spark jobs.�
Updating Spark jobs to use DataFrames and Datasets instead of RDDs.�
Optimizing existing Spark jobs.�
Creating Airflow DAGs to schedule data processing.

Kenntnisse

Hadoop Spark EMR Airflow S3 Docker Terraform Scala Java Python

Kunde

Groupon

Einsatzort

Remote

1 Jahr 1 Monat

2021-06 - 2022-06

Preparing a framework

Lead Big Data Engineer/Contractor Hadoop Spark Hive ...

Rolle

Lead Big Data Engineer/Contractor

Projektinhalte

Designing and implementing a feature store for machine learning. Preparing a framework for efficient calculation of thousands different aggregate values (features) from terabytes of data.�
Dockerizing Spark applications to run them as containers on EMR cluster in more isolated and standardized way.�
Implementing application for serving machine learning model as REST API on Kubernetes cluster using Flask and Gunicorn. Significantly improved response time of the API by using approximate nearest neighbor search algorithm.�
Implementing lambda function for transforming new objects created in S3 bucket and storing records in DynamoDB table.�
Implementing REST API application using Akka HTTP that serves recommendations stored in DynamoDB table.�
Optimizing existing Spark applications.�
Working with data scientists on optimizing their solutions and making them production ready

Kenntnisse

Hadoop Spark Hive Delta Lake Docker Kubernetes EMR S3 CloudFormation Lambda DynamoDB Akka HTTP Flask Gunicorn. Programming languages: Scala Python

Kunde

Adidas

Einsatzort

Remote

4 Monate

2021-03 - 2021-06

Implementing multiple PySpark jobs running on Kubernetes

Big Data Engineer/Contractor Hadoop Spark Hive ...

Rolle

Big Data Engineer/Contractor

Projektinhalte

Implementing multiple PySpark jobs running on Kubernetes cluster for transforming the data from MS SQL Server and storing it into S3.�
Implementing Airflow pipelines for scheduling PySpark jobs and defining dependencies between them.�

Kenntnisse

Hadoop Spark Hive Docker Kubernetes Airflow AWS MS SQL Server Terraform

Kunde

Nike

Einsatzort

Remote

11 Monate

2020-05 - 2021-03

Designing and implementing a feature store for machine learning

Lead Big Data Engineer Hadoop Spark Hive ...

Rolle

Lead Big Data Engineer

Projektinhalte

Designing and implementing a feature store for machine learning. Preparing a framework for efficient calculation of thousands different aggregate values (features) from terabytes of data.�
Dockerizing Spark applications to run them as containers on EMR cluster in more isolated and standardized way.�
Implementing application for serving machine learning model as REST API on Kubernetes cluster using Flask and Gunicorn. Significantly improved response time of the API by using approximate nearest neighbor search algorithm.�
Implementing lambda function for transforming new objects created in S3 bucket and storing records in DynamoDB table.�
Implementing REST API application using Akka HTTP that serves recommendations stored in DynamoDB table.�
Optimizing existing Spark applications.�
Working with data scientists on optimizing their solutions and making them production ready

Kenntnisse

Hadoop Spark Hive Delta Lake Docker Kubernetes EMR S3 CloudFormation Lambda DynamoDB Akka HTTP Flask Gunicorn Scala Python

Kunde

Adidas

Einsatzort

Remote

2 Jahre 4 Monate

2018-01 - 2020-04

Core Banking Platform

Big Data Engineer/Contractor Hadoop Spark Kafka ...

Rolle

Big Data Engineer/Contractor

Projektinhalte

Implementing report generators for Core Banking Platform using Spark.
Implementing random data generators for the purpose of verifying the performance of Spark applications. Analyzing outputs of performance tests and making necessary improvements.
�Implementing Spark jobs for file compaction and repartitioning to improve performance of report generators and Hive queries.
Working on migration from Cloudera to MapR distribution for Hadoop.
Using Flume for reading messages from Kafka, transforming them and persisting into HDFS and HBase.
Using Sqoop to ingest data from Oracle database into HDFS.
Implementing ETL jobs for transforming files in various formats into Avro format.
Automated deployment of applications using Ansible which greatly reduced the number of issues during production deployments.

Kenntnisse

Hadoop Spark Kafka Hive Flume HBase Oozie Sqoop Splunk Avro Scala Python

Kunde

Nordea

Einsatzort

Warsaw, Remote

2 Monate

2018-01 - 2018-02

Using PySpark and GraphFrames to run graph algorithms

Big Data Engineer/Contractor Spark Athena S3 ...

Rolle

Big Data Engineer/Contractor

Projektinhalte

Using PySpark and GraphFrames to run graph algorithms. Comparing the performance with Neo4j.�
Using PySpark to transform data stored in S3 and generate CSV files in order to import them into Neo4j.

Kenntnisse

Spark Athena S3 EMR Neo4j. Programming language: Python

Kunde

Agata Tudek LDI

Einsatzort

Remote

5 Monate

2017-07 - 2017-11

Risk Data Processing

Scala Software Engineer / Contractor Hadoop Spark Streaming Kafka ...

Rolle

Scala Software Engineer / Contractor

Projektinhalte

Developing application for collecting risk data from various sources and processing it in real-time.

Kenntnisse

Hadoop Spark Streaming Kafka Avro Camel Scala

Kunde

Citi

Einsatzort

Warsaw

2 Jahre 2 Monate

2015-07 - 2017-08

Business Graph

Java, Scala Software Engineer/Contractor Neo4j Cassandra Spark Streaming ...

Rolle

Java, Scala Software Engineer/Contractor

Projektinhalte

Developing system that uses data from public sources to conclude "who-knows-who" relationships and help companies to identify valuable relations within their existing customers.

Designing and implementing algorithm for concluding "knows" relationships between persons using Spark.
Designing and implementing algorithm for finding ultimate beneficial owner of company using Spark GraphX.
Creating Neo4j Server plugin for finding shortest paths between nodes in the graph using defined business rules.
Implementing REST services that perform Cypher queries in order to retrieve data from nodes and relationships.
Implementing fast data import to Neo4j database by writing directly to the files using batch inserter API.
Implementing transformations of data stored in Cassandra using Spark into the format that can be easily used to import data into Neo4j database.
Designing and implementing synchronization between Cassandra and Neo4j using event driven architecture.
Implementing searching nodes in the graph using Cypher queries and Lucene index.
Data modelling.
Configuring and tuning Neo4j database.

Kenntnisse

Neo4j Cassandra Spark Streaming Spark GraphX Spring Spray ActiveMQ Docker Redis Solr Scala Java

Kunde

Kantwert GmbH

Einsatzort

Poznan, Remote

1 Jahr

2014-07 - 2015-06

Developing system for optimal planning and precise balancing

Java Developer Spring Hibernate ActiveMQ ...

Rolle

Java Developer

Projektinhalte

Developing PSIcarlos system for optimal planning and precise balancing of crude transportation
Designing and implementing new system functions based on defined requirements.�
Preparing technical documentation.�
Close cooperation within international team. Discussing customer?s requirements.�
Technical support for system users.�

Kenntnisse

Spring Hibernate ActiveMQ Oracle Apache Tomcat

Kunde

PSI Polska

Einsatzort

Pozna?

4 Monate

2014-03 - 2014-06

Developing V-Desk workflow system for document circulation

Java, C# Developer WPF WinForms MS SQL

Rolle

Java, C# Developer

Projektinhalte

Developing (mainly optimizing) service for automatic text recognition from scanned documents and retrieving key information from documents using regular expressions.�
Developing application for documents scanning and barcodes recognition.�

Kenntnisse

WPF WinForms MS SQL

Kunde

PrimeSoft Polska

Einsatzort

Pozna?

1 Jahr

2013-03 - 2014-02

Developing V-Tell Call Center system.

C# Developer WCF WPF Mono ...

Rolle

C# Developer

Projektinhalte

Designing scalable system architecture.�
Developing multithreaded WCF services.�
Implementing calling in different modes by sending requests and handling events sent using the AMI protocol from the Asterisk PBX.�
Implementing automatic calling by using integration of the Asterisk PBX with the PostgreSQL database to create dynamic call queues.�
Developing the predictive dialer algorithm that calculates the number of calls to be made based on collected statistics, e.g. percentage of received calls and talk time.�
Preparing the mechanism of sending, mixing, compressing and saving recorded calls to the database.�

Kenntnisse

WCF WPF Mono PostgreSQL MongoDB Asterisk

Kunde

V-TELL

Einsatzort

Pozna?

1 Jahr 9 Monate

2011-07 - 2013-03

Developing Verax Network Management System.

Java Developer Spring Hibernate Adobe Flex ...

Rolle

Java Developer

Projektinhalte

Implementing advanced plugins for detecting problems and real-time monitoring of devices and applications such as:�
- PostgreSQL and MySQL database�
- Active Directory service�
- VMware ESX servers and virtual machines�
- .NET applications�
- Windows and Unix workstations�
- Cisco, MRV and Juniper routers and switches�
- APC UPS devices�
- Devices with undetected type�
Creating module for monitoring changes in software installed on detected devices.�

Kenntnisse

Spring Hibernate Adobe Flex Oracle MS SQL

Kunde

Verax Systems

Einsatzort

Pozna?

Aus- und Weiterbildung

1 Jahr 9 Monate

2013-10 - 2015-06

Computer Science - Master Studies

Faculty of Computing, Pozna? University of Technology

Institution, Ort

Faculty of Computing, Pozna? University of Technology

Schwerpunkt

Part-time
Master thesis: on request

3 Jahre 6 Monate

2009-10 - 2013-03

Computer Science - Bachelor studies

Faculty of Computing, Pozna? University of Technology

Institution, Ort

Faculty of Computing, Pozna? University of Technology

Position

Big Data Engineering
Software development

Kompetenzen

Top-Skills

AWS Apache Spark Big Data Scala Python Java Hadoop Apache Airflow Apache Kafka Kubernetes Apache Cassandra Neo4j

Produkte / Standards / Erfahrungen / Methoden

Version control systems

Git
Mercurial
SVN
CVS

Continuous integration systems�

Jenkins
Hudson
TeamCity

Others�

Cassandra
Spark
Neo4j
Kafka
Hadoop
Hive
Flume
HBase
AWS
Docker
Kubernetes
Airflow
Ansible
Gradle
Maven
Ant
SBT
UML

Hobby and interests�

Big data technologies�
Machine learning

Betriebssysteme

Windows

Linux

Programmiersprachen

Functional Programming Principles in Scala

Functional Program Design in Scala

Parallel programming

Scala

Java

Python

Groovy

Cypher

CQL

SQL

PL/SQL

Bash

Datenbanken

AWS Certified Big Data - Specialty

MapR Certified Spark Developer v2

Neo4j Certified Professional

Google Cloud Certified Professional Data Engineer

Vertrauen Sie auf Randstad

Im Bereich Freelancing

Im Bereich Arbeitnehmer�berlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Name E-Mail-Adresse Ihre Frage

Telefonnummer Unternehmen

Ich habe die Datenschutzbestimmungen gelesen und bin damit einverstanden.

Einsatzorte

Projekte

Aus- und Weiterbildung

Position

Kompetenzen

Top-Skills

Produkte / Standards / Erfahrungen / Methoden

Betriebssysteme

Programmiersprachen

Datenbanken

Einsatzorte

Projekte

Aus- und Weiterbildung

Position

Kompetenzen

Top-Skills

Produkte / Standards / Erfahrungen / Methoden

Betriebssysteme

Programmiersprachen

Datenbanken

Vertrauen Sie auf Randstad

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.