Data Engineer

By Nebojša Jotović | June 27, 2022

Full time

Seven Bridges Centralna Srbija, Serbia (201-500 employees)

https://www.sevenbridges.com/

About the job

At Seven Bridges we are building the most advanced cloud computing platform for genomics data analysis. Our team and product enable scientists to analyze genomic data faster and more efficiently than ever, so they can focus on making progress in genomics and personalized medicine. Through our collaboration with the largest genomics projects, we connect the world’s biomedical information to enable the most efficient analysis at scale. We are a global company with offices in the US, UK, Serbia and Turkey, with roughly 300 employees and rapidly growing!

Do you want to help us engineer a healthier tomorrow, together?

As a member of our Engineering team you will have the opportunity to extend our platform and be involved in building the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and various ‘big data’ technologies. Seven Bridges is a ‘Remote Friendly’ company. The successful candidate for this position could be remote within Serbia, or in Belgrade / Novi Sad offices.

Some of the problems we are solving are:

Data wrangling and exploration; analyzing and organizing raw data
Working with our bioinformaticians on developing tools and pipelines for structured transformation of rich datasets
Designing schemas and data modeling
Query optimizations

Do you have what it takes?

B.S., M.S. or Ph.D. degree in Computer Science or a related technical field
At least 3 years of professional experience working with data
Experience with relational and non-relational databases
Experience with SQL
Experience in the design, optimization and support of big data ecosystems
Experience with Python and/or Java programming languages
Experience working with one or more of the following(or related) technologies: Snowflake, Amazon Redshift, Amazon Athena, Google BigQuery, Azure, Synapse.
Experience using RCFile, Apache ORC and/or Apache Parquet
Experience with cloud service providers like AWS, GCP, Azure…
Experience with schema design and dimensional data modeling
Ability to design, build, and maintain the ETL pipeline and data warehouse
Writes unit/integration tests, contributes to engineering wiki, and documents work

It would be great if you:

Have a strong bioinformatics background
Previous experience with genomic, phenotypic, EMR and EHR data, data which includes time series and many other types of data