Data Engineer
Full time
Seven Bridges Centralna Srbija, Serbia (201-500 employees)
About the job
At Seven Bridges we are building the most advanced cloud computing platform for genomics data analysis. Our team and product enable scientists to analyze genomic data faster and more efficiently than ever, so they can focus on making progress in genomics and personalized medicine. Through our collaboration with the largest genomics projects, we connect the world’s biomedical information to enable the most efficient analysis at scale. We are a global company with offices in the US, UK, Serbia and Turkey, with roughly 300 employees and rapidly growing!
Do you want to help us engineer a healthier tomorrow, together?
As a member of our Engineering team you will have the opportunity to extend our platform and be involved in building the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and various ‘big data’ technologies. Seven Bridges is a ‘Remote Friendly’ company. The successful candidate for this position could be remote within Serbia, or in Belgrade / Novi Sad offices.
Some of the problems we are solving are:
- Data wrangling and exploration; analyzing and organizing raw data
- Working with our bioinformaticians on developing tools and pipelines for structured transformation of rich datasets
- Designing schemas and data modeling
- Query optimizations
Do you have what it takes?
- B.S., M.S. or Ph.D. degree in Computer Science or a related technical field
- At least 3 years of professional experience working with data
- Experience with relational and non-relational databases
- Experience with SQL
- Experience in the design, optimization and support of big data ecosystems
- Experience with Python and/or Java programming languages
- Experience working with one or more of the following(or related) technologies: Snowflake, Amazon Redshift, Amazon Athena, Google BigQuery, Azure, Synapse.
- Experience using RCFile, Apache ORC and/or Apache Parquet
- Experience with cloud service providers like AWS, GCP, Azure…
- Experience with schema design and dimensional data modeling
- Ability to design, build, and maintain the ETL pipeline and data warehouse
- Writes unit/integration tests, contributes to engineering wiki, and documents work
It would be great if you:
- Have a strong bioinformatics background
- Previous experience with genomic, phenotypic, EMR and EHR data, data which includes time series and many other types of data