The Big Data Analytics team supports the needs for advanced analytics. As a competency center for analytics, the team helps to transform data into insight using techniques such as text mining, process mining, network analytics or predictive modelling.
The team is currently looking for a Data Engineer whose core objectives will be:
· Collect, clean, prepare and load the necessary data - structured or unstructured - onto Hadoop, our Big Data analytics platform, so that they can be used by the data scientists to create insights and answer business challenges
· Act as a liaison between the team and other stakeholders and contribute to support the Hadoop cluster and the compatibility of all the different software’s that run on the platform (Spark, R, Python, …)
· Experiment new tools and technologies related to data extraction, exploration or processing
· Depending on his / her skills, the new data engineer may also be involved in the analytical aspects of data science projects
· Identify the most appropriate data sources to use for a given purpose and understand their structures and contents, if necessary with the help of SMEs
· Extract structured and unstructured data from the source systems (relational databases, data warehouses, document repositories, file systems …), prepare such data (cleanse, re-structure, aggregate …) and load them onto Hadoop.
· Actively support data scientists in the data exploration and data preparation phases. Where data quality issues are detected, liaise with the data supplier to do root cause analysis
· Where a use case is meant to become a production application, contribute to the design, build and launch activities
· Ensure the maintenance and support of production applications (watch duty)
· Liaise with a team to address infrastructure issues and to ensure that the components and software’s used of the platform are all consistent
· Where the skills allow for it, perform advanced data analysis on a selection of business use cases, supported by data scientists
o With understanding and creating data flows, with data architecture, with ETL/ELT development (MS SQL Server SSIS, DataStage, … ) and with processing structured and unstructured data
o In working with customers to identify and clarify requirements
o With open source technologies used in Big Data analytics like Pig, Hive, HBase, Kafka, …
· Proven experience with using data stored in RDBMSs and experience or good understanding of NoSQL databases
· Understanding of the Hadoop ecosystem including Hadoop file formats like Parquet and ORC
· Very good knowledge of Spark & Scala
· Ability to
o Write MapReduce & Spark jobs and performant SQL statements
o Analyze data, to identify issues like gaps and inconsistencies and to do root cause analysis
o Design solutions that are fit for purpose whilst keeping options open for future needs
· Strong verbal and written communication skills, good customer relationship skills
Will be considered as assets :
· Knowledge of Cloudera
· Experience with Linux and Shell scripting
· Knowledge of
o IBM mainframe and DB2
o Or experience in classic and new/emerging business intelligence methodologies
o Statistics, data mining, machine learning and predictive modeling, data visualization and information discovery techniques
· A challenging position in a fast growing company with an international presence.
· A stimulating working environment with a really good team spirit maintained by lots of internal events (teambuilding, ...).
· A dynamic culture focused on personal development.
· A wide range of training and career development opportunities.
Interested? Please send us your CV in English at firstname.lastname@example.org.
Please apply now !