Data Engineer Interview Questions

Data Engineer Interview Questions

Data engineers zijn IT-professionals en zijn in bijna elke bedrijfstak nodig. Data engineers volgen gegevenstrends voor het vaststellen van de beste vervolgstappen voor bedrijven. Een cruciaal onderdeel van het werk van een data engineer bestaat uit het verwerken van ruwe gegevens tot bruikbare gegevens door datapipelines te creëren en gegevenssystemen te bouwen.

Meest gestelde sollicitatievragen voor een data engineer (M/V/X) en hoe te antwoorden

Question 1

Vraag 1: Kunt u tot in detail uw kennisniveau van programmeertalen omschrijven?

How to answer
Zo antwoordt u: Bekijk vóór het sollicitatiegesprek uw cv en/of portfolio en maak een lijst van de programma's waar u het meest bekwaam in bent. Als het u duidelijk wordt dat u voor een programma dat het bedrijf voornamelijk gebruikt, niet de benodigde expertise in huis hebt, beschrijf uzelf dan als een zeer gemotiveerd, zelfstandig persoon die zich onvermoeibaar zal inzetten om deze programma's te leren.
Question 2

Vraag 2: Leg in uw eigen woorden uit wat data engineering inhoudt.

How to answer
Zo antwoordt u: Leg uit wat uw rol is in relatie tot de bredere organisatie en in relatie tot andere rollen zoals die van data scientists om uw bijdrage aan het totale bedrijfssysteem duidelijk te maken. Verduidelijk het verschil tussen een op de database gerichte engineer en een op de pipeline gerichte engineer.
Question 3

Vraag 3: Kunt u uw ervaring met Apache Hadoop en databeheer in een cloudomgeving beschrijven?

How to answer
Zo antwoordt u: Bereid u voor op deze vraag door informatie te zoeken over de software van het bedrijf, producten voor gegevensopslag in de cloud en het gebruik van Apache Hadoop. Data engineers moeten kunnen werken met programmeertalen en gegevensbeheersystemen die overal in de bedrijfstak worden gebruikt, zoals Apache Hadoop.

20,956 data engineer interview questions shared by candidates

Data Model: LinkedIn data model - model for 1st degree connection Python: 1. Dedup items in the list - retaining the order of items. ==> cannot use dict/set since order will not be retained. Follow up question on this - How would you handle nested lists? (they are looking for recursion) 2. Find the number of words in a sentence /avg length of word SQL On the product-sales-customers data model that is preloaded in coderpad.io, write the following queries 1. Count of stores in OR state with area_sqft > 25000 2. avg number of Female Customers group by state 3. Customer FirstName, Last Name and count of unqiue products purchased by state followup question - Return the top customer by state based on diverse product purchased (diverse = count(distinct product_id))
avatar

Senior Data Engineering Manager

Interviewed at Meta

3.5
Aug 26, 2020

Data Model: LinkedIn data model - model for 1st degree connection Python: 1. Dedup items in the list - retaining the order of items. ==> cannot use dict/set since order will not be retained. Follow up question on this - How would you handle nested lists? (they are looking for recursion) 2. Find the number of words in a sentence /avg length of word SQL On the product-sales-customers data model that is preloaded in coderpad.io, write the following queries 1. Count of stores in OR state with area_sqft > 25000 2. avg number of Female Customers group by state 3. Customer FirstName, Last Name and count of unqiue products purchased by state followup question - Return the top customer by state based on diverse product purchased (diverse = count(distinct product_id))

A satellite television provider XYZ is looking forward to your help in retrieving some critical information from its data warehouse. The satellite television provider provides television channels bundled in packages to its subscribers. The below diagram explains the key data tables and the relation-ship between them: [Don't have diag] D_SUBSCRIBER is a dimension table that has details of subscribers (customers). D_PACKAGES is a dimension table that has details of various package offered by the provider F_SUBSCRIPTIONS is a fact table that stores package subscriptions by each subscriber. A subscriber can have multiple records in this table during his lifetime Please note the following business rules • There are 2 types of packages. BASE pack and ADDON pack (which is mentioned in PACKAGE_TYPE attribute) • Subscriber will have at least 1 BASE pack. They can have 0 or more ADDON packs • A package is considered active for a subscriber if PACKAGE_SUB_END_DT in F_SUBSCRIPTIONS is NULL. Otherwise the package is not active for the subscriber • A subscriber is considered to be active if there is a BASE pack linked to him with NULL PACKAGE_SUB_END_DT. Otherwise the subscriber is considered to be inactive. • Each package has a PRICE_PER_DAY which is the fee for the package for a single day usage. A subscriber’s monthly fee is the PRICE_PER_DAY of each of his active packages multiplied by No. of active days in that month that he has been on that package. The fee is considered as revenue to the business. Please find below sample data for each tables for better understanding • D_SUBSCRIBER D_SUBSCRIBER_ID SUBSCRIBER_NAME SUBSCRIBER_JOINING_DT 1E+08 Rahul Agarwal ######## 1E+08 Ajay Mehta ######## 1E+08 Pranab Shetty ######## • D_PACKAGES D_PACKAGE_ID PACKAGE_NAME PACKAGE_TYPE PRICE_PER_DAY 131 Metro Base ₹ 4 132 Dhamaal Cricket Base ₹ 5 133 South Special Base ₹ 4 134 Supreme Kids Base ₹ 3 231 Hindi Movies Addon ₹ 1 232 English News Addon ₹ 2 233 English Entertainment Addon ₹ 2 • F_SUBSCRIPTIONS D_SUBSCRIBER_ID D_PACKAGE_ID PACKAGE_START_DT PACKAGE_END_DT 1E+08 131 ######## 1E+08 231 ######## 19-Jan-15 1E+08 233 ######## 1E+08 133 ######## 1-Jun-15 1E+08 232 17-Jan-15 1-Mar-15 1E+08 131 ######## The business relies on you to pull out important data to get better understanding of the business and subscriber’s. For each of the questions below please write a SQL query. Please follow the below ground rules for answering the below questions 1) The marketing team wants to run a campaign to bring back subscribers who are no longer active. Write a query to pull out subscribers who are no longer active? 2) The products team wants to understand subscribers who have at least 2 Add on Packages. Write a query to pull out subscribers who have at least 2 active ADDON pack? 3) The marketing team believes that people who subscribe to ‘English News’ package should also be needing ‘English Entertainment’ package. Write a SQL to pull out subscribers who have active ‘English News’ package but do not have active ‘English Entertainment’ package 4) The finance team wants to track revenue generated by each subscriber. 5) Write a SQL to get total revenue generated by each subscriber in the year 2014? 6) Write a query to identify Top 3 base packages (in terms of revenue collected) for each month in 2014?
avatar

Data Engineer

Interviewed at Amazon

3.5
Jan 13, 2021

A satellite television provider XYZ is looking forward to your help in retrieving some critical information from its data warehouse. The satellite television provider provides television channels bundled in packages to its subscribers. The below diagram explains the key data tables and the relation-ship between them: [Don't have diag] D_SUBSCRIBER is a dimension table that has details of subscribers (customers). D_PACKAGES is a dimension table that has details of various package offered by the provider F_SUBSCRIPTIONS is a fact table that stores package subscriptions by each subscriber. A subscriber can have multiple records in this table during his lifetime Please note the following business rules • There are 2 types of packages. BASE pack and ADDON pack (which is mentioned in PACKAGE_TYPE attribute) • Subscriber will have at least 1 BASE pack. They can have 0 or more ADDON packs • A package is considered active for a subscriber if PACKAGE_SUB_END_DT in F_SUBSCRIPTIONS is NULL. Otherwise the package is not active for the subscriber • A subscriber is considered to be active if there is a BASE pack linked to him with NULL PACKAGE_SUB_END_DT. Otherwise the subscriber is considered to be inactive. • Each package has a PRICE_PER_DAY which is the fee for the package for a single day usage. A subscriber’s monthly fee is the PRICE_PER_DAY of each of his active packages multiplied by No. of active days in that month that he has been on that package. The fee is considered as revenue to the business. Please find below sample data for each tables for better understanding • D_SUBSCRIBER D_SUBSCRIBER_ID SUBSCRIBER_NAME SUBSCRIBER_JOINING_DT 1E+08 Rahul Agarwal ######## 1E+08 Ajay Mehta ######## 1E+08 Pranab Shetty ######## • D_PACKAGES D_PACKAGE_ID PACKAGE_NAME PACKAGE_TYPE PRICE_PER_DAY 131 Metro Base ₹ 4 132 Dhamaal Cricket Base ₹ 5 133 South Special Base ₹ 4 134 Supreme Kids Base ₹ 3 231 Hindi Movies Addon ₹ 1 232 English News Addon ₹ 2 233 English Entertainment Addon ₹ 2 • F_SUBSCRIPTIONS D_SUBSCRIBER_ID D_PACKAGE_ID PACKAGE_START_DT PACKAGE_END_DT 1E+08 131 ######## 1E+08 231 ######## 19-Jan-15 1E+08 233 ######## 1E+08 133 ######## 1-Jun-15 1E+08 232 17-Jan-15 1-Mar-15 1E+08 131 ######## The business relies on you to pull out important data to get better understanding of the business and subscriber’s. For each of the questions below please write a SQL query. Please follow the below ground rules for answering the below questions 1) The marketing team wants to run a campaign to bring back subscribers who are no longer active. Write a query to pull out subscribers who are no longer active? 2) The products team wants to understand subscribers who have at least 2 Add on Packages. Write a query to pull out subscribers who have at least 2 active ADDON pack? 3) The marketing team believes that people who subscribe to ‘English News’ package should also be needing ‘English Entertainment’ package. Write a SQL to pull out subscribers who have active ‘English News’ package but do not have active ‘English Entertainment’ package 4) The finance team wants to track revenue generated by each subscriber. 5) Write a SQL to get total revenue generated by each subscriber in the year 2014? 6) Write a query to identify Top 3 base packages (in terms of revenue collected) for each month in 2014?

Viewing 201 - 210 interview questions

Glassdoor has 20,956 interview questions and reports from Data engineer interviews. Prepare for your interview. Get hired. Love your job.