Data Engineer Interview Questions

Data Engineer Interview Questions

Data engineers zijn IT-professionals en zijn in bijna elke bedrijfstak nodig. Data engineers volgen gegevenstrends voor het vaststellen van de beste vervolgstappen voor bedrijven. Een cruciaal onderdeel van het werk van een data engineer bestaat uit het verwerken van ruwe gegevens tot bruikbare gegevens door datapipelines te creëren en gegevenssystemen te bouwen.

Meest gestelde sollicitatievragen voor een data engineer (M/V/X) en hoe te antwoorden

Question 1

Vraag 1: Kunt u tot in detail uw kennisniveau van programmeertalen omschrijven?

How to answer
Zo antwoordt u: Bekijk vóór het sollicitatiegesprek uw cv en/of portfolio en maak een lijst van de programma's waar u het meest bekwaam in bent. Als het u duidelijk wordt dat u voor een programma dat het bedrijf voornamelijk gebruikt, niet de benodigde expertise in huis hebt, beschrijf uzelf dan als een zeer gemotiveerd, zelfstandig persoon die zich onvermoeibaar zal inzetten om deze programma's te leren.
Question 2

Vraag 2: Leg in uw eigen woorden uit wat data engineering inhoudt.

How to answer
Zo antwoordt u: Leg uit wat uw rol is in relatie tot de bredere organisatie en in relatie tot andere rollen zoals die van data scientists om uw bijdrage aan het totale bedrijfssysteem duidelijk te maken. Verduidelijk het verschil tussen een op de database gerichte engineer en een op de pipeline gerichte engineer.
Question 3

Vraag 3: Kunt u uw ervaring met Apache Hadoop en databeheer in een cloudomgeving beschrijven?

How to answer
Zo antwoordt u: Bereid u voor op deze vraag door informatie te zoeken over de software van het bedrijf, producten voor gegevensopslag in de cloud en het gebruik van Apache Hadoop. Data engineers moeten kunnen werken met programmeertalen en gegevensbeheersystemen die overal in de bedrijfstak worden gebruikt, zoals Apache Hadoop.

20,944 data engineer interview questions shared by candidates

Given a multi-step product feature, write SQL to see how well this feature is doing (loading times, step completion %). Then use Python to constantly update average step time as new values stream in, given that there are too many to store in memory.
avatar

Data Engineer

Interviewed at Meta

3.5
Apr 30, 2018

Given a multi-step product feature, write SQL to see how well this feature is doing (loading times, step completion %). Then use Python to constantly update average step time as new values stream in, given that there are too many to store in memory.

SQL: 1. Percentage increase in revenue compared to promoted and non-promoted products. 2. Products classes that has the highest transactions 3. Count of Customers who bought 2 items type (A,B) 4. Don't remember Python: 1. Average length of letters to words. 2. Parse an ip address (This is a favourite FB question) 3. [[A],[A,B],[A,C],[B,D],[C,A]] -- Find the alphabet with highest neighbors? -- (Wasnt able to solve because of time limit but the interviewer was like I get what I want to convey.. I gave her an algo of what I would I have done)
avatar

Data Engineer

Interviewed at Meta

3.5
Sep 21, 2018

SQL: 1. Percentage increase in revenue compared to promoted and non-promoted products. 2. Products classes that has the highest transactions 3. Count of Customers who bought 2 items type (A,B) 4. Don't remember Python: 1. Average length of letters to words. 2. Parse an ip address (This is a favourite FB question) 3. [[A],[A,B],[A,C],[B,D],[C,A]] -- Find the alphabet with highest neighbors? -- (Wasnt able to solve because of time limit but the interviewer was like I get what I want to convey.. I gave her an algo of what I would I have done)

In python code, given a json object with nested objects, write a function that flattens all the objects to a single key value dictionary. Do not use the lib that actually performs this function. { a:{b:c,d:e} } becomes {a_b:c, a_d:e} ( not, a:"b:c,d:e" }
avatar

Data Engineer

Interviewed at Amazon

3.5
Apr 29, 2020

In python code, given a json object with nested objects, write a function that flattens all the objects to a single key value dictionary. Do not use the lib that actually performs this function. { a:{b:c,d:e} } becomes {a_b:c, a_d:e} ( not, a:"b:c,d:e" }

SQL Select the value of a column based on the max of a different column from each grouping of yet a third column. Column A, Column B, Column C. For each group based on Column A, give value of Column B, where Column C is max for that group.
avatar

Data Engineer

Interviewed at Amazon

3.5
Apr 29, 2020

SQL Select the value of a column based on the max of a different column from each grouping of yet a third column. Column A, Column B, Column C. For each group based on Column A, give value of Column B, where Column C is max for that group.

# Question 3: # Complete a function that returns a list containing all the mismatched words (case sensitive) between two given input strings # For example: # - string 1 : "Firstly this is the first string" # - string 2 : "Next is the second string" # # - output : ['Firstly', 'this', 'first', 'Next', 'second']
avatar

Data Engineer

Interviewed at Meta

3.5
Jun 8, 2020

# Question 3: # Complete a function that returns a list containing all the mismatched words (case sensitive) between two given input strings # For example: # - string 1 : "Firstly this is the first string" # - string 2 : "Next is the second string" # # - output : ['Firstly', 'this', 'first', 'Next', 'second']

1. What difference have you made in current team apart from regular work ? 2. What are the steps you follow to rebuild a table in database ? 3. How did you do performance tuning ? 4. How do you find the skewness of data in table ? 5. Difference between RDBMS and Dimensional Modeling SQL 1) purchase customer_id product_id quantity purchase_date 1 111 1 01/01/2017 1 111 2 01/02/2107 1 222 2 01/02/2017 2 111 3 01/04/2017 2 222 1 01/03/2017 3 222 1 01/05/2017 3 222 1 01/06/2017 3 111 1 01/06/2017 3 111 1 01/04/2017 Q: How many customers bought each product how many times during the week? Product_Id Number_of Customers Number_of_Times 111 2 2 111 1 1 222 2 1 222 1 2 2) daily_usage account_id usage_amount usage_date 1 10 1 1 20 2 1 15 3 1 30 4 Q. a) How do you print the usage_amount of previous/consecutive rows b) Without using window functions
avatar

Data Engineer

Interviewed at Amazon

3.5
Feb 16, 2017

1. What difference have you made in current team apart from regular work ? 2. What are the steps you follow to rebuild a table in database ? 3. How did you do performance tuning ? 4. How do you find the skewness of data in table ? 5. Difference between RDBMS and Dimensional Modeling SQL 1) purchase customer_id product_id quantity purchase_date 1 111 1 01/01/2017 1 111 2 01/02/2107 1 222 2 01/02/2017 2 111 3 01/04/2017 2 222 1 01/03/2017 3 222 1 01/05/2017 3 222 1 01/06/2017 3 111 1 01/06/2017 3 111 1 01/04/2017 Q: How many customers bought each product how many times during the week? Product_Id Number_of Customers Number_of_Times 111 2 2 111 1 1 222 2 1 222 1 2 2) daily_usage account_id usage_amount usage_date 1 10 1 1 20 2 1 15 3 1 30 4 Q. a) How do you print the usage_amount of previous/consecutive rows b) Without using window functions

SQL : Top 3 Products by sale, % using Case, Basic Having clause and one Set operator (Intersect) type question Python : Average word length, ip-address parsing, dictionary, list of lists, flatten list of lists. ( Similar to previous interview experiences)
avatar

Data Engineer

Interviewed at Meta

3.5
Nov 26, 2018

SQL : Top 3 Products by sale, % using Case, Basic Having clause and one Set operator (Intersect) type question Python : Average word length, ip-address parsing, dictionary, list of lists, flatten list of lists. ( Similar to previous interview experiences)

Viewing 21 - 30 interview questions

Glassdoor has 20,944 interview questions and reports from Data engineer interviews. Prepare for your interview. Get hired. Love your job.