Data Engineer Interview Questions

Data Engineer Interview Questions

Data engineers zijn IT-professionals en zijn in bijna elke bedrijfstak nodig. Data engineers volgen gegevenstrends voor het vaststellen van de beste vervolgstappen voor bedrijven. Een cruciaal onderdeel van het werk van een data engineer bestaat uit het verwerken van ruwe gegevens tot bruikbare gegevens door datapipelines te creëren en gegevenssystemen te bouwen.

Meest gestelde sollicitatievragen voor een data engineer (M/V/X) en hoe te antwoorden

Question 1

Vraag 1: Kunt u tot in detail uw kennisniveau van programmeertalen omschrijven?

How to answer
Zo antwoordt u: Bekijk vóór het sollicitatiegesprek uw cv en/of portfolio en maak een lijst van de programma's waar u het meest bekwaam in bent. Als het u duidelijk wordt dat u voor een programma dat het bedrijf voornamelijk gebruikt, niet de benodigde expertise in huis hebt, beschrijf uzelf dan als een zeer gemotiveerd, zelfstandig persoon die zich onvermoeibaar zal inzetten om deze programma's te leren.
Question 2

Vraag 2: Leg in uw eigen woorden uit wat data engineering inhoudt.

How to answer
Zo antwoordt u: Leg uit wat uw rol is in relatie tot de bredere organisatie en in relatie tot andere rollen zoals die van data scientists om uw bijdrage aan het totale bedrijfssysteem duidelijk te maken. Verduidelijk het verschil tussen een op de database gerichte engineer en een op de pipeline gerichte engineer.
Question 3

Vraag 3: Kunt u uw ervaring met Apache Hadoop en databeheer in een cloudomgeving beschrijven?

How to answer
Zo antwoordt u: Bereid u voor op deze vraag door informatie te zoeken over de software van het bedrijf, producten voor gegevensopslag in de cloud en het gebruik van Apache Hadoop. Data engineers moeten kunnen werken met programmeertalen en gegevensbeheersystemen die overal in de bedrijfstak worden gebruikt, zoals Apache Hadoop.

20,944 data engineer interview questions shared by candidates

# # sales # products # +------------------+---------+ +---------------------+---------+ # | product_id | INTEGER |>--------| product_id | INTEGER | # | store_id | INTEGER | +---<| product_class_id | INTEGER | # | customer_id | INTEGER | | | brand_name | VARCHAR | # +---<| promotion_id | INTEGER | | | product_name | VARCHAR | # | | store_sales | DECIMAL | | | is_low_fat_flg | TINYINT | # | | store_cost | DECIMAL | | | is_recyclable_flg | TINYINT | # | | units_sold | DECIMAL | | | gross_weight | DECIMAL | # | | transaction_date | DATE | | | net_weight | DECIMAL | # | +------------------+---------+ | +---------------------+---------+ # | | # | # promotions | # product_classes # | +------------------+---------+ | +---------------------+---------+ # +----| promotion_id | INTEGER | +----| product_class_id | INTEGER | # | promotion_name | VARCHAR | | product_subcategory | VARCHAR | # | media_type | VARCHAR | | product_category | VARCHAR | # | cost | DECIMAL | | product_department | VARCHAR | # | start_date | DATE | | product_family | VARCHAR | # | end_date | DATE | +---------------------+---------+ # +------------------+---------+ # */ # Question 1: # -- What percent of all products in the grocery chain's catalog # -- are both low fat and recyclable? #
avatar

Data Engineer

Interviewed at Meta

3.5
Jun 8, 2020

# # sales # products # +------------------+---------+ +---------------------+---------+ # | product_id | INTEGER |>--------| product_id | INTEGER | # | store_id | INTEGER | +---<| product_class_id | INTEGER | # | customer_id | INTEGER | | | brand_name | VARCHAR | # +---<| promotion_id | INTEGER | | | product_name | VARCHAR | # | | store_sales | DECIMAL | | | is_low_fat_flg | TINYINT | # | | store_cost | DECIMAL | | | is_recyclable_flg | TINYINT | # | | units_sold | DECIMAL | | | gross_weight | DECIMAL | # | | transaction_date | DATE | | | net_weight | DECIMAL | # | +------------------+---------+ | +---------------------+---------+ # | | # | # promotions | # product_classes # | +------------------+---------+ | +---------------------+---------+ # +----| promotion_id | INTEGER | +----| product_class_id | INTEGER | # | promotion_name | VARCHAR | | product_subcategory | VARCHAR | # | media_type | VARCHAR | | product_category | VARCHAR | # | cost | DECIMAL | | product_department | VARCHAR | # | start_date | DATE | | product_family | VARCHAR | # | end_date | DATE | +---------------------+---------+ # +------------------+---------+ # */ # Question 1: # -- What percent of all products in the grocery chain's catalog # -- are both low fat and recyclable? #

first round - written: 3 sql and one about what will you do to improve the fastness of an insert on a huge table second round - get the players with highest streak get the employee details who has maximum members in a team. python-return the numbers which have maximum count in a list round 3: behavioral questions and 1 question on python lists. from the 2 lists get the numbers that are common , and return the numbers in the following way. [1,2,3,3,1,1,1],[1,1,2,2,3] - return [1,1,2,3]
avatar

Data Engineer

Interviewed at Amazon

3.5
Apr 8, 2021

first round - written: 3 sql and one about what will you do to improve the fastness of an insert on a huge table second round - get the players with highest streak get the employee details who has maximum members in a team. python-return the numbers which have maximum count in a list round 3: behavioral questions and 1 question on python lists. from the 2 lists get the numbers that are common , and return the numbers in the following way. [1,2,3,3,1,1,1],[1,1,2,2,3] - return [1,1,2,3]

python question: given a two dimensional list for example [ [2,3],[3,4],[5]] person 2 is friends with 3 etc. find how many friends does each person has. note one person has no friends. SQL question: find the top 10 college/company that a average social person interacts with. something in those lines. I split the query in two. Not able to finish coding but was able to explain and write both the parts but didn't have time to test it. also had data modeling questions. on a social network website. cant give details.
avatar

Data Engineer

Interviewed at Meta

3.5
Nov 16, 2020

python question: given a two dimensional list for example [ [2,3],[3,4],[5]] person 2 is friends with 3 etc. find how many friends does each person has. note one person has no friends. SQL question: find the top 10 college/company that a average social person interacts with. something in those lines. I split the query in two. Not able to finish coding but was able to explain and write both the parts but didn't have time to test it. also had data modeling questions. on a social network website. cant give details.

Python 1 #1.returns the number of times a given character occurs in the given string s1='missisipi' #print(s1.find('s')) res=[] for i in range(len(s1)): #print(s1[i]) if s1[i]=='s': res.append('s') print(len(res)) #2.[1,None,1,2,None} --> [1,1,1,2,2] arr=[None,1,2,None] new_l=[] for i in range(0,len(arr)): if arr[i] != None: new_l.append(arr[i]) else: new_l.append(arr[i-1]) print(new_l) #2. (python) Given two sentences, construct an array that has the words that appear in one sentence and not the other. A = "Geeks for Geeks" B = "Learning from Geeks for Geeks" d={} for w in A.split(): if w in d: d[w]=d.get(w,0)+1 else: d[w]=1 for w in B.split(): if w in d: d[w]=d.get(w,0)+1 else: d[w]=1 unmatchedW=[w for w in d if d[w]==1] print (unmatchedW) 3. d = {"a": 4, "c": 3, "b": 12} [(k, v) for k, v in sorted(d.items(), key=lambda x: x[1], reverse=True)] #[('b', 12), ('a', 4), ('c', 3)] SQL # # sales # products # +------------------+---------+ +---------------------+---------+ # | product_id | INTEGER |>--------| product_id | INTEGER | # | store_id | INTEGER | +---<| product_class_id | INTEGER | # | customer_id | INTEGER | | | brand_name | VARCHAR | # +---<| promotion_id | INTEGER | | | product_name | VARCHAR | # | | store_sales | DECIMAL | | | is_low_fat_flg | TINYINT | # | | store_cost | DECIMAL | | | is_recyclable_flg |… Show More 1. find top 5 sales products having promotions Select Sum(s.store_sales), brand_name, count(p.product_id) from products p inner join sales s p.product_id = s.product_id where promotion_id is not null group by brand_name having count(p.product_id) =1 /* single-channel media type */ order by 1 desc limit 5 2. # -- % Of sales that had a valid promotion, the VP of marketing # -- wants to know what % of transactions occur on either # -- the very first day or the very last day of a promotion campaign. select sum(case when valid_promotion = 1 then 1 else 0 end)/count(*) * 100 as percentage from sales where day = First_day(date) or day = last_day(date) or select sum(case when transaction_date = (select min(transaction_date) from sales) then 1 else 0)/count(*) as first_day_sales, sum(case when transaction_date = (select max(transaction_date) from sales) then 1 else 0)/count(*) as last_day_sales from sales or select avg(transaction_date in (p.start_date,p.end_date))*100 as first_last_pct from sales s join promotions p using(promotion_id)
avatar

Data Engineer

Interviewed at Meta

3.5
Aug 25, 2020

Python 1 #1.returns the number of times a given character occurs in the given string s1='missisipi' #print(s1.find('s')) res=[] for i in range(len(s1)): #print(s1[i]) if s1[i]=='s': res.append('s') print(len(res)) #2.[1,None,1,2,None} --> [1,1,1,2,2] arr=[None,1,2,None] new_l=[] for i in range(0,len(arr)): if arr[i] != None: new_l.append(arr[i]) else: new_l.append(arr[i-1]) print(new_l) #2. (python) Given two sentences, construct an array that has the words that appear in one sentence and not the other. A = "Geeks for Geeks" B = "Learning from Geeks for Geeks" d={} for w in A.split(): if w in d: d[w]=d.get(w,0)+1 else: d[w]=1 for w in B.split(): if w in d: d[w]=d.get(w,0)+1 else: d[w]=1 unmatchedW=[w for w in d if d[w]==1] print (unmatchedW) 3. d = {"a": 4, "c": 3, "b": 12} [(k, v) for k, v in sorted(d.items(), key=lambda x: x[1], reverse=True)] #[('b', 12), ('a', 4), ('c', 3)] SQL # # sales # products # +------------------+---------+ +---------------------+---------+ # | product_id | INTEGER |>--------| product_id | INTEGER | # | store_id | INTEGER | +---<| product_class_id | INTEGER | # | customer_id | INTEGER | | | brand_name | VARCHAR | # +---<| promotion_id | INTEGER | | | product_name | VARCHAR | # | | store_sales | DECIMAL | | | is_low_fat_flg | TINYINT | # | | store_cost | DECIMAL | | | is_recyclable_flg |… Show More 1. find top 5 sales products having promotions Select Sum(s.store_sales), brand_name, count(p.product_id) from products p inner join sales s p.product_id = s.product_id where promotion_id is not null group by brand_name having count(p.product_id) =1 /* single-channel media type */ order by 1 desc limit 5 2. # -- % Of sales that had a valid promotion, the VP of marketing # -- wants to know what % of transactions occur on either # -- the very first day or the very last day of a promotion campaign. select sum(case when valid_promotion = 1 then 1 else 0 end)/count(*) * 100 as percentage from sales where day = First_day(date) or day = last_day(date) or select sum(case when transaction_date = (select min(transaction_date) from sales) then 1 else 0)/count(*) as first_day_sales, sum(case when transaction_date = (select max(transaction_date) from sales) then 1 else 0)/count(*) as last_day_sales from sales or select avg(transaction_date in (p.start_date,p.end_date))*100 as first_last_pct from sales s join promotions p using(promotion_id)

Viewing 31 - 40 interview questions

Glassdoor has 20,944 interview questions and reports from Data engineer interviews. Prepare for your interview. Get hired. Love your job.