HQL面试题(一)
1. 数据准备
gz.txt 用户工资组成表
用户id(uid), 基本工资(jb) 奖金(jj) 提成(tc) 部门编号( deptno)
1,2000,3000,1500,1 2,5000,500,1000,2 3,1500,1000,3000,2 4,3000,6000,8000,3 5,1500,2000,1800,1 6,2500,1000,1900,1
bm.txt 部门表
1,销售 2,技术 3,行政
yg.txt 员工信息表
uid name gender age
1,zs,M,28 2,ww,F,36 3,zl,F,48 4,pp,M,44 5,wb,M,32 6,TQ,F,32
2. 建表并到导入数据到Hive数据表中
建立数据库db_company
create database db_company
;
use db_company
;
建表
create table tb_gz
(
uid
int,
jb
int,
jj
int,
tc
int,
deptno
int
)
row format delimited
fields terminated by ",";
load data local inpath
"/root/hive_sql/gz.txt" into table gz
;
create table bm
(
deptno string
,
name string
)
row format delimited
fields terminated by ",";
load data local inpath
"/root/hive_sql/bm.txt" into table bm
;
create table yg
(
uid
int,
name string
,
gender string
,
age
int
)
row format delimited
fields terminated by ",";
load data local inpath
"/root/hive_sql/yg.txt" into table yg
;
表格建立后,数据如下
+---------+--------+--------+--------+------------+
| gz.uid | gz.jb | gz.jj | gz.tc | gz.deptno |
+---------+--------+--------+--------+------------+
| 1 | 2000 | 3000 | 1500 | 1 |
| 2 | 5000 | 500 | 1000 | 2 |
| 3 | 1500 | 1000 | 3000 | 2 |
| 4 | 3000 | 6000 | 8000 | 3 |
| 5 | 1500 | 2000 | 1800 | 1 |
| 6 | 2500 | 1000 | 1900 | 1 |
+---------+--------+--------+--------+------------+
+------------+----------+
| bm.deptno | bm.name |
+------------+----------+
| 1 | 销售 |
| 2 | 技术 |
| 3 | 行政 |
+------------+----------+
+---------+----------+------------+---------+
| yg.uid | yg.name | yg.gender | yg.age |
+---------+----------+------------+---------+
| 1 | zs | M | 28 |
| 2 | ww | F | 36 |
| 3 | zl | F | 48 |
| 4 | pp | M | 44 |
| 5 | wb | M | 32 |
| 6 | TQ | F | 32 |
+---------+----------+------------+---------+
3. 题目
3.1 求出公司中每个员工的姓名 和 三类收入中最高的那种收入的类型
思路,
需要员工名字,收入,不涉及部门,所以yg表和gz表join收入最高,也就是需要求出三个收入部分最高的,查阅资料可以知道有一个函数greatest需要知道哪个部分最高,使用case when
sql
第一步,求出名字以及每个人对应工资中最高的部分信息
select
yg
.name
,
greatest
(gz
.jb
, gz
.jj
, gz
.tc
) as highest
from
gz
join
yg
on
gz
.uid
=yg
.uid
;
+----------+----------+
| yg.name | highest |
+----------+----------+
| zs | 3000 |
| ww | 5000 |
| zl | 3000 |
| pp | 8000 |
| wb | 2000 |
| TQ | 2500 |
+----------+----------+
再进行比对,最高的这部分是哪一个部分,case when即可
select
yg
.name
,
greatest
(gz
.jb
, gz
.jj
, gz
.tc
) as highest
,
case
when greatest
(gz
.jb
, gz
.jj
, gz
.tc
)=gz
.jb
then '基本工资'
when greatest
(gz
.jb
, gz
.jj
, gz
.tc
)=gz
.jj
then '奖金'
when greatest
(gz
.jb
, gz
.jj
, gz
.tc
)=gz
.tc
then '提成'
end highest_income_part
from
gz
join
yg
on
gz
.uid
=yg
.uid
;
+----------+----------+----------------------+
| yg.name | highest | highest_income_part |
+----------+----------+----------------------+
| zs | 3000 | 奖金 |
| ww | 5000 | 基本工资 |
| zl | 3000 | 提成 |
| pp | 8000 | 提成 |
| wb | 2000 | 奖金 |
| TQ | 2500 | 基本工资 |
+----------+----------+----------------------+
3.2 求出公司中每个岗位的薪资总和
思路
只有岗位也就是部门,以及薪资,所以gz表和bm表join即可需要求出每个部门的薪资总合,所以需要对薪资先求和,jj+jb+tc需要划分部门
sql
先求出部门中的薪资之和,再针对部门进行求和
select
bm
.name
,
(gz
.jb
+gz
.jj
+gz
.tc
) as total
from
gz
join
bm
on
gz
.deptno
=bm
.deptno
;
+----------+--------+
| bm.name | total |
+----------+--------+
| 销售 | 6500 |
| 技术 | 6500 |
| 技术 | 5500 |
| 行政 | 17000 |
| 销售 | 5300 |
| 销售 | 5400 |
+----------+--------+
select
t
.name
,
sum(t
.total
)
from
(select
bm
.name
,
(gz
.jb
+gz
.jj
+gz
.tc
) as total
from
gz
join
bm
on
gz
.deptno
=bm
.deptno
) t
group by t
.name
;
+---------+--------+
| t.name | _c1 |
+---------+--------+
| 技术 | 12000 |
| 行政 | 17000 |
| 销售 | 17200 |
+---------+--------+
3.3 求出公司中每个岗位不同性别员工薪资总和
思路
每个岗位,不同性别,不同员工,薪资总和也就是三张表都涉及到,join在一起、可以使用join on join on语法,也可以使用join 。。。join。。。join。。。on。。。and on。。。语法需要针对岗位,针对性别进行分类,同时对薪资进行求和。所以先针对岗位进行分类,再针对性别进行分类,再对薪资进行求和
sql
先把需要的信息都检索出来,然后根据需要进行处理
select
yg
.name
,
yg
.gender
,
(gz
.jb
+gz
.jj
+gz
.tc
) as total
,
bm
.name
from
yg
join
gz
on
yg
.uid
=gz
.uid
join
bm
on
bm
.deptno
= gz
.deptno
;
+----------+------------+--------+----------+
| yg.name | yg.gender | total | bm.name |
+----------+------------+--------+----------+
| zs | M | 6500 | 销售 |
| ww | F | 6500 | 技术 |
| zl | F | 5500 | 技术 |
| pp | M | 17000 | 行政 |
| wb | M | 5300 | 销售 |
| TQ | F | 5400 | 销售 |
+----------+------------+--------+----------+
这里需要的是根据部门岗位和性别进行划分
select
yg
.name
as username
,
yg
.gender
,
gz
.jb
,
gz
.jj
,
gz
.tc
,
bm
.name
from
yg
join
gz
on
yg
.uid
=gz
.uid
join
bm
on
bm
.deptno
= gz
.deptno
;
+-----------+------------+--------+----------+
| username | yg.gender | total | bm.name |
+-----------+------------+--------+----------+
| zs | M | 6500 | 销售 |
| ww | F | 6500 | 技术 |
| zl | F | 5500 | 技术 |
| pp | M | 17000 | 行政 |
| wb | M | 5300 | 销售 |
| TQ | F | 5400 | 销售 |
+-----------+------------+--------+----------+
再将信息根据部门和性别进行分组,group by
with
a
as
(select
yg
.name username
,
yg
.gender
,
gz
.jb
,
gz
.jj
,
gz
.tc
,
bm
.name
from
yg
join
gz
on
yg
.uid
=gz
.uid
join
bm
on
bm
.deptno
= gz
.deptno
)
select
name
,
gender
,
sum(jb
+jj
+tc
)
from
a
group by name
, gender
;
+-------+---------+--------+
| name | gender | _c2 |
+-------+---------+--------+
| 技术 | F | 12000 |
| 行政 | M | 17000 |
| 销售 | F | 5400 |
| 销售 | M | 11800 |
+-------+---------+--------+
3.4 求出公司中不同性别、不同年龄阶段(20-30,31-40,41-50)的员工薪资总和
思路
需要拿到员工信息、工资信息,但不涉及部门信息,所以gz表和yg表join即可需要针对性别、年龄段进行分类
sql
初步数据检索
select
gender
,
age
,
gz
.jb
,
gz
.jj
,
gz
.tc
from
gz
join
yg
on
gz
.uid
=yg
.uid
;
+---------+------+--------+--------+--------+
| gender | age | gz.jb | gz.jj | gz.tc |
+---------+------+--------+--------+--------+
| M | 28 | 2000 | 3000 | 1500 |
| F | 36 | 5000 | 500 | 1000 |
| F | 48 | 1500 | 1000 | 3000 |
| M | 44 | 3000 | 6000 | 8000 |
| M | 32 | 1500 | 2000 | 1800 |
| F | 32 | 2500 | 1000 | 1900 |
+---------+------+--------+--------+--------+
进一步进行分类,性别和年龄段
select
gender
,
age
,
gz
.jb
,
gz
.jj
,
gz
.tc
,
case
when age
>=20 and age
<=30 then '20~30'
when age
>=31 and age
<=40 then '31~40'
when age
>=41 and age
<50 then '41~50'
else 'too old'
end as age_range
from
gz
join
yg
on
gz
.uid
=yg
.uid
;
+---------+------+--------+--------+--------+------------+
| gender | age | gz.jb | gz.jj | gz.tc | age_range |
+---------+------+--------+--------+--------+------------+
| M | 28 | 2000 | 3000 | 1500 | 20~30 |
| F | 36 | 5000 | 500 | 1000 | 31~40 |
| F | 48 | 1500 | 1000 | 3000 | 41~50 |
| M | 44 | 3000 | 6000 | 8000 | 41~50 |
| M | 32 | 1500 | 2000 | 1800 | 31~40 |
| F | 32 | 2500 | 1000 | 1900 | 31~40 |
with
a
as
(select
yg
.gender
,
yg
.age
,
gz
.jb
,
gz
.jj
,
gz
.tc
,
case
when age
>=20 and age
<=30 then '20~30'
when age
>=31 and age
<=40 then '31~40'
when age
>=41 and age
<50 then '41~50'
else 'too old'
end as age_range
from
gz
join
yg
on
gz
.uid
=yg
.uid
)
select
gender
,
age_range
,
sum(jj
+jb
+tc
) as total
from
a
group by gender
, age_range
;
+---------+------------+--------+
| gender | age_range | total |
+---------+------------+--------+
| F | 31~40 | 11900 |
| F | 41~50 | 5500 |
| M | 20~30 | 6500 |
| M | 31~40 | 5300 |
| M | 41~50 | 17000 |
+---------+------------+--------+
4. 查阅Hive的函数
show functions
;
+------------------------------+
| tab_name |
+------------------------------+
| ! |
| != |
| $sum0 |
| % |
| & |
| * |
| + |
| - |
| / |
| < |
| <= |
| <=> |
| <> |
| = |
| == |
| > |
| >= |
| ^ |
| abs |
| acos |
| add_months |
| aes_decrypt |
| aes_encrypt |
| and |
| array |
| array_contains |
| ascii |
| asin |
| assert_true |
| assert_true_oom |
| atan |
| avg |
| base64 |
| between |
| bin |
| bloom_filter |
| bround |
| cardinality_violation |
| case |
| cbrt |
| ceil |
| ceiling |
| char_length |
| character_length |
| chr |
| coalesce |
| collect_list |
| collect_set |
| compute_stats |
| concat |
| concat_ws |
| context_ngrams |
| conv |
| corr |
| cos |
| count |
| covar_pop |
| covar_samp |
| crc32 |
| create_union |
| cume_dist |
| current_authorizer |
| current_database |
| current_date |
| current_groups |
| current_timestamp |
| current_user |
| date_add |
| date_format |
| date_sub |
| datediff |
| day |
| dayofmonth |
| dayofweek |
| decode |
| degrees |
| dense_rank |
| div |
| e |
| elt |
| encode |
| enforce_constraint |
| exp |
| explode |
| extract_union |
| factorial |
| field |
| find_in_set |
| first_value |
| floor |
| floor_day |
| floor_hour |
| floor_minute |
| floor_month |
| floor_quarter |
| floor_second |
| floor_week |
| floor_year |
| format_number |
| from_unixtime |
+------------------------------+
| tab_name |
+------------------------------+
| from_utc_timestamp |
| get_json_object |
| get_splits |
| greatest |
| grouping |
| hash |
| hex |
| histogram_numeric |
| hour |
| if |
| in |
| in_bloom_filter |
| in_file |
| index |
| initcap |
| inline |
| instr |
| internal_interval |
| isfalse |
| isnotfalse |
| isnotnull |
| isnottrue |
| isnull |
| istrue |
| java_method |
| json_tuple |
| lag |
| last_day |
| last_value |
| lcase |
| lead |
| least |
| length |
| levenshtein |
| like |
| likeall |
| likeany |
| ln |
| locate |
| log |
| log10 |
| log2 |
| logged_in_user |
| lower |
| lpad |
| ltrim |
| map |
| map_keys |
| map_values |
| mask |
| mask_first_n |
| mask_hash |
| mask_last_n |
| mask_show_first_n |
| mask_show_last_n |
| matchpath |
| max |
| md5 |
| min |
| minute |
| mod |
| month |
| months_between |
| murmur_hash |
| named_struct |
| negative |
| next_day |
| ngrams |
| noop |
| noopstreaming |
| noopwithmap |
| noopwithmapstreaming |
| not |
| ntile |
| nullif |
| nvl |
| octet_length |
| or |
| parse_url |
| parse_url_tuple |
| percent_rank |
| percentile |
| percentile_approx |
| pi |
| pmod |
| posexplode |
| positive |
| pow |
| power |
| printf |
| quarter |
| radians |
| rand |
| rank |
| reflect |
| reflect2 |
| regexp |
| regexp_extract |
| regexp_replace |
| regr_avgx |
+------------------------------+
| tab_name |
+------------------------------+
| regr_avgy |
| regr_count |
| regr_intercept |
| regr_r2 |
| regr_slope |
| regr_sxx |
| regr_sxy |
| regr_syy |
| repeat |
| replace |
| replicate_rows |
| restrict_information_schema |
| reverse |
| rlike |
| round |
| row_number |
| rpad |
| rtrim |
| second |
| sentences |
| sha |
| sha1 |
| sha2 |
| shiftleft |
| shiftright |
| shiftrightunsigned |
| sign |
| sin |
| size |
| sort_array |
| sort_array_by |
| soundex |
| space |
| split |
| sq_count_check |
| sqrt |
| stack |
| std |
| stddev |
| stddev_pop |
| stddev_samp |
| str_to_map |
| struct |
| substr |
| substring |
| substring_index |
| sum |
| tan |
| to_date |
| to_epoch_milli |
| to_unix_timestamp |
| to_utc_timestamp |
| translate |
| trim |
| trunc |
| ucase |
| udftoboolean |
| udftobyte |
| udftodouble |
| udftofloat |
| udftointeger |
| udftolong |
| udftoshort |
| udftostring |
| unbase64 |
| unhex |
| unix_timestamp |
| upper |
| uuid |
| var_pop |
| var_samp |
| variance |
| version |
| weekofyear |
| when |
| width_bucket |
| windowingtablefunction |
| xpath |
| xpath_boolean |
| xpath_double |
| xpath_float |
| xpath_int |
| xpath_long |
| xpath_number |
| xpath_short |
| xpath_string |
| year |
| | |
| ~ |
+------------------------------+