这里和pandas差不多,用的是 groupby子句

mysql> select vend_id,count(*) from products group by vend_id;

+---------+----------+

| vend_id | count(*) |

+---------+----------+

|    1001 |        3 |

|    1002 |        2 |

|    1003 |        7 |

|    1005 |        2 |

+---------+----------+

4 rows in set (0.02 sec)

上面代码中 groupbyvend_id子句按照vend_id进行分组, selectvend\_id,count(*)则返回vend_id,而计数函数则是对每个分组进行计算。

在分组中,我们也使用 having可添加过滤条件

mysql> select vend_id,count(*) as title from products group by vend_id having count(*) > 2;     #只返回行数大于2的分组

+---------+-------+

| vend_id | title |

+---------+-------+

|    1001 |     3 |

|    1003 |     7 |

+---------+-------+

2 rows in set (0.01 sec)

需要注意的就是 where和 having的区别,where是对行的过滤,比如我们制定只返回vend_id=1001的数据,我们筛选的行,但是having是对分组的筛选,比如上面的只返回行数大于2的分组。另外我们也可以理解为where是在数据分组前进行过滤,而having是在数据分组后进行的筛选。

限免来看下同时使用where和having的例子;

mysql> select vend_id,prod_price from products;

+---------+------------+

| vend_id | prod_price |

+---------+------------+

+---------+------------+

14 rows in set (0.00 sec)

mysql> select vend_id,prod_price from products where prod_price > 5;

+---------+------------+

| vend_id | prod_price |

+---------+------------+

+---------+------------+

10 rows in set (0.00 sec)

mysql> select vend_id,count(*) from products where prod_price > 5 group by vend_id;

+---------+----------+

| vend_id | count(*) |

+---------+----------+

|    1001 |        3 |

|    1002 |        1 |

|    1003 |        4 |

|    1005 |        2 |

+---------+----------+

4 rows in set (0.00 sec)

mysql> select vend_id,count(*) from products where prod_price > 5 group by vend_id having count(*) > 2;

+---------+----------+

| vend_id | count(*) |

+---------+----------+

|    1001 |        3 |

|    1003 |        4 |

+---------+----------+

2 rows in set (0.00 sec)

另外需要注意的一点是,虽然group by子句返回的数据确实是以分组顺序输出的,但是一般在使用group by子句的时候,也应该给出order by子句,不要依赖group by的排序,容易出错。

mysql> select order_num,quantity,item_price from orderitems;

+-----------+----------+------------+

| order_num | quantity | item_price |

+-----------+----------+------------+

+-----------+----------+------------+

7 rows in set (0.00 sec)

mysql> select order_num,sum(quantity*item_price) from orderitems group by order_num;

+-----------+--------------------------+

| order_num | sum(quantity*item_price) |

+-----------+--------------------------+

+-----------+--------------------------+

4 rows in set (0.00 sec)

mysql> select order_num,sum(quantity*item_price) as ordertotal from orderitems group by order_num having sum(quantity*item_price) > 50 order by ordertotal desc;

+-----------+------------+

| order_num | ordertotal |

+-----------+------------+

+-----------+------------+

3 rows in set (0.01 sec)

子查询其实就是嵌套,一个select语句中嵌套一个select语句。

mysql> select order_num from orderitems where prod_id = 'TNT2';

+-----------+

| order_num |

+-----------+

|     20007 |

+-----------+

1 row in set (0.00 sec)

mysql> select cust_id from orders where order_num = 20007;

+---------+

| cust_id |

+---------+

|   10004 |

+---------+

1 row in set (0.00 sec)

mysql> select cust_name,cust_contact from customers where cust_id = 10004;

+----------------+--------------+

| cust_name      | cust_contact |

+----------------+--------------+

| Yosemite Place | Y Sam        |

+----------------+--------------+

1 row in set (0.00 sec)

如果我们用子查询,也就是嵌套的方式,就是

mysql> select cust_name,cust_contact from customers where cust_id in (select

-> cust_id from orders where order_num in (select

-> order_num from orderitems where prod_id = 'TNT2'));

+----------------+--------------+

| cust_name      | cust_contact |

+----------------+--------------+

| Yosemite Place | Y Sam        |

+----------------+--------------+

1 row in set (0.01 sec)

这一步应该还是比较好理解的,下面我们更进一步,我们想返回vendors表中的供应商名字,id并且还想返回这些供应商分别提供了多少种产品,多少中产品这个我们可以通过products表得到。

mysql> select vend_id,vend_name from vendors;

+---------+----------------+

| vend_id | vend_name      |

+---------+----------------+

|    1001 | Anvils R Us    |

|    1002 | LT Supplies    |

|    1003 | ACME           |

|    1004 | Furball Inc.   |

|    1005 | Jet Set        |

|    1006 | Jouets Et Ours |

+---------+----------------+

6 rows in set (0.00 sec)

mysql> select vend_id,prod_name from products;

+---------+----------------+

| vend_id | prod_name      |

+---------+----------------+

|    1001 | .5 ton anvil   |

|    1001 | 1 ton anvil    |

|    1001 | 2 ton anvil    |

|    1003 | Detonator      |

|    1003 | Bird seed      |

|    1003 | Carrots        |

|    1002 | Fuses          |

|    1005 | JetPack 1000   |

|    1005 | JetPack 2000   |

|    1002 | Oil can        |

|    1003 | Safe           |

|    1003 | Sling          |

|    1003 | TNT (1 stick)  |

|    1003 | TNT (5 sticks) |

+---------+----------------+

14 rows in set (0.00 sec)

这里也需要使用子查询方法:

mysql> select vend_id,vend_name,(select

-> count(*) from products where products.vend_id = vendors.vend_id) as prod_total

-> from vendors order by vend_id;

+---------+----------------+------------+

| vend_id | vend_name      | prod_total |

+---------+----------------+------------+

|    1001 | Anvils R Us    |          3 |

|    1002 | LT Supplies    |          2 |

|    1003 | ACME           |          7 |

|    1004 | Furball Inc.   |          0 |

|    1005 | Jet Set        |          2 |

|    1006 | Jouets Et Ours |          0 |

+---------+----------------+------------+

6 rows in set (0.00 sec)

mysql是一种关系型数据库,这里要说下主键和外键的概念,其实很好理解,主键是一个表中作为标识的一列,每一行数据都会有一个唯一的主键。外键是表中的另一列,它是另一个表的主键,那么两个表中就会通过这个外键相连接。

比如说vendors表中,vend_id就是它的主键,而在products表中,vend_id就是它的外键,这样这两个表就进行了联结。

创建联结很简单,规定要联结的表以及他们如何关联即可

mysql> select vend_name,prod_name,prod_price from vendors,products where vendors.vend_id = products.vend_id;

+-------------+----------------+------------+

| vend_name   | prod_name      | prod_price |

+-------------+----------------+------------+

| Anvils R Us | .5 ton anvil   |       5.99 |

| Anvils R Us | 1 ton anvil    |       9.99 |

| Anvils R Us | 2 ton anvil    |      14.99 |

| LT Supplies | Fuses          |       3.42 |

| LT Supplies | Oil can        |       8.99 |

| ACME        | Detonator      |      13.00 |

| ACME        | Bird seed      |      10.00 |

| ACME        | Carrots        |       2.50 |

| ACME        | Safe           |      50.00 |

| ACME        | Sling          |       4.49 |

| ACME        | TNT (1 stick)  |       2.50 |

| ACME        | TNT (5 sticks) |      10.00 |

| Jet Set     | JetPack 1000   |      35.00 |

| Jet Set     | JetPack 2000   |      55.00 |

+-------------+----------------+------------+

14 rows in set (0.00 sec)

因为vend_id在vendors和products两个表中都有,所有上面的where子句需要限定列名。另外需要注意的是所有的联结都必须要有where或者后面的on子句,不然返回的就是笛卡尔积了。

我们也可以联结多个表

mysql> select cust_name,cust_contact from customers,orders,orderitems where customers.cust_id = orders.cust_id and orders.order_num = orderitems.order_num and orderitems.prod_id = 'TNT2';

+----------------+--------------+

| cust_name      | cust_contact |

+----------------+--------------+

| Yosemite Place | Y Sam        |

+----------------+--------------+

1 row in set (0.00 sec)

联结的化建议使用下面这种写法,一是指定联结类型,二是更加规范

mysql> select vend_name,prod_name,prod_price from vendors inner join products on vendors.vend_id = products.vend_id;

+-------------+----------------+------------+

| vend_name   | prod_name      | prod_price |

+-------------+----------------+------------+

| Anvils R Us | .5 ton anvil   |       5.99 |

| Anvils R Us | 1 ton anvil    |       9.99 |

| Anvils R Us | 2 ton anvil    |      14.99 |

| LT Supplies | Fuses          |       3.42 |

| LT Supplies | Oil can        |       8.99 |

| ACME        | Detonator      |      13.00 |

| ACME        | Bird seed      |      10.00 |

| ACME        | Carrots        |       2.50 |

| ACME        | Safe           |      50.00 |

| ACME        | Sling          |       4.49 |

| ACME        | TNT (1 stick)  |       2.50 |

| ACME        | TNT (5 sticks) |      10.00 |

| Jet Set     | JetPack 1000   |      35.00 |

| Jet Set     | JetPack 2000   |      55.00 |

+-------------+----------------+------------+

14 rows in set (0.00 sec)

就是用inner join指定联结类型,用on关键字过滤条件。和上面的from...where...子句是一样的。

这里的inner join 在mysql中叫内部联结,就是pandas中的pd.merge()函数中的innner,是一样的。后面还会有left join,right join,full join,在mysql中叫外部联结,也是和pd.merge()中的left,right,outer是一样的。

另外还有一种叫自联结,就比如我们想在products表中找到 prod_name='TNT2'的vend_id,然后在products表中找出这个供应商生产的所有产品。

mysql> select a.vend_id,a.prod_name from products as a,products as b where a.vend_id = b.vend_id and b.prod_name = 'safe';

+---------+----------------+

| vend_id | prod_name      |

+---------+----------------+

|    1003 | Detonator      |

|    1003 | Bird seed      |

|    1003 | Carrots        |

|    1003 | Safe           |

|    1003 | Sling          |

|    1003 | TNT (1 stick)  |

|    1003 | TNT (5 sticks) |

+---------+----------------+

7 rows in set (0.00 sec)

外部联结,就是上面说的left join,right join,full join。这里和pandas一样看个例子就知道,由于下面例子中right join,full join,inner join返回的结果是一样的就不演示了。

mysql> select vend_id,vend_name from vendors;

+---------+----------------+

| vend_id | vend_name      |

+---------+----------------+

|    1001 | Anvils R Us    |

|    1002 | LT Supplies    |

|    1003 | ACME           |

|    1004 | Furball Inc.   |

|    1005 | Jet Set        |

|    1006 | Jouets Et Ours |

+---------+----------------+

6 rows in set (0.00 sec)

mysql> select prod_name,vend_id,prod_price from products;

+----------------+---------+------------+

| prod_name      | vend_id | prod_price |

+----------------+---------+------------+

| .5 ton anvil   |    1001 |       5.99 |

| 1 ton anvil    |    1001 |       9.99 |

| 2 ton anvil    |    1001 |      14.99 |

| Detonator      |    1003 |      13.00 |

| Bird seed      |    1003 |      10.00 |

| Carrots        |    1003 |       2.50 |

| Fuses          |    1002 |       3.42 |

| JetPack 1000   |    1005 |      35.00 |

| JetPack 2000   |    1005 |      55.00 |

| Oil can        |    1002 |       8.99 |

| Safe           |    1003 |      50.00 |

| Sling          |    1003 |       4.49 |

| TNT (1 stick)  |    1003 |       2.50 |

| TNT (5 sticks) |    1003 |      10.00 |

+----------------+---------+------------+

14 rows in set (0.00 sec)

mysql> select vendors.vend_id,products.prod_name from vendors inner join products on vendors.vend_id = products.vend_id;

+---------+----------------+

| vend_id | prod_name      |

+---------+----------------+

|    1001 | .5 ton anvil   |

|    1001 | 1 ton anvil    |

|    1001 | 2 ton anvil    |

|    1002 | Fuses          |

|    1002 | Oil can        |

|    1003 | Detonator      |

|    1003 | Bird seed      |

|    1003 | Carrots        |

|    1003 | Safe           |

|    1003 | Sling          |

|    1003 | TNT (1 stick)  |

|    1003 | TNT (5 sticks) |

|    1005 | JetPack 1000   |

|    1005 | JetPack 2000   |

+---------+----------------+

14 rows in set (0.00 sec)

mysql> select vendors.vend_id,products.prod_name from vendors left join products on vendors.vend_id = products.vend_id;

+---------+----------------+

| vend_id | prod_name      |

+---------+----------------+

|    1001 | .5 ton anvil   |

|    1001 | 1 ton anvil    |

|    1001 | 2 ton anvil    |

|    1002 | Fuses          |

|    1002 | Oil can        |

|    1003 | Detonator      |

|    1003 | Bird seed      |

|    1003 | Carrots        |

|    1003 | Safe           |

|    1003 | Sling          |

|    1003 | TNT (1 stick)  |

|    1003 | TNT (5 sticks) |

|    1004 | NULL           |

|    1005 | JetPack 1000   |

|    1005 | JetPack 2000   |

|    1006 | NULL           |

+---------+----------------+

16 rows in set (0.00 sec)

还可以将外部联结和聚集函数count()以及分组数据关键词group by结合在一起使用。