1
2
3
%load_ext sql
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
%sql sqlite:///cs61a.db
Aggregation
- Aggregation functions can be used to perform aggregation over multiple rows.
- Simple
SELECTstatements contain column expressions that actually refer to the value of the column in each row of the table.SELECT [columns] FROM [table] WHERE [expression] ORDER BY [expression]- In fact, nearly everything we’ve been performing with
SELECThas been of the value of the column in a given row. - The SELECT expression is evaluated per row.
- An aggregate function in the
[columns]clause computes a value from a group of rows. - Ex: Getting the max value of a column.
1
2
3
4
5
6
7
8
9
10
%%sql
CREATE TABLE animals AS
SELECT "dog" AS kind, 4 AS legs, 20 AS weight UNION
SELECT "cat", 4, 10 UNION
SELECT "ferret", 4, 10 UNION
SELECT "parrot", 2, 6 UNION
SELECT "penguin", 2, 10 UNION
SELECT "t-rex", 2, 12000;
SELECT MAX(legs) FROM animals;
1
2
3
* sqlite:///cs61a.db
Done.
Done.
| MAX(legs) |
|---|
| 4 |
- Ex: Select the sum of the weight of every animal
1
2
%%sql
SELECT SUM(weight) FROM animals;
1
2
* sqlite:///cs61a.db
Done.
| SUM(weight) |
|---|
| 12056 |
- Aggregations may also be combined together
- Ex: Max weight and min weight.
1
2
%%sql
SELECT MAX(weight) - MIN(weight) AS difference FROM animals;
1
2
* sqlite:///cs61a.db
Done.
| difference |
|---|
| 11994 |
- Aggregations can also be used with
WHEREclauses - Ex: Exclude t-rex when calculating max weight difference.
1
2
%%sql
SELECT MAX(weight) - MIN(weight) AS difference FROM animals WHERE kind!="t-rex";
1
2
* sqlite:///cs61a.db
Done.
| difference |
|---|
| 14 |
- Ex: Average number of legs (
AVG)
1
2
%%sql
SELECT AVG(legs) AS average_legs FROM animals;
1
2
* sqlite:///cs61a.db
Done.
| average_legs |
|---|
| 3.0 |
- Ex: Get number of rows (
COUNT)- We would use a particular column name in side count if we wanted a distinct count of each value within the column. (‘DISTINCT’)
DISTINCTmay also be applied over other aggregation functions within a column:
1
2
%%sql
SELECT COUNT(*) AS num_of_rows, COUNT(DISTINCT legs) AS num_of_unique_legs, COUNT(DISTINCT weight) AS num_of_unique_weights FROM animals;
1
2
* sqlite:///cs61a.db
Done.
| num_of_rows | num_of_unique_legs | num_of_unique_weights |
|---|---|---|
| 6 | 2 | 4 |
1
2
%%sql
SELECT SUM(DISTINCT weight) AS sum_distinct_weight FROM animals;
1
2
* sqlite:///cs61a.db
Done.
| sum_distinct_weight |
|---|
| 12036 |
Mixing Aggregate Functions and Single Values
- An aggregate function really selects a row in the table and also aggregate the value provided.
- Thus, whenever we use an aggregate function, we may retrieve a row that provides us with more information about that particular aggregate value.
- This may or may not be useful. values may or may not be meaningful.
1
2
%%sql
SELECT MAX(weight), kind, legs FROM animals;
1
2
* sqlite:///cs61a.db
Done.
| MAX(weight) | kind | legs |
|---|---|---|
| 12000 | t-rex | 2 |
1
2
%%sql
SELECT MIN(weight), kind, legs FROM animals;
1
2
* sqlite:///cs61a.db
Done.
| MIN(weight) | kind | legs |
|---|---|---|
| 6 | parrot | 2 |
1
2
%%sql
SELECT MIN(kind), weight, legs FROM animals;
1
2
* sqlite:///cs61a.db
Done.
| MIN(kind) | weight | legs |
|---|---|---|
| cat | 10 | 4 |
1
2
%%sql
SELECT MAX(legs), weight, kind FROM animals;
1
2
* sqlite:///cs61a.db
Done.
| MAX(legs) | weight | kind |
|---|---|---|
| 4 | 10 | cat |
- This does not give us a meaningful value, as there are many animals with the maximum number of legs.
1
2
%%sql
SELECT AVG(weight), kind FROM animals;
1
2
* sqlite:///cs61a.db
Done.
| AVG(weight) | kind |
|---|---|
| 2009.3333333333333 | cat |
- This does not give us a meaningful value, as there are no animals with the average weight.
Groups
- By default, aggregate functions use all rows of the table in one big group to compute the value.
- Thus the result of the aggregate function only has one row.
- Select statements can define multiple groups.
- Rows in a table can be grouped, and aggregation is performed on each group:
1
SELECT [columns] FROM [table] GROUP BY [expression] HAVING [expression];
- The number of groups is the number of unique values of an expression
- Find the max weight for animals with each number of legs.
1
2
%%sql
SELECT legs, MAX(weight) FROM animals GROUP BY legs
1
2
* sqlite:///cs61a.db
Done.
| legs | MAX(weight) |
|---|---|
| 2 | 12000 |
| 4 | 20 |
- Ex: Select the number of legs from each group in animals where we group by legs.
- This retrieves the unique values of legs within the table
1
2
%%sql
SELECT legs FROM animals GROUP BY legs
1
2
* sqlite:///cs61a.db
Done.
| legs |
|---|
| 2 |
| 4 |
- Ex: Compute the number of rows that exists for each group of legs
1
2
%%sql
SELECT legs, COUNT(*) AS num_of_rows FROM animals GROUP BY legs
1
2
* sqlite:///cs61a.db
Done.
| legs | num_of_rows |
|---|---|
| 2 | 3 |
| 4 | 3 |
- We may also group over multiple columns at the same time.
- Ex: All unique combinations of legs and weight
1
2
%%sql
SELECT legs, weight FROM animals GROUP BY legs, weight;
1
2
* sqlite:///cs61a.db
Done.
| legs | weight |
|---|---|
| 2 | 6 |
| 2 | 10 |
| 2 | 12000 |
| 4 | 10 |
| 4 | 20 |
- Ex: Getting the lexicographically superior animal kind for every weignt to legs ratio
1
2
%%sql
SELECT max(kind) AS kind, weight/legs AS ratio FROM animals GROUP BY weight/legs;
1
2
* sqlite:///cs61a.db
Done.
| kind | ratio |
|---|---|
| ferret | 2 |
| parrot | 3 |
| penguin | 5 |
| t-rex | 6000 |
Selecting Groups
- Rows in a table can be grouped, and aggregatoin is performed on each group
SELECT [column] FROM [table] GROUP BY [expression] HAVING [expression];
- A
HAVINGclause would filter the set of groups that are aggregated.- This is different than a where clause, as we may include aggregation within the Having
- Ex: Getting the leg to weight ratio along with the number of animals that satisfy that ratio only if there are more than one such animal that satisfy the ratoi
1
2
%%sql
SELECT weight/legs AS ratio, count(*) AS count FROM animals GROUP BY weight/legs HAVING COUNT(*)>1;
1
2
* sqlite:///cs61a.db
Done.
| ratio | count |
|---|---|
| 2 | 2 |
| 5 | 2 |