1
2
3
%load_ext sql
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
%sql sqlite:///cs61a.db
Aggregation
- Aggregation functions can be used to perform aggregation over multiple rows.
- Simple
SELECT
statements contain column expressions that actually refer to the value of the column in each row of the table.SELECT [columns] FROM [table] WHERE [expression] ORDER BY [expression]
- In fact, nearly everything we’ve been performing with
SELECT
has been of the value of the column in a given row. - The SELECT expression is evaluated per row.
- An aggregate function in the
[columns]
clause computes a value from a group of rows. - Ex: Getting the max value of a column.
1
2
3
4
5
6
7
8
9
10
%%sql
CREATE TABLE animals AS
SELECT "dog" AS kind, 4 AS legs, 20 AS weight UNION
SELECT "cat", 4, 10 UNION
SELECT "ferret", 4, 10 UNION
SELECT "parrot", 2, 6 UNION
SELECT "penguin", 2, 10 UNION
SELECT "t-rex", 2, 12000;
SELECT MAX(legs) FROM animals;
1
2
3
* sqlite:///cs61a.db
Done.
Done.
MAX(legs) |
---|
4 |
- Ex: Select the sum of the weight of every animal
1
2
%%sql
SELECT SUM(weight) FROM animals;
1
2
* sqlite:///cs61a.db
Done.
SUM(weight) |
---|
12056 |
- Aggregations may also be combined together
- Ex: Max weight and min weight.
1
2
%%sql
SELECT MAX(weight) - MIN(weight) AS difference FROM animals;
1
2
* sqlite:///cs61a.db
Done.
difference |
---|
11994 |
- Aggregations can also be used with
WHERE
clauses - Ex: Exclude t-rex when calculating max weight difference.
1
2
%%sql
SELECT MAX(weight) - MIN(weight) AS difference FROM animals WHERE kind!="t-rex";
1
2
* sqlite:///cs61a.db
Done.
difference |
---|
14 |
- Ex: Average number of legs (
AVG
)
1
2
%%sql
SELECT AVG(legs) AS average_legs FROM animals;
1
2
* sqlite:///cs61a.db
Done.
average_legs |
---|
3.0 |
- Ex: Get number of rows (
COUNT
)- We would use a particular column name in side count if we wanted a distinct count of each value within the column. (‘DISTINCT’)
DISTINCT
may also be applied over other aggregation functions within a column:
1
2
%%sql
SELECT COUNT(*) AS num_of_rows, COUNT(DISTINCT legs) AS num_of_unique_legs, COUNT(DISTINCT weight) AS num_of_unique_weights FROM animals;
1
2
* sqlite:///cs61a.db
Done.
num_of_rows | num_of_unique_legs | num_of_unique_weights |
---|---|---|
6 | 2 | 4 |
1
2
%%sql
SELECT SUM(DISTINCT weight) AS sum_distinct_weight FROM animals;
1
2
* sqlite:///cs61a.db
Done.
sum_distinct_weight |
---|
12036 |
Mixing Aggregate Functions and Single Values
- An aggregate function really selects a row in the table and also aggregate the value provided.
- Thus, whenever we use an aggregate function, we may retrieve a row that provides us with more information about that particular aggregate value.
- This may or may not be useful. values may or may not be meaningful.
1
2
%%sql
SELECT MAX(weight), kind, legs FROM animals;
1
2
* sqlite:///cs61a.db
Done.
MAX(weight) | kind | legs |
---|---|---|
12000 | t-rex | 2 |
1
2
%%sql
SELECT MIN(weight), kind, legs FROM animals;
1
2
* sqlite:///cs61a.db
Done.
MIN(weight) | kind | legs |
---|---|---|
6 | parrot | 2 |
1
2
%%sql
SELECT MIN(kind), weight, legs FROM animals;
1
2
* sqlite:///cs61a.db
Done.
MIN(kind) | weight | legs |
---|---|---|
cat | 10 | 4 |
1
2
%%sql
SELECT MAX(legs), weight, kind FROM animals;
1
2
* sqlite:///cs61a.db
Done.
MAX(legs) | weight | kind |
---|---|---|
4 | 10 | cat |
- This does not give us a meaningful value, as there are many animals with the maximum number of legs.
1
2
%%sql
SELECT AVG(weight), kind FROM animals;
1
2
* sqlite:///cs61a.db
Done.
AVG(weight) | kind |
---|---|
2009.3333333333333 | cat |
- This does not give us a meaningful value, as there are no animals with the average weight.
Groups
- By default, aggregate functions use all rows of the table in one big group to compute the value.
- Thus the result of the aggregate function only has one row.
- Select statements can define multiple groups.
- Rows in a table can be grouped, and aggregation is performed on each group:
1
SELECT [columns] FROM [table] GROUP BY [expression] HAVING [expression];
- The number of groups is the number of unique values of an expression
- Find the max weight for animals with each number of legs.
1
2
%%sql
SELECT legs, MAX(weight) FROM animals GROUP BY legs
1
2
* sqlite:///cs61a.db
Done.
legs | MAX(weight) |
---|---|
2 | 12000 |
4 | 20 |
- Ex: Select the number of legs from each group in animals where we group by legs.
- This retrieves the unique values of legs within the table
1
2
%%sql
SELECT legs FROM animals GROUP BY legs
1
2
* sqlite:///cs61a.db
Done.
legs |
---|
2 |
4 |
- Ex: Compute the number of rows that exists for each group of legs
1
2
%%sql
SELECT legs, COUNT(*) AS num_of_rows FROM animals GROUP BY legs
1
2
* sqlite:///cs61a.db
Done.
legs | num_of_rows |
---|---|
2 | 3 |
4 | 3 |
- We may also group over multiple columns at the same time.
- Ex: All unique combinations of legs and weight
1
2
%%sql
SELECT legs, weight FROM animals GROUP BY legs, weight;
1
2
* sqlite:///cs61a.db
Done.
legs | weight |
---|---|
2 | 6 |
2 | 10 |
2 | 12000 |
4 | 10 |
4 | 20 |
- Ex: Getting the lexicographically superior animal kind for every weignt to legs ratio
1
2
%%sql
SELECT max(kind) AS kind, weight/legs AS ratio FROM animals GROUP BY weight/legs;
1
2
* sqlite:///cs61a.db
Done.
kind | ratio |
---|---|
ferret | 2 |
parrot | 3 |
penguin | 5 |
t-rex | 6000 |
Selecting Groups
- Rows in a table can be grouped, and aggregatoin is performed on each group
SELECT [column] FROM [table] GROUP BY [expression] HAVING [expression];
- A
HAVING
clause would filter the set of groups that are aggregated.- This is different than a where clause, as we may include aggregation within the Having
- Ex: Getting the leg to weight ratio along with the number of animals that satisfy that ratio only if there are more than one such animal that satisfy the ratoi
1
2
%%sql
SELECT weight/legs AS ratio, count(*) AS count FROM animals GROUP BY weight/legs HAVING COUNT(*)>1;
1
2
* sqlite:///cs61a.db
Done.
ratio | count |
---|---|
2 | 2 |
5 | 2 |