Home CS61A: Aggregation
Post
Cancel

CS61A: Aggregation

1
2
3
%load_ext sql
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
%sql sqlite:///cs61a.db

Aggregation

  • Aggregation functions can be used to perform aggregation over multiple rows.
  • Simple SELECT statements contain column expressions that actually refer to the value of the column in each row of the table.
    • SELECT [columns] FROM [table] WHERE [expression] ORDER BY [expression]
    • In fact, nearly everything we’ve been performing with SELECT has been of the value of the column in a given row.
    • The SELECT expression is evaluated per row.
  • An aggregate function in the [columns] clause computes a value from a group of rows.
  • Ex: Getting the max value of a column.
1
2
3
4
5
6
7
8
9
10
%%sql
CREATE TABLE animals AS
    SELECT "dog" AS kind, 4 AS legs, 20 AS weight UNION
    SELECT "cat", 4, 10 UNION
    SELECT "ferret", 4, 10 UNION
    SELECT "parrot", 2, 6 UNION
    SELECT "penguin", 2, 10 UNION
    SELECT "t-rex", 2, 12000;

SELECT MAX(legs) FROM animals;
1
2
3
 * sqlite:///cs61a.db
Done.
Done.
MAX(legs)
4
  • Ex: Select the sum of the weight of every animal
1
2
%%sql
SELECT SUM(weight) FROM animals;
1
2
 * sqlite:///cs61a.db
Done.
SUM(weight)
12056
  • Aggregations may also be combined together
  • Ex: Max weight and min weight.
1
2
%%sql
SELECT MAX(weight) - MIN(weight) AS difference FROM animals;
1
2
 * sqlite:///cs61a.db
Done.
difference
11994
  • Aggregations can also be used with WHERE clauses
  • Ex: Exclude t-rex when calculating max weight difference.
1
2
%%sql
SELECT MAX(weight) - MIN(weight) AS difference FROM animals WHERE kind!="t-rex";
1
2
 * sqlite:///cs61a.db
Done.
difference
14
  • Ex: Average number of legs (AVG)
1
2
%%sql
SELECT AVG(legs) AS average_legs FROM animals;
1
2
 * sqlite:///cs61a.db
Done.
average_legs
3.0
  • Ex: Get number of rows (COUNT)
    • We would use a particular column name in side count if we wanted a distinct count of each value within the column. (‘DISTINCT’)
    • DISTINCT may also be applied over other aggregation functions within a column:
1
2
%%sql
SELECT COUNT(*) AS num_of_rows, COUNT(DISTINCT legs) AS num_of_unique_legs, COUNT(DISTINCT weight) AS num_of_unique_weights FROM animals;
1
2
 * sqlite:///cs61a.db
Done.
num_of_rowsnum_of_unique_legsnum_of_unique_weights
624
1
2
%%sql
SELECT SUM(DISTINCT weight) AS sum_distinct_weight FROM animals;
1
2
 * sqlite:///cs61a.db
Done.
sum_distinct_weight
12036

Mixing Aggregate Functions and Single Values

  • An aggregate function really selects a row in the table and also aggregate the value provided.
  • Thus, whenever we use an aggregate function, we may retrieve a row that provides us with more information about that particular aggregate value.
    • This may or may not be useful. values may or may not be meaningful.
1
2
%%sql
SELECT MAX(weight), kind, legs FROM animals;
1
2
 * sqlite:///cs61a.db
Done.
MAX(weight)kindlegs
12000t-rex2
1
2
%%sql
SELECT MIN(weight), kind, legs FROM animals;
1
2
 * sqlite:///cs61a.db
Done.
MIN(weight)kindlegs
6parrot2
1
2
%%sql
SELECT MIN(kind), weight, legs FROM animals;
1
2
 * sqlite:///cs61a.db
Done.
MIN(kind)weightlegs
cat104
1
2
%%sql
SELECT MAX(legs), weight, kind FROM animals;
1
2
 * sqlite:///cs61a.db
Done.
MAX(legs)weightkind
410cat
  • This does not give us a meaningful value, as there are many animals with the maximum number of legs.
1
2
%%sql
SELECT AVG(weight), kind FROM animals;
1
2
 * sqlite:///cs61a.db
Done.
AVG(weight)kind
2009.3333333333333cat
  • This does not give us a meaningful value, as there are no animals with the average weight.

Groups

  • By default, aggregate functions use all rows of the table in one big group to compute the value.
    • Thus the result of the aggregate function only has one row.
  • Select statements can define multiple groups.
    • Rows in a table can be grouped, and aggregation is performed on each group:
1
SELECT [columns] FROM [table] GROUP BY [expression] HAVING [expression];
  • The number of groups is the number of unique values of an expression
    • Find the max weight for animals with each number of legs.
1
2
%%sql
SELECT legs, MAX(weight) FROM animals GROUP BY legs
1
2
 * sqlite:///cs61a.db
Done.
legsMAX(weight)
212000
420
  • Ex: Select the number of legs from each group in animals where we group by legs.
    • This retrieves the unique values of legs within the table
1
2
%%sql
SELECT legs FROM animals GROUP BY legs
1
2
 * sqlite:///cs61a.db
Done.
legs
2
4
  • Ex: Compute the number of rows that exists for each group of legs
1
2
%%sql
SELECT legs, COUNT(*) AS num_of_rows FROM animals GROUP BY legs
1
2
 * sqlite:///cs61a.db
Done.
legsnum_of_rows
23
43
  • We may also group over multiple columns at the same time.
    • Ex: All unique combinations of legs and weight
1
2
%%sql
SELECT legs, weight FROM animals GROUP BY legs, weight;
1
2
 * sqlite:///cs61a.db
Done.
legsweight
26
210
212000
410
420
  • Ex: Getting the lexicographically superior animal kind for every weignt to legs ratio
1
2
%%sql
SELECT max(kind) AS kind, weight/legs AS ratio  FROM animals GROUP BY weight/legs;
1
2
 * sqlite:///cs61a.db
Done.
kindratio
ferret2
parrot3
penguin5
t-rex6000

Selecting Groups

  • Rows in a table can be grouped, and aggregatoin is performed on each group
    • SELECT [column] FROM [table] GROUP BY [expression] HAVING [expression];
  • A HAVING clause would filter the set of groups that are aggregated.
    • This is different than a where clause, as we may include aggregation within the Having
  • Ex: Getting the leg to weight ratio along with the number of animals that satisfy that ratio only if there are more than one such animal that satisfy the ratoi
1
2
%%sql
SELECT weight/legs AS ratio, count(*) AS count FROM animals GROUP BY weight/legs HAVING COUNT(*)>1;
1
2
 * sqlite:///cs61a.db
Done.
ratiocount
22
52
This post is licensed under CC BY 4.0 by the author.

CS61A: Tables

CS61A: Lecture 33