Measuring Efficiency
- Efficiency is the measure of how long our program would take to run.
- Ex: The recursive computation of the fibonacci sequence
1
2
3
4
5
6
7
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-2) + fib(n-1)
- For tree recursive algorithms, a way to measure the efficiency is to see how many function calls our program requires to compute the final return value.
- Ex: fib(5) requires 15 function calls.
- We may implement a way to count how many calls fib requires.
1
2
3
4
5
6
7
8
9
10
11
12
13
def count(f):
def counted(n):
counted.call_count += 1
return f(n)
counted.call_count = 0
return counted
fib = count(fib)
fib(5)
print("Number of calls for fib(5):", fib.call_count)
fib.call_count = 0
fib(30)
print("Number of calls for fib(30):", fib.call_count)
1
2
Number of calls for fib(5): 15
Number of calls for fib(30): 2692537
- From this experiment, we can see that the jump in funtion calls from fib(5) to f(30) is very dramatic
- The reason for this is we recompute the same values many times over.
- We can speed this up if we didn’t perform so many redundant calculations!
Memoization
- Memoization: A technique to speed up the running time of a program by storing/remembering the values that have been computed before.
- We may create another decorator function
memo
to memoize the results of our recursive function to make it more efficient
1
2
3
4
5
6
7
def memo(f):
cache = {} # Keys of the cache are arguments that map to return values
def memoized(n):
if n not in cache:
cache[n] = f(n)
return cache[n]
return memoized # Same behavior as f if f is a pure function
- Our decorator works properly only for pure functions. If we had a non-pure function, then our decorator would not always execute our function.
- Only pure functions may be memoized and still have the behavior be the same, as the total number of function executions are changed.
1
2
3
4
5
6
7
8
9
10
11
12
13
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-2) + fib(n-1)
fib = count(fib)
counted_fib = fib
fib = memo(fib)
fib = count(fib)
1
2
3
fib(30)
print(fib.call_count)
print(counted_fib.call_count)
1
2
59
31
- The process for the Memoized function is as below:
- For each value of n,
fib(n)
is called exactly once, and their value in the cache is accessed exactly once, except for n and n-1.
Exponentiation
- Lets write an exponentiation function in two ways
- Goal: One more multiplication lets us double the problem size, we may cover twice as many exponents with every increase in function calls.
- Method 1: Incremental recursive definition of recursion \(b^n = \left\{ \begin{array}{cl} 1 & : \ \text{if } n = 0 \\ b \cdot b^{n-1} & : \ \text{otherwise} \end{array} \right.\)
1
2
3
4
5
def exp(b, n):
if n == 0:
return 1
else:
return b * exp(b, n-1)
- Method 1 does not achieve our goal for efficiency, as every increase in the exponent results in an additional function call that we have to perform.
- Method 2: Split by half every time \(b^{n} = \left\{ \begin{array}{cl} 1 & : \ \text{if } n = 0 \\ (b^{\frac{1}{2}n})^{2} & : \ \text{if n is even} \\ b \cdot b^{n-1} & : \ \text{if n is odd} \end{array} \right.\)
1
2
3
4
5
6
7
8
9
def exp_fast(b, n):
square = lambda x: x * x
if n == 0:
return 1
elif n % 2 == 0:
return square(exp_fast(b, n//2))
# return exp(b, n//2) * exp(b, n//2) DO NOT DO THIS, THIS IS INEFFICIENT
elif n % 2 == 1:
return b * exp_fast(b, n-1)
exp_fast
is an example of linear recursion, as only one function is called at each step of recursion.- In the case that n is even,
exp_fast
reduces the problem size by half, which provides us with even greater efficiency.
Efficiency Analysis
- We may plot the efficiencies of both implementations:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rc('font', size=16)
from timeit import repeat
from numpy import median, percentile
def plot_times(name, xs, n=15):
f = lambda x: name + '(' + str(x) + ')'
g = globals()
samples = []
for _ in range(n):
times = lambda x: repeat(f(x), globals=g, number=1, repeat=n)
samples.append([median(times(x)) for x in xs])
ys = [10e3 * median(sample) for sample in zip(*samples)]
plt.figure(figsize=(8, 8))
plt.plot(xs, ys)
plt.xlabel('n')
plt.xlabel('miliseconds')
- The graph of the times for
exp
:
1
2
exp_2 = lambda n: exp(2, n)
plot_times('exp_2', range(20, 1600, 10))
- The graph of the times for
exp_fast
:
1
2
exp_fast_2 = lambda n: exp_fast(2, n)
plot_times('exp_fast_2', range(20, 1600, 10))
- By looking at the graph of runtime for
exp
, we observe that the implementation is linear in n. - By looking at the graph of runtime for
exp_fast
, we observe that the implementation is logarithmic in n.- We see that everytime we double the input, it only takes a constant amount of of extra work.
- This efficiency is due to us making recursive calls on n that are only half the size of each preceding call.
Comparison between linear and logarithmic time
Linear Time | Logarithmic Time |
---|---|
Doubling the input doubles the time | Doubling the input increases the time by a constant C |
1024x the input takes 1024x as much time | 1024x the input increases the time by only 10 times C |
Orders of Growth
- Orders of growth of a function’s execution time is a general category for the function.
- Functions of the same orders of growth scale similarily.
Exponential Time
- Ex: Tree-recursive functions take exponential time
1
2
3
4
5
def fib(n):
"""Compute the nth fibonacci number tree-recursively"""
if n < 2:
return n
return fib(n-1) + fib(n-2)
- We observe that the running time of
fib
scales exponentially comapred to the input size n.- This is because the fib function has to do around 60% more work to compute an n that is one larger than the previous term.
1
2
fibo_ranges = lambda n: fib(n)
plot_times('fibo_ranges', range(1, 30, 1))
Quadratic Time
- Ex: Functions that process all pairs of values in a sequence of length n takes quadratic time
1
2
3
4
5
6
7
8
9
10
def overlap(a, b):
"""Count the total number of overlaps between one list and another"""
count = 0
for i in a:
for j in b:
if i == j:
count+=1
return count
overlap([3, 5, 7, 6], [4, 5, 6, 5])
1
3
- We observe that the running time of
overlap
scales quadratically comapred to the input size n.
1
2
overlap_ranges = lambda n: overlap(list(range(n)), list(range(n)))
plot_times('overlap_ranges', range(20, 200, 10))
Common Orders of Growth
- Exponential growth:
- Ex: Recursive
fib
- Incrementing n multiplies time by a constant.
- Typically Tree recursive algorithms have this characteristic, unless we memoize.
- Time for input n+1: \(a \cdot b^{n+1} = (a\cdot b^{n})\cdot b\)
- Ex: Recursive
- Quadratic growth.
- Ex:
overlap
- Incrementing n increases time by n times a constant.
- The amount of time added at each increment gets bigger and bigger, but not as much as exponential.
- Time for input n+1: \(a \cdot (n+1)^2 = (a\cdot n^2) + a \cdot (2n+1)\)
- Ex:
- Linear growth.
- Ex: slow
exp
- Incrementing n increases time by a constant.
- This is typically what memoization helps us achieve.
- Time for input n+1: \(\)
- Ex: slow
- Logarithmic growth.
- Ex:
exp_fast
- Really, really useful is n is large.
- Doubling n only increments time by a constant.
- Ex:
- Constant growth.
- Ex: Looking up the value in a hashmap/dictionary
- Increasing n doesn’t affect time.