Algorithms to Live By, Christian



Computers are smart sometimes, but not smart other times. People are smart sometimes, but not smart other times. How can you get the best of both worlds?

Notes

Introduction

The broad definition of an algorithm: a finite sequence of steps used to solve a problem. Baking a cake is following an algorithm.

Some people reject the idea that we should let computer science dictate our own decision making since it doesn’t feel very human. If algorithms have been part of life for all of human history then why is it so bad?

Computer Science lets you make provably optimal solutions to problems.

Optimal Stopping

Optimal stopping is about deciding on the right time to take a given action to balance what you’ve already found out and the potential to get better opportunities in the future.

It’s “committing to or forgoing a succession of options”.

  • Finding a secretary: How many secretaries should you consider before picking the next best one?
  • Buying a house: How long should you spend looking at houses before buying the one that beats all previous competition?
  • Selling a house: What offer should you accept?
  • Finding a partner: How many partners should you reject before accepting one?
  • Car parking: When should you stop looking for a parking place and just go for one?
  • Quitting: When should you “quit while you’re ahead”?

Solutions

  • Look-Then-Leap: Survey 37% from the pool before choosing the next best one.
  • Threshold: Always accept an applicant whose performance is above a certain percentile. This percentile can reduce as you work through the pool.

Real-Life

A lot of the solutions to these problems come from abstract descriptions and take on several assumptions. In studies people often stop surveying before the mathematically optimal time to do so. This could be because time is the real resource and every moment spent deliberating is time that is being wasted.

Exploit/Explore

The Exploit/Explore tradeoff is about deciding whether we should try new things or stick with our favourite ones.

Exploration is gathering information, and exploitation is using the information that you already have to get a good result.

  • Eating out: Should you go to the Italian restaurant that you know and love, or the new place that just opened up?
  • Listening to music: Should you put on an old favourite or listen to something new?
  • Selecting slot machines: Should you try gather information about a new slot machine or use one who’s payoff you know?
  • Selecting prices: Should you explore changing the price of a product or just stick with the one that’s given the most profit so far?

The value of exploitation can only increase, since the best option you have right now is at least as good as the best option you had a month ago.

The value of exploration can only decrease, since the remaining opportunities to try something go down over time.

Sergeant?

This almost perfectly describes the ideal “difficulties” view of [[Sergeant, applying spaced application to A-levels]]B – an exploit/explore tradeoff.

  • Exploit: Using the already gathered information to choose the topic that the computer thinks will be the most difficult.
  • Explore: Trying a topic which hasn’t been quizzed on much yet.

What made this apparent is the idealised “multi-armed bandit” problem which is about maximising payoff by trying out different the slot machines and considering the ones that will give the largest reward. It’s a little bit backwards for Sergeant; the questions we’re aiming for don’t have the highest reward, but rather the largest chance of failure – unless you find learning by failure an intrinsically rewarding experience.

Potential Algorithms

  • Win-Stay, Lose-Shift: Stick with one slot machine while it’s doing well, and switch as soon as you start losing from it. It seems like a good choice but it has no concept of the interval in which we’re optimising over.
  • Gittin’s index: Used when there’s an endless amount of time ahead, but the value of now is larger than the value of then, e.g. the same reward now is worth 10% more than it will be later. A value is computed that combines the wins and losses of each machine seperately and the one with the highest index is chosen.



Related posts