Tutorial 1
Contents
Tutorial 1#
Probability & Statistics I: Intro to Probability
[insert your name]
Important reminders: Before starting, click “File -> Save a copy in Drive”. Produce a pdf for submission by “File -> Print” and then choose “Save to PDF”.
Imports
# @markdown Imports
# Imports
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets # interactive display
import math
from scipy.stats import binom
Plotting functions
# @markdown Plotting functions
import numpy
from numpy.linalg import inv, eig
from math import ceil
from matplotlib import pyplot, ticker, get_backend, rc
from mpl_toolkits.mplot3d import Axes3D
from itertools import cycle
%config InlineBackend.figure_format = 'retina'
plt.style.use("https://raw.githubusercontent.com/NeuromatchAcademy/course-content/master/nma.mplstyle")
Exercise 1: Probability Type Review#
Let’s say we’re interested in two discrete random variables X and Y. X can take on one of 3 values (1, 2, or 3) while Y can take on one of two (1 or 2). The joint probability distribution is summarized in the table below:
X = 1 |
X = 2 |
X = 3 |
|
---|---|---|---|
Y = 1 |
0.2 |
0.1 |
0.15 |
Y = 2 |
0.1 |
0.05 |
0.4 |
A) P(Y)#
Compute the marginal distribution of Y (so p(Y = 1) and p(Y = 2))
Answer#
Fill in your answer here
B) P(X)#
Compute the marginal distribution of X
Answer#
Fill in your answer here
C) P(Y | X = 2)#
Compute the conditional probability of Y given that X = 2.
Answer#
Fill in your answer here
D) Independence#
Are Y and X independent or not?
Answer#
Fill in your answer here
Exercise 2: Two armed bandit task#
You are a mouse. A giant plonks you down in a weird apparatus each morning when you’d rather have been sleeping in. You consider biting the giant in retaliation but decide against it. Eventually, because you’re a smart mouse, you realize that you can poke a central port with your nose. After this, if you go the left or right, there are ports that sometimes give you sugar water when you lick (YUM). It seems like one of the sides (left or right) tends to give you the reward of sugar water more often. Annoyingly, the side that gives rewards more often seems to change every day!
You decide that each day you will visit each side (left and right) two times (a total of 4 trials). You will then pick the side that gave you the most rewards and keep visiting that one. If they give you equal numbers of rewards, you choose randomly.
What is the probability you are wrong (pick the righthand port) for a given day if, unbeknownst to you, the probability of the left port giving water is 0.8 and the right port is 0.3?
(Optional) A) Simulation#
Let’s first simulate this situation to get an answer (we’ll compute it analytically next). We can get a feel for what’s happening and check the computed answer in part B!
We want to code this scenario where the rewards on each of the 4 trials are randomly chosen given the probabilities for each side. Out of 100000 simulations of the 4 trials, compute how many times the mouse picks the right port vs the left port.
The relevant probability distribution is the binomial distribution (https://en.wikipedia.org/wiki/Binomial_distribution). Relevant code might be:
binom (from scipy.stats.binom): binom(n, p) is the binomial distribution with n visits and probability of reward p. You can use binom(n, p).pmf(k) to get the probability mass function or binom(n, p).cdf(k) to get the cumulative distributive function. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html
np.random.binomial(n, p) draws samples from the binomial distribution with n trials and probability p. https://numpy.org/doc/stable/reference/random/generated/numpy.random.binomial.html
left_prob = 0.8
right_prob = 0.3
n_visits_per_side = 2
# your simulation code here
B) Analytical Computation#
Now we will actually analytically compute the probability that you are wrong. There are hints you can uncover as needed but try to work through things as much as possible before revealing each hint.
You have to put every thing together by hand but you can use binom(2, 0.8).pmf(k)
to compute bimomial probabilities. This example gives the probability of getting k rewards from the left side with 2 trials (where k is either 0, 1, or 2)
binom(2, 0.8).pmf(0)
0.03999999999999998
Answer#
Fill in the probability of being wrong here so it is green. (show your work above or below though!)
Hint 1
# @markdown Hint 1
"""
Think through each event where you would be wrong
(what the number of rewards on the left side vs the
right side would be).
Brute force this - write out every single possibility.
"""
'\nThink through each event where you would be wrong \n(what the number of rewards on the left side vs the \nright side would be). \n\nBrute force this - write out every single possibility.\n'
Hint 2
# @markdown Hint 2
"""
How would you compute the total probability that you're
wrong based on the probability of each of these events
occuring? You may want to treat ties separately
for ease. How would you compute the probability
of each event?
"""
"\nHow would you compute the total probability that you're \nwrong based on the probability of each of these events\noccuring? You may want to treat ties separately\nfor ease. How would you compute the probability\nof each event? \n"
Hint 3
# @markdown Hint 3
"""
P(wrong) = P(fewer rewards on left side) + 0.5P(tie)
"""
'\nP(wrong) = P(fewer rewards on left side) + 0.5P(tie)\n'
Hint 4
# @markdown Hint 4
"""
The probability of either event A or event B occuring
is P(A or B) = P(A) + P(B) if they are mutually exclusive
(they can't both happen at once). This should help
answer the first question in hint 2
"""
"\nThe probability of either event A or event B occuring\nis P(A or B) = P(A) + P(B) if they are mutually exclusive \n(they can't both happen at once). This should help\nanswer the first question in hint 2\n"
Hint 5
# @markdown Hint 5
"""
The probability of event A and B occuring if they are
independent is P(A and B) = P(A)*P(B). This should help
answer the second question in hint 2
"""
'\nThe probability of event A and B occuring if they are\nindependent is P(A and B) = P(A)*P(B). This should help \nanswer the second question in hint 2\n'
Optional Advanced Extension#
Can you write a succinct function for computing the probability that you are wrong given any number of visits to each side and any probabilities of reward on either side (assuming the left prob is higher)? So the inputs would be n_visits, left_prob, and right_prob.
Use it it compute the probability of being wrong with n_visits = 4, left_prob = 0.7, and right_prob = 0.2
# your code here
Exercise 3: Poisson model of spiking#
The probability distributions we use the most in neuroscience are Gaussian and Poisson distributions. Poisson distributions are often used to describe neural spiking. The output of the Poisson distribution is the number of spikes for that small time window.
Let’s say we have a neuron that’s responding to head direction. The head direction (\(\theta\)) ranges between 0 and 359 degrees. The underlying “firing rate” of the neuron to a particular head direction is \(\frac{1}{20} \theta\). Let’s say this is the response in a 10 ms bin during which the mouse is still and angled at that head direction \(\theta\). The number of spikes during that time window is modeled by a Poisson distribution where the input \(\lambda\) is that underlying firing rate. Specifically,
$\( P(S = k | H = \theta) = \frac{\lambda ^k e^{-\lambda}}{k!} \)\( where \)\lambda = \frac{1}{20}\theta\(, S stands for number of spikes, H stands for head direction, and \)P(S=k | H=\theta)\( is the probability that the number of spikes is k given that the head direction is \)\theta$.
Note that we have build an encoding model! Encoding models model neural responses to some stimulus or behavior - our \(P(S = k | H = \theta)\) is an encoding model.
A) Sampling spike counts#
In the code below, for \(\theta\) between 0 and 359, compute \(\lambda\). Sample the spike count for 10000 trials (10000 occurences of that head direction) using np.random.poisson
.
thetas = np.arange(0, 360, 1)
lam = ... # should be an array the same size as thetas with the lambdas
n_trials = 10000
# Get spike counts, an array of size len(theta) x n_trials
# YOUR CODE HERE
...
Ellipsis
B) Lambda vs mean spike count#
For each \(\theta\), compute the mean of the spike count over trials and plot vs \(\lambda\).
mean_spike_count = ...
fig, ax = plt.subplots(1, 1)
ax.plot(lam, mean_spike_count, 'ok')
ax.set(xlabel='Lambda', ylabel='Mean spike count');
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[12], line 4
1 mean_spike_count = ...
3 fig, ax = plt.subplots(1, 1)
----> 4 ax.plot(lam, mean_spike_count, 'ok')
5 ax.set(xlabel='Lambda', ylabel='Mean spike count');
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/matplotlib/axes/_axes.py:1664, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
1662 lines = [*self._get_lines(*args, data=data, **kwargs)]
1663 for line in lines:
-> 1664 self.add_line(line)
1665 if scalex:
1666 self._request_autoscale_view("x")
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/matplotlib/axes/_base.py:2340, in _AxesBase.add_line(self, line)
2337 if line.get_clip_path() is None:
2338 line.set_clip_path(self.patch)
-> 2340 self._update_line_limits(line)
2341 if not line.get_label():
2342 line.set_label(f'_child{len(self._children)}')
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/matplotlib/axes/_base.py:2363, in _AxesBase._update_line_limits(self, line)
2359 def _update_line_limits(self, line):
2360 """
2361 Figures out the data limit of the given line, updating self.dataLim.
2362 """
-> 2363 path = line.get_path()
2364 if path.vertices.size == 0:
2365 return
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/matplotlib/lines.py:1031, in Line2D.get_path(self)
1029 """Return the `~matplotlib.path.Path` associated with this line."""
1030 if self._invalidy or self._invalidx:
-> 1031 self.recache()
1032 return self._path
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/matplotlib/lines.py:659, in Line2D.recache(self, always)
657 if always or self._invalidx:
658 xconv = self.convert_xunits(self._xorig)
--> 659 x = _to_unmasked_float_array(xconv).ravel()
660 else:
661 x = self._x
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/matplotlib/cbook/__init__.py:1369, in _to_unmasked_float_array(x)
1367 return np.ma.asarray(x, float).filled(np.nan)
1368 else:
-> 1369 return np.asarray(x, float)
TypeError: float() argument must be a string or a number, not 'ellipsis'
What is the relationship between lambda and the mean spike count? Aka between lambda and the mean of the Poisson distribution
Answer#
Fill in your text answer here (if in latex) so it is green. Use
for line breaks
C) Mean vs variance of spike count#
Now, for each theta, compute the variance of the spike counts over trials. Plot the mean spike count vs the variance of the spike counts.
variance_spike_count = ...
fig, ax = plt.subplots(1, 1)
ax.plot(mean_spike_count, variance_spike_count, 'ok')
ax.set(xlabel='Mean spike count', ylabel='Variance of spike counts');
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[13], line 4
1 variance_spike_count = ...
3 fig, ax = plt.subplots(1, 1)
----> 4 ax.plot(mean_spike_count, variance_spike_count, 'ok')
5 ax.set(xlabel='Mean spike count', ylabel='Variance of spike counts');
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/matplotlib/axes/_axes.py:1664, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
1662 lines = [*self._get_lines(*args, data=data, **kwargs)]
1663 for line in lines:
-> 1664 self.add_line(line)
1665 if scalex:
1666 self._request_autoscale_view("x")
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/matplotlib/axes/_base.py:2340, in _AxesBase.add_line(self, line)
2337 if line.get_clip_path() is None:
2338 line.set_clip_path(self.patch)
-> 2340 self._update_line_limits(line)
2341 if not line.get_label():
2342 line.set_label(f'_child{len(self._children)}')
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/matplotlib/axes/_base.py:2363, in _AxesBase._update_line_limits(self, line)
2359 def _update_line_limits(self, line):
2360 """
2361 Figures out the data limit of the given line, updating self.dataLim.
2362 """
-> 2363 path = line.get_path()
2364 if path.vertices.size == 0:
2365 return
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/matplotlib/lines.py:1031, in Line2D.get_path(self)
1029 """Return the `~matplotlib.path.Path` associated with this line."""
1030 if self._invalidy or self._invalidx:
-> 1031 self.recache()
1032 return self._path
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/matplotlib/lines.py:659, in Line2D.recache(self, always)
657 if always or self._invalidx:
658 xconv = self.convert_xunits(self._xorig)
--> 659 x = _to_unmasked_float_array(xconv).ravel()
660 else:
661 x = self._x
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/matplotlib/cbook/__init__.py:1369, in _to_unmasked_float_array(x)
1367 return np.ma.asarray(x, float).filled(np.nan)
1368 else:
-> 1369 return np.asarray(x, float)
TypeError: float() argument must be a string or a number, not 'ellipsis'
What is the relationship between the mean spike count and the variance of spike counts? Aka between the mean and variance of a Poisson distribution
Answer#
Fill in your text answer here (if in latex) so it is green. Use
for line breaks
D) Why Poisson?#
What are some features of the Poisson distribution that lend to modeling spiking well? What are some potential problems?
Answer#
Fill in your text answer here
Exercise 4: Bayes’ Theorem with COVID Tests#
We will assume that the COVID test has a true positive rate of 75%, meaning that the probability of a positive result if you are sick is 75%. It has a true negative rate of 97%, meaning that the probability of a negative result if you are healthy is 97%.
The probability of you being sick in general is equal to the prevalence of COVID in your community: 5% of people are sick so the probability of being sick is 5% and the probability of being healthy is 95%.
Let’s say you get a positive result from your COVID test. What is the probability you are sick given the positive test result? Hint: use Baye’s theorem
Answer#
Fill in the answer here so it is green. (show your work above or below though!)
(Optional Advanced Problem) Exercise 5: Birthday Paradox#
Compute the probability that at least two people share a birthday out of a class of 16.
Answer#
Fill in the answer here so it is green. (show your work above or below though!)