-
The absurdity of diverse democracies... via math!
Politics via math? When I learn a new thing, I like to see what sorts of unexpected things I can model with it. I’m dabbling in some less intuitive properties of high dimensional spaces right now and the concentration of measure is a pretty core part of that. This post is about a particularly amusing model with some interesting results I came up with earlier today. Concentration of Measure If you take a unit ball (the surface of a sphere) in N dimensions, nearly all the points lie on the equator if N is sufficiently large. Note that the equator is relative because I haven’t told you which side is north! That’s the concentration of measure. Now let’s go to our model. Model Let’s represent people’s political beliefs as vectors in some high dimensional space. It’ll look like [pro-life...
-
Another annotated transformer
If you’re anything like me, while trying to implement transformers, you’ve read the original attention is all you need paper, the annotated transformer, the updated version, d2l.ai, and had to cobble them all together to get something going. This post is an attempt to make that proccess easier for people like me in a short and to-the-point style. You can think of this as a bare-bones implementation with a whole lot of documentation. Notes: this post is about how transformers are implemented, not why they’re implemented the way they are. This post assumes the reader understands what is meant by training a model, parameters and hyperparameters, dense layers, and so on. We’ll be using jax and flax for this implementation. Overview The following is the transformer architecture di...
-
Intro To Algebraic Geometry, 1.2
Intro In the last post we covered the definition of places and valuations (sometimes called valuation rings). We covered how there is a one to one correspondence between valuations and places (up to equivalences). That was the entirety of chapter 1 section 1. This post will be about chapter 1 section 2. Valuations Group order Let $\Gamma$ be a multiplicative commutative group. If there exists a multiplicatively closed subset $S\subset \Gamma$ that does not contain the group’s $1$ element and has the property that for each $x$ in $\Gamma$ either $x\in S$ or $x^{-1}\in S$1, we can define a strict total order on $\Gamma$ as: \[a \lt b \iff ab^{-1}\in S\] Just to refresh everyone, a total order is a relation with 4 properties, The “total” part (fully connected): for all $a, b \in ...
-
Intro to Algebraic Geometry, 1.1
Intro I haven’t really studied any Math seriously since I graduated with my undergrad in 2015. It seems I kinda miss it, so I’ll be going over Introduction to Algebraic Geometry by Serge Lang today. It starts out with the theory of Places. The idea being that homomorphisms between fields are necessarily trivial. Before we prove that, we define a field homomorphism as a function $\varphi:F\to K$ such that: $\forall a,b \in F, \varphi(a + b) = \varphi(a) + \varphi(b)$ $\forall a,b \in F, \varphi(a \cdot b) = \varphi(a) \cdot \varphi(b)$ $\varphi(0) = 0$, and $\varphi(1) = 1$ Axiom 3 only guarantees that the homomorphism doesn’t send every element to 0. Field homomorphisms are injective Computational proof It’s been a long time since I’ve done any math, so let’s warm up a bit w...
-
A mile south, east, and north
The scenario You’re standing on the surface of the earth (as opposed to whatever your usual hangouts are), and walk a mile south, a mile east, and then a mile north and end up where you started. You see a bear. What color is the bear? You’ve heard this one before, but just in case, the answer is: White. You are at the north pole and it was a polar bear. But that’s not the only point that it could be. The riddle is, can you describe the full set of solutions? Solution Pick any line of longitude where the length of the line is $1/n$ miles long, then go a mile north of that and you have a solution. In fact, this set is dense around the ring with circumference of 1 mile.
-
The sequential shooters
There are N people indexed by $1,..,N$ (because down with the zero-indexers! Long live Peano!) standing in a circle. They’re standing such that $i$ is between $i-1$ and $i+1$ (unless $i \in{1,N}$, for obvious reasons). Then starting with $1$ and going around the circle in turn, each person shoots the person directly to their left. This stops when there’s only one person left alive. For example: if $N=5$, it would go: 1 shoots 2 3 shoots 4 5 shoots 1 3 shoots 5 and only 3 remains. The task Given $N$, what is the index (1-indexed, unfortunately) of the last shooter standing? For example: if $N=5$, the answer is $3$. Solution: The Theoretical Solution Every number $N$ can be written uniquely as $2^i + \lambda$, where $i,\lambda\in\mathbb{N}_{\geq 0}$ and $\lambda \lt...
-
The Sequential Doors and The Data Science Analogue
This post is going to be something of a combination riddle and blog post. The setup There are $N$ doors indexed by $1,..,N$ along a wall and they’re all closed. Then for each $i\in{1,2,\dots,N}$, you toggle every $i$th door. So at first ($i = 1$) you toggle every door. Then $i=2$ you toggle every second door – the even doors ($2, 4, 6, \dots,$), and so on. For example, if $N=4$, then the following is the sequence of events: Open every door. Close doors 2 and 4 Close door 3 Open door 4 It results in 1 and 4 are open, while 2 and 3 are closed. The task Given $N$, describe the “door is closed” function. By that I mean, which doors are open and which are closed at the end? Solution Only the perfect squares are open at the end. So $1,4,9,16,\dots$ are open, and $2,3,5,...
-
Quantile Regression Yields Quantiles
Background If you’ve ever read any papers referring to quantile regression (QR), you undoubtedly have read “MSE yields means, MAE yields medians, and QR yields quantiles”, but what does that even mean? Like not just hand-wavy nonsense, but really, what does that mean? After all, these are just loss functions and therefore don’t yield anything at all. Even the regressions yields coefficients, or, more generally, models; so what’s with the mean and median talk? It turns out they’re talking about the distribution of residuals1. In this post, we’ll show what they mean, how that’s the case, and we’ll investigate a bit more about QR. Quantile Regression Definition Given a scaler $q\in[0,1]$, let $l_q:\mathbb{R}\to\mathbb{R}_+$ be: \[\begin{align*} l_q(r) =& r\cdot\left(q - \mathbb{I...
-
Gradient Boosting Seen as Gradient Descent
tldr The short version of this whole thing is that we can see the gradient boosting algorithm as just gradient descent on black-box models. Typically gradient descent involves parameters and a closed form solution to the gradient of your learner as a way to update those parameters. In the case of trees, for instance, there are no parameters you want to update1, and somewhat more importantly they aren’t even continuous predictors, much less differentiable! But gradient boosting can be seen as gradient descent on black box models resulting in an additive black box model. GBM The naive algorithm A description of the algorithm I’m talking about can be found on wikipedia, but I’ll go over the algorithm somewhat quickly here just so we’re on the same page. The basic idea is that Gradien...
-
A little geometric walk down risk model lane
Background Earlier today I asked a colleague what the formal definition of a Unit Factor Portfolio is. He told me “it’s a portfolio with unit exposure to the factor, and orthogonalized to all the others.” That didn’t make sense to me, so I’m going to dig in here. We (quants) tend to use some form of a risk model to optimize our portfolios1 given a vector of alphas2. This post assumes that you know what a risk model is. Today we’ll be going over risk models from a geometric point of view. Mathematical Background Random Variables A continuous, real valued, random variable3 $X$ can be seen as a vector in $\mathbb{R}^n$, where $n$ is the number of observations you have. So if I take 7 samples of a normally distributed random variable, it might look like this: >>> import numpy...
-
Let's Cover The Fundamental Group (pt 1)
Jumping right in Prerequisites: Firstly, this post builds off of my previous post, so if you’re learning these things for the first time and find yourself kinda lost, start there. In addition to the previous post, the following are mathematical definitions you should probably know before reading this post. If you don’t know any of these, trust google. Group. Homomorphism Kernel Subgroup Normal subgroup Homotopy Retraction Star Convex Moving onward Definition: Given a path $\alpha$ from $x_0$ to $x_1$ in $X$, we define $\hat\alpha:\pi_1(X,x_0)\to\pi_1(X,x_1)$ as follows: \[\hat\alpha([f]) = [\bar\alpha]\star[f]\star[\alpha]\] And it’s a theorem that $\hat\alpha$ is a group isomorphism. Definition: Given a continuous map $h:(X,x_0)\to(Y,y_0)$, we define: $h_\...
-
Let's Cover Homotopy (an apologetic revisionary tale)
What am I doing? The structure of these posts This post is not really building off of the theory from my previous post (homology pt 1), but it’s about the same subject. It turns out we were using a book I didn’t really like too much, so I switched to something with a bit more of a familiar style for me. So I’ll be using Topology by Munkres for these posts (at least for now). I’ll not be covering the book in much detail because after all, Munkres did a great job at doing that himself. But I will summarize some things that I feel like summarizing, and do some of the exercises (just the ones I feel like doing because, well, I can – I know, this is sounding more useful by the second). So let’s dig in! Part II But where did part I go? That’s not Algebraic Topology, so we’re not gonna c...
-
Let's talk Homotopy and Algebra
So this post is going to be a bit more terse than most. In fact, the objective of this post will be to develop the theory of Homotopy very briefly, with the goal of proving the Fundamental Theorem of Algebra. Homotopy Given two continuous maps $f,g : \mathbb{R}^n\to\mathbb{R}m$, a homotopy between $f,$ and $g$ is a continuous map $h: \mathbb{I}\times\mathbb{R}^n\to\mathbb{R}^m$ such that $h(0,x)=f(x)$ and $h(1,x) = g(x)$. Basically it’s a continuous deformation of one map into the other. It’s a bit stronger than just homeomorphism of topological spaces. You can see this because two interlocked rings are topologically homeomorphic to non-interlocked rings, but they’re not homotopic because you’d have to split one of the rings. Enough of all that! Let’s get to the proof already! The F...
-
Getting across the river
Part 1 Foxes, Rabbits, and Lechuga You stand at the edge of a riverbed with your pet fox, a pet rabbit, and a head of lettuce. You need to cross the river, but can only take one of the three with you each time you cross. If the fox is ever left unsupervised with the rabbit (s)he’ll eat it, and the same for the rabbit and the head of lettuce. Just to clarify what you know: You have a fox, rabbit, and a head of lettuce The fox will eat the rabbit, and the rabbit will eat the lettuce (if given the opportunity) You can only take one across at a time How can you get all three to the other side of the river without any of them getting eaten? Solution: First let’s define the sides as $A$ and $B$ (we’re on side $A$). Take the rabbit to side $B$ (the fox is left with th...
-
Boys in the hood
This post is really several riddles that get progressively more difficult. So don’t be dissuaded if you find the first few too easy (or if you don’t). Baby crazy Let’s say we know a person named Pat who has two kids and at least one is a boy. Assuming that across the population the probability of having a boy is 50%. What is the probability that the other kid is a boy as well? It’s 1/3 (or $33.\bar3$%). To see this, let’s just work out the possible states (the following states will be written as younger,older): B,B B,G G,B G,G But since we know G,G is not a possibility (because Pat has at least one boy), and only one of the other three have the other child being a boy. This might be easier to see and believe if you consider this using coin flips. To m...
-
N people on a plane
The scenario: $n$ people are in line ready to board a full plane (so $n$ people in $n$ seats). The first person in line lost his boarding pass, so he just sits in any random old seat (but specifically not his own). From then on, the people boarding are so nice that they won’t confront someone sitting in their respective seats; instead, they’ll just take a random empty one. What’s the probability that the last person gets to sit in his or her own seat? What you know: There are $n$ seats on the plane There are $n$ passengers about to board the plane The first passenger sits in the wrong seat Every passenger after sits in his/her own seat if it’s available Every passenger sits in a random available seat if his/her seat is not available. You need to find out what the probabi...
-
The Monty Hall Problem
The scenario: You’re on a game show and they show you three doors (they’re all shut). They tell you behind two of these doors are goats, and behind one of them is a brand new car! They say you get to pick one door, then they’ll open up one of the remaining doors to reveal a goat, then ask you if you want to change your choice. What do you say? What you know: There are three doors Only one has the prize All doors have equal probability at the onset You choose one door The host shows a goat behind one of the other doors You’re asked if you want to change your choice to the last remaining door Hints (click to unblur) Just work out the probabilities Think of the microstates A solution A Strategy You do switch. The door you picked only has 1/3 proba...
-
Prisoners and The Lightblub
The scenario: One day, a drunk warden waltzes into her prison feeling particularly sadistic and decides to put on a little show for herself. She told her guards to gather 100 prisoners into a room so that she could talk to them. When in front of the group of prisoners, she tells them of her plan: All of their prison sentences have been extended indefinitely, but they have a chance to earn their freedom – through her game. She explained that she arranged for a room to be secured and left with just a single light bulb inside and that nobody but them would be allowed inside. They would then be brought into the room one prisoner per day and allowed to turn the light on or off. It’s guaranteed that nobody else would be given access to the room other than themselves, and that in no way woul...
-
The blue-eyed reconing!
The scenario: There’s an island far away from anything we would call recognizable with a particularly strange custom: in this town, any person who discovers his or her own eye color is compelled to go to the town square at 9:00AM the following morning and kill him or her self publicly, so it goes. It’s a part of their lives and they all obey this custom diligently. This island is so secluded that the inhabitants had not seen an outsider until the one fateful day Pat stopped by. On that day, they were so excited to see an outsider that they asked Pat to give a speech in front of the whole town. Not knowing the town’s customs, in Pat’s opening remarks, Pat mentioned, “it’s nice to see other blue eyed people out here”. The question is: what happens? What you know: The town is very s...
-
The 7 person game-show riddle
The scenario: You and six of your friends are going on a game show. When you get there, you’ll be placed in a green room and told the rules of the game so that you all can strategize. Once you all agree on a strategy, you’ll be taken on stage where you’ll each be put in your own respective isolation chambers. Once your whole team is in their respective chambers, you will each be assigned a number between 1 and 7 (inclusive and possibly repeating). You’ll all be shown everyone else’s numbers, but not your own. Your whole team wins if any one member can guess his or her own number correctly. What you know: Each member of your team is assigned a number between 1 and 7 inclusive and possibly repeating. Each member gets only one guess. There’s no penalty for wrong guesses. There’...
-
2 ropes and a matchbook
The scenario: You’re given 2 ropes and a matchbook. You know a priori that these two ropes each take an hour to burn, but they don’t burn at an even rate. For instance, the first rope might take 1 minute to burn through the first half and 59 minutes to burn the second, and the other rope might be completely different. Can you measure 45 minutes using only these two ropes and the matchbook? What you know: You have two ropes and a matchbook. The ropes take an hour to burn completely. The rate of burn is inconsistent. Hints (click to unblur) Can you measure 30 minutes? You may have to do two things at once A solution A description (and strategy) You can measure 60 minutes easily by just burning a rope. You can measure 30 minutes by burning both ends of...
-
Let's Cover Homology (pt 1)
Topology What is it? Topology is essentially the study of proximity, which is a close analogue for shape. The specifics of how that’s done are not really the point of this post. That’s all for this section. How is it related to analyzing data? Most data analysis techniques are already about quantifying some geometric attributes. For instance, when you fit a linear model, you’re really just imposing an affine model on the data, and computing the appropriate parameters (slope and intercept). But these sorts of things work best if you have an idea for what an appropriate shape would be. That’s one area where topology might be able to help. But to hear more about that, you’ll have to wait for my next post. There is a notion of equivalence (called isomorphism) of topological spaces (bi-...
-
Decomposition of Autoregressive Models
Autoregression Background This post will be a formal introduction into some of the theory of Autoregressive models. Specifically, we’ll tackle how to decompose an AR(p) model into a bunch of AR(1) models. We’ll discuss the interpretability of these models and howto therein. This post will develop the subject using what seems to be an atypical approach, but one that I find to be very elegant. The traditional way Let $x_t$ be an AR(p) process. So $x_t = \sum\limits_{i=1}^pa_ix_{t-i} + w_t$. We can express $x_t$ thusly: \[\begin{align*} x_t =& \sum\limits_{i=1}^pa_ix_{t-i} + w_t \\ x_t - \sum\limits_{i=1}^pa_ix_{t-i} =& w_t \\ \left(1 - \sum\limits_{i=1}^pa_iL^i\right)x_t =& w_t \end{align*}\] Where $L$ is the lag operator. We define the AR polynomial $\Phi$ as $\Phi(L)...
-
Markov Chain Monte Carlo
$\text{???} = (MC)^2$ Background What if we know the relative likelihood, but want the probability distribution? \[\mathbb{P}(X=x) = \frac{f(x)}{\int_{-\infty}^\infty f(x)dx}\] But what if $\int f(x)dx$ is hard, or you can’t sample from $f$ directly? This is the problem we will be trying to solve. First approach If space if bounded (integral is between $a,b$) we can use Monte Carlo to estimate $\int\limits_a^b f(x)dx$ Pick $\alpha \in (a,b)$ Compute $f(\alpha)/(b-a)$ Repeat as necessary Compute the expected value of the computed values The big bag of nope What if we can’t sample from $f(x)$ but can only determine likelihood ratios? \[\frac{f(x)}{f(y)}\] Enter Markov Chain Monte Carlo The obligatory basics A Markov Chain is a stochastic process (a collection of i...
-
Linear Regression -- The Basics
The basics Yeah. It’s not a good sign if I’m starting out already repeating myself. But that’s how things seem to be with linear regression, so I guess it’s fitting. It seems like every day one of my professors will talk about linear regression, and it’s not due to laziness or lack of coordination. Indeed, it’s an intentional part of the curriculum here at New College of Florida because of how ubiquitous linear regression is. Not only is it an extremely simple yet expressive formulation, it’s also the theoretical basis of a whole slew of other tactics. Let’s just get right into it, shall we? Linear Regression: Let’s say you have some data from the real world (and hence riddled with real-world error). A basic example for us to start with is this one: There’s clearly a linear tren...
-
Probability -- A Measure Theoretic Approach
Probability using Measure Theory A mathematically rigorous definition of probability, and some examples therein. The Traditional Definition: Consider a set $\Omega$ (called the sample space), and a function $X:\Omega\rightarrow\mathbb{R}$ (called a random variable. If $\Omega$ is countable (or finite), a function $\mathbb{P}:\Omega\rightarrow\mathbb{R}$ is called a probability distribution if it satisfies the following 2 conditions: For each $x \in \Omega$, $\mathbb{P}(x) \geq 0$ If $A_i\cap A_j = \emptyset$, then $\mathbb{P}(\bigcup\limits_0^\infty A_i) = \sum\limits_0^\infty\mathbb{P}(A_i)$ And if $\Omega$ is uncountable, a function $F:\mathbb{R}\rightarrow\mathbb{R}$ is called a probability distribution or a cumulative distribution function if it satisf...
-
A dirty little ditty on Finite Automata
This post builds on the previous post about Formal Languages. Some Formal Definitions A Deterministic Finite Automata (DFA) is A set $\mathcal{Q}$ called “states” A set $\Sigma$ called “symbols” or “alphabet” A function $\delta_F:\mathcal{Q}\times\Sigma \to \mathcal{Q}$ A designated state $q_0\in\mathcal{Q}$ called the start point A subset $F\subseteq\mathcal{Q}$ called the “accepting states” The DFA is then often referred to as the ordered quintuple $A=(\mathcal{Q},\Sigma,\delta_F,q_0,F)$. Defining how strings act on DFAs. Given a DFA, $A=(\mathcal{Q}, \Sigma, \delta, q_0, F)$, a state $q_i\in\mathcal{Q}$, and a string $w\in\Sigma^*$, we can define $\delta(q_i,w)$ like so: If $w$ only has one symbol, we can consider $w$ to be the symbol and define $\delta(q_i,w)...
-
What's in a language?
Languages in abstraction This post is about Languages from a mathematical and abstract linguistics point of view. Not much more to say about that, so let’s get right to it! The Rigorous definition: Let $\Sigma$ be an alphabet and let $\Sigma^k$ be the set of all strings of length k over that alphabet. Then, we define $\Sigma^*$ to be $\bigcup\limits_{k\in\mathbb{N}}\Sigma^k$ (the union of $\Sigma^k$ over all natural numbers k). If $L\subseteq\Sigma^*$ , we call $L$ a language. The Intuition Behind the Definition Consider an alphabet (some finite set of characters), for example we can consider the letters of the English language, the ASCII symbols, the symbols ${0, 1}$ (otherwise known as binary), or the symbols ${1, 2, 3, 4, 5, 6, 7, 8, 9, 0, +, \times , =}$ . We can then c...
-
First Post, An Explication
This is my first blog, so I guess I should start off by explaining my motives behind writing what will probably be a sporadically updated blog. Basically, as I learn things that I find pretty difficult to find online, I’ll try to explain them as best I can here. Also, since I enjoy learning math, I’m going to try to keep up a (semi) regular stream of math posts. If you have any questions, feel free to contact me. My up-to-date contact info can be found on my website https://aaron.niskin.org