Fermat’s Little Theorem

One of the first results in any introduction to number theory is Fermat’s Little Theorem, so called in contrast to the Last Theorem, or indeed to any of Fermat’s other results.

Theorem 1 (Fermat’s Little Theorem). Let {p} be a prime number and let {a} be any integer. Then {a^{p}\equiv a\pmod{p}.}

In this post — aimed at students taking an introductory course in number theory — I want to explain the two proofs that I find to be the simplest. Neither is deep, but the second requires a tiny bit of group theory, whereas the first does not.

1. First proof: by binomial coefficients

We shall take as given that the binomial coefficients \binom{n}{r}:=\frac{n!}{r!(n-r)!}, for r\in\{0,\ldots,n\} are natural numbers. We also take as given the Binomial Theorem:

\displaystyle {(X+Y)^{n}=\sum_{i=0}^{n}\binom{n}{i}X^{i}Y^{n-i}.}

Fix a prime number {p}.

Lemma 2. For {r\in\{1,\ldots,p-1\}}, the binomial coefficient {\binom{p}{r}} is divisible by {p}.

Proof: Since {\binom{p}{r}} is an integer, {r!} divides {\frac{p!}{(p-r)!}=p(p-1)\cdot\ldots\cdot(p-r+1)}. Since {p} is prime, and {r<p}, it follows that {p} and {r!} are coprime. By Euclid's Lemma, {r!\mid(p-1)\cdot\ldots\cdot(p-r+1)}. Therefore {\frac{(p-1)!}{r!(p-r)!}} is an integer. \Box

Lemma 3. For integers {a,b} we have {(a+b)^{p}\equiv a^{p}+b^{p}\pmod{p}}.

Proof: Apply the Binomial Theorem and Lemma 1. \Box

It’s now a simple matter to give the first proof of Fermat’s Little Theorem.

Proof: Let {p} be a prime number, and let {S:=\{a\in\mathbb{Z}\;|\;a^{p}\equiv a\pmod{p}\}} be the set of integers satisfying the conclusion of the theorem. If {p} is odd, then {(-1)^{p}=-1}. Otherwise, if {p=2} then {(-1)^{2}=1\equiv-1\pmod{2}}. Thus {-1\in S}. Lemma 3 shows that {S} is closed under addition. Therefore {S=\mathbb{Z}}. \Box

2. Second proof: by Lagrange’s Theorem

The second strategy is reasonably simple. I think of it as the proof ‘by Lagrange’s Theorem’, but that is perhaps unfair since its use of Bézout’s Lemma is just as key.

Let {a,b\in\mathbb{Z}}. Then there exist {x,y\in\mathbb{Z}} such that {ax+by=\mathrm{gcd}(a,b).} .

The proof goes by showing that the set

\displaystyle {\{xa+yb\;|\;x,y\in\mathbb{Z}\}},

of {\mathbb{Z}}-linear combinations of {a,b}, is also the set of integer multiples of the greatest common divisor {\mathrm{gcd}(a,b)}.

Let {G} be a finite group, and let {H\leq G} be a subgroup. Then {|G|=|H|\cdot(G:H)}.

The proof of Lagrange’s Theorem is a straightforward counting proof. One shows that each coset of {H} has the same order, and that the cosets partition {G}.

Okay! Using these two ingredients, we get the second proof of Fermat’s Little Theorem.

Proof: Let {p} be a prime number. It is a consequence of Bézout’s Lemma that {\{1,\ldots,p-1\}} forms a group under multiplication modulo {p}. Moreover it is a group of order {p-1}. Thus, for each {b\in\{1,\ldots,p-1\}}, we have {b^{p}\equiv b\pmod{p}}.

Now let {c\in\mathbb{Z}}. If {c} is divisible by {p}, then certainly {c^{p}\equiv c\pmod{p}}. Otherwise, there exists {a\in\{1,\ldots,p-1\}} and {k\in\mathbb{Z}} such that {c=a+pk}. Then {a^{p}\equiv c^{p}\equiv c\equiv a\pmod{p}}, as required. \Box

To be perfectly honest, I find this second proof less satisfactory than the first. Relying on Bézout seems reasonable enough (we are interested in integer arithmetic after all), but relying on Lagrange is a bit silly. Lagrange’s Theorem is a statement about all finite groups, which is just far too strong for what we need.

However you get there, Fermat’s Little Theorem is an important result of elementary number theory.


Saga hwæt ic hatte

Ic eom mare   þonne þes middangeard
læsse þonne hondwyrm   leohtre þonne mona
swiftre þonne suna   sæs me sind ealle
flodas on fæðmum   on þes foldan bearm
grene wongas   grundum ic hrine
helle underhnige   heofonas oferstige
wuldres eðel   wide ræce
ofer engla eard   eorðan gefylle
ealne middangeard   ond merestreamas
side mid me sylfum   saga hwæt ic hatte

Puzzle and meta-puzzle: Factorisation — SOLUTION

In an earlier post I posed the following problem:

Find the prime factorization of {12318876808768112319}.


Write z:=12318876808768112319. As a first attempt we shall ask some straightforward questions:

  1. Is {z} even? No.
  2. Is {z} divisible by {3}? We use the “sum the digits” trick, and find that — yes — {z} is divisible by {3^{4}}, but not divisible by {3^{5}}.
  3. Is {z} divisible by {5}? No.

Etc. We can directly calculate for other small primes, and if we push on we will find a factor of {11}.

Taking out these factors, namely {3^{4}} and {11}, we are still left with a large number:

\displaystyle 13825899897607309.

How should we proceed? We seek a more systematic approach, or at least some sort of shortcut.

Fortunately, in this special case, such a short cut is available: we exploit the repetition of various strings of digits in the base-{10} representation of {z}. Let {x:=12319} and let {y:=10^{5}}. Then {y-x=87681}, and z=y^{3}(x-1)+y^{2}(y-x-1)+y(y-x)+x. Rearranging, we get

\begin{array}{rl}  z &= y^{3}x-y^{2}x-yx+x\\  &= x(y^{3}-y^{2}-y+1)\\  &= x(y-1)^{2}(y+1). \end{array}

It now remains to find the prime factorizations of {x=12319}, {y-1=99999}, and {y+1=100001}. The most attractive place to begin is with {y-1=99999}:

\begin{array}{rl} y-1 &= 99999\\  &= 3^{2}\cdot11111\\  &= 3^{2}\cdot41\cdot271. \end{array}

I don’t know of any particularly neat way of finding the factors {41} and {271,} but it’s very easy by hand.

Next we consider {y+1=10^{5}+1}:

y+1 = 100001 = 11\cdot9091.

The factor of {11} is easy to find, but I don’t know a quick way to verify that {9091} is prime, other than trial division, by checking for prime factors up to {\lfloor\sqrt{9091}\rfloor=95}.

Finally, we consider {x=12319}:

x = 12319 = 97\cdot127.

Again, I don’t know an easy way to find these factors, other than trial division, by checking for prime factors up to {\lfloor\sqrt{12319}\rfloor=110}.

Putting all of this together, we have the prime factorization

12318876808768112319 = 3^{4}\cdot11\cdot41^{2}\cdot97\cdot127\cdot271^{2}\cdot9091.

As mentioned in the original post, perhaps the real challenge is to come up with a problem that retains the trickiness of this example, but avoids the cumbersome checking.

A fun little formula from Graph Theory

This note is aimed at those taking my fourth-year Graph Theory course.

Let {G=(V,E)} be a graph, where we view {E} as a set of {2}-element subsets of {V}. For {v\in V}, denote by {d(v)} the degree of {v}. (Notation and terminology as in my graph theory lecture notes.)

Lemma 1. {\sum_{uv\in E}d(u)+d(v) = \sum_{u\in V}d(u)^{2}}.

It’s easy to let this little formula go by without much thought. It gets used in the proof of Mantel’s Theorem. But I want to bring it to the fore, as an example of a simple but perhaps opaque statement, which becomes clearer with a little abstraction.

Consider the following stronger statement.

Lemma 2. Let {f:V\longrightarrow\mathbb{N}} be any function. Then

\displaystyle \sum_{uv\in E}f(u)+f(v)=\sum_{v\in V}d(u)f(u).

Proof: Given {v}, the term {f(v)} appears on the right hand side once for each edge of which {v} is an endpoint. \Box

Lemma 1 follows by taking {f=d}.

Let’s see how Lemma 1 is used in a proof of Mantel’s Theorem.

Theorem 3 (Mantel’s Theorem). Suppose that {G} is triangle-free, of order {n}, and of size {e}. Then

\displaystyle e\leq\Big\lfloor\frac{n^{2}}{4}\Big\rfloor.

Proof: Since {G} is triangle-free, for every {uv\in E} we have {N(u)\cap N(v)=\emptyset}. Therefore {d(u)+d(v)\leq n}. Summing across edges and using Lemma 1, we have

\displaystyle \begin{array}{rclr} 	\sum_{v\in V}d(v)^{2}=\sum_{uv\in E}d(u)+d(v) &\leq& ne. \end{array}

Combining these things, we have

\displaystyle \begin{array}{rclr} 	4e^{2} &=& \big(\sum_{v\in V}d(v)\big)^{2} &\text{by the degree formula}\\\\ 	&\leq& n\sum_{v\in V}d(v)^{2} &\text{by Cauchy--Schwarz}\\\\ 	&\leq& n^{2}e, \end{array}

as required. \Box

So there we have it. Lemma 1 has a straightforward application in this proof of Mantel’s Theorem.

Note also that the complete bipartite graph {K_{\lfloor\frac{n}{2}\rfloor,\lceil\frac{n}{2}\rceil}} is triangle-free and has size {\lfloor\frac{n^{2}}{4}\rfloor}.

Often bundled into the statement of Mantel’s Theorem is the additional claim:

Proposition 4. Let {G} be triangle-free, of order {n}, and of size {e(G)=\lfloor\frac{n^{2}}{4}\rfloor}. Then {G} is isomorphic to {K_{\lfloor\frac{n}{2}\rfloor,\lceil\frac{n}{2}\rceil}}.

Proof: If {e(G)=\lfloor\frac{n^{2}}{4}\rfloor}, then each inequality in the above proof of Mantel’s Theorem must be an equality. In particular, for each {uv\in E} we have {d(u)+d(v)=n}. Thus the neighbourhoods of {u} and of {v} form a partition of the vertex set of {G}. It follows that {G} is complete bipartite, and up to isomorphism the only such graph to have size {\lfloor\frac{n^{2}}{4}\rfloor} is {K_{\lfloor\frac{n}{2}\rfloor,\lceil\frac{n}{2}\rceil}}. \Box

Möbius transformations

Teaching note: Möbius Transformations

This short note is written for those taking my second-year analysis course.

Möbius transformations are brilliant. They are conformal (=angle preserving) maps from the extended complex plane to itself.

Definition 1 (The extended complex plane). The extended complex plane is {\overline{\mathbb{C}}:=\mathbb{C}\cup\{\infty\}}, i.e. {\mathbb{C}} together with an entirely new element, denoted {\infty}.

For now, we won’t extend the domains of {+} and {\cdot} to include {\infty}; so the extended complex plane has no more algebraic structure than the field of complex numbers.

Definition 2 (Möbius transformations). A Möbius transformation is a function {f:\overline{\mathbb{C}}\longrightarrow\overline{\mathbb{C}}} such that

\displaystyle  f(z)=\left\{\begin{array}{ll}\frac{az+b}{cz+d}&z\neq-\frac{d}{c},\infty\\\infty&z=-\frac{d}{c}\\\frac{a}{c}&z=\infty,\end{array}\right.

for some {a,b,c,d\in\mathbb{C}} such that {ad-bc\neq0}.

The second and third cases in the definition above are important, but we specify a Möbius transformation simply by writing e.g.

\displaystyle f(z):=\frac{az+b}{cz+d},

and leave the other two cases implicit.

Nevertheless, such a Möbius transformation does not determine the quadruple {(a,b,c,d)}, but does determine the tuple {[a;b;c;d]} in projective space. For a trivial example: the identity map {z} is a Möbius transformation, equal to {\frac{2z}{2}}.

Example 1. Consider the Möbius transformation {g:\overline{\mathbb{C}}\longrightarrow\overline{\mathbb{C}}} defined by {g(z):=\frac{1}{z-1}.} Note that {g} really is a Möbius transformation since {-1\neq0}. A few example evaluations: {g(0)=-1}, {g(1)=\infty}, and {g(\infty)=0}.

Proposition 3. The composition of two Möbius transformations is a Möbius transformation.

Proof: For {i=1,2}, consider a Möbius transformation {f_{i}(z):=\frac{a_{i}z+b_{i}}{c_{i}z+d_{i}}}, where {a_{i}d_{i}-b_{i}c_{i}\neq0}. The composition is \displaystyle \begin{array}{rcl} f_{2}\circ f_{1}(z)&=&\frac{(a_{1}a_{2}+c_{1}b_{2})z+b_{1}a_{2}+d_{1}b_{2}}{(a_{1}c_{2}+c_{1}d_{2})z+b_{1}c_{2}+d_{1}d_{2}},\end{array} which is a Möbius transformation since \displaystyle (a_{1}a_{2}+c_{1}b_{2})(b_{1}c_{2}+d_{1}d_{2}) - (a_{1}c_{2}+c_{1}d_{2})(b_{1}a_{2}+d_{1}b_{2}) = (a_{1}d_{1}-b_{1}c_{1})(a_{2}d_{2}-b_{2}c_{2}) is not zero. \Box

Proposition 4. Möbius transformations are bijections.

Proof: Let {f(z)=\frac{az+b}{cz+d}}, with {ad-bc\neq0}. We find the candidate for its inverse, and check it really is. Define {g:\overline{\mathbb{C}}\longrightarrow\overline{\mathbb{C}}}

\displaystyle g(w)=\frac{dw-b}{-cz+a}.

We compose:

\displaystyle \begin{array}{lll} 	g\circ f(z)	&=&	\frac{d\frac{az+b}{cz+d}-b}{-c\frac{az+b}{cz+d}+a}\\ 	&=&	\frac{(ad-bc)z}{ad-bc}\\ 	&=&	z, \end{array}

the identity! The composition {f\circ g} is the identity by the same calculation. \Box

Definition 5 (Möbius group). The set of Möbius transformations, equipped with composition, forms the Möbius group, denoted {\mathbf{M}}.

We are able to represent Möbius transformations as matrices, in the following way.

Definition 6. Define the map

\displaystyle \begin{array}{rcl} 	\Phi:\mathbf{M} &\longrightarrow& \mathrm{PGL}_{2}(\mathbb{C})\\ 	\frac{az+b}{cz+d} &\longmapsto& \begin{pmatrix} a & b \\ c & d \end{pmatrix} \end{array}

Naturally we wonder whether, under this representation of Möbius transformations as matrices, the composition of functions corresponds to the multiplication of matrices.

Proposition 7. {\Phi} is an isomorphism.

Proof: It is clear that {\Phi} is a bijection. For {i=1,2}, consider {f_{i}\in\mathbf{M}} defined by {f_{i}(z):=\frac{a_{i}z+b_{i}}{c_{i}z+d_{i}}}, where {a_{i}d_{i}-b_{i}c_{i}\neq0}.
The composition is

\displaystyle \begin{array}{rcl} f_{2}\circ f_{1}(z)&=&\frac{(a_{1}a_{2}+c_{1}b_{2})z+b_{1}a_{2}+d_{1}b_{2}}{(a_{1}c_{2}+c_{1}d_{2})z+b_{1}c_{2}+d_{1}d_{2}},\end{array}

and we have

\displaystyle \begin{array}{rcl} \Phi(f_{2})\Phi(f_{1})	&=&	\begin{pmatrix}a_{2} & b_{2} \\ c_{2} & d_{2} \end{pmatrix}\begin{pmatrix} a_{1} & b_{1} \\ c_{1} & d_{1} \end{pmatrix}\\&=& \begin{pmatrix} a_{1}a_{2}+c_{1}b_{2} & b_{1}a_{2}+d_{1}b_{2} \\ a_{1}c_{2}+c_{1}d_{2} & b_{1}c_{2}+d_{1}d_{2} \end{pmatrix}\\&=& \Phi(f_{2}\circ f_{1}),\end{array}

which shows that {\Phi} is a homomorphism. \Box

Proposition 8. Möbius transformations act {3}-transitively on {\overline{\mathbb{C}}}.

Proof sketch: To see this, let {w_{0},w_{1},w_{\infty}} be three distinct elements of the extended complex plane. We wish to choose {[a;b;c;d]} such that

\displaystyle f(z)=\frac{az+b}{cz+d}


\displaystyle f(0)=w_{0},\,f(1)=w_{1},\,f(\infty)=w_{\infty}.

Equivalently, such that the three equations

  • {\frac{b}{d}=w_{0}}
  • {\frac{a+b}{c+d}=w_{1}}
  • {\frac{a}{c}=w_{\infty}}

hold. It suffices to satisfy \displaystyle w_{\infty}c+w_{0}d=w_{1}c+w_{1}d, which can indeed be accomplished. \Box

Maths for Testing Times Along with colleagues all…

Maths for Testing Times.

Along with colleagues all over the world, I will be spending much of the next few weeks and months at home. Working — teaching, research, etc. — but detached. How best should I make use of the time? I will try to post more puzzles for students, and little notes that may be of interest to students or fellow educators.

For now, here is a little one: how many configurations are there of a 2x2x2 Rubik’s cube?

For our purposes, a ‘configuration’ is simply any position into which a ‘solved’ cube may be transformed by any finite sequence of the allowed physical twists.

Puzzle and meta-puzzle: Factorisation

(This was suggested to me by a friend.)

Find the prime factorisation of 12318876808768112319.

Factorisation puzzles are familiar, and obviously the point is to minimise the use of computer assistance, by making ‘clever’ observations.

Once you see how to do it, you’ll notice that there is a still a non-trivial amount of checking to do. Thus, a meta-puzzle suggests itself: to find a natural number N, so that the problem of factorising N by hand contains lots of interesting and clever observations, and very little checking.

I don’t know how to solve this meta-puzzle!

Puzzle: Cinema Trip

Three friends—Alsa, Bjorn, and Cally—want to go to the cinema.

There are five films:

  • Star Wars X,
  • Avengers Rereunite,
  • Frozen 3 – ice cold,
  • Fish and chips – the inside story,
  • Harry Potter and the Arithmancer of Andover.

There are three choices of snack:

  • Basil popcorn,
  • Chilli marshmallows,
  • Vegan burger.There are four choices of drink:
  • 8 down,
  • Diet Cake,
  • Popsi Minimum,
  • Ginger coffee.

They don’t mind going to separate films, but Cally won’t miss Star Wars if either of the others sees it (because she’s worried about spoilers). Each of them wants a drink of their own, and a snack of their own, except that the bags of marshmallows must be shared between two of them.

How many configurations of cinema trip may our three friends choose?

Extra Maths: Analysis, Darboux’s Theorem

As before, these questions are aimed at first-year or second-year undergraduates studying a course in Real Analysis.

Question 1. Let f:[a,b]\longrightarrow\mathbb{R} be differentiable, with derivative f'. Suppose that f'(a)>0>f'(b). Show there exists c\in(a,b) such that f'(c)=0.

Compare Question 1 with Rolle’s Theorem.

Question 2 (Darboux’s Theorem). Let f:\mathbb{R}\longrightarrow\mathbb{R} be differentiable, with derivative f'. Show that f' has the intermediate value property.

Compare Question 2 with the Mean Value Theorem.

Question 3. Let f:[a,b]\longrightarrow\mathbb{R} be differentiable, with derivative f'. Show that there exists c\in[a,b] such that f' is continuous at c.

This is very difficult! See this article. Nevertheless, I would like to expand on this argument in a subsequent post. Watch this space.

Question 4. Suppose that g:\mathbb{R}\longrightarrow\mathbb{R} satisfies the intermediate value property. Need g admit an anti-derivative?

Actually, this one is straightforward, so I’ll give the argument right away.

Proof. Certainly not. In fact we have just seen that every derivative (so, a function that admits an anti-derivative) is continuous somewhere. However, there are plenty of functions that satisfy the intermediate value property which are continuous nowhere. For example, there are functions whose graphs are dense in the plane. \square

Remark. For an in-depth discussions of the continuity of derivatives, see this article.

Extra Maths: Analysis, Accumulation

Here are three fun Analysis questions, really aimed at my students who are studying their first/second course on Real Analysis, but hopefully it’s of wider interest as well.

We begin at the beginning, with a couple of definitions.

Definition. A function f:\mathbb{R}\longrightarrow\mathbb{R} has the IVT Property if, for all real numbers a,b,x with a<b and \min\{f(a),f(b)\}<x<\max\{f(a),f(b)\} there exists \xi\in[a,b] such that f(\xi)=x.

So, using this terminology, the Intermediate Value Theorem is the statement that a continuous function \mathbb{R}\longrightarrow\mathbb{R} has the IVT Property.

Definition. Let f:\mathbb{R}\longrightarrow\mathbb{R} and let a,l\in\mathbb{R}. We say that l is a weak limit of f at i if
for all \varepsilon>0 there exists b\in(a-\varepsilon,a+\varepsilon)\setminus\{a\} such that f(b)\in(l-\varepsilon,l+\varepsilon). Naturally, we say that f has a weak limit somewhere if there exists a,l\in\mathbb{R} such that l is a weak-limit of f at a.

We say that l\in\mathbb{R} is an accumulation point of a real sequence (a_{n}) if for all \varepsilon>0 there is an infinite set N\subseteq\mathbb{N} such that for all n\geq N we have |a_{n}-l|<\varepsilon.

The Questions

Question 1.
Show the existence or non-existence of a function f:\mathbb{R}\longrightarrow\mathbb{R} which has the IVT Property but which is nowhere continuous.

Question 2.
Prove or disprove the claim that every function f:\mathbb{R}\longrightarrow\mathbb{R} has a weak limit somewhere.

Question 3.
Find a real sequence (a_{n}) such that every real number l\in\mathbb{R} is an accumulation point of (a_{n}).

Where should we start? Perhaps my favourite of the three is the first one: to show the existence or non-existence of a function with the IVT property but which is nowhere continuous. There are several approaches we might take, beginning with the following lemma.

Lemma 1. Suppose that a function f:\mathbb{R}\longrightarrow\mathbb{R} has the property that, for all a,b with a<b, the restriction of f to the interval (a,b) is surjective onto \mathbb{R}. Then f has the IVT property.

Hopefully the proof of this lemma is quite clear: the IVT property asks, of such a function restricted to such an interval, that it attain certain values. So if f attains all real values, then the property is evidently satisfied.

Okay…. So what?

Lemma 2. Suppose that a function f:\mathbb{R}\longrightarrow\mathbb{R} has the property that, for all a,b with a<b, the restriction of f to the interval (a,b) is surjective onto \mathbb{R}. Then f is continuous nowhere.

Proof. Let a\in\mathbb{R} be given, and consider, for example, \varepsilon=1. For any \delta>0, by assumption f restricted to the open interval (a-\delta,a+\delta) is surjective onto \mathbb{R}. In particular, the image under f of (a-\delta,a+\delta) is not a subset of (f(a)-1,f(a)+1). \square

Ah-ha! Now it’s clear why we’re interested in these sorts of function. Such a `locally surjective function‘ would be a nowhere continuous function which nevertheless has the IVT property. The big question now is: does any locally surjective function exist?

Well, YES. Such functions do exist, and there are several ways to find them. First we give a slightly abstract construction, which might seem a little unsatisfactory.

Method 1. Recall that the cardinality of \mathbb{R} is 2^{\aleph_{0}}, and the cardinality of \mathbb{R}/\mathbb{Q} is also 2^{\aleph_{0}}. We may choose a bijection


Also, let \phi:\mathbb{R}\longrightarrow\mathbb{R}/\mathbb{Q} be the natural map sending each real number r to its coset modulo \mathbb{Q}, namely r+\mathbb{Q}. Note that \phi is surjective. Thus, the composition F\circ\phi:\mathbb{R}\longrightarrow\mathbb{R} is also a surjection. Indeed, by the density of the rationals in the reals, each proper open interval (a,b) contains a representative of each coset of \mathbb{Q} in \mathbb{R}. That is, for each r\in\mathbb{R}, there exists q\in\mathbb{Q} such that r+q\in(a,b). A consequence of this is that the restriction to (a,b) of \phi is still surjective onto \mathbb{R}/\mathbb{Q}. Finally, the restriction of F\circ\phi to (a,b) is still surjective onto \mathbb{R}, which shows that F\circ\phi is locally surjective, as required.\square

Wonderful, that was easy. But it wasn’t particularly constructive. Perhaps it felt a little unsatisfactory to rely on the cardinalities of these sets. Is there a better way?

That’s where I’ll leave it for the moment. To be continued!

In the meantime, try to construct a locally surjective function by using the decimal expansion of each real number.