back to homepage

41% of Roads Lead to Homotopical Algebra

By repeatedly clicking the first link on a Wikipedia article, you reach the article on philosophy 97% of the time. I wondered if a similar principle applied to a wiki about mathematics and physics from an abstract point of view called nLab.

nLab hosts a mirror of its source code on GitHub. I shallow cloned a copy onto my computer, specifically the commit be209ad or revision 216776 which dates to the 10th of January, 2025. Here is the Python code I used to process the articles into a digraph:

import glob
import re
from pathlib import Path

import networkx as nx
from tqdm import tqdm

redirects = {}
data = {}
for p in tqdm(glob.glob('**/content.md', recursive=True)):
    p = Path(p)

    name = (p.parent / 'name').read_text().strip()
    data[name] = []
    redirects[name] = name

    lines = [
        line
        for line in p.read_text().splitlines()
        if not line.startswith('>') # remove blockquotes
    ]
    try:
        lines = lines[lines.index('{:toc}') :]
    except ValueError:
        pass

    for m in re.findall(r'\[\[.*?]]', '\n'.join(lines)):
        m = m[2:-2]
        if m.startswith('!redirects'):
            redirects[m.removeprefix('!redirects').strip()] = name
        elif not m.startswith('!'):
            if len(data[name]) >= 1:
                continue
            if m.lower().startswith('nlab:'):
                m = m[5:]
            if '#' in m:
                m = m.split('#')[0]
            data[name].append(m.split('|')[0].strip())

for k in list(data.keys()):
    # print links that do not exist, for debugging.
    # for link in data[k]:
    #     if link not in redirects:
    #         print(link)
    data[k] = set([redirects[link] for link in data[k] if link in redirects])
    if len(data[k]) == 0 or k in data[k]:  # remove empty nodes and loops
        del data[k]

G = nx.DiGraph(data)
nx.write_graphml(G, 'graph.gml')
nx.write_gexf(G, 'graph.gexf')
nx.nx_agraph.write_dot(G, 'graph.dot')

The code is complicated by the existence of "redirects" and other considerations. For example, the code does not count links before the table of contents or links inside block quotes.

I wrote two functions to find the simple longest path (simple in that the path contains only unique vertices) and the strongly connected components that are sinks in the sense that no vertex in the component connects outside the component. We can also enumerate all articles that are reachable into one of these sinks.

G = nx.read_graphml('graph.gml')

def find_longest_path(G):
    def dfs(path):
        best = path.copy()
        for n in G.neighbors(path[-1]):
            if n not in path:
                new = dfs(path + [n])
                if len(new) > len(best):
                    best = new
        return best

    h = []
    for node in G.nodes():
        path = dfs([node])
        heapq.heappush(h, (-len(path), path))

    print(heapq.heappop(h)[1])
    print(heapq.heappop(h)[1])
    print(heapq.heappop(h)[1])


def find_all_sinks(G):
    sinks = []
    for scc in nx.strongly_connected_components(G):
        for node in scc:
            if any(n not in scc for n in G.successors(node)):
                break
        else:  # no break
            sinks.append(scc)

    for sink in sinks:
        node = list(sink)[0]
        
        print(sink)
        print(reachable_nodes := {
            n for n in G.nodes()
            if nx.has_path(G, n, node)
        })

If we examine the connected sinks, we find that the top 10 represent 93% of the total 18,356 articles. It means, if you randomly choose an article on nLab, there is a 93% chance that by clicking on the first link you will end up on one of these articles:

  1. homotopical algebra, higher algebra (7,593 articles lead to these)
  2. interaction, experimental observation, physics, classical mechanics, measurement, classical physics, observable universe, Lagrangian density (2,853 articles lead to these)
  3. point, edge, category, vertex, quiver, graph (2,614 articles lead to these)
  4. topological space, space (1,910 articles lead to these)
  5. logic, deduction (1,431 articles lead to these)
  6. computer science, program (243 articles lead to these)
  7. cartesian product (151 articles lead to these)
  8. measure space, measure theory, measurable space (124 articles lead to these)
  9. Alexander Grothendieck, EGA (90 articles lead to these)
  10. symmetry (88 articles lead to these)

Notably, there is a (surprising) 41% chance you end up on either homotopical or higher algebra. I did not expect them to rank so highly, given that I've never heard of those subjects before. Apparently homotopical algebra is just homological algebra but in an non-abelian setting.

Now, if we take a look at the longest simple path, it would take you an impressive 21 steps starting from the history of universal homotopy theories to arrive at homotopical algebra.

 1. Universal homotopy theories > history
 2. Universal Homotopy Theories
 3. Daniel Dugger
 4. Michael Hopkins
 5. Mark Mahowald
 6. Haynes Miller
 7. elliptic Chern character
 8. higher chromatic Chern character
 9. Chern character
10. universal characteristic class
11. characteristic class
12. cohomology
13. (infinity,1)-category
14. (n,r)-category
15. higher category theory
16. category theory
17. mathematics
18. philosophy of mathematics
19. philosophy
20. higher algebra
21. homotopical algebra

In fact, there exists only one other path of the same length, which arrives at Michael Hopkins through equivariant homotopy theory -- table > Global Homotopy Theory and Cohesion > Charles Rezk > Michael Hopkins instead.

I tried visualizing the graph with gephi and other tools but its nothing worth showing. No insights to be gleamed that way. It might be interesting to consider not just the first link but, say, the first 20 links; we can then analyze the popularity of an article (with pagerank?), run community detection algorithms, or analyze various centrality metrics.