41% of Roads Lead to Homotopical Algebra
By repeatedly clicking the first link on a Wikipedia article, you reach the article on philosophy 97% of the time. I wondered if a similar principle applied to a wiki about mathematics and physics from an abstract point of view called nLab.
nLab hosts a mirror of its source code on GitHub. I shallow cloned a copy onto my computer, specifically the commit be209ad
or revision 216776
which dates to the 10th of January, 2025. Here is the Python code I used to process the articles into a digraph:
import glob
import re
from pathlib import Path
import networkx as nx
from tqdm import tqdm
redirects = {}
data = {}
for p in tqdm(glob.glob('**/content.md', recursive=True)):
p = Path(p)
name = (p.parent / 'name').read_text().strip()
data[name] = []
redirects[name] = name
lines = [
line
for line in p.read_text().splitlines()
if not line.startswith('>') # remove blockquotes
]
try:
lines = lines[lines.index('{:toc}') :]
except ValueError:
pass
for m in re.findall(r'\[\[.*?]]', '\n'.join(lines)):
m = m[2:-2]
if m.startswith('!redirects'):
redirects[m.removeprefix('!redirects').strip()] = name
elif not m.startswith('!'):
if len(data[name]) >= 1:
continue
if m.lower().startswith('nlab:'):
m = m[5:]
if '#' in m:
m = m.split('#')[0]
data[name].append(m.split('|')[0].strip())
for k in list(data.keys()):
# print links that do not exist, for debugging.
# for link in data[k]:
# if link not in redirects:
# print(link)
data[k] = set([redirects[link] for link in data[k] if link in redirects])
if len(data[k]) == 0 or k in data[k]: # remove empty nodes and loops
del data[k]
G = nx.DiGraph(data)
nx.write_graphml(G, 'graph.gml')
nx.write_gexf(G, 'graph.gexf')
nx.nx_agraph.write_dot(G, 'graph.dot')
The code is complicated by the existence of "redirects" and other considerations. For example, the code does not count links before the table of contents or links inside block quotes.
I wrote two functions to find the simple longest path (simple in that the path contains only unique vertices) and the strongly connected components that are sinks in the sense that no vertex in the component connects outside the component. We can also enumerate all articles that are reachable into one of these sinks.
G = nx.read_graphml('graph.gml')
def find_longest_path(G):
def dfs(path):
best = path.copy()
for n in G.neighbors(path[-1]):
if n not in path:
new = dfs(path + [n])
if len(new) > len(best):
best = new
return best
h = []
for node in G.nodes():
path = dfs([node])
heapq.heappush(h, (-len(path), path))
print(heapq.heappop(h)[1])
print(heapq.heappop(h)[1])
print(heapq.heappop(h)[1])
def find_all_sinks(G):
sinks = []
for scc in nx.strongly_connected_components(G):
for node in scc:
if any(n not in scc for n in G.successors(node)):
break
else: # no break
sinks.append(scc)
for sink in sinks:
node = list(sink)[0]
print(sink)
print(reachable_nodes := {
n for n in G.nodes()
if nx.has_path(G, n, node)
})
If we examine the connected sinks, we find that the top 10 represent 93% of the total 18,356 articles. It means, if you randomly choose an article on nLab, there is a 93% chance that by clicking on the first link you will end up on one of these articles:
- homotopical algebra, higher algebra (7,593 articles lead to these)
- interaction, experimental observation, physics, classical mechanics, measurement, classical physics, observable universe, Lagrangian density (2,853 articles lead to these)
- point, edge, category, vertex, quiver, graph (2,614 articles lead to these)
- topological space, space (1,910 articles lead to these)
- logic, deduction (1,431 articles lead to these)
- computer science, program (243 articles lead to these)
- cartesian product (151 articles lead to these)
- measure space, measure theory, measurable space (124 articles lead to these)
- Alexander Grothendieck, EGA (90 articles lead to these)
- symmetry (88 articles lead to these)
Notably, there is a (surprising) 41% chance you end up on either homotopical or higher algebra. I did not expect them to rank so highly, given that I've never heard of those subjects before. Apparently homotopical algebra is just homological algebra but in an non-abelian setting.
Now, if we take a look at the longest simple path, it would take you an impressive 21 steps starting from the history of universal homotopy theories to arrive at homotopical algebra.
1. Universal homotopy theories > history 2. Universal Homotopy Theories 3. Daniel Dugger 4. Michael Hopkins 5. Mark Mahowald 6. Haynes Miller 7. elliptic Chern character 8. higher chromatic Chern character 9. Chern character 10. universal characteristic class 11. characteristic class 12. cohomology 13. (infinity,1)-category 14. (n,r)-category 15. higher category theory 16. category theory 17. mathematics 18. philosophy of mathematics 19. philosophy 20. higher algebra 21. homotopical algebra
In fact, there exists only one other path of the same length, which arrives at Michael Hopkins through equivariant homotopy theory -- table
> Global Homotopy Theory and Cohesion
> Charles Rezk
> Michael Hopkins
instead.
I tried visualizing the graph with gephi and other tools but its nothing worth showing. No insights to be gleamed that way. It might be interesting to consider not just the first link but, say, the first 20 links; we can then analyze the popularity of an article (with pagerank?), run community detection algorithms, or analyze various centrality metrics.