Every big project starts with a deceptively small question. For me, it was: how do you turn a carved letter into data?
APEX (Alphabetic Paleography Explorer) is my attempt to map how the Greek alphabet developed and spread—first across Greek-speaking regions, then into other scripts entirely. But before I can compare, model, or visualize anything, I need something more fundamental: a dataset that doesn’t just record letters, but understands them. That’s where things get tricky.
Step 0: Drawing the Inscriptions


Most corpora don’t offer clean, high-res images. They give us facsimiles—drawn reconstructions, often made by epigraphers decades ago. I tried using automated skeletonization on those, but the results were messy and inconsistent. So I went manual: scanning documents and tracing letters by hand on my iPad.
It’s slow. But it gives me clean, consistent vector forms that reflect how letters were actually drawn—and forces me to look closely at every curve, stroke, and variation. In a sense, this is my own kind of excavation.
What I Track
Each inscription gets logged with basic info: where it was found, what it was written on, when it was made (as best we can tell), and how damaged it is. But the real heart of the project is the letters.
For each character, I record:

- Visual traits (curvature, symmetry, stroke count, proportions)
- Layout (spacing, alignment, writing direction)
- Function (sound value, graphemic identity)
- Notes on ambiguity or damage
From this, I can start comparing how different regions handled the same letter—Did their rho have a loop? Was their epsilon closed?—and whether that tells us something about cultural contact or local invention.
The Workflow
The data entry pipeline looks like this:
- Scan + trace the letterform
- Enter the inscription’s metadata
- Manually mark letter positions and reading direction
- Extract geometric features automatically
- Save everything as structured, nestable JSON
It’s part computer vision, part field notes, and part quiet staring at a very old alpha until you start to feel like it’s looking back.
Why This Level of Detail?
Because I want to ask big questions—how alphabets travel, which paths are innovations vs. imitations—but I don’t want to ask them fuzzily. Too much work on writing systems either leans purely qualitative or strips out the messiness for the sake of clean data. APEX is an attempt to hold both: interpretive richness and formal structure.
This dataset—AlphaBase, soon to be expanded to other open-access museum collections and public domain corpora—is the scaffolding. It’s how I’ll test transmission models later on. But even on its own, it’s already revealing things—like which letterforms stay stable across centuries, and which are quick to splinter under pressure.
APEX begins here: not with theory, but with tracing. With building a system that doesn’t just store letterforms, but actually listens to what they’re doing. That’s what this first trench is for. Now I get to start digging.
Leave a comment