Tag: APEX Updates

  • APEX Updates, 4: The Pipeline Problem: Computer Vision and the Limits of Manual Tracing

    This inscription, the Poteidaia epigram (IG I³ 1179, CEG 1, no. 10), took me 1.5 hours to trace.
    It contains approximately 260 characters, which works out to a rate of 1 character per 20 seconds.

    After weeks of tracing letterforms by hand—squinting at jagged facsimiles and smoothing them into curves—I’ve hit a bottleneck. Manual vectorization has given me precision and intimacy with the material, but it’s not sustainable. Each traced letter takes time, care, and a degree of interpretive judgment that can’t be scaled easily. I even have a bit of a hand tremor that’s sometimes made me rely on line straightening and curve smoothing, which obviously is going to distort the measurement of features such as symmetry and curvature score. As I move toward building a larger corpus, I’ve had to ask: What’s keeping me from working at the scale this project demands?

    The answer, in short, is the tracing pipeline. My current workflow looks like this:

    1. Scan the inscription facsimile (mostly from IG, plus some drawings from my 2022 semester in Athens)
    2. Import into Adobe Fresco on my iPad
    3. Trace each letter manually, often with correction enabled due to an unsteady hand
    4. Export as SVG
    5. Import to a Python program for analysis with OpenCV
    6. Export measured features to JSON

    This essentially works, but it’s fragile. It depends on my eyesight, my steadiness, and my judgment. More importantly, it doesn’t scale. To move beyond 50 or so well-documented instances, I need to automate at least part of this process.

    I’ve brainstormed a few approaches:

    • Edge detection + curve fitting using OpenCV and Potrace
    • Image preprocessing to isolate ink
    • Eventually: Interactive labeling that lets a human confirm or correct bounding boxes and centerlines before full vectorization

    So far, nothing replaces the hand trace. However, I’m refining the steps—normalizing resolution, simplifying contours (overcounting has been a major problem, even with hand-vectorization, surprisingly), and reducing noise—so that a machine can at least propose a first draft. Once I trust the pipeline, I can begin comparing letters in bulk—but not until then.

    For now, I’m working under this model both as a proof of concept and because I have a hard deadline: an MVP (minimum viable product) is due on April 25th for my final project in my Data Science for Archaeology class. (That’s in NYU’s Anthro department, for anyone curious). That constraint is shaping my whole approach—what gets prioritized, what gets cut, and how I balance the methodological ideals with the practical demands of execution.

    In the next post, I’ll zoom out from metadata and back into morphology—not through computation just yet, but through design. This detour will help us begin to operationalize high-level concepts like complexity and similarity—ideas that seem intuitive at first glance, but quickly reveal their computational thorns. APEX Updates, 5 will explore what I’m calling the “Geometric Mindset”: the tendency toward symmetry, regularity, and visual balance that emerges in early Greek inscriptions. What kinds of shapes did Greek scribes favor? What does it mean to “correct” a letter? And how might a cultural aesthetic of order and legibility leave its mark on the alphabet itself?

  • APEX Updates, 2: What is the Transmission Problem? A Brief History of My Research Question

    If the first APEX post was about tracing letters, this one is about why those traces matter. Underneath every variant alpha or eccentric epsilon is a deeper question: when, how, and under what conditions did the Greek alphabet emerge from its West Semitic predecessor? This question, which is known in the scholarship as the transmission problem, lies at the core of alphabetic studies, and despite over a century of scholarship, it remains fiercely contested. To map alphabetic transmission is not just to track graphical similarity, but to reckon with how cultures borrow, adapt, forget, and reimagine the systems by which they make language visible.

    At its simplest, the transmission problem asks: When did the Greeks adopt the Phoenician script? But the real terrain is messier. Did the transfer happen once or multiple times? Was it sudden or gradual? Coordinated or ad hoc? Which region of the Greek-speaking world was first? Exactly which Semitic script was the donor—or was there a confluence of models? And what kind of evidence—linguistic, paleographic, archaeological—should we privilege when our sources conflict?

    Historically, the debate has followed disciplinary lines. Scholars trained in Semitic philology and Near Eastern studies tend to favor a high date for the transmission: sometime in the 11th or 10th century BCE, before the traditional Greek Geometric period (in older scholarship, referred to as the “Greek Dark Age”). This camp emphasizes the strong formal similarities between early Greek and Phoenician letterforms, arguing that Greek epichoric scripts most closely resemble Phoenician forms from around 1050 BCE, not the later shapes one would expect if transmission occurred in the 8th century. Joseph Naveh, for instance, in his landmark Early History of the Alphabet (1982), argued that the Greek system must have branched off before major innovations appear in the Phoenician script, such as the angular mem or evolved forms of shin. Naveh saw the Greek alphabet as a snapshot of an earlier Semitic system—evidence, in his view, of early contact and early borrowing.

    On the other side of the debate, Classicists and archaeologists tend to argue for a low date, favoring the 8th century BCE. Their reasoning draws primarily from stratified archaeological contexts: the earliest securely datable Greek inscriptions—such as the Dipylon oinochoe and the Nestor’s Cup from Pithekoussai—belong to the mid-to-late 8th century. Rhys Carpenter was among the earliest and most forceful voices in this camp. In a 1933 article, he wrote that “the argumentum a silentio grows every year more formidable and more conclusive,” referring to the continued absence of any Greek alphabetic inscriptions predating the eighth century (“The Antiquity of the Greek Alphabet,” AJA 37 [1933]: 8–29, at p. 27). For Carpenter, the lack of material evidence was not a gap to be explained away, but itself a powerful datum: if earlier use had existed, we would likely have found traces by now.

    This school is generally skeptical of typological comparison, pointing out that letterforms evolve unevenly and can be conservative in certain contexts. Archaeological absence, while never conclusive, is taken seriously—especially when paired with the sudden, near-simultaneous appearance of inscriptions across disparate sites in the 8th century, suggesting a relatively rapid uptake of a recently acquired script. Later scholars, such as Barry B. Powell, built on this foundation. In Homer and the Origin of the Greek Alphabet (1991), Powell controversially argued that the Greek alphabet was deliberately invented for the purpose of recording Homeric verse, dating the invention to around 750 BCE. Though widely criticized for its teleology and lack of evidence for such a top-down design, Powell’s theory exemplifies the kind of interdisciplinary crossfire that defines this problem: where linguistic function, archaeological data, and cultural ideology all collide.

    Roger D. Woodard, in Greek Writing from Knossos to Homer (1997), pushed back against Powell while still supporting a relatively late date. Woodard views the alphabet’s adaptation as a process shaped not only by contact with Phoenician traders but also by internal Greek developments—especially the memory of Linear B and broader shifts in literacy practices. He emphasizes the complex interplay between tradition and innovation, seeing the Greek vowel system as a structural solution that could only emerge in a linguistic environment receptive to phonological precision.

    The question remains open, but APEX offers a different kind of approach. Rather than anchoring the debate to a single origin point, I focus on regional trajectories and graphical evidence: how letterforms vary, travel, and settle. If the Semitic party line reads the Greek alphabet as a photograph of Phoenician forms from 1000 BCE, and the archaeological model sees it as an emergent public tool of the 8th century, then I want to understand how specific graphemes move through space and time. Which forms remain stable across centuries? Which mutate rapidly? And what can that tell us about the process of transmission, rather than the moment of origin?

    In fact, the most immediate goal of the APEX project is to evaluate whether the Greek letterforms do, in fact, most closely resemble the Semitic models from around 1000 BCE—as the high-date camp maintains—or if their nearest parallels lie elsewhere in the Phoenician typology. The intention is to move beyond qualitative comparisons and scholarly intuition, toward a quantitative, statistically grounded assessment of letterform similarity. By measuring and modeling these visual relationships systematically, APEX aims to provide a more objective foundation for dating the moment of greatest resemblance between the Greek and Phoenician scripts.

    Rather than jumping straight into letterform similarity metrics, though, the next update will take a detour—one that’s no less crucial. Before the vectors can speak, they must be named, contextualized, and organized. APEX Updates, 3: Encoding Decisions will explore how I’m structuring the metadata that surrounds each traced letter: what counts as “context,” how information is tagged, and why every dataset is also a narrative. As it turns out, deciding how to describe a letter may be just as revealing as deciding how to compare it.

  • APEX Updates, 1: Building a Dataset

    Every big project starts with a deceptively small question. For me, it was: how do you turn a carved letter into data?

    APEX (Alphabetic Paleography Explorer) is my attempt to map how the Greek alphabet developed and spread—first across Greek-speaking regions, then into other scripts entirely. But before I can compare, model, or visualize anything, I need something more fundamental: a dataset that doesn’t just record letters, but understands them. That’s where things get tricky.

    Step 0: Drawing the Inscriptions

    Most corpora don’t offer clean, high-res images. They give us facsimiles—drawn reconstructions, often made by epigraphers decades ago. I tried using automated skeletonization on those, but the results were messy and inconsistent. So I went manual: scanning documents and tracing letters by hand on my iPad.

    It’s slow. But it gives me clean, consistent vector forms that reflect how letters were actually drawn—and forces me to look closely at every curve, stroke, and variation. In a sense, this is my own kind of excavation.

    What I Track

    Each inscription gets logged with basic info: where it was found, what it was written on, when it was made (as best we can tell), and how damaged it is. But the real heart of the project is the letters.

    For each character, I record:

    • Visual traits (curvature, symmetry, stroke count, proportions)
    • Layout (spacing, alignment, writing direction)
    • Function (sound value, graphemic identity)
    • Notes on ambiguity or damage

    From this, I can start comparing how different regions handled the same letter—Did their rho have a loop? Was their epsilon closed?—and whether that tells us something about cultural contact or local invention.

    The Workflow

    The data entry pipeline looks like this:

    1. Scan + trace the letterform
    2. Enter the inscription’s metadata
    3. Manually mark letter positions and reading direction
    4. Extract geometric features automatically
    5. Save everything as structured, nestable JSON

    It’s part computer vision, part field notes, and part quiet staring at a very old alpha until you start to feel like it’s looking back.

    Why This Level of Detail?

    Because I want to ask big questions—how alphabets travel, which paths are innovations vs. imitations—but I don’t want to ask them fuzzily. Too much work on writing systems either leans purely qualitative or strips out the messiness for the sake of clean data. APEX is an attempt to hold both: interpretive richness and formal structure.

    This dataset—AlphaBase, soon to be expanded to other open-access museum collections and public domain corpora—is the scaffolding. It’s how I’ll test transmission models later on. But even on its own, it’s already revealing things—like which letterforms stay stable across centuries, and which are quick to splinter under pressure.

    APEX begins here: not with theory, but with tracing. With building a system that doesn’t just store letterforms, but actually listens to what they’re doing. That’s what this first trench is for. Now I get to start digging.

  • Introduction (Pinned)

    Introduction (Pinned)

    Welcome to To Wake the Dead — a public research journal by Theo Avedisian.

    I study linguistics & archaeology at NYU, where I also run the League of Linguistics. I’m interested in how ancient languages and scripts evolve—how they’re shaped by material practices and continue to speak across space and time. This blog is a place for me to think aloud and document as I work across Greek, Akkadian, Latin, Phoenician, and French; build tools for studying writing systems; and reflect on the messier, more personal side of learning things slowly and deeply. Generally a record of mind, not of life.

    All writing and research shared here represent my own independent work and views. They are not reviewed, endorsed, or representative of any institution with which I am, or have been, affiliated.

    If you’re new here

    These are a few posts that capture both halves of my project—how I think about things and what I’m trying to build.

    Personal reflections:

    Research & method:

    Series

    The Tritropic Line
    Reflections on reading Homer’s Odyssey in three languages—Greek, French (Bérard), and English (Loeb series, Murray). This combines language study and comparative poetics with the slow joy of close reading.

    Tablets and Tribulations
    A record of my work with Akkadian, of which I’m now in my third semester. Named with as much reverence as chagrin.

    APEX Updates
    This is about my current research project on alphabetic transmission and paleography—mostly Greek and Phoenician. It includes progress notes, technical experiments, and the occasional map or dataset that cooperates. More process-oriented than the dedicated project site.

    Adventures in Materiality
    Here I document my experiments in carving, molding, inscribing, and replicating artifacts. The work is messy, and that’s the point.

    Linguistics for All
    Posts rooted in the events and conversations I help organize, especially through the NYU League of Linguistics. A mix of accessible theory, reflections on public linguistics, and notes on language’s role in community.

    Tools of the Trade
    Every so often, I write about a tool that has helped me read, write, map, or parse. This could be a corpus, a piece of software, or just a clever work-around I’ve devised. One upcoming project: online flashcards of Latin terms found in inscription commentary, making corpora more accessible for non-specialists.

    The Close Read
    Wherein I do a deep dive into a piece of literature, though some nonfiction as well. A fair bit of poetry, as it lends itself to my style.

    Marginalia
    A space for stray thoughts, reflections on studying dead languages as a living person, and the emotional archaeology that sometimes comes with long-term projects.

    This site is where I work in public—testing ideas, gathering feedback, and learning how attention itself becomes a method. Thanks for reading.

    —T

    Picture: Athens, 2021. Birthplace of my epigraphic obsession.