APEX Updates, 4: The Pipeline Problem: Computer Vision and the Limits of Manual Tracing

This inscription, the Poteidaia epigram (IG I³ 1179, CEG 1, no. 10), took me 1.5 hours to trace.
It contains approximately 260 characters, which works out to a rate of 1 character per 20 seconds.

After weeks of tracing letterforms by hand—squinting at jagged facsimiles and smoothing them into curves—I’ve hit a bottleneck. Manual vectorization has given me precision and intimacy with the material, but it’s not sustainable. Each traced letter takes time, care, and a degree of interpretive judgment that can’t be scaled easily. I even have a bit of a hand tremor that’s sometimes made me rely on line straightening and curve smoothing, which obviously is going to distort the measurement of features such as symmetry and curvature score. As I move toward building a larger corpus, I’ve had to ask: What’s keeping me from working at the scale this project demands?

The answer, in short, is the tracing pipeline. My current workflow looks like this:

  1. Scan the inscription facsimile (mostly from IG, plus some drawings from my 2022 semester in Athens)
  2. Import into Adobe Fresco on my iPad
  3. Trace each letter manually, often with correction enabled due to an unsteady hand
  4. Export as SVG
  5. Import to a Python program for analysis with OpenCV
  6. Export measured features to JSON

This essentially works, but it’s fragile. It depends on my eyesight, my steadiness, and my judgment. More importantly, it doesn’t scale. To move beyond 50 or so well-documented instances, I need to automate at least part of this process.

I’ve brainstormed a few approaches:

  • Edge detection + curve fitting using OpenCV and Potrace
  • Image preprocessing to isolate ink
  • Eventually: Interactive labeling that lets a human confirm or correct bounding boxes and centerlines before full vectorization

So far, nothing replaces the hand trace. However, I’m refining the steps—normalizing resolution, simplifying contours (overcounting has been a major problem, even with hand-vectorization, surprisingly), and reducing noise—so that a machine can at least propose a first draft. Once I trust the pipeline, I can begin comparing letters in bulk—but not until then.

For now, I’m working under this model both as a proof of concept and because I have a hard deadline: an MVP (minimum viable product) is due on April 25th for my final project in my Data Science for Archaeology class. (That’s in NYU’s Anthro department, for anyone curious). That constraint is shaping my whole approach—what gets prioritized, what gets cut, and how I balance the methodological ideals with the practical demands of execution.

In the next post, I’ll zoom out from metadata and back into morphology—not through computation just yet, but through design. This detour will help us begin to operationalize high-level concepts like complexity and similarity—ideas that seem intuitive at first glance, but quickly reveal their computational thorns. APEX Updates, 5 will explore what I’m calling the “Geometric Mindset”: the tendency toward symmetry, regularity, and visual balance that emerges in early Greek inscriptions. What kinds of shapes did Greek scribes favor? What does it mean to “correct” a letter? And how might a cultural aesthetic of order and legibility leave its mark on the alphabet itself?

Comments

3 responses to “APEX Updates, 4: The Pipeline Problem: Computer Vision and the Limits of Manual Tracing”

  1. APEX Updates, 3: Encoding Decisions – To Wake the Dead Avatar

    […] the next update, I’ll cover the baseline nature of the pipeline and what makes it so difficult—but also so […]

    Like

  2. APEX Updates, 12: Teaching the Machine to Read – To Wake the Dead Avatar

    […] beneath that ritual lies a bottleneck—the invisible labor of segmentation. Before any computer can analyze a letter, someone has to […]

    Like

  3. APEX Updates, 13: Designing in the Shadows – To Wake the Dead Avatar

    […] thread has been there since the beginning. Encoding Decisions asked how metadata carries ideology. The Pipeline Problem wrestled with the impossibility of full automation. Teaching the Machine to Read turned that […]

    Like

Leave a comment