Category: Uncategorized

  • APEX Updates, 14: Data, from FAIR to FRAIL

    Clarice from Calvino’s Invisible Cities, as drawn by Karina Puente.

    In the last decade the digital humanities have built an ethics of stewardship around two frameworks: FAIR and CARE.

    Data, we’re told, should be Findable, Accessible, Interoperable, Reusable; its use should uphold Collective benefit, Authority to control, Responsibility, and Ethics. These principles have given structure to a once-couture, even cowboy, practice. They taught us that visibility is a virtue, that openness can be an act of justice. They made data management legible—something one could rate, certify, or defend.

    Yet legibility is never neutral. FAIR presumes that clarity is the highest good; CARE assumes that control can be cleanly assigned. Both, however gently, rest on the dream of completeness: that if we organize our data well enough, we might finally see the whole.

    APEX lives where that dream dies. The inscriptions I trace resist closure. They are fragmentary, re-inscribed, half-lost. Every dataset carries the tremor of its source—a chipped delta, a missing ‘alep, a surface that refuses to yield. The data, like the stones themselves, is frail.

    I’ve begun to imagine a third paradigm: one that keeps FAIR’s discipline and CARE’s ethics but admits that in the humanities, stability is fictional. Call it FRAIL: Findable, Reproducible, Accountable, Interpretive, and Liminal.

    1. Findable—disappearance helps no one.
    2. Reproducible—others should be able to retrace our steps, even if they find another path.
    3. Accountable—provenance and responsibility cannot be dispensed of.
    4. Interpretive—ambiguity, when recorded, becomes part of the evidence itself.
    5. Liminal—some knowledge dwells on thresholds: certainty and speculation, artifact and idea.

    FRAIL doesn’t replace FAIR or CARE but grows from them. It asks what stewardship looks like when the object of study is itself uncertain, when our task is to hold the fragment without pretending it is whole.

    At this point I keep returning to Calvino’s Invisible Cities. In “Cities and Names 4,” he writes of Clarice, a city that forever rebuilds itself from the shards of its earlier selves:

    “Only this is know for sure: a given number of objects is shifted within a given space, at times submerged by a quantity of new objects, at times worn out and not replaced; the rule is to shuffle them each time, then try to assemble them. Perhaps Clarice has always been only a confusion of chipped gimcracks, ill-assorted, obsolete.”

    Clarice is every archive we have ever built. Its fragments persist, rearranged with each generation, their order provisional, their meaning renewed by use. FRAIL data lives in that same condition: never whole, yet never lost—structures of care built from what survives. The humanities have always been a discipline of rebuilding Clarice.

    To keep data FRAIL is therefore not to weaken it but to recognize its true strength: the capacity to bear transformation without disowning its past. Rigor becomes a form of tenderness. Reproducibility includes hesitation. The dataset, like the inscription, becomes layered, self-aware, and open to rereading.

    In APEX I try to move toward that kind of data: technically precise yet narratively honest, transparent about its mediation, willing to show its seams. The goal isn’t immortality but traceability—to make each decision legible without pretending it ends the story.

    Perhaps that is what stewardship finally means: not to eliminate fragility, but to hold it safely, as one holds a fragment of Clarice—knowing it has already been broken, and still believing it can be assembled again.

  • APEX Updates, 13: Designing in the Shadows

    An example of gold lacquerware: it loses something in the harsh light of the catalogue.
    .

    When I was fifteen, I read Jun’ichirō Tanizaki’s In Praise of Shadows without knowing how to name the unease it stirred. I didn’t yet have a project or a discipline, only a sense that technology could be moral in its texture. Even a lamp carries a worldview. Every act of design answers a metaphysical question: what do we think knowledge should look like?

    What would a washing machine look like if designed from within a Japanese sensibility rather than imported Western logic? A fountain pen? A bathroom?

    Years later, as I train segmentation models and debug recursive loops, I keep thinking about that line. I’ve started to wonder what a machine built from a Middle Eastern sensibility would look like—what it would mean to design software from within the epistemic lineage that produced the alphabets I study, to work from my own heritage as a Middle Eastern researcher myself.

    Shadow-Aware Programming

    So much of American computer science assumes that clarity is the highest good: explicit is better than implicit, errors must be handled, uncertainty resolved. It’s a worldview of grids, graphs, and proofs, of light without shadow. Even our metaphors for computation—pipelines, flows, stacks—presume continuity and containment.

    The world APEX studies was never built that way. The alphabets that seeded Greek writing came from cultures that held multiplicity as a form of precision. Meaning could live in the margin, in the half-seen ligature, in the polyvalence of a single sign. In those traditions, opacity was structure. The text was not a window. Knowledge emerged through relation rather than reduction.

    When I look at my code through that lens, I see just how Western its instincts are. Every line by its nature insists on disambiguation; every model optimizes for convergence. What would it mean to build a system that allowed divergence to count as truth?

    A Patinated Algorithm

    APEX has been my laboratory for this question. Each stage of the pipeline forces me to choose between clarity and care.

    American frameworks reward closure: every function must return a value, every process must resolve. But paleography resists closure. The most honest state is, often, a definite maybe. I’ve built APEX’s schemas to accommodate that—to let “uncertain,” “variant,” and “disputed” be valid outputs, not placeholders for failure.

    The result is a dataset that behaves more like a commentary tradition than a database: multiple voices, layered readings, recursive disagreements. It’s an architecture of coexistence, a Talmudic litigation of love. In its small way, APEX tries to reintroduce that Middle Eastern mode of knowledge: the one that assumes that understanding doesn’t replace mystery but deepens it.

    Inheritance and Interference

    This isn’t a manifesto against computation. It’s a recognition that computation, as I’ve encountered it in the American university system, carries unspoken moral premises: that a problem can be solved, that noise can be filtered, that ambiguity is a bug.

    But the alphabet itself was a product of a world where those premises didn’t hold. The Phoenician scribes who first shaped letters into repeatable forms were negotiating between sound and sign, god and stone. Their writing system was a compromise between the seen and the said.

    When I import their traces into Python scripts and JSON schemas, I feel that interference—the hum between two epistemologies. One seeks light, the other shadow. One builds toward universality, the other toward particularity. APEX lives in that interference pattern. It’s less a reconciliation than a coexistence.

    Learning to Code Otherwise

    Building a model from a Middle Eastern epistemology doesn’t mean using “Eastern data” or aestheticized, exoticized metaphors. It means rethinking what the model owes to its object. It means writing code that holds its own uncertainty—that treats silence, loss, and contradiction as data types.

    I’ve been experimenting with forms of graceful incompletion, so to speak: workflows that stop short rather than forcing a decision, algorithms that surface disagreement instead of averaging it out. I’ve even started thinking about whether uncertainty could be rendered visually—whether the model’s hesitation could be made visible, like a faint shadow under each bounding box. Confidence intervals are a start. I want to take it further.

    It feels strange, almost perverse, to build a machine that admits it doesn’t know. But perhaps that’s what ethical technology for the ancient world should look like: something provisional, interpretive, humble enough to remain unfinished.

    The Long Continuity

    Looking back, this thread has been there since the beginning. Encoding Decisions asked how metadata carries ideology. The Pipeline Problem wrestled with the impossibility of full automation. Teaching the Machine to Read turned that impossibility into method—training the computer not just to recognize letters but to inherit human hesitation.

    “Designing in the Shadows” is another turn of that screw. The question now is not how to teach the machine to see, but how to teach it to doubt.

    Maybe that’s what it means to build from the epistemology of the alphabets themselves. Treat uncertainty as the condition of meaning.

    Coda

    Tanizaki wrote that “were it not for shadows, there would be no beauty.”

    In American computer science, we’re taught the opposite: that shadow is error, that beauty lies in perfect visibility.

    But my work lives somewhere in between. Every day I toggle between these grammars of knowing—between the brightness of the machine and the opacity of the inscription.

    If In Praise of Shadows sought a cultural continuity within modernization, perhaps APEX is seeking a moral continuity within computation. I hope, at least, to leave a trace of that other way of seeing—

    A system that illuminates, but never over-illuminates.

    A technology that leaves the dead a little privacy, while still letting them speak.

  • APEX Updates, 12: Teaching the Machine to Read

    The Vision and the Bottleneck

    A manually vectorized, and therefore easily segmented, inscription.

    Hours pass like this: bent over an iPad image, tracing one letter after another. Each alpha, each resh, becomes a small act of care—lines pulled taut, cleaned, bounded. It’s a ritual, somewhere between drawing and deciphering, between study and touch.

    But beneath that ritual lies a bottleneck—the invisible labor of segmentation. Before any computer can analyze a letter, someone has to isolate it: to draw a box, to declare this is glyph and that is background. It’s the unseen threshold of every computational paleography project. The machine cannot learn to read until it can first learn to see.

    That’s the problem and the promise. If I can teach the system to segment inscriptions automatically, with epigraphic precision, the entire pipeline changes. What has so far depended on human dexterity could become scalable without losing rigor. The ambition is simple to state, and enormous to achieve: to automate attention.

    Automating Attention

    Segmentation, in plain terms, means teaching the computer to draw invisible boxes around ancient letters. It sounds trivial, but it’s the hinge between archaeology and AI, between artifact and data. Every analysis I’ve ever run on symmetry, complexity, or transmission rests on that first act of demarcation. Without it, nothing else holds.

    Computational paleography has come far in feature extraction, clustering, and visualization. But segmentation remains human-bound, a couture craft disguised as mechanical preprocessing. To “close the loop”—to move from semi-automated annotation to a genuine vision pipeline—is to let the system begin where I do: by noticing.

    This, then, is the next frontier of APEX: dexterity to detection.

    Teaching the Machine to See Like an Epigrapher

    Overenthusiastic reading.
    Underenthusiastic detection.

    Segmentation is difficult not because machines are bad at vision, but because ancient writing is not meant for them. Lighting varies; stone gleams or shadows; lead reflects; letters fade into corrosion or merge in ligature. Scripts mutate, overlap, and misbehave. Non-Latin alphabets, especially, resist the tidy categories on which modern OCR depends. The machine expects Helvetica. What it gets is Styra.

    Traditional OCR fails here because it assumes clean, printed forms—typographic regularity, not weathered intent. An epigrapher reads not just the mark but the gesture that made it. To approximate that sensitivity computationally is less about brute accuracy than about modeling discernment.

    At present, APEX’s dataset contains roughly 2,000 glyphs across 50 inscription photos—each traced and annotated by hand. Those drawings are not just data; they are training material. They encode what human attention looks like when applied to ancient form.

    The great tension ahead lies between fidelity and generalization: the need to preserve nuance while building a model that can scale. Building a dataset is a kind of digital fieldwork: slow, repetitive, and quietly devotional.

    Getting closer, but still showing the limits of simply preprocessing and segmenting without ML.

    Building Iteratively

    Phase 1 — Manual Baseline
    Create high-quality, human-annotated bounding boxes—each verified against the traced vectors. Establish a consistent schema: filenames, rights metadata, feature tables. This becomes the ground truth. Approximately 1/5 through my 10k glyph goal.

    Phase 2 — Model Prototyping
    Train a small (YOLOv8 perhaps) segmentation model on the initial 2k glyphs. Evaluate precision and recall. Visualize false positives and negatives. Adjust preprocessing—normalize lighting, sharpen contrast, calibrate thresholds.

    Phase 3 — Iterative Retraining
    Adopt a human-in-the-loop cycle. The model proposes and asks for clarifications; I correct and feed it the new data it needs in an active learning process. Those corrections return to the model as new training data. Each iteration, where I feed it 1k glyphs more at a time in targeted passes, improves both speed and fidelity.

    Phase 4 — Integration with APEX
    Fold the trained segmenter into the APEX engine. Users will be able to run it locally or via API, generating structured, IIIF-ready outputs. The dashboard will visualize segmentation confidence and quality metrics in real time.

    Phase 5 — Cross-Script Generalization
    Extend beyond Greek: Phoenician, Aramaic, Lydian, Coptic, and others. Develop shared feature ontologies for cross-script comparison. The horizon: a universal segmentation model for alphabetic writing.

    What It Means to Eliminate a Bottleneck

    To automate segmentation is not to replace the human but amplify what the human can attend to. The dream isn’t delegation but acceleration: letting the machine perform the gestures that would otherwise consume a lifetime, so that we can ask better questions.

    The act of teaching a model to see is, in a way, an act of translation. Between disciplines, between forms of attention. APEX began as a bridge between ancient and digital worlds; this next stage extends that bridge into vision itself.

    If segmentation is the bottleneck, then teaching the machine to see is the act of undamming the throat of the alphabet—to let the dead speak again, not in whispers, but in full computational voice.

  • Marginalia, 7: The Archive’s Great Secret

    These thoughts concern the evolution of knowledge and intellectual culture in the general and abstract. They are reflective of no particular environment.

    We receive our disciplines like walled gardens—beautiful, precise, and fenced. We pad through them carefully, tracing old paths, diligent to not disturb the moss. The longer I study, the more I believe those walls are historical habits, not inevitable truths; refined in their purpose yet always evolving.

    I’ve been getting teary-eyed lately thinking about the great askers in history, our Galileos, Hypatias, Wollstonecrafts, Woolfs. It moves me profoundly, how questioning has incited every movement to remake and renew.

    Knowledge, like water, seeps through citation, conversation, and the quiet generosity of people who share what they’ve made. The digital age didn’t invent that impulse—it just revealed how ancient it was. Copyists in monasteries, scribes at their tables, scholars passing marginalia hand to hand: they were already practicing open access, one leaf at a time. Preservation depends on circulation: a text survives because it’s passed along.

    The archive’s great secret is that it wants to be read.

  • The Close Read, 2: “One Art,” Elizabeth Bishop

    The art of losing isn’t hard to master;
    so many things seem filled with the intent
    to be lost that their loss is no disaster.

    Lose something every day. Accept the fluster
    of lost door keys, the hour badly spent.
    The art of losing isn’t hard to master.

    Then practice losing farther, losing faster:
    places, and names, and where it was you meant
    to travel. None of these will bring disaster.

    I lost my mother’s watch. And look! my last, or
    next-to-last, of three loved houses went.
    The art of losing isn’t hard to master.

    I lost two cities, lovely ones. And, vaster,
    some realms I owned, two rivers, a continent.
    I miss them, but it wasn’t a disaster.

    —Even losing you (the joking voice, a gesture
    I love) I shan’t have lied. It’s evident
    the art of losing’s not too hard to master
    though it may look like (Write it!) like disaster.

    Can loss be mastered, or merely rehearsed?

    Bishop’s villanelle proposes, in a way, that repetition is training, can be practiced, that form and control can make loss bearable. Her composure in the face of grief is compelling; stoicism always tempts the wounded mind. Yet the paradox of tone and form—her unflappable cant, the neat tercets, its refrain that promises discipline where grief ought to exist—is impossible to ignore.

    The form is an argument, and its unraveling coherence speaks to a profound tension. Each recurrence of the refrain weakens its authority, until mastery itself begins to sound like mimicry. The poem’s structure mimics denial. Every return of “The art of losing isn’t hard to master” sounds more and more like self-persuasion than wisdom. There’s this extreme rhetoric of control that is increasingly overtaken by the tremor of what escapes it.

    Form as containment—that’s the key thread here. The villanelle is a form obsessed with return, which makes it an ironic vessel for a poem about moving on. Syntax becomes a kind of fate, in my opinion: by choosing this structure, Bishop cages herself in an inescapable neurosis, no doubt intentionally. As with all forms we cling to—habit, routine, scholarship—it becomes both ritual and trap.

    Her quasi-enjambment, too (“—Even losing you…”), stretch the villanelle almost to breaking—but never beyond, not in her hands. Whether it counts as enjambment at all is debatable. It’s nearly a continuous sentence across a stanza break, even though the prior sentence ends with a period. The em dash unsettles that finality: was the thought complete, or has the speaker decided, mid-breath, that it wasn’t?

    There’s a mirroring at work here. The poem’s discipline enacts the speaker’s composure, yet that same discipline exposes her desperation to stay intact—both for herself and for others. To write a poem is to face inward; to publish it, outward. It is a saving of face and a measured loss of it.

    “Lose something every day” sounds like a rule in a manual—domestic(ated), manageable—this impersonal, authoritative voice distances her from the wrenching-away that is loss, and puts her in the territory of disengagement from on high.

    But note the escalating scale of loss—keys, an hour, a mother’s watch, houses, cities, continents, you. We can read this as a kind of curriculum: a stoic pedagogy that keeps failing upward. First comes the loss of convenience and access—doors that briefly refuse to open—then time itself; then a representation of time, something that both measures duration and embodies continuity, that in two senses keeps time. After that, a door that will never open again; then larger places, perhaps soured by memory (it’s unclear to me what Bishop means by losing “cities” and “continents”); and finally, the addressee—the greatest loss of all, by the poem’s own logic.

    Each stanza revises the promise of “no disaster” until it barely, if at all, convinces. By the end, the loss of the addressee—one person—is weighed against irretrievable things: time, heirlooms, memory embodied in place. Bishop strains believability here; repetition becomes not comfort but corrosion, a gradual wearing away of self.

    When does style stop protecting the self and start exposing it? I’d argue that it is the last line: “though it may look like (Write it!) like disaster.”

    That parenthetical rupture is where the villanelle—and language itself—betrays its own extent. Here, language reaches its limit: the moment Wittgenstein warned of, when the boundary of speech becomes the boundary of world. The form insists on repetition even at the poem’s most tormenting point. It’s the poem’s scream under its breath, the instant Bishop forces herself to name what she cannot rationalize away.

    Even the italics matter. It isn’t “Write it!”—as it would be if rendered simply as an inward thought—but “Write it!”, a specific verbal (but not verb-phrase) imperative. The command exposes writing itself as an act of commitment: to inscribe the unbearable, to fix the truth she can no longer evade. And it’s fitting, as Wilde once said, that “a poet can survive anything but a misprint.”

    Yet we cannot ignore the poet’s agency with that line. It’s also an intrusion of authorial will, Bishop interrupting her own line to compel honesty. The command is both a confession and a flouting of form. It punctures the poem’s staid decorum, revealing all that earlier composure was scaffolding for this climactic moment. The poem’s grammar at last fractures under the excruciating pressure of declaring losing the addressee was de minimis.

    To return to the above question: Bishop’s control is exquisite, but her vulnerability is captured perfectly in the syntax: note the doubling of “like,” something readers may gloss over, “autocorrecting” in their brain, but it’s this very stutter that beautifully undoes her mastery. The imperative tone has turned to pleading.

    The art of losing proves to be an art only because it cannot be perfected.


    Close reading itself may be a kind of loss, losing the illusion (delusion?) that meaning is stable, that distance is comfortable. Bishop’s refrain mirrors the scholar’s: returning to the same line, across space or time, until it yields—or refuses to. The art of reading, like that of losing, “isn’t hard to master,” at least until it reaches something too-close, enough to resist impersonal analysis. At that point, one must remain—with the text, with the loss—awake to what cannot be understood without the self.

    Bishop’s villanelle doesn’t close; it circles. The refrain ends where it began, but altered by exposure. Our poet doesn’t teach detachment in this poem, despite appearing to. The didacticism is rather about endurance through excruciating pain—a theme I can’t help but connect to Homer’s Odyssey, to be driven far (ὃς μάλα πολλὰ πλάγχθη) but still moving, even after two decades of loss after loss. The villanelle, then, like the epic, becomes a ritual for staying with pain until it can be metabolized into form—not escape it, but to give it shape.

  • Marginalia, 6: On the Doubt

    Temple of Apollo Zoster in Vouliagmeni, photographed near the excavation where I worked in 2022—my first experience of how much patience the ground demands. Author photo.

    The trouble of big projects isn’t the beginning, when the spark of the idea sustains you, nor the final stretch, when you have the promise of closure and the satisfaction of naming what you’ve completed. It’s the middle, at least for me. The quiet question arrives: is this worth the hours, the labor, the limited energy I have? I wonder whether I’m still uncovering something. Every metric, every dataset, every traced line starts to blur.

    That question finds me somewhat often—when a script won’t compile, when a line refuses to resolve, when a day’s tracing yields nothing new. It’s never some dramatic collapse, just a slow thinning of conviction.

    I used to take that as failure, as proof that I wasn’t cut out for long projects or that real scholars didn’t feel this way. But doubt, I’ve learned, is part of the method. It means I’ve reached the part of the work where certainty would be dishonest. Projects that matter eventually resist you—they stop confirming your brilliance and start asking what you actually believe in.

    Doubt is the test of whether the initial fire was real. To stay with something is the difference between driven and devoted. Ambition wants progress; devotion wants presence. The work I trust most emerges from that quieter side of the self, the kind that endures even without visible reward.

    When I’m in that fog, I do two things. The first is find milestones to celebrate and share; that’s a major function of the APEX Updates series. The second is think about field archaeologists in the trench. Most days, you don’t find anything spectacular. You scrape, record, bag, label, and go home covered in dust. Perhaps you down some Advil if the ground was especially uncooperative that day. (I’ve been there.) Meaning comes later, when the layers are mapped and the fragments aligned.

    The paradox of discovery: the ground never tells you you’re close.

    If Marginalia 3 was about the spark that begins a project, this is about what happens when the spark dies down. To keep going is to trust in slow revelation. That’s what this phase is: not failure, not even frustration, but the apprenticeship to patience.

  • APEX Updates, 11: APEX’s Six Layers

    Current GitHub repository structure. Detect layer to come.

    As APEX has grown, its structure has become clearer. What began as 3 files—a single .html, .css, and .py each—in virtually continuous script (perhaps mirroring ancient inscriptions themselves) has developed into a layered system. Each part carries its own method and rhythm, but together form a complete research environment—a cycle that moves from inscription to dataset, from dataset to interpretation, and from interpretation to discovery.

    The division is not hierarchical. Each layer depends on the others and stays in motion with them. Separating them made the system intelligible, something that can scale and adapt, something that can be thought inside of, with, and through.

    0. Detect

    APEX-detect is the system’s first act of perception, currently in development with a machine learning protocol. It won’t interpret or analyze; it’ll isolate. From the continuous surface of a photograph, it will identify the discrete marks that will later become glyphs. Detection turns light and texture into segmented boundaries, carving the visual field into fragments the Engine can later measure, compare, and encode.

    1. Engine

    APEX-engine is the analytical component. It extracts form from image, turning traced/detected letters into measurable geometry. This layer defines how APEX perceives the world of writing, translating shape into information that can be analyzed, compared, and preserved.

    2. Database

    APEX-db (public-facing name: AlphaBase) gathers what APEX-engine records. It stores inscriptions, glyphs, and metadata in a coherent schema that keeps the material legible across regions, languages, and media. Each entry carries its archaeological, linguistic, and graphical context. The database ensures continuity and keeps every trace.

    3. Relate

    The APEX-relate layer will give structure to connection. It is to join individual records into networks, revealing correspondences among letters, scripts, and regions. Relate will be the layer of inference, where data begins to show its meaning.

    4. Find

    APEX-find will make it all navigable, the memory and relations established by the previous two layers. It transforms the corpus into a searchable field, allowing queries by inscription, feature, script, or drawn form. Each result will draw on the underlying relational structure, turning the accumulated data into an accessible landscape. Find will be the gateway between APEX and its users, the point where the internal logic of the system meets the act of discovery.

    5. Witness

    APEX-behold is going to make the system visible to itself. It will render “on-premises” (as opposed to in external environments) the processes and relationships established by the other layers into sight: plots, maps, and visual fields where patterns take shape. The interface will allow the eye to follow what the model perceives, aligning computation with human attention.

    The System as a Whole

    These five layers create a loop of perception and understanding. The engine observes, the database retains, relate connects, find reveals, and witness visually consolidates. Each layer holds a distinct epistemology, shaping a system that can study writing as both artifact and behavior. APEX has reached a stage where its structure echoes its object of study: the layered evolution of form into meaning.

  • APEX Updates, 10: Glyph to System



    Complexity trends for three letters over 700 years on Euboea,
    from my forthcoming diachronic study.

    When I began APEX seven months ago, I wrote that before theory comes tracing—the act of turning old strokes into structured data. Half a year later, that small act has grown into something larger: a functioning, extensible research environment capable of analyzing thousands of letterforms across hundreds of inscriptions.

    If the last few months have been about proving the analytical potential of APEX, this one has been about deepening its usability—turning it from a powerful engine into a genuine workspace. The latest version, 1.9.1, focuses on the graphical user interface, which I only dreamed of last April. The idea was to give form to the human side of paleographemics: how scholars see, record, and reason through inscriptions.

    The platform now balances two goals that usually pull in opposite directions. It is rigorous enough to handle multilingual, multi-directional corpora across millennia, yet flexible enough to capture interpretive uncertainty and scholarly disagreement.

    1. The Corpus Grows

    APEX now contains its first completed regional corpus: 209 lead curse tablets from 5th-century BCE Styra (Euboea), encompassing 1,857 individual glyphs, each manually traced, annotated, and analyzed through the full APEX pipeline. Alongside this is a parallel dataset of another 99 Euboean inscriptions, spanning roughly eight centuries—from the Archaic through the Hellenistic period—processed through an exploratory workflow still in development.

    Together, these two datasets represent 4,990 glyphs from the island of Euboea alone, making this one of the largest and most detailed regional paleographic corpora currently in existence. This body of material allows APEX not only to test technical scalability but to examine a single region’s graphical traditions across a complete chronological arc.

    Selected gallery of “Most Typical Glyphs by Letter” from Styra lead tablet report.

    Within Styra alone, clear structural tendencies emerge. Across the 1,857 analyzed glyphs, symmetry and complexity show a strong inverse relationship, as expected, but now quantified. Others—especially theta and, unexpectedly, many iotas—deviate, showing that simplicity and circularity were not universal ideals but locally negotiated habits.

    Classical intuitions—decreasing complexity through the Archaic and Early Classical periods, a plateau, then a late stylistic uptick—are confirmed here, but more importantly, they’re now quantified

    Across the broader eight-century span, early tendencies toward angularity give way to smoother, more balanced forms. Though not universally—delta stands out, evolving from a 2-stroke rounded D-shape to the familiar 3-stroke angular Δ, a shift that mirrors the broader transition from ductus-driven to design-driven writing. Nonetheless, this broadly confirmed long-standing epigraphic intuitions, but for the first time, making them concrete and measurable.

    Taken together, the data suggest that what epigraphers once described qualitatively as a “balanced hand” or “tidy style” can now be measured as a structural principle—evidence that writers (whoever they may be, trained scribes or so-called ordinary people) in 5th-century Styra pursued an underlying visual economy that blurred the boundary between mechanical habit and aesthetic intention.

    2. The Interface Takes Shape

    v1.9.1 particularly hinges on a comprehensive Inscription Metadata panel—a modular framework for recording everything an inscription can tell us: provenance, language, writing direction, translation, confidence, and context.

    The (very granular) metadata panel, designed for maximum precision. This will later
    allow highly dimensional, unsupervised machine learning (ML) to be performed.

    Furthermore, there’s an extensive rights and permissions panel just below that. This enables future rights-safe integration with public databases, preserving sensitive and restricted information from accidental reproduction—critical in heritage preservation and in preventing looting/destruction, especially in conflict zones. Now that I’m pivoting to working with data outside of the public domain, this is a non-negotiable feature, and I hope this is a practice that others replicate when fusing rights-diverse corpora. Below is the model of that.

    Each record can now be broken into sublines, allowing users to specify separate languages and writing directions within a single inscription. This makes it possible to manually encode boustrophedon layouts, alternating left-to-right and right-to-left lines without losing reading order. The same applies for multilingual inscriptions: the user isolates each portion of a different language subline to analyze individually. However, true schlangenschrift—the serpentine style of continuous directional change—remains a technical frontier still ahead, but the architecture for handling it is now in place.

    Bounding boxes are direction-aware, indexed according to reading orientation, ensuring that extracted visual features align correctly with the direction of writing. Metadata imports from museum APIs are now supported, and flexible fields allow users to enter additional descriptors such as inscription purpose, formula, or archaeological context.

    3. Encoding the Human Element

    Each glyph now carries its own metadata through a compact per-glyph panel. Users can record completeness, stroke count, and intersection data, and—critically—can flag alternative readings where forms are contested. The new Scholarship Mode attributes alternate identifications to specific scholars or corpora, creating a visible interpretive genealogy and turning disagreement into structured data.

    5-tier completeness flags now present.

    What results is a layered model of knowledge. APEX no longer treats the epigrapher’s uncertainty as noise; it considers it data. Each recorded disagreement becomes part of the historical record of how these inscriptions have been read.

    User can now cite alternative readings and the reasoning for them.

    4. Intelligent Defaults

    Five editable, language-aware dictionaries now exist for the GCELL script cluster, i.e., Greek, Coptic, Etruscan, Latin, and Lydian. There are another five dictionaries on the way for the PASHA branch: Phoenician, Aramaic, Semitic, Hebrew, and Arabian. This capability autofills letter names with their expected stroke and intersection counts, cutting per-glyph processing time by ~70%. These default expectations provide baselines for feature extraction and make visible the subtle divergences that define local or experimental hands.

    The dynamic Greek dictionary.

    5. From Interface to Insight

    The combination of robust metadata, per-glyph fields, and structured dictionaries has turned APEX into a living research environment. A researcher can now import an object from a museum API, record multilingual metadata, define directionality, tag individual glyphs, and export a ready-to-analyze JSON file—all within a single interface.

    Early exploratory notebooks using the full eight-century dataset are already visualizing regional drift and stylistic convergence over time. Though not yet publishable, these models provide a first view of how letterforms move within and between centuries, forming clusters of continuity and outliers of innovation.

    Critically, they also provide examples contrary to certain received wisdom. See the following chart, and note that the p-value of this correlation is p = 0.37, well above the <0.05 threshold for statistical significance in the social sciences.

    In the eight-century dataset, letter frequency shows only a weak and statistically insignificant relationship to graphical stability. The trend line slopes slightly downward—more common letters like alpha, sigma, and omicron are somewhat more stable—but the effect is far from reliable. This suggests that the conventional linguistic expectation—that frequently used units remain more conservative—does not translate cleanly to letterforms. Here, stability may follow style and medium more than frequency.

    6. Reflection

    The major achievement of this phase is not simply scale—it’s integration. APEX has reached a point where drawing, data entry, and interpretation form a continuous loop. Each inscription is both a record of ancient writing and a record of modern reading.

    With nearly five thousand glyphs from one major region already processed, APEX is beginning to reveal what paleographemics promises: the ability to study writing as a cultural system that can be seen, measured, and compared without losing its human texture.

    Download a PDF of the abridged report (13 pages): A Synchronic Analysis of 5th-Century BCE Lead Tablet Inscriptions from Styra on Euboea

  • Marginalia, 5: To the First-Semester Student

    I get it. I was you. You’re nervous about starting college, and fair enough: who knows if this is going to be the right place for you?

    Chances are, you’re not going to be the perfect student. You won’t always love the assignments, but you’ll come alive in the margins—tracing things back, asking your own questions. You’ll get some praise, sure—but also enough silence and uncertainty to make you doubt your footing. College is hard: your GPA is probably not what it was, but that can be one of the best things to happen to you. You hopefully learn other ways of measuring growth, and you’ll meet people who see what you’re made of even if the transcript doesn’t.

    In time, you’ll find your people. It takes a while, but when it clicks, it really clicks. Play your cards right, and you’ll get something that feels like home: gathering mentors who consistently go to bat for you, friends who see the long game, and a self that doesn’t fold.

    No one ever quite knows how they keep landing on their feet. But somehow, most do—through some mix of the grace of others and a resilience of their own.

    You won’t always see the shape of what you’re building while you’re in the middle of it. But stay with it. Something durable, and fitting, is finding its way toward you.

    With affection,

    T

  • APEX Updates, 9: Lunar Letters: Gradation, Gradation—and Then a Sudden Leap

    Back in February, I thought two months was enough time to date the transmission of the Greek alphabet.

    That was my starting point: a delusional hope. And indeed I began APEX with scant technical ability—poor grasp of concepts and a month of coding background—just a sense that there was something doable at the intersection of computational method and ancient script. That if I could just find a way to measure the shapes of letters, I might be able to tell a new kind of story about how the alphabet traveled, evolved, settled into forms.

    It turns out: no, I can’t date the transmission of the alphabet, not at 21, not in 10 weeks. But what I can do is more interesting than I could’ve imagined.

    The first time my bounding boxes returned in the correct order, it felt like a miracle. Then came the multiline inscriptions, then symmetry, then even raw drawings, not traced. Every breakthrough brought a little more light. Eventually I wasn’t just copying, being derivative—I was quantifying, contributing. My complexity metric sharpened. The overlay lines grew more reliable. And like a planet in formation, the project developed a center of gravity—and it began to cohere.

    And I encountered some surprises, as in the section “Dipylon: When the Data Doesn’t Flinch.” A particular early inscription came back less complex than many of the later ones I tested. That result challenged not just my expectations, but my entire premise. It forced me to reckon with how many qualitative judgments still underpin every “quantitative” metric I generate. What counts as complexity? What gets weighted, and why? Those choices are human. They’re mine.

    That’s the most important thing I’ve learned: you don’t escape interpretation by adding math. You just make the interpretation a little more legible. Hopefully.

    But still: I measured something. I made a system that can trace chaos and extract structure—not to flatten but honor. The alphabet, in its earliest known Greek forms, is no longer just a field of intuition or artistry or tradition. It’s data. It’s patterns. It’s beautiful.

    What does this moment feel like? It feels like I just landed a man on the moon, as in that grainy black-and-white footage: men in short-sleeved white shirts and skinny ties, erupting with joy when the impossible became real. That’s me right now—except I’m just an undergrad with an old laptop and the stubborn belief that I can make something for the field I love.

    After months of tracing, testing, and theorizing, APEX just did something real. AlphaBase is expanding. This is what it felt like to land the letters.