Tag: linguistics

  • Linguistics for All, 2: Rare Features of Select Endangered Languages

    Linguistics for All, 2: Rare Features of Select Endangered Languages

    Though this is technically Linguistics for All, 2, this post is about NYU League of Linguistics’ first discussion group of the semester—I’m posting it now because a few people asked for the recap, and I’m more than happy to oblige.

    The conversation focused on typological features found in endangered languages—many of them rare in the languages of the world and very unexpected (to English speakers, that is). We took a fast but focused world tour: Austronesian syntax, Mayan phonology, Bantu morphology, and more. The goal wasn’t comprehensiveness, but curiosity. What kinds of things can human languages do? And what’s at stake when we lose examples of those things?

    Some of the questions that came up:

    • How do syntactic constraints shift when the verb comes first? When the object comes before the subject?
    • Why might a language have a vast and highly irregular consonant inventory? Why might sound changes that are quite unique cross-linguistically emerge?
    • What’s it like to speak a language where every noun has to fit into one of twenty classes, each with its own agreement pattern?

    The point was to slow down and marvel at the extent of linguistic diversity, and just what we’d lose if those languages went extinct. These features are beautiful, in my opinion, but they’re also systemically instructive. They tell us what’s possible in the “design space” of language, and how they resist the tidy models that formalists sometimes prefer.

    For those who couldn’t attend, the slides are linked here along with a short primer on the pre-event readings & videos, plus a folder of journal articles and book chapters in a shared Google Drive. Unfortunately, we weren’t able to record the session, but I’m hoping to incorporate some of this material into future blog entries or curriculum tools.

    And if this is your first time hearing about the League, drop me a line at tfavdw@nyu.edu—we meet semi-regularly and welcome anyone curious about language in any form. NYU affiliates and non-NYU people can both attend.

    Stay tuned for our next session: a hands-on cryptography and forensic linguistic game using real linguistic data, running during midterms as a low-stakes puzzle night (with some surprise mechanics). It’ll be at 10 Washington Place, NY, NY, at 6:30pm on April 1st. More details are available at nyulol.org, and a recording of the presentation portion, delivered by a leading forensic linguist, will be posted shortly after the event.

  • Tools of the Trade, 1: Epigraphy: The Local Scripts of Archaic Greece by L.H. Jeffery

    Jeffery’s summary table of all epichoric scripts at the end of LSAG.
    It is foundational for any work on early regional Greek scripts.

    There are very few books I consider truly irreplaceable in my research. Lilian H. Jeffery’s The Local Scripts of Archaic Greece is one of them. First published in 1961 and revised in 1990 with A.W. Johnston, this book remains the reference for regional variations in the Greek alphabet during the archaic period. It’s where I first learned to read epichoric inscriptions with the eye of a paleographer rather than a Classicist alone.

    The book is very hard to find, and I only got my copy at an even remotely affordable price after months of scouring secondhand sellers. While copies still circulate among libraries and the used book market, I wanted to make it more accessible to others working in this area. So I hunted diligently before finding it on the Internet Archive. You can read or download it here:
    The Local Scripts of Archaic Greece (1990 ed.) – Internet Archive

    Jeffery’s study remains foundational for any work on early Greek writing—not just in Athens or Ionia, but across the full spectrum of regional scripts: Corinthian, Euboian, Attic-Boeotian, Cretan, Cycladic, and others. It includes extensive commentary, maps, and an invaluable inscriptional catalogue organized by region, with drawings and typographic transcriptions. The 1990 revision added important corrections, expanded references, and additional illustrative material. For those of us studying alphabetic transmission, especially the Phoenician-Greek interface or the evolution of letterforms over time, this book is indispensable.

    What makes Local Scripts especially useful is that it bridges the gap between paleography, archaeology, and linguistics. Jeffery doesn’t just chart when and where a particular variant of alpha or epsilon shows up—she explains what those variations might imply for chronology, influence, and contact. And although her typology has been revised and challenged in places (especially with the discovery of new inscriptions), her system remains a critical baseline for almost every study that’s come after.

    Whether you’re interested in early Greek literacy, the transmission of the alphabet, the sociopolitical meaning of epigraphy, or just want to be able to tell the difference between Laconian and Euboian chi, this is the book to start with. I hope having it freely available will be helpful to others navigating this fragmentary and fascinating material.

    Do you have other resources you pair with Jeffery? I’d love to hear what we can supplement LSAG with.

  • APEX Updates, 1: Building a Dataset

    Every big project starts with a deceptively small question. For me, it was: how do you turn a carved letter into data?

    APEX (Alphabetic Paleography Explorer) is my attempt to map how the Greek alphabet developed and spread—first across Greek-speaking regions, then into other scripts entirely. But before I can compare, model, or visualize anything, I need something more fundamental: a dataset that doesn’t just record letters, but understands them. That’s where things get tricky.

    Step 0: Drawing the Inscriptions

    Most corpora don’t offer clean, high-res images. They give us facsimiles—drawn reconstructions, often made by epigraphers decades ago. I tried using automated skeletonization on those, but the results were messy and inconsistent. So I went manual: scanning documents and tracing letters by hand on my iPad.

    It’s slow. But it gives me clean, consistent vector forms that reflect how letters were actually drawn—and forces me to look closely at every curve, stroke, and variation. In a sense, this is my own kind of excavation.

    What I Track

    Each inscription gets logged with basic info: where it was found, what it was written on, when it was made (as best we can tell), and how damaged it is. But the real heart of the project is the letters.

    For each character, I record:

    • Visual traits (curvature, symmetry, stroke count, proportions)
    • Layout (spacing, alignment, writing direction)
    • Function (sound value, graphemic identity)
    • Notes on ambiguity or damage

    From this, I can start comparing how different regions handled the same letter—Did their rho have a loop? Was their epsilon closed?—and whether that tells us something about cultural contact or local invention.

    The Workflow

    The data entry pipeline looks like this:

    1. Scan + trace the letterform
    2. Enter the inscription’s metadata
    3. Manually mark letter positions and reading direction
    4. Extract geometric features automatically
    5. Save everything as structured, nestable JSON

    It’s part computer vision, part field notes, and part quiet staring at a very old alpha until you start to feel like it’s looking back.

    Why This Level of Detail?

    Because I want to ask big questions—how alphabets travel, which paths are innovations vs. imitations—but I don’t want to ask them fuzzily. Too much work on writing systems either leans purely qualitative or strips out the messiness for the sake of clean data. APEX is an attempt to hold both: interpretive richness and formal structure.

    This dataset—AlphaBase, soon to be expanded to other open-access museum collections and public domain corpora—is the scaffolding. It’s how I’ll test transmission models later on. But even on its own, it’s already revealing things—like which letterforms stay stable across centuries, and which are quick to splinter under pressure.

    APEX begins here: not with theory, but with tracing. With building a system that doesn’t just store letterforms, but actually listens to what they’re doing. That’s what this first trench is for. Now I get to start digging.

  • Marginalia, 1: On The Texture of Dead Languages

    Marginalia, 1: On The Texture of Dead Languages

    I’ve long wondered what it was about ancient languages—as opposed to modern ones—that so captivated me. For more than half my life now (21 years long), they’ve been at the center of my intellectual and emotional world. I’ve done much internal archaeology on this, and here’s where I’ve landed.

    What first drew me to ancient languages wasn’t beauty, or history, or even mystery—this much I knew. But I’ve figured it out, after much reflection: it was structure. At age ten, I was told that Latin had a “very mathematical” nature—and I had a very mathematical mind. That was the pitch that won me over when I was choosing between French, Mandarin, Spanish, and Latin in fourth grade. My friends—older kids who knew me from our accelerated math class—urged me to choose Latin. “It works the way you do,” they said.

    From the beginning, I had a knack for it. Parsing Latin felt like solving elegant equations: all those declension and conjugation charts, the case endings, the tightly constructed sentences. I found the clarity of it deeply satisfying. It’s also what got me into etymology, many a linguist’s bridge into the discipline. I was thrilled to learn that words, and thereby language itself, had discrete histories we could uncover and unlock.

    And, to be honest, I was also avoiding something. I’ve had a lifelong fear of being wrong—especially out loud. The thought of sounding like a toddler in French or Mandarin mortified me, even at that age. Ancient languages, by contrast, required no vocal performance—or at least none you could be substantially corrected on.2 As Mary Beard wrote, it’s a tremendous freedom to read a language without needing to order a pizza in it.

    But beyond the safety of silence and the comfort of structure, ancient languages offered me something stranger and deeper. They are, paradoxically, both rigid and wild—formally inflected, syntactically unruly. Their rich systems of agreement allow a kind of grammatical anarchy. That contradiction fascinated me. And then, there was the sheer alterity: the profound otherness I was only beginning to grasp. These languages came from far away—across centuries and empires—and they had nothing to do with me.

    What I didn’t expect was how intimate they would feel. There’s something magical about reaching across time and space to hear men (alas, mostly men) from millennia ago speak. I feel, in some small way, like I’m raising the dead (see blog title!), giving voice to what was nearly lost. There is mystery in this, in the impossibility of perfect translation, in the silence that always remains. But there is also joy. Sitting at a wooden table, poring over ancient texts with comrades-for-a-semester, I’ve never felt isolated. If anything, I’ve felt surrounded—by the dead, yes, but also by other living readers, deep in the muck of it all.

    Inscriptions are my great love: language not filtered through scribes or stylists, but carved directly, once, and then cast into the abyss memoriarum. To read an inscription is to hear a voice that was not supposed to last this long. It reminds me that people have always been this way: strange, familiar, brutal, kind, just like us.3 That realization has made me a softer person, I think. More attuned. You can’t spend your days in conversation with the past—and with the people who help you interpret it—without becoming more human.

    There’s also a tension I feel—quiet but insistent—between my deep love of ancient languages and my commitments to the present. Studying the dead can sometimes feel like retreat: a kind of sequestration in the library, the archive, the ivory tower. And yet, this is also the work that sharpens my ethics. It’s by looking at the long arc of language, power, and survival that I’ve come to understand how political language always is—what gets written down, whose names are preserved, whose voices fade. So while it might look like I’m hiding in the past, I don’t think I am. I’m studying the structures that built the world we live in now—and learning how fragile, and how remakeable, those structures are.

    It’s changed the questions I ask, too. I no longer want to know only what words mean, or what they do. I want to know how they came to be. Who made them. Why they changed. What pressures they both buckled under and resisted. That’s the kind of inquiry ancient languages have trained me for.

    Lately, though, something has changed. For the first time, I’m stepping away from silence. I’m learning a living language—French—and it’s bringing all kinds of old fears and new questions to the surface. But that’s a story for next time, once I’m deeper in it.

    1. To the point I got internet-famous for being a budding particle physicist, which got me invited to labs and observatories around the world, including NYU’s, Columbia’s, and CERN, home of the Large Hadron Collider. ↩︎
    2. After a while, you remember that a Latin “v” is pronounced [w], silly as it sounds aloud. [wɛni widi wiki] just doesn’t hit like [vɛni vidi vit͡ʃi]. ↩︎
    3. Read some of the archaic Theran graffiti (pp. 22-25 of this paper) if you want to see that teenage boys have, in fact, always been teenage boys. ↩︎

  • Linguistics for All, 1: A World Tour of Endangerment and Hope

    Map of endangered language prevalence worldwide.

    Over the past two months, I had the chance to lead two discussion groups for the NYU League of Linguistics. The first concerned endangered language typology, highlighting interesting features from languages across the world: Austronesian verb-initial sentences, unique Mayan phonologies, and rich Bantu noun classes & declensions.

    The second focused on revitalization efforts (video here) around the world, focusing on Hawaiian, Welsh, and Hebrew. We explored how languages can be revived when intergenerational transmission is fading, and what gets negotiated along the way. Key questions were: What counts as success in revitalization? What has become of our own ancestral languages, and why? What trade-offs—like the loss of minority dialects—do we accept? And crucially: where do we, as linguists, fit in?

    Together, these sessions became an profound study in contrast: one examining how endangered languages structure and make sense of the world, the other how we might help them endure in it.


    As someone who works primarily with dead languages, this topic holds a particular fascination for me. I live among the dead—scripts etched into stone, grammars fossilized in treatises, phonologies inferred but never heard. Endangered languages, by contrast, are not yet dead. Revitalization efforts aim to preserve what I usually only encounter after the fact: living, breathing language. That shift in focus—from excavation to preservation—has reshaped how I think about what linguistic work can be.

    It also deepens our understanding of language change. Revitalization doesn’t freeze a language in time—it lets us see what happens next. Take Hawaiian, whose famously tiny consonant inventory (only eight!) allows a wide range of free variation. Will it continue to shrink? Will contrasts harden? Watching a language evolve under new sociopolitical pressures offers historical linguists like me something rare: the chance to witness change not as reconstruction, but as unfolding reality.


    One thing that became clear between the two sessions was the importance of community and agency. Typological structures are fascinating—but they exist amid complicated social structures. We came away with a deeper appreciation for linguistic diversity as a lived reality. Our role, ultimately, is not to fix or play savior—but to listen, support, and amplify the work already being done.