I make real languages, and you do too 2016-06-01

Recently, And Rosta — and much earlier, Jeffrey Hennning — described conlangs as, at most, "model" languages. (Henning's "Model Languages" newsletter is down. If you have a reliable link to a specific essay, please let me know.)

To me, "model" here can mean two things:

  1. a model of some referent, trying to approximate it, and/or
  2. a not entirely functional miniature.
  3. `

The first is, I think, quickly dismissed. With very few exceptions, conlangs do not explicitly attempt to approximate some reference. The closest I can think of are Basic English (which tries to whittle down English to a minimum-viable subset), and a posteriori conlangs (which have some quasi-diachronic relationship with one or more natlangs).

The real question is whether conlangs are fully functional, and whether lacunae in the descriptions of conlangs make them somehow worse than natlangs.


Let's suppose that you're an ordinary person learning English as a second language.

Let's call all the learning materials normally accessible to you — curricula, dictionaries, grammar dictionaries (of the ordinary sort), K-12 English education, etc — the "documentation" of English.


First, let's admit that any description of any language which has probe-able "speaker intuition" is incomplete.

In the millennias of time spent documenting English, we have still never produced a complete description of even a single idiolect at a single point in time. Read all the monographs you want, do all the research you want, and you will never learn enough to fully model even a single speaker's linguistic behavior.

Since the analogy holds anyway, let's stick with the more reasonable sense of "documentation".


Let's call the gap between the documentation and speaker behavior "undefined behavior".

Let's call the actual complete linguistic behavior of any single person an "implementation" of that language.

Implementation variance is always present for any language with multiple speakers — be they jointly-raised twins (with minor idiolectal variances); native speakers with different dialects; second language learners with different native languages, whose respective native languages fill in the undefined behavior (to the extent they have not formed some other consensus), etc.

The variance will be in the aforementioned undefined behavior, to the extent that the speaker doesn't just reject the supposed documentation as incorrect.

So there is formal agreement on some parts of the language (the documentation); informal agreement on other parts (implementation similarity), and disagreement on other parts (differently implemented undefined behavior).


This is of course an analogy to computer languages.

(Yes, I agree that computer languages are not the same as human languages, because they have limited domain. That they tend to be simpler is irrelevant; give them time and they build up all sorts of crazy weirdness. Just look at JavaScript, Perl, or PHP across different implementations.)

Computer languages also have formal documentation, saying how the thing should work. With few exceptions (e.g. possibly Haskell and its ilk), they also have areas of undefined behavior.

Different implementations (compilers / interpreters) fill in those gaps in different ways. Some have cross-implementation consensus, e.g. internet RFCs that get adopted; some are intentionally extended in some special snowflake way, e.g. anything made by Microsoft; and some just don't think about it and do something that may or may not be reliable (and is probably going to wind up in a vulnerability report eventually).

Just like human speakers of languages — nat or con — implementations will often give decisions on whether or not some given utterance is "grammatical", i.e. whether they think it has a meaningful parsing. Some interpreters/speakers are more lenient than others; some (correctly, by their implementation/idiolect) will or won't have a meaningful way to interpret the utterance; some just won't know.

(Yes, I am rejecting the notion that grammaticality in languages is a boolean, or indeed that it is fully consistent even within one speaker, let alone across more than one.)


The exact nature of these undefined behaviors (in a given situation, for a given implementation) is of course interesting — to linguists, for human languages, and to hackers, for computer languages. It can tell you a lot about how the system really works, not just the documented version. It also provides both with gainful employment. ;-)


By And's expressed view, a conlang is "model" for a few reasons.

First, conlangs have undefined behavior, when judged by their documentation. But this true of any natlang as well — the difference is only as a matter of degree, not of quality.

(If you're a Searlian, insert the standard rant here about poverty of input, and the argument extends. Even if you had complete corpus knowledge, which you don't and can't, it would not be adequate to fully explain linguistic behavior.

If you're a Lakoffian, insert the standard rant here about embodied cognition, and the argument extends the other way. It is impossible in principle to document language completely, because substantial parts of it are based on [non-uniform] human interaction with the physical world, rather than a separate language faculty per se.)


Second, conlangs' undefined behavior is often implemented by a given interpreter in ways that borrow heavily from the defined behavior or consensus undefined behavior or that interpreter's more familiar (e.g. native) languages. But this too is true of any natlang as well; second language learners will generally apply what they already use to what they're learning, unless overridden by clear documentation or adopted consensus.

Indeed, some conlangers — notably Boudewijn Rempt, in his Apologia pro imaginatione — would say that this "sub-creation" is a thing to be honored.

True, natlangs have been more thoroughly documented — and have more thorough inter-speaker consensus. Linguists and broadcasters alike remain gainfully employed. This is not a universal, though; plenty of languages (e.g. sign languages) have had far less investigation, or less inter-speaker consensus, or borrow heavily from contact with some more dominant language. Nevertheless, they are not dependent on the grace of a field linguist's thesis, nor on complete differentiation from other languages, for their legitimacy.

Even a "dead" language, like spoken Latin, has these properties. Spoken Latin has changed over the last centuries despite no native speaker population at all (with the possible exception of a handful of dual-language clergy). Native speakers are not a requirement for Latin, in its current form, being significantly different from the last spoken native Latin (at least as much as inter-English dialectical variance) and yet still a full language.

So it is again only a distinction of degree, not of kind, to say that conlangs are more "model" than natlangs because their speakers import undefined behavior implementations from other languages.


And describes UNLWS as one of the least model languages he knows — essentially, because it deviates so drastically from extant languages that it gives the appearance of having less undefined behavior.

This appearance is false. UNLWS has plenty of undefined behavior, both because Alex & I suck at documenting our language, and because there is (very intentionally) a great deal of it that we punt to resolution via Grice… which is not reliable inter-speaker.

Both jointly and severally, we have intuitions about what has the "UNLWS nature", even for completely novel utterances, morphosyntax, lexemes, velc. Sometimes we agree, sometimes not; sometimes we have very strong reactions (on par with any natlang speaker's reaction like "this is obviously ungrammatical, and an abomination to boot"), sometimes weak ones (like "this is weird but maybe-I-guess grammatical").

The same, I would posit, is true of any conlanger. We all have some intuitive sense about the languages we make and speak.

I don't think there's much if any difference between a conlanger's intuition for aesthetic and a natlanger's intuition for "deep grammar". Neither is fully documented, nor can they be. Both have impact on linguistic behavior, including judgments of "grammaticality". Both are a source of the deeply complex "implementation" aspect of language.


In short, if Mandarin as learned by an native English speaker is a full language, so is Toki Pona.

Either all languages are "model", or none are. It matters not whether the language is natural or constructed, nor how much development has gone into it, so long as there is anyone who can use it with any degree of intuition about its unexplored areas.

Since the distinction is vacuous, let's just ditch the pejorative.

P.S. For non-programmers in the audience, take a look at e.g. (in order of difficulty): Wikipedia: undefined behavior & implementation-defined behavior; John Regehr, A Guide to Undefined Behavior in C and C++ et seq & Undefined Behavior Consequences Contest Winners; Underhanded C Contest; Wang et al., Undefined Behavior: What Happened to My Code?

P.P.S. Thanks and blames to Alex for the conversation that prompted this post, outlining it, and haranguing me to post it. :-p

Originally posted to CONLANG-L, where you may find responses.