Language Log has done a much more thorough beatdown of this story than I could, but it’s still worth mentioning.
English contains more words than any other other language on the planet and will add its millionth word early Wednesday, according to the Global Language Monitor, a Web site that uses a math formula to estimate how often words are created.
It is a silly claim, especially when you realise that English already has an infinite number of words right this instant. You can paint a fence, and then you can re-paint the fence, and then if you have any paint left, you can re-re-paint the fence. In fact, you can keep adding re- as many times as you like, on to infinity, and each one of those will be a separate word.
Or you can have a great-grandmother, and a great-great-grandmother, and a great-great-great-grandmother, on and on to infinity.
That’s one of the things about English morphology: it allows some prefixes to be used recursively. Recursion is why English (or any other language, pace Everett) can have an infinite number of sentences. You can walk and walk, or you can walk and and walk and walk, making the sentence longer and longer and longer, on to infinity.
Quibble about hyphens, if you like. I could argue that a hyphen, not being whitespace, does not constitute a word boundary, and thus words containing them are kosher. If you wanted to push it, you could even consider multi-word expressions as words themselves. After all, ‘ice cream’ contains a space, but it represents just one thing. It can be found variously with a space, a hyphen, or all smashed together. Your definition of ‘word’ will influence your count.
13 June 2009 at 6:14 am
One of the things that was a bit ambiguous when I did LING1101 was what the criteria are for accepting a production as part of the language. I think there was an informal definition (like so many of the terrible informal definitions we got) that it could be accepted if "it is something a native speaker could say".
If a word has the potential to be meaningfully decoded (ala some finite large number of 're-' prefixes), but there is no evidence of such a production ever having being uttered, is it part of the language? How about an infinite prefixing of 're-'? It is impossible for such a production to be uttered. The same goes for large sentence productions; IMO you can prune a lot of them out of the grammar because there are finite limits on what kind of convoluted sentences a native speaker will actually accept/comprehend, even if they seem like a logical extension of existing patterns.
13 June 2009 at 1:59 pm
That is a little tangly. There are really two concepts here: grammaticality, and acceptability.
A sentence is grammatical if it can generated by some grammar (some formal representation of the language), which implies that the grammar has been built already.
A sentence is acceptable, however, if a native speaker thinks it's acceptable.
There's some overlap between the two — we tend to think of acceptable utterances as being grammatical — but it's not 1:1. Sometimes native speakers have trouble parsing sentences that are actually okay according to even a fairly rudimentary grammar of English, like "People people left left." (See also the infamous "Buffalo buffalo" sentence.) So, the very long sentences you're describing might fail on acceptability, but they might pass on grammaticality, provided that someone represented the language formally in the form of a grammar that could generate those sentences.
(See here and here for a little more.)
If a word has the potential to be meaningfully decoded (ala some finite large number of 're-' prefixes), but there is no evidence of such a production ever having being uttered, is it part of the language?
Yes, if by 'part of the language' you mean 'potentially acceptable'.
How about an infinite prefixing of 're-'? It is impossible for such a production to be uttered.
Because you'd never get to the 'paint' part, and the word would go eternally unfinished! Quite right.
But while I can't make an infinite string of 're-', I can make an infinite number of finite strings of 're-'. Given enough time, of course.
The same goes for large sentence productions; IMO you can prune a lot of them out of the grammar because there are finite limits on what kind of convoluted sentences a native speaker will actually accept/comprehend, even if they seem like a logical extension of existing patterns.
Gram v. acc, again.
14 June 2009 at 5:41 am
The 'grammatical' criteria doesn't seem too useful for deciding what is part of 'English'. If a linguist does a lousy job of analysing English (ultimately based on acceptability) and then creates an English grammar that produces a junk word, he can't claim it's an English word just because his defective grammar says so.
Thanks for the terminology and references though. Wish I could fork my life to become a linguist. :/