Good Reason

It's okay to be wrong. It's not okay to stay wrong.

Category: language (page 6 of 22)

Talk the Talk – Uptalk

This was a fun episode today. I didn’t think I was going to have enough to talk about on the subject of ‘uptalking’, but there’s quite a lot to say. Plus it was fun to actually ‘uptalk’ because I knew it would annoy our producer Peter Barr.

‘Uptalking’, for the uninitiated, is where you use question-style intonation even when you’re not asking a question. Everyone has an opinion on what it means — you’re unconfident, you’re seeking approval — but I think of this as idle speculation. And then I add some of my own — hey, my idle speculation is as good as anyone else’s.

Subscribe to us on iTunes, or get into Talk the Talk any way you like on our show page.

Google’s contextual spell checker is cool

It was a Great Moment in Tech Support. The caller asked me how he could remove a word from his WordPerfect dictionary.

It was an unusual request, but we got a lot of those. “What word do you want to remove?” I asked.

He stumbled. “Um… ‘pubic’?”

I knew immediately what had happened. ‘Pubic’ is a real word, of course, but he hadn’t meant to use it in his document, and there he’d gone and given a presentation on ‘pubic works’ and how ‘pubic libraries’ operate for the ‘pubic good’. These things can happen when you speak in pubic.

His dumb spell checker had failed him. Spell checkers have been around so long that we’re used to their limitations, and one of them was that it was insensitive to context. Well, no more. Google’s DocsBlog (via Lifehacker) has announced that it’s rolled its “do you mean” spell checker into GoogleDocs.

1. Suggestions are contextual. For example, the spell checker is now smart enough to know what you mean if you type “Icland is an icland.”
2. Contextual suggestions are made even if the misspelled word is in the dictionary. If you write “Let’s meat tomorrow morning for coffee” you’ll see a suggestion to change “meat” to “meet.”
3. Suggestions are constantly evolving. As Google crawls the web, we see new words, and if those new words become popular enough they’ll automatically be included in our spell checker—even pop culture terms, like Skrillex.

How do ordinary spell checkers work?

Spell checkers work by taking words that don’t appear in the dictionary (sometimes known as ‘out-of-vocabulary’ words, or OOV), and comparing the string to a list of known words in the dictionary. To figure out the most likely suggestion, they calculate an ‘edit distance’, or how many changes it would take to go from the malformed word to a known word.

So how do you calculate the edit distance? One easy measure is the Levenshtein distance. It’s pretty intuitive. Ask yourself: How many changes would it take to go from ‘pubic’ to ‘public’? Just one: add an ‘l’. So the edit distance is 1. But the computer calculates this using a grid. This is the cool part.

Start by putting the two words in a grid like so; one word down and the other across. Also, fill the second row and column with numbers. (This will make sense in a minute.)

p u b i c
0 1 2 3 4 5
p 1
u 2
b 3
l 4
i 5
c 6

Now, fill each of the inner boxes with one of three numbers, whichever is lowest:

  1. The number above plus 1
  2. The number to the left plus 1, or
  3. The number to the upper left, plus 1 if the two letters don’t match (that’s called the “cost”), or plus 0 if the two letters do match.

For our example, ‘p’ matches ‘p’, so the smallest number would be the 0 to the upper left. No cost.

p u b i c
0 1 2 3 4 5
p 1 0
u 2
b 3
l 4
i 5
c 6

On we go, down the column. None of the other letters are a ‘p’, so the lowest number for each box would be the one just above it, plus one. Notice how the numbers keep stacking up.

p u b i c
0 1 2 3 4 5
p 1 0
u 2 1
b 3 2
l 4 3
i 5 4
c 6 5

We start again at the next column. The ‘p’ and the ‘u’ aren’t a match, so we give it a 0 + 1 from the left, but the ‘u’ and the ‘u’ are a match, so that box gets a cost-free ‘0’ from the upper-left.

p u b i c
0 1 2 3 4 5
p 1 0 1
u 2 1 0
b 3 2
l 4 3
i 5 4
c 6 5

You can work out the rest of the table if you’re keen, but here it is in full.

p u b i c
0 1 2 3 4 5
p 1 0 1 2 3 4
u 2 1 0 1 2 3
b 3 2 1 0 1 2
l 4 3 2 1 1 2
i 5 4 3 2 1 2
c 6 5 4 3 2 1

Notice how everything’s going smoothly until the number in green, where the first real mismatch is. But the number to watch out for is that last one in the lower right, in red. When the whole table is filled out, that’s where your answer is. So the words ‘pubic’ and ‘public’ have a Levenshtein distance of 1, which matches our intuition about the number of changes we’d have to make to go from one to the other.

You can try this with any two words, either on paper, or using this handy website here. Having a play with it is a good way of getting a grip on this algorithm.

There are lots of ways we can tweak this spell-checker. We can adjust the cost so that near keys (and therefore more plausible typing mistakes) cost less than farther-away keys. We could adjust for frequency so that more common words float to the top of our suggestion list. But what we can’t do is look at nearby words to see what’s likely. That means the classic ‘form/from’ problem is beyond the reach of our spell checker.

But not the one from GoogleDocs. It will flag words, even if they’re real words. Behold:

Note how throwing in a related word (‘pelvis’) in that last example is enough to calm the spell checker down.

How does it do it? It looks like it works by calculating the probability of other words appearing nearby. Articles like ‘the’ and ‘a’ are likely to appear before ‘island’, less likely before ‘Iceland’. The whole thing could be modelled with n-grams (nearby words) using a sufficiently large language corpus, which Google certainly has. And that huge corpus ensures that lots of words will be in the dictionary, including low-frequency or brand new terms.

It’s good to know that people are still adding to a technology that’s so seemingly mundane.

Talk the Talk: Carillon Shemozzle

I was talking about the word ‘shemozzle’ and the word ‘carillon’ on today’s “Talk the Talk” podcast. I shall never look at Perth’s Carillon City shopping centre in quite the same way again.

I forgot to include a shoutout to Laverne and Shirley, which is the first place most of us ever heard the word ‘shemozzle’ (or more probably, ‘shlemazel’).

Also: my computer doesn’t seem to recognise ‘shemozzle’, which is too bad. And when I type ‘shlemazel’, it suggests ‘schlemiel’. These computers don’t know from Yiddish!

Listen here, or subscribe via iTunes.

Does Romney drop his G’s in the South?

I’m a bit of a G-dropper. I have a habit of dropping my participial g’s sometimes. If I say “doing” and “working”, it can come out as “doin'” and “workin'”. (Although really, there’s no /g/ there in the first place. It’s alveolarisation of the velar /ŋ/. But I’m going to call it G-dropping anyway.)

This is a pretty common pattern that shows up in many dialects of English, be they British, Australia, or USAian. For me, it seems to get more pronounced the closer I am to the USA.

Nowadays, G-dropping is tied to lower socioeconomic status (but it used to be a high-prestige feature), or to certain regions. Which is why it interested me to see this little story:

Mitt Romney wishes Mobile ‘good mornin”

Although he didn’t mention grits or his growing like of the word “y’all,” Romney’s awkward bid to connect to Southern voters was still evident. He wished the crowd a “fine Alabama good mornin’’” — dropping the letter “g” at the end of some words.

So is Romney doing some linguistic pandering with the locals? I thought I’d check by watching stump speeches — one in the North, one in the South — and compare the number of dropped g’s.

This meant watching videos of Romney on the stump, which is not entirely without risk.

When my boys have asked about Romney, I’ve said that although I don’t want him to be president of the USA, he’s not one of the crazy ones, and that there were loads of people in the race who were more stupid (Santorum, Perry, Bachmann, Cain) or evil (Gingrich, um… Cain) than Romney. But the weird thing about Romney is that he is capable of saying stupid, evil things while seeming perfectly sensible. Call it his gift.

So I’ve watched a bit of Romney doing the usual Republican schtick: bashing Europe, vowing to repeal health care, hammering away at unions, and claiming that the free market will fix everything. While watching these speeches, I was left with one over-arching impression: If you want to know what Romney’s stump speeches are like, just picture a giant penis in a suit, saying “I believe in freedom!” I’m sorry for that mental image, but tell me if you don’t find it accurate.

To the counts.

New Hampshire

ing in’
saying
founding
enduring value
bringing
going
campaigning
overwhelming
saying
taking
choosing
distributing
pursuing
talkin’

Alabama

ing in’
interesting
manufacturing
cutting
spending
coming
proposing
(ain’t that somethin’?)

Well, just from these two speeches, it seems like Romney doesn’t do a lot of G-dropping in either place. I have no doubt that he tried it out in Mobile, but it doesn’t seem to be a feature he uses often, no matter where.

I realise this is a small sample. I tried to watch more, but there’s only so much moral vacuity that one can stand.

Cyrillic meets Roman

It’s funny when people try to use Cyrillic letters as Roman ones. I understand why they do it — for English speakers, Russian has acquired connotations of militarism and toughness. And people have been tossing in the odd Cyrillic character for a long time.

(cue music)

But when you actually know how to read Cyrillic script, it’s a little jarring. Here’s a movie poster that sidled up next to me at a traffic light this morning.

More like ‘the dorkest hour’, amirite?

See, the Д that they have standing in for an ‘A’ is actually a /d/ sound, and the Я is a vowel that sounds like ‘ya’. Also, that Ц covering for the ‘U’ is the sound of /ts/ in ‘tsar’.

So really, the movie’s title should be pronounced ‘The Ddyakest Hotsr’. Or ‘Notsr’ if the ‘H’ has an /n/ sound, as Cyrillic Н does.

But let’s not be pedantic. We’re stuck with it now. We’ll be seeing posters, ads, and maybe even action figures in Toys ‘Ya’ Us.

Mice sing. Humans sing. Coincidence?

Singing makes you more attractive. (Singing well, anyway.) And you don’t even have to be a human. Even now, tiny mice are singing their ear-splitting ditties to impress potential mates.

Their in­i­tial stud­ies, the first to study song in wild mice, con­firmed that males emit songs when they en­coun­ter a fe­ma­les’ scent and that fe­males are at­tracted to the songs. The sci­en­tists al­so found that fe­males can tell apart their broth­ers from un­re­lat­ed males by their songs – even though they had pre­vi­ously nev­er heard their broth­ers sing.

We already know that birds use song to impress mates, and now mice. What about people?

There are two main hypotheses about how language began in humans. The one that gets the most play is the gestural (or mirror) hypothesis, as articulated by Michael Arbib, which goes something like this:

  • We have neurones in our brains that fire when we perform an action.
  • We also have ‘mirror neurons’ that fire when we see someone else performing the same action.
  • This allows us to recognise when someone is doing something.
  • From here, we can imitate others, and start to communicate using gestures, including pantomime.
  • This allows us to represent things that aren’t in the immediate vicinity, which is a precursor to language.

But it’s not clear from this how we make the move from gesture to speech.

The other main hypothesis is that human language started from music. This was Darwin’s favoured hypothesis, and it’s found a new advocate in W. Tecumseh Fitch (who I interviewed for an episode of ‘Talk the Talk‘).

For this one,

  • People were able to vocalise (or sing), and if their singing was sumptuous enough, they got the mates.
  • At the same time, we can recognise people’s voices, and distinguish them from the voices of other people.
  • We can even do imitations of other people, which allows us to represent them when they’re not around.

This could have been the beginning of representing things that aren’t around, which, again, is necessary for language. And it explains the use of the vocal channel.

So, mice. They sing. They use their songs to attract mates. They can tell each other apart by voice. All very languagy. It’s not just birds.

Even though both gesture and music were probably big factors in human language at the same time, I think this tips things toward the music hypothesis.

The atheist temple

The big news in atheism this week: Alain de Botton wants to build an atheist temple. Which seems strange — atheism isn’t a religion, so why would it need to borrow religion’s trappings? I think de Botton tipped his hand, though, in this pronouncement:

The philosopher and writer Alain de Botton is proposing to build a 46-metre tower to celebrate a ”new atheism” as an antidote to what he describes as Richard Dawkins’s ”aggressive” and ”destructive” approach to non-belief.

Rather than attack religion, Mr de Botton said he wants to borrow the idea of awe-inspiring buildings that give people a better sense of perspective on life.

”Normally a temple is to Jesus, Mary or Buddha but you can build a temple to anything that’s positive and good,” he said. ”That could mean a temple to love, friendship, calm or perspective … Because of Richard Dawkins and Christopher Hitchens, atheism has become known as a destructive force.

Destructive force? For me, Dawkins and Hitchens are two guys who have come to epitomise well-tempered reason, intelligence, and courage in the face of mortality, so de Botton’s criticism doesn’t ring true for me. I’d like to suggest a little test which I’ll call the S.E. Cupp test: When someone says they’re an atheist, do they spend more time promoting atheism, or castigating other atheists because of their tone? If the latter, then what’s the difference between them and a theist?

Dawkins has called the project a waste of funds, PZ says it’s a monument to hubris.

Me? I say it’s redundant. We already have a temple. I was there earlier this month. Or, at least, at one of them.

The atheist temple I went to was the Temple of Knowledge, and it’s better known as the New York Public Library.

It gots lions.

Why would I call it an atheist temple? Because it’s filled with the work of people. People; not gods. People (and you can see them there every day) engaged in the process of gathering knowledge and combining it to make new knowledge. This is the goal of science, which is an atheistic form of reasoning.

I walked along its halls of solid marble, where generations of humans have come to read and learn.

No gothic arches, these. How could you help but be in awe of not just the building, but the building’s purpose?

Like a temple, the magnificent Reading Room prompts a hush. 

And the people who built this place — yeah, they were tycoons who made their money from the skins of small furry animals. But they wanted to build a place where the knowledge of the world could be preserved, and they cared enough to make it amazing. And they inscribed this on the walls, in letters big enough for anyone to read:

“On the diffusion of education
among the people
rest the preservation
and perpetuation
of our free institutions.”

I read that, and I think, you know, they got it. They really got it! Even back then. Our society depends on education. Our freedom depends on it. You can’t preserve freedom in a population of ignoramuses; they’ll just tear it down again the instant they feel afraid. It’s such an alien concept in this age, when one political party has dedicated itself to the destruction of the Department of Education, and (through homeschooling) constantly works to undermine the public school system so that children will be protected from education. It seems like a quaint and noble sentiment, but we need to relearn this thinking that came from better minds than ours. Just as we need another quaint and antiquated notion symbolised by libraries: the public good.

But that’s not all I saw. There were treasures.

Holy shit! It’s a Gutenburg Fucking Bible! One of only 40 perfect ones left. Yes, it’s a bible because for some reason, people thought the Bible was important back then. But what this book did was make reading and publishing commonplace. That’s much more important than the book’s rather poor contents.

And check this out: it’s Christopher Robin’s toys! That’s not just Winnie a Pooh — it’s Winnie THE Pooh. And the others! It was great to see them there, even though it made me think of Toy Story 2. I look at Tigger and realise that Ernest Shepard really nailed it.

These are clay tokens with cuneiform on them, some of the earliest writing that people ever used. That made it possible for people to transmit knowledge over generations.

And while I was in this Library, I felt so connected to people in other ages and to the future. It was a feeling that I can only describe as spiritual, even though I don’t like that word. But it was the same feeling that I felt in the old religion but more intense and meaningful.

You can keep your paltry theist cathedrals. Do not copy Mormon temples — they are monuments to superstition and foolishness. Let St Patrick’s fall. Instead, build a library, Mr de Botton, or an observatory, or a university, or a museum. They’re the only temples that atheists have any business building.

Actually, St. Patrick’s will make a very nice reading room in about 100 years.

Pronounce that sign

I really like the bilingual signs in Canada. It’s good for English speakers to be reminded that English isn’t the only language in the entire world. (Remember: Republicans made fun of John Kerry because he spoke French. What kind of president would he be?!)

But while driving through British Columbia, I saw a bilingual sign, and French wasn’t the other language. Here’s the sign, snorfed from Wikipedia.

So what’s the language? Why the accents and lines? And what is a ‘7’ doing in the middle of a word?

The Wikipedia page for the Squamish language answered most of my questions. The language is known as ‘Sḵwx̱wú7mesh’ (or the more Anglicised ‘Squamish’). It was first documented by no less than the legendary anthropologist Franz Boas. Sadly, it appears that only about 15 native speakers remain. I don’t know if those 15 speakers do a lot of driving, but I’m glad the signs are up anyway.

So, to the characters:

The ‘7’ is a glottal stop. That’s the sound that Cockney speakers use in the middle of ‘bottle’ or ‘mental’. I use it in the middle of ‘uh-oh’ or (a little strangely) ‘hot water’. A real glottal stop looks like this: ʔ. I don’t see anything on my keyboard that looks more like a glottal stop than the 7 does, except the question mark, which would be even more confusing, so I guess 7 was a good choice.

The ‘k’ and the ‘x’ with lines under them are just like a regular ‘k’ or ‘x’ (the latter of which which we don’t have in English — think ‘ch’ as in Scottish ‘loch’), but they’re farther back in the throat. You have to take it all the way back to your uvula, also known as ‘the hangy down thing in your throat’. Just make a ‘k’ sound as if you’re choking. (Why do they make sounds in such strange places? Oh, everyone does in one way or another. We have a ‘th’ sound in English, which other people think is weird.)

What about the ‘k’ with an apostrophe? That’s the exciting one for me. It’s an ejective. Usually we make the ‘k’ sound with a puff of air, but the ejective ‘k’ is different. To make an ejective ‘k’, just hold your breath, and without letting it out, make a ‘k’ sound as best you can. That’s ejective ‘k’.

Finally, if a vowel has an apostrophe after it, that just means that vowel takes the stress.

There you have it — Squamish phonology. Or more appropriately, Sḵwx̱wú7mesh phonology.

Prescriptivism with attitude

A graphic from Facebook. Honestly, some people get so touchy about correct usage.

Better do what they say, though. Looks like the writer of this has been driven to the brink by one too many “your”s. One more dropped apostrophe, and they might snap.

Milk: Lost in translation

If you use Google Translate to translate “Got milk?” into Spanish, and burrow into the ‘alternate translations’ it offers you, one of the choices is “bigote de leche”, which means “milk moustache”. I’m leaving it as an exercise for the reader to figure out how it arrived at that translation.

Of course, this is better than the tagline they went with in Spanish speaking countries: “¿Tiene leche?” which sounds plausible enough to a non-native speaker, but which carries maternal associations, something along the lines of “Are you lactating?

Older posts Newer posts

© 2024 Good Reason

Theme by Anders NorenUp ↑