In the Language Technology community, we don’t talk enough about the ethical consequences of our work. We’re busy building speech recognisers and text classifiers — and they could be used to listen to your phone and flag an agent if you mention the wrong key words. I hope we don’t all someday have an Oppenheimer moment.
Data mining has been in the news, especially with regard to Bush’s illegal wiretapping operation. Data mining has acquired a vaguely sinister whiff about it, especially in the hands of the CIA.
But have a look at this interview with CIA ‘Open Source Center’ director Douglas J. Naquin:
“A lot of blogs now have become very big on the Internet, and we’re getting a lot of rich information on blogs that are telling us a lot about social perspectives and everything from what the general feeling is to … people putting information on there that doesn’t exist anywhere else,” Mr. Naquin told The Washington Times.
Eliot A. Jardines, assistant deputy director of national intelligence for open source, said the amount of unclassified intelligence reaching Mr. Bush and senior policy-makers has increased as a result of the center’s creation in November.
Okay, so they’re pulling together a lot of opinions and stuff that’s already out there. That doesn’t sound so sinister. In fact, it sounds like the kind of thing I’d like to have.
In fact, I’d love a data mining news program similar to Google News, but with a few changes. It’d watch the articles I clicked on, and track those topics. It would pull up similar topics as they arrived. Of course, I wouldn’t only want items similar to what I’ve already seen, so then it would throw in unrelated stories every once in a while, just to see if I was interested in them. Over time, it’d learn my preferences, and deliver related news and blog pages. Things like this are already in the works, and the tools are already available: just combine data mining, topic detection, and user modeling.
Here’s the problem: the sinister-ness of the technology goes up exponentially with the opacity of the government.
“We’re certainly scoring a number of wins with our ultimate customer,” said Mr. Jardines, who became the first high-level official in charge of the government’s nonsecret intelligence in December.
“I can’t get into detail of what, but I’ll just say the amount of open source reporting that goes into the president’s daily brief has gone up rather significantly,” Mr. Jardines said. “There has been a real interest at the highest levels of our government, and we’ve been able to consistently deliver products that are on par with the rest of the intelligence community.”
Mr. Naquin said recent OSC successes have included the discovery of a technology advance in a foreign country. Also, most data on avian flu outbreaks come from open sources, he said.
That was weird. Could they have been any less specific? “We can’t say too much, but we have information that something happened. Perhaps in a foreign county, perhaps not. Also someone said something to someone else, so we’re looking into that.” I guess it must be hard getting respect when you’re in non-secret intelligence.
Is this information good for anything? Yes, one thing. Assembling an enemies list.
This stuff has to be pursued as openly as possible, and right now I don’t trust the US government to do the right thing with the technology. Or anything else.
Recent Comments