While I appreciate the compliment, I wish it would be more specific as to how it got that assessment. I can make a few guesses.
It seems obvious that this uses some kind of nearest-neighbour search. Take a corpus of authors, break their works into good-sized chunks, and then find the closest match for whatever the user gives you.
But what constitutes a match? We could use n-grams (words, and strings of words), as we do in many computational language tasks, but just matching the words in a book doesn’t mean you write like the author. Sure, Steinbeck and Faulkner wrote different words in their books just because of the topics they treated, but that’s not what we mean by writing style.
My guess is that writing style is more about patterns of words, especially function words like prepositions and conjunctions. (You may have noticed I start a lot of sentences with conjunctions like ‘but’ and ‘and’.) I’d try running all the words through a part-of-speech tagger, and see what matches that data best. Just a guess though.
I wonder if Orwell writes like Orwell. Here are three adjacent passages from Orwell’s Down and Out in Paris and London, with the computer’s assessment.
Or there was Henri, who worked in the sewers. He was a tall, melancholy man with curly hair, rather romantic-looking in his long, sewer-man’s boots. Henri’s peculiarity was that he did not speak, except for the purposes of work, literally for days together. Only a year before he had been a chauffeur in good employ and saving money. One day he fell in love, and when the girl refused him he lost his temper and kicked her. On being kicked the girl fell desperately in love with Henri, and for a fortnight they lived together and spent a thousand francs of Henri’s money. Then the girl was unfaithful; Henri planted a knife in her upper arm and was sent to prison for six months. As soon as she had been stabbed the girl fell more in love with Henri than ever, and the two made up their quarrel and agreed that when Henri came out of jail he should buy a taxi and they would marry and settle down. But a fortnight later the girl was unfaithful again, and when Henri came out she was with child, Henri did not stab her again. He drew out all his savings and went on a drinking-bout that ended in another month’s imprisonment; after that he went to work in the sewers. Nothing would induce Henri to talk. If you asked him why he worked in the sewers he never answered, but simply crossed his wrists to signify handcuffs, and jerked his head southward, towards the prison. Bad luck seemed to have turned him half-witted in a single day.
Or there was R., an Englishman, who lived six months of the year in Putney with his parents and six months in France. During his time in France he drank four litres of wine a day, and six litres on Saturdays; he had once travelled as far as the Azores, because the wine there is cheaper than anywhere in Europe. He was a gentle, domesticated creature, never rowdy or quarrelsome, and never sober. He would lie in bed till midday, and from then till midnight he was in his comer of the bistro, quietly and methodically soaking. While he soaked he talked, in a refined, womanish voice, about antique furniture. Except myself, R. was the only Englishman in the quarter.
There were plenty of other people who lived lives just as eccentric as these: Monsieur Jules, the Roumanian, who had a glass eye and would not admit it, Furex the Liniousin stonemason, Roucolle the miser — he died before my time, though — old Laurent the rag-merchant, who used to copy his signature from a slip of paper he carried in his pocket. It would be fun to write some of their biographies, if one had time. I am trying to describe the people in our quarter, not for the mere curiosity, but because they are all part of the story. Poverty is what I am writing about, and I had my first contact with poverty in this slum. The slum, with its dirt and its queer lives, was first an object-lesson in poverty, and then the background of my own experiences. It is for that reason that I try to give some idea of what life was like there.
No wonder Orwell had writer’s block: schizophrenia.
UPDATE: Thanks to Kuri for that link in comments. It seems the author used
vocabulary (use of words), number of words, commas, and semicolons in sentences, number of sentences with quotation marks and dashes (direct speech).
I’d say this could be smartened up considerably. Just including some simple features would help, like the ratio of singletons (words appearing once) to other words, appearance of conjunctions, or ranking all the words by frequency and comparing lists.
This kind of makes me want to try building a better system. I won’t (for lack of time), but I think I will keep in mind that if you can take interesting work in natural language processing and make a simple web implementation, people will think it is interesting. You can also have a lot of English major hotheads sniping at you because you snubbed Toni Morrison. Wouldn’t that be fun!