The bot dutifully recorded the U.S. president’s vulgar word choice two weeks ago
On 11 January, U.S. President Donald Trump reportedly used a choice word during an Oval Office meeting about immigration. Just like that, a term that’s usually not part of respectable public vernacular had been splashed on the front pages of newspapers and media websites.
That evening, a twitter bot called New New York Times, running under the handle @NYT_first_said, tweeted the word: “shithole.” The bot, which has been running since March 2017, scans The New York Times for words that the esteemed newspaper uses for the first time. That tweet went viral and the Twitter account exploded from 300 to 30,000 followers.
We chatted with Max Bittker, the San Francisco-based software engineer who built the bot.
This interview has been edited and condensed for clarity.
IEEE Spectrum: What exactly does your Twitter bot do?
Max Bittker: Every day, New New York Times scrapes through new news articles posted on The New York Times website and breaks it down into list of words. Then it searches through background logs of The New York Times to determine if the word has occurred before. And if it finds a word that has never been used before, it posts the word on to Twitter.
Spectrum: How did you come up with the idea for @NYT_first_said?
Max Bittker: There are two other twitterbots that planted the seed and were my main inspiration. One is called NewsDiffs. It scours the text of New York Times articles, looking for changes to online news articles after they are first posted to see how they evolve. And so the bot picks up a lot of typos but also picks up a lot of cases where, say, the headline is softened or facts are left out.
The other inspiration was an early Twitter bot that entered the public consciousness, called Everyword. All it did was it went through the entire dictionary alphabetically and posted a word every half hour. Reading the dictionary is boring but put in the context of Twitter it actually took on a really cool life. Posting random words is the most minimal thing a robot can do, but this turned it into an interactive community. I kind of see @NYT_first_said as a spiritual descendent of Everyword.
Spectrum: Other than the president’s vulgar word choice, what other surprising or funny words has it brought up?
The New York Times covers a really wide variety of subjects, including science research and food and culture. My favorite categories are science articles where the bot will pick up words like nanoraptor: words that for a scientist might just be daily jargon but taken out of context sounds really intriguing. Also, I really like it when there are cultural articles about trends because it kind of juxtaposes The New York Times, which is this very uptight publication, with jargon like neckbeard. It’s this ‘Look, now that the NYT has published it, it’s here to stay’ kind of thing.
Spectrum: How did you build the bot?
Max Bittker: It was a weekend project I worked on at home. It’s really simple from the perspective of code. I found the open-source code for the NewsDiffs bot on GitHub and basically started deleting things because NewsDiffs does a lot of things like sorting that I didn’t need. I reduced the code down to just getting the text of articles, then started storing them in a list and cross-referencing them. I had an initial version done in one afternoon.
The New York Times publishes an application programming interface (API), a programmatic entry point to requesting information. The bot uses this API to search through all of the New York Times’ digital archives.
News articles are noisy so it would post lots of things that weren’t interesting, like proper nouns, URLs, numbers, etc. So there was this long refinement process of building rules and heuristics to leave out non-interesting things. That’s an ongoing process.
Spectrum: What keeps you busy on weekdays?
Max Bittker: I work at a startup called Sentry that builds a tool for developers. Sentry tells you when the code you write has errors in it or is experiencing problems. I build interfaces and tools and alerts that present information to you in a useful way.
It has been nice to have my Twitterbot code running in the wild. I actually use the tools from my day job to make sure my Twitter bot is running well. It kinda gives me an excuse to poke at my Twitterbot when I’m at work. Working on the Twitterbot is a time to be a consumer of the Sentry tool versus building the tool.
Spectrum: What do you do in your free time besides build Twitter bots?
Max Bittker: I make a lot of programmatic visual art. Also, I’m kind of a programming language and human language enthusiast. So I like to learn new languages that aren’t necessarily practical but I like the process of learning them. Right now, I’m learning the human language German and the programming language Rust.
Source: IEEE Spectrum