X
Innovation

OpenAI has an inane text bot, and I still have a writing job

Testing the new, enlarged version of OpenAI's GPT-2 code on my computer, the flashes of brilliance in the code are mixed with a lot of gibberish. Some other signal is needed to train this neural net to develop coherence in written language.
Written by Tiernan Ray, Senior Contributing Writer

Editors have said to me over the years, only half-jokingly, that I will someday be replaced by a robot. Many editors would rather deal with a machine than a reporter.

We're not quite there yet, however. 

I've been playing with "GPT-2," a program developed by the not-for-profit Silicon Valley company OpenAI. GPT-2 uses machine learning to automatically generate several paragraphs in a row of what seems like human writing. This is a fresh batch of code, released on Friday by OpenAI, and it's more robust than what was first posted in February, when GPT-2 was announced

Unfortunately, this new code is not that much more impressive. The occasional flash of brilliance is mixed in with a lot of gibberish and the creations quickly become tiresome.

What's the problem? It may be that a more powerful version of the software will make the difference. Or it may be that fundamentally machine learning approaches still have tons of work to do to incorporate forms of causal reasoning and logical structure. 

To try out GPT-2, I downloaded the code from Github. This is not the most powerful version of GPT-2. When OpenAI announced GPT-2, on Valentine's Day, they said the program was potentially dangerous, given the ability to generate massive amounts of fake writing. For that reason, they refused to release the most sophisticated version of GPT-2. The initial code release had only 117 "parameters," the variables that GPT learns in order to calculate the probability of word combinations. 

That's a fraction of the 1.5 billion parameters in the full version. More parameters is better. On Friday, OpenAI posted a version with 345 million parameters. 

(OpenAI notes, in an expanded version of the original blog post, that they are still studying the risks of GPT-2 before releasing the full version.)

Also: Fear not deep fakes: OpenAI's machine writes as senselessly as a chatbot speaks

On my computer, I installed Docker in order to run the container in which GPT-2 will operate. Once that's set up, it's very easy to go to the folder with the GPT-2 code and start the thing running in a terminal window, at the command prompt. 

Note that when running the python command that starts GPT-2 running, it's important to specify the larger model by using the "model-name" flag:

python3 src/interactive_conditional_samples.py --top_k 40 --model-name 345M

The phrase "345M" here refers to the 345-million-parameter model. 

There are some settings one can play with; I only explored one of them, known as "top_k," which controls the "diversity" of the text generated. Although setting this value lower made the text more coherent, I didn't find it changed much my overall feeling about what was created.

Note: If you want an easier way to try out GPT-2, head on over to researcher Adam King's Web-based installation, "Talk to Transformer." There you can simply type a prompt into the Web page.

The first impression I had is that it's incredible for the computer to assemble some paragraphs that have some form of continuity. The second impression I had was just how inane what's produced is. 

At the command prompt, one types in some text that acts as a kind of seed to give GPT-2 something off of which to work. GPT-2 works by encoding that text and then decoding it to produce a sample text. After a period of perhaps a minute, when the neural network seems to be thinking, it produces several hundred words worth of sentences. 

The text it produces isn't really a continuation of the text you type. Instead, GPT-2 seems to borrow the style or the voice of your text. 

gpt-model-open-ai-2018.png

The original GPT neural network from last year is adapted for GPT-2, says OpenAI, with some modifications. On the left is the neural network model, and on the right are various language tests on which GPT is tested, such as sentence similarity. 

OpenAI.

Its best examples are when prompted with something like a news report. To test out the prospect of dangerous text, I fed GPT-2 some fake sentences.

For example, I crafted the top of a news report stating that Apple is dumping the iPad, something that an unwitting investor might panic upon seeing:

Shares of Apple plunged in morning trading after the company's chief executive indicated to Wall Street that the company will discontinue its iPad tablet computers. The remarks came during an appearance Monday onstage at an investment banking conference in New York.

GPT-2 responded with several paragraphs in a convincing newswire style. But the nonsensical details came fast and furious. Apple shares were still at a high, despite the plunge, it said, and somehow this related to Microsoft, in no clear way: 

While this news certainly hurts market sentiment, it won't necessarily change the value of shares in Apple, where the company is trading at a healthy 52-week high.

It also puts pressure on Microsoft to show an end-of-year miracle. Apple last year posted better-than-expected quarterly financial results, but Microsoft is running its third-quarter earnings before interest, taxes, depreciation, and amortization (EBITDA) reports on Nov. 30.

Two fake sentences about White House press secretary Sarah Sanders ran into some factual trouble immediately, but, again, the style worked well and the loose structure of the news story suits the generation of paragraphs that are more or less consistent. First, my prompt, followed by part of the generated text:

Model prompt >>> White House press secretary Sarah Sanders will step down at the end of the month, the White House announced. A replacement has not yet been selected, the White House said.

======================================== SAMPLE 1 ========================================

Sanders' tenure with the White House ran from 2009 to 2013, when she was the only White House press secretary to not take a paid role on the president's daily radio program, Fox News.

Sanders' departure won't come as a complete surprise: The White House has said it looks at dozens of candidates for press secretary, and often selects between one to six people, all depending on the jobs facing the administration at the time.

Sanders' time in the job led to the "first female senior press assistant" moniker coined by the conservative American Enterprise Institute, which began promoting it as part of President Obama's 2014 re-election strategy.

In other types of text, the nonsense comes quicker even though the style of the writing continues to be captured aptly, as in this response to my made-up opening line of a story:

Model prompt >>> Once upon a time, there was a little seal who lived in the zoo.

======================================== SAMPLE 1 ========================================

She opened her mouth wide and the seal got really scared because she was shaking so much. She was scared because she was afraid of a big seal that would come out of the zoo and bite her.

She closed her mouth again and it just stood there looking at her. Now that the seal was frightened enough, she got up and walked to the zoo where the seal lived. She got some candy and started eating all of the candy left from the seal.

Sometimes, GPT-2 would pick up on a theme in my text and respond to it in an associative way. When I typed out the opening stanza of a poem by Indonesian poet W. S. Rendra, GPT-2 responded with what sounds like a singer describing his or her work:

Model prompt >>> As the moon sleeps with a rich old lady/I caress a maiden in the mango grove./My Heart is wild and fiery/trampling hunger and thirst underfoot./In our misery we reach out/In the dark and the shadows/roars the passion of our rebellion./And her fierce laughter/makes my heart glad.

======================================== SAMPLE 1 ========================================

The lyrics speak to a feeling that many, many people in the world may find as frustrating as mine, with our current circumstances. There are very few reasons that a song should have this many words but most songs have only a couple, so, I have created this to give people (especially those who have experienced the hardships of homelessness) something to be thankful for. It also works for people who have been through a lot of the same but now have new eyes on life, something I hope to provide. For the record, I am only trying to provide hope. We may not always have the same goals but we all strive the same to do our best and have the best time that we can.

Leaving aside factual errors, there's a kind of run-on quality to GPT-2's synthesized texts that reflects the task. The neural network in GPT-2 is tasked with producing a probable string of text in response to a given string of text. 

In that sense, the program can respond correctly by mirroring what it is receiving, but it tends to then run on and on without any point. That's why the news stories work well, given that news stories in the real world do, indeed, tend to simply run from start to finish as a bag of loosely connected sentences. 

OpenAI's big insight is that written language itself contains clues that allow a neural network to infer what "task" is to be performed. For example, in "natural language inference," where the task is to determine whether one sentence entails another, digesting huge amounts of text will probably convey a sense of structure as far as sentences that probabilistically follow one another. In addition to OpenAI's blog post, you can read more about the details of this in the OpenAI paper posted on the arXiv Web server, "Language Models Are Unsupervised Multitask Learners."       

Far be it from me to tell OpenAI their job, but I suspect the the principle of probable word assembly is not sufficient on its own to ensure coherence of sentences or consistency of reference. 


Must read


If you've read all the way to this point in the article, your mind is connecting this sentence with what came before. And not just as being similar, but as in some way responding to the preceding sentences, what we generally call advancing an argument. 

The writer, to create that flow of argument, is responding to their own sentences with questions of their own. 

Put in machine learning terms, similarity alone is too "sparse" of a reward for the machine on which to build a competent learner of writing. There are other signals, between the writer and the reader, real or imagined, that constrain writing in such a way as to lead to a coherent product. One constraint is, "don't be too run-on and repetitive," which may work against the principle of sameness. 

Be that as it may, I'll sum up by concluding the thought I started with at the beginning of this article: I'm looking over my shoulder because this GPT-2 thing can capture a lot about language syntax, as well as the mood and voice in written language. 

Here's how this article would have gone if I had just given GPT-2 my first paragraph and let it go to it:

Model prompt >>> Editors have said to me over the years, only half-jokingly, that I will someday be replaced by a robot. Many editors would rather deal with a machine than a reporter.

======================================== SAMPLE 1 ========================================

 However, I am certain that this is a matter of choice. And I can confirm that it is the choice of many who have worked at the Times who love the fact that they do something other than simply write.

So why should anyone change? I ask myself, now that we're out of election season. Is it fair that we continue the tradition of hiring journalists by giving up an editor who did so well during a time of relative stability? And to what extent — to what percentage, as all political editors do — do we deserve someone who did so well?

No answer comes readily. The truth of the matter is we're all better for getting our jobs through an honest vetting system than through some kind of patronage system. Journalists can find a way. There are several ways this process can be done, and, for me, and others like me, that makes it worth it.

One person's rejection won't be another person's job.<|endoftext|>A few weeks ago I wrote about my personal favorite video game of 2013, Dragon's Dogma: Dark Arisen. While the review of Dragon's Dogma is very different from my original review, it still contains many of the same criticisms that I had with that game, and with The Witcher 3 and Skyrim, in general. That said, here I am writing about my least favorite of those.

On this one I want to start with the things that struck me the most during that time. These were all things that I found to be obvious, yet that they didn't fit into their own little category, and yet they were all extremely obvious.

In Dragon's Dogma there are things that do not feel justified at times. Sure, there are areas in the game which do have plot beats, even though they're very minor, and yes, there are moments when everything is just plain terrible.

There are also moments where a character is really, really bad, and the only way to get through those parts and the ending is using a skill that can only function for one side of the story. Those are the things that, when you do them well, make Dragon's Dogma stand out from other video games out there. Those things are so subtle and subtle that I didn't notice until I got to playing them again, and the reason I stopped checking them every time was because I was like… damn, these things. What was I missing?

I've watched so many video games with such obvious plot-hole characters and so much

Have you tried the code for yourself? If so, let me know your impressions in the comments section.

Cloud services: 24 lesser-known web services your business needs to try

Editorial standards