From the video of VPT pursuing the making of a diamong pickaxe in Minecraft. The computer program achieved the feat in ten minutes, half the time it would take a proficient human player to do it.
How important is it to master the "diamond" tool in Minecraft?
According to OpenAI, an artificial intelligence startup, $160,000 is enough.
That's the amount OpenAI spent on hiring Minecraft players via Upwork's online job posting platform. The team also submitted videos of themselves playing Minecraft.
ZDNet Recommends
The 6 best Samsung smartphones: Find a new Galaxy. Essential home batteries: The best battery backup systems
Our favorite pellet grills: Alternatives to gas and charcoal
The 5 Best Wireless Headphones: Untethered Sound!
The best GPS trackers & devices for children: Find your child fast
OpenAI researchers Bowen Baker (and his team) present a paper titled "Video PreTraining(VPT): Learning To Act by Watching Unlabeled On-line Videos." This week's paper outlines how they used large datasets to train a neural system to imitate human keystrokes in solving different tasks in the videogame. OpenAI also posted a blog post.
A variety of neural networks have conquered many types of games via reinforcement learning. These include DeepMind DeepMind’s AlphaZero, which beat chess, Go, Shogi and the MuZero program which added the ability of handling Atari games.
Baker and his team set out to develop a neural system for Minecraft's "open world" environment. The complex game environment offers players more freedom than Atari or chess.
Also: AI in Sixty Seconds
According to the authors, there is a "vast" amount of research on Minecraft in this literature. The VPT work is however unique because of its scope and size. "To our best knowledge, there is no published work which operates in the full human action space, including drag-and-drop inventory management, item crafting, and item crafting."
VPT was the first stage of the building of the neural network. The first stage required human game contractors or game players who played 4,500 hours. The researchers later figured out that they only really needed about 2,000 hours.
Baker and his team describe their process:
We kept the applications open for one day, then randomly selected 10 candidates for the first round. Later in the project, as we needed more data and as some contractors asked to terminate their contracts, we added more applicants from the original pool as well as referrals from the currently working contractors. The hourly rate paid to the contractors was $20, minus any applicable taxes and fees on Upwork. All the data used in this paper was collected over 4,500 hours. We also paid Upwork platform fees and taxes. We also collected data that we did not use due bugs in the recorder, and ideas that we did not pursue. In total, we spent about $160k for contractor compensation over the course of the project. Sec. We can probably achieve most of our results using an IDM that is trained with only $2000 worth data, as we discuss in Sec. VPT model for the foundation, BC fine-tuning to earlygame_keyword dataset and RL fine-tuning results. The cost of collecting the contractor_house data was $8000. Because we used the IDM trained on about 2000 hours of contractor data, the actual cost of contractor data for those results was around $40,000.
They also added labels to the frames for actions such a "inventory," which checks if a player has enough objects to go around, using "E" key; and "sneak," that allows you to move in the desired direction using "SHIFT key". These actions are recorded as JSON text strings during game play and stored along with the video frames.
The gameplay frames were used to train an inverse dynamics model (or IDM) which learns what actions go along with which frames. The IDM is a mash-up of several kinds of neural nets, including a 3-D convolutional neural net and a ResNet to parse the video frames, and several Transformer networks of attention to predict the next video frame.
Also: Sentient? Google LaMDA feels like a typical chatbot
That IDM's trained ability is then used on a much larger set of video footage, a total of 70,000 hours of unlabeled Minecraft footage gathered from the Web. The IDM applies "pseudo labels" to this vastly larger collection. In other words, the IDM, and the contractor fees, are a way to bootstrap a huge video training set.
The training regimen for VPT.
The authors say that even though the contractor payment may seem costly, it is actually a huge cost savings. It would cost much more to collect contractor information equivalent to the 70,000 hours worth of Web videos.
"If we could collect a labeled contractor database of the same order of magnitude as web_clean, this would not be of any importance; however, it would have cost millions to collect that scale of data."
Using the 70,000 hours, the authors then train a second neural network, also made up of Transformer layers, to mimic the user actions in the videos, a common practice known as "behavioral cloning."
The point of the work is to find a way to train a general purpose computer "agent" that can use the wealth of the data on the Internet that has no labels to solve tasks that involve causality, meaning, and sequences of actions that have a necessary relationship from one to the next.
"The results presented here help pave a path to utilizing web data rich in unlabeled data for sequential decision domains," they write.
They suggest that this work could be used to perform a variety of computer tasks that require mouse clicks and other operator controls.
"While our experiments are limited to Minecraft, we believe VPT provides a general formula for training behavioral priors into hard, yet generic action spaces in any domain that contains a large quantity of unlabeled data like computer usage."
Open-AI's most famous product is GPT-3, a large language program that uses a pre-trained approach based upon tons of Web data. The Minecraft game extends that approach to mimicry in the domain sequential computer tasks captured via video.
Also: What's GPT-3? Everything your business needs about OpenAI's groundbreaking AI language program
The ultimate achievement of any human is to sometimes exceed the time required to complete one of life's most difficult tasks, obtaining the diamond pickaxe.
Diamond-based tools can do more damage and last longer in Minecraft. The only pickaxe that is important to gamers is the diamond pickaxe. You need a diamond pickaxe to mine obsidian and a fictional material called netherite, both of which are important for endgame activities such as enchanting tables and making netherite equipment.
After training the VPT how to perform various Minecraft tasks, authors used a "fine tuning" approach that created a reinforcement learning neural net to make a diamond pickaxe much faster than normal.
Newsfordummies.com
They write, "To demonstrate the efficacy RL fine-tuning,"
This is hard for humans. It takes them twice as long to complete, if at all.
Doing so involves acquiring a sequence of difficult-to-obtain items that require complex skills like mining, inventory management, crafting with and without a crafting table, tool use, operating a furnace, and mining at the lowest depths, where many hazards like enemies and lava exist (Fig. 6). Adding to the difficulty, progress can be easily lost by dropping items, destroying items, or dying. It takes a skilled human more than 20 minutes (24,000 steps) to get a diamondpickaxe.
In assembling both the contractor data and the unlabeled 70,000 hours of Web video, the authors were mindful of the prospect of offensive content. The contractors could theoretically use Minecraft’s open-world property to generate personally identifiable data and/or offensive material (e.g. "The contractors could theoretically use Minecraft's open-world property to generate personally identifiable information and/or offensive content (e.g., by using Minecraft blocks for their names or offensive message, and then finding a spot where the message would appear)," they write. However, they didn't see it in the videos of contractors that the authors saw.
"Of course we train our BC [behavioral-cloning] models based on videos of people playing Minecraft online, and if such behavior appears in those videos our model might also learn it, but we expect such behavior to be rare enough that our model wouldn't be able to reproduce it," they write.
Where is this general agent going next? The idea is that having conquered diamond axes, VPT, or its offspring, can do all kinds of things that a person might do with a mouse and keyboard, including booing tickets, surfing social media, or navigating maps.