Book Review: GPT-3
Building Innovative NLP Products using LLMs (A Bite-Sized Review) + Around the web GPT-3 special.
Book Title:
GPT3: Building Innovative NLP Products using LLMs
This book is about a (model) technology that marks an important milestone in the history of AI. It's called generative pre-trained transformers (GPT), a type of large language model. It can write essays, code, and everything in between.
More specifically, it is a deep learning-based natural language processing model that is trained on a large corpus of text data in an unsupervised manner.
Only in the field of AI can a book be outdated the same year it was published. While this book is not outdated per se, it could use some additional chapters on, say chatGPT which did not exist at the time the book was released.
It's an excellent, solid book nonetheless to get you started. You will get a gentle work through the model and various ways you can make use of GPT, either as a developer or non-programmer.
So rather than writing more about the book, I think it’s befitting if I merge an ‘around the web’ GPT-3 special with this micro book review.
Here are some of the contents I have read and metabolized in the past few days.
Around the Web GPT-3 Special:
[I]: EdTech and GPT
Deep analysis of the state of EdTech and the impact of GPT via Reach Capital.
[II]: Building an AI Chatbot.
Dan Shipper, CEO of Every built an AI chatbot for the Huberman Lab podcast, using mostly code from OpenAI cookbook.
“On the personal side, if you're trying to remember an idea from a book you read, or something a colleague said in a meeting, or a restaurant a friend recommended to you, you’re not going to dig through your second brain. Instead, you’re going to ask a chatbot that sits on top of all of your notes, and the chatbot will return the right answer to you.
On the organizational side, if you have a question about a new initiative at your company, you’re not going to consult the internal wiki or bother a colleague. You’re going to ask the internal chatbot, and it will return an up-to-date, trustworthy answer to you in seconds.
On the cultural side, if you want to know what your favorite podcaster says about a specific topic, you’re not going to have to Google them, sort through an episode list, and listen to a two-hour audio file to find the answer. Instead, you’ll just ask a chatbot trained on their content library, and get an answer instantly.”
[III]: Historical analogies for large language models
How will large language models (LLMs) change the world?
“No one knows. With such uncertainty, a good exercise is to look for historical analogies—to think about other technologies and ask what would happen if LLMs played out the same way.”
[IV]: How to Get the Most Out of ChatGPT
Learn to leverage the most disruptive AI of the year
This essay contains many, many GPT use cases.
“We shouldn’t qualify ChatGPT as more or less intelligent (the concept doesn’t even apply to it) but as more or less suited for a given task. As an autocomplete system with a high component of randomness, pure creativity, inspiration, ideation, etc. are the tasks for which it’s best suited.”
“Besides creativity, ChatGPT is also viable for tasks that enter the “factual quer[y]” territory—as long as the user has the knowledge to fact-check it afterward. If you manage to save time with ChatGPT, then that’s a great use case you got there.”
[V]: Hidden uses of ChatGPT
Nothing hidden about it, but here you go:
[VI]: PubMed GPT: a Domain-Specific Large Language Model for Biomedical Text
The dataset used for training the model “contains around 50B tokens and spans a collection of 16 million abstracts and 5 million full-text articles from the biomedical literature, as curated by the National Institute of Health.”
“The total training time for Pubmed GPT was ~ 6.25 days. Using placeholder pricing of $2/A100/hr, the total cost for this training run on MosaicML Cloud was ~ $38,000.”
Here is their code: stanford-crfm /pubmedgpt
The researchers “demonstrates the capabilities of industry-specific large language models—specifically for the field of biomedicine. Using the MosaicML Cloud platform, CRFM trained a 2.7B parameter GPT on biomedical data from PubMed that achieves state-of-the art results on medical question and answer text from the US Medical Licensing Exam (USMLE) — highlighting the promise of domain-specific language generation models in real-world applications.”