OpenAI GPT-3: limitations of large-scale language models
- Fatigue as the Primary Symptom of Long COVID
- 22% of AI-generated medical advice could lead to death or serious injury
- WHO: Optimizing Vaccination Can Reduce the Use of 2.5 Billion Doses of Antibiotics Each Year
- $3 Million Lentiviral Gene Therapy Linked to Cases of Blood Cancer
- Mycoplasma Pneumonia Outbreak in Japan: Still Travel to Japan?
- Is Zero-Calorie Sweetener Harmful?
OpenAI GPT-3: limitations of large-scale language models
OpenAI GPT-3: limitations of large-scale language models. It is difficult for us to define in a general way what is considered to reduce the bias of large-scale language models.
Recently, Miles Brundage, Director of OpenAI Policy Research, shared a new paper on Twitter. The content of the paper is a summary of a GPT-3 seminar.
On October 14, 2020, researchers from OpenAI, Stanford University’s HAI Institute and other institutions convened to discuss open research issues surrounding GPT-3.
Scholars participating in the seminar have a variety of research backgrounds, including computer science, linguistics, philosophy, political science, communications, and network policy. Roughly speaking, this seminar revolves around two main issues:
What are the capabilities and limitations of large language models? The discussion covered several key areas, including: the huge impact of scale on the function of the model; the difficulty of evaluating whether a large language model really understands the language; the importance of training the model under multiple data modalities; and making the model goals and human values consistent Consistent challenge.
What is the social impact of the widely used large-scale language model? The discussion covered several key areas, including: it is difficult to determine all possible use (or abuse) scenarios of the universal language model; the challenges that organizations may face in model deployment; the potential for the model to leak information at the algorithm level; and the reduction of model bias ( For example: race, gender, religious beliefs, etc.) existing obstacles; and the impact of language model-based automation applications on the labor market.
After the meeting, several participants from Stanford University, OpenAI and AI Index sorted out and summarized the content of the discussion, and wrote the following:
In the open discussion, the author and others hope to provide you with a multi-angle view, cause thinking, and jointly seek solutions.
Technical capabilities and limitations
1) Scale effect
GPT-3 is one of the largest language models: it has 175 billion parameters and has received 570 GB of text training. In contrast, its predecessor, GPT-2 (similar in function to GPT-3) has 1.5 billion parameters and received 40 GB of text training. Although GPT-2 shows a certain degree of zero-sample generalization ability for downstream tasks, when examples are given in the context, GPT-3 further demonstrates the ability to learn more novel tasks. Participants found that it is amazing that this generalization ability only comes from increasing the scale of the model and training data.
It was pointed out that The phenomenon of expanding capabilities as the model scale increases, “just like the laws of physics or the laws of thermodynamics” has stability and predictability.
Some participants were optimistic that even for models much larger than GPT-3, these trends will continue to take effect, and more and more powerful models will appear in the future, and they will be able to learn from a small number of training examples in a more advanced way. Learn new abilities.
One participant pointed out that The scale of a model like GPT-3 is reminiscent of large-scale particle accelerator experiments, and the construction of such an accelerator requires researchers with many different backgrounds.
For example, when training such a large model, different teams with different expertise must collaborate to run experiments, build and maintain computing infrastructure, develop algorithms, and constantly test the function of the model to solve possible problems (for example: bias , Abuse, security, etc.).
2) Understanding
What constitutes “understanding” in the language model? Does GPT-3 meet this definition? Some people tend to define based on the concept of strong intelligence, which requires
The model has intent or the ability to respond to requests in the real world.
Others have suggested that GPT-3 also has some weaker concepts of intelligence that have not yet been satisfied, including robustness to adversarial examples. These examples can easily confuse the AI system but have no effect on humans. The participants suggested, If the model does not perform well on rare but important inputs, solving the problem “basically correct” may not be enough to be considered understanding.
Another definition of understanding revolves around the concept of causality, because a model that truly understands should grasp the causal relationship between data features and desired behavior. Some people believe that language models will inevitably use the inherent “false associations” or “shortcut features” in the data, and therefore lack a true underlying causal model. However, a participant put forward another point of view:
If the language model has enough data, it may form a “natural experiment” effect, so that the model can learn causality from observational data in a way similar to humans.
Some participants opposed the dualism of understanding and emphasized the phenomenon that children and adults gradually master more powerful skills over time. For example, one participant quoted a famous physicist, “I only learned about thermodynamics the third time I taught thermodynamics.”
Another participant opposed a single concept of understanding and emphasized the debate on meaning between linguists and philosophers, that is, “Does meaning derive from the relationship between expressions or from some external basic truth? ?”
Finally, some participants questioned the concern for understanding, believing that humans can accomplish many tasks in mediocre or even lack of understanding, including non-French players who recently won the French Scrabble Championship. Some people said that the judgment on whether GPT-3 understands language in a relevant way may have nothing to do with whether it can successfully complete the task.
Impressively, one participant also talked about a reverse problem, namely, human understanding of the capabilities of large language models: “GPT-3 is completely unfamiliar. It is not a foolish thing to ask if it is AGI.”
3) Multimodal
Most of the discussions have touched on the importance of multi-modal models, which are language models trained on data from other modalities (such as images, speech, etc.). Participants generally agreed that large-scale multimodal models will become more common and achieve more functions. In fact, shortly after the seminar, OpenAI released DALL-E, which is a multi-modal version of GPT-3, trained in text-to-image conversion.
but, Some people think that GPT-3 has been trained on multimodal data because the training data contains prose, structured data tables and computer code. Others believe that the main benefit of multimodal training may be to allow the model to learn useful features faster, because the interaction between different data modes may provide a stronger learning signal than a single data mode.
Finally, some people commented, In view of the differences in the range of sensory modalities that humans can use, no other modalities are critical to language use.
4) Value matching
Participants discussed that the goals of the model need to be better matched with human values. For example, one participant mentioned
Some language models treat all language symbols (such as nouns, prepositions, numbers, etc.) equally, but humans are different.
Several other participants emphasized the challenge of better optimizing factual accuracy and counter-robustness. The matching of human values and model goals is considered very important, especially for “embedded” AI agents that learn through active interaction with the environment. Participants also emphasized the development of better algorithms to “turn” the value of agents to humans, and promote interdisciplinary cooperation to better clarify what is “human value”, especially considering the diversity of individuals and communities Sex, and the biases in the data set.
The benefits of GPT-3 being widely used
1) Function
GPT-3 is very powerful and can perform text summarization, robot chat, search, code generation and article generation.
At the meeting, someone suggested that the functions of GPT-3 are so shocking that it is necessary to carefully control all uses (because GPT-3 accepts arbitrary input, but cannot predict all possible performance of the model in advance), but also to ensure the safety of human society. The GPT-3 threat is actually extremely challenging.
Many people at the meeting also noticed,
If GPT-3 is delayed using restricted APIs, then OpenAI is easier to control the use of the model than open source.
However, this method also has many problems that need to be solved, such as: who can access and why? How to provide model access to support large-scale team collaboration in large communities (check for potential misuse of models and develop mitigation strategies)?
2) Deployment
Participants discussed the ethical and social challenges that may arise from the deployment of large-scale language models, and the ways to deal with these challenges. One suggestion is to increase the computing resources used by academia so that scholars can study the deployment of large-scale language models.
It has been suggested that establishing laws and regulations that require users to disclose when they use AI to generate text may help manage the impact of large-scale language models. Another participant asked: Can certain criteria be used to assess whether the language model has social benefits? Everyone thinks this is a very challenging but very important task. Several participants thought, OpenAI and other organizations will not monopolize large-scale language models forever.
They mentioned that developers may only have a monopoly for 6-9 months until other researchers reproduce their results. Everyone reached a consensus: The most cutting-edge research institutions should use their cutting-edge positions to responsibly formulate standards and regulations in emerging fields.
In addition, some participants pointed out that Due to advances in technical standards, it will become easier to copy models such as GPT-3 over time.
This also further demonstrates the urgency of using the current time window. In this window, very few actors have very large-scale language models, and it is difficult to formulate appropriate norms and principles for others to follow.
3) Fake news
Another main topic discussed at the meeting was about the incorrect use of language models to generate false information. Specifically, models such as GPT-3 can be used to create false, misleading, or publicly related articles, tweets, and news reports.
Some people think that some previous technologies (such as photography and PS) will also bring similar problems. The public has increased awareness of this risk, so there is no need to worry too much; in addition, although GPT-3 may indeed be automatically generated in principle Fake news, but it seems more cost-effective to spread rumors manually than to make fake news with GPT-3.
Others disagree with the above view. They believe that the cost of automatically generating false news from a language model is much lower than the cost of training and paying manpower to create false news.
Everyone thinks:
Actual investigation of the economic laws of automatically generating false information and artificially creating false information is very important.
Looking ahead, someone suggested that we might as well imagine that in the future, the text generated by the language model is not only connected to the topics discussed by everyone, but also has strong persuasive power on any topic.
Another participant pointed out that GPT-3 or other language models in the future may make false information difficult or impossible to detect from content, thereby forcing reliance on metadata through online platforms. Similarly, it has been suggested that the existence of systems such as GPT-3 should encourage more use of encryption technology to authenticate media.
4) Bias
GPT-3 exhibits multiple types of racial, gender, and religious biases.
One discussant compared the difficulty of solving language model bias to the difficulty of solving content review on online platforms.
Although both of them have difficulties in formulating regulations, there are consensus and opportunities for mitigation in some aspects. For example, online platforms agree that it is necessary to address child pornography or serious threats of violence. The concept of “protected class” in the discrimination law also provides a useful initial framework for thinking about certain language model biases.
Several seminar participants pointed out that It is difficult for us to define in a general way what is considered to reduce the bias of large-scale language models, Because the appropriate language use depends largely on the context.
One participant said, All data sets are biased in some respects Therefore, our challenge is not to eliminate all prejudices, but to resolve harmful prejudices in accordance with certain norms and/or legal standards. Some people suggested, Companies like OpenAI do not have an appropriate position to formulate norms on behalf of society.
Some people find it difficult to reduce the bias of multifunctional systems such as GPT-3 by changing the training data, because bias is usually analyzed under specific use cases.
Participants discussed various possible ways to address harmful biases in language models, including:
• Change the initial training data to reduce bias in advance
• Train a separate model to filter the content generated by the language model
• Fine-tune large-scale language models based on necessary data
• Label the data so that the model can learn to distinguish certain forms of content (see CTRL)
• Train the model to “understand the facts”
• Use human feedback for reinforcement learning
• Use the knowledge of the model itself to improve the output (for example, well-designed prompts)
• Develop more “bias test” suites that the model can run before deployment
• Research models together with trusted partners and provide certain business services
None of these methods are omnipotent. For example, using human feedback to manipulate the model still raises the question: Who is the human marker? How to choose human markers? In addition, content filters sometimes undermine the specific agents they want to protect (for example, take back words or phrases that most people use for defamation, and marginalize these groups).
One participant argued, Placing people at the core of text generation is crucial to solving these problems. Some participants emphasized that, given the limitations of existing technologies, certain functions of the language model should be avoided, and the differences in openness and risk of text generation applications are very large. For example, detecting regular expressions is easier to handle than managing suicide hotlines.
5) Economy
Another topic of discussion is about the economic significance of models such as GPT-3. The participants observed that People currently have different levels of expectations for work involving text reading or analysis Some jobs are satisfactory (such as writing or reading and summary reports), while others are less effective (such as content review). This raises the question: when or what type of work should large language models be automated? One participant thought, If the company is allowed to make such decisions, it may have undesirable consequences.
The conference also discussed that education is also likely to be affected by larger language models, which may be caused by changes in the essay writing process and the way the text is evaluated.
Another participant mentioned that providing API access to groups in different sectors of society can help transmit early signals of potential social change.
Future research directions
The following research questions were inspired by the discussion at the conference:
• Can we better understand why the scale of the language model has become so large? Can this help us build models that can scale more effectively?
• What are the limits of expansion? Will scaling up lead to stronger causal reasoning, symbolic manipulation, common sense understanding, and robustness to a wider range of input categories? Still need to use different technologies?
• How do we understand the limitations of large language models? Can we ask the model to ask for help, explanation, or abstention when uncertain?
• How can we develop new neural network architectures and algorithms so that the model can efficiently learn multimodal data other than text?
• What are the opportunities and trade-offs involved in different ways to make the output of large language models more in line with human values?
• How should access rights to models such as GPT-3 be allocated, and a balance between security, reproducibility, and fairness should be balanced? In order for language models like GPT-3 to be safe or unsafe to use in certain situations, what kind of testing do we need to perform?
• What measures can academia take to best position itself to build the industrial development guardrail of this model, including advocating for adequate funding to replicate the computing resources required for training?
• How can we best promote interdisciplinary collaboration to understand and manage the biases in large data sets and model representations of such data sets?
• How can we best describe the potential “threat posture” of such models; for example, do we need to spend more time worrying about some profit-driven people using this model to generate a lot of spam, or should we worry about some people? Use models to generate persuasive texts and use them in false propaganda campaigns?
• Compared with alternative methods to achieve the same goal, how cost-effective and skill-intensive is the abuse of language models for various purposes by malicious actors?
(source:internet, reference only)
Disclaimer of medicaltrend.org