【提前体验GPT-5】吴恩达AI智能体工作流详细解读
发布时间:2024年06月06日
就在上周,提到了吴恩达教授提出的AI智能体驱动的工作流方式将会在未来成为一种趋势,并且他还说,他的团队已经证实了,利用此技术可以使GPT3.5的效果远超GPT4。
Last week, I authored an article titled "How GPT-3.5 Can Outperform GPT-4," where I discussed Professor Andrew Ng's perspective on AI intelligent agent-driven workflows emerging as a future trend. He stated that his team has demonstrated that by utilizing this technology, GPT3.5 can outperform GPT4.
当时他只是文本简单介绍了一下,现在他在斯坦福大学对此内容的演讲视频已经发布。让我们进一步了解一下。
Initially, he provided a brief textual overview, but now his lecture on this topic at Stanford University is available in video format. Let's delve deeper into the subject.
下面附上13分钟完整演讲视频,原视频地址为https://www.youtube.com/watch?v=sal78ACtGTc&t=58s,下面带中文翻译字幕的视频来自Twitter用户@宝玉。
,时长13:40
四种设计模式
视频中,吴恩达博士详细解释了智能体工作流实现的4种设计模式,这四种模式分别为反思、工具的使用、规划、多智能体协作。
In the video, Dr. Ng elaborates on four design patterns for the implementation of intelligent agent workflows: reflection, tool utilization, planning, and multi-agent collaboration.
吴恩达博士指出,前两种模式已经被证明是非常稳健且有效的技术,可以明显提高AI给出结果的准确性。
He noted that the first two patterns are well-established and effective, significantly enhancing the precision of AI outcomes.
而后两种模式则是比较新兴的方式,目前还不能保证100%有效,但是,有时它会给你带来惊喜,所以还是很值得尝试的。
The latter two represent newer approaches, which, while not always guaranteed to be effective, can occasionally yield surprising results and are thus worth exploring.
一、反思(Reflection)
这里举了一个例子,比如你想要让AI智能体完成某项编写代码的任务,在它提供给你第一版代码以后,你可以把它给你的结果再次给它,让它检查代码是否存在bug,或者是否可以进一步优化。
For instance, consider assigning an AI intelligent agent the task of coding. After it delivers the initial code, you can feed the output back into the system to check for bugs or potential optimizations.
这种方法是最简单的,但也是最有效的。这里的方法是针对一个智能体而言,吴恩达博士还提到,由此很容易让人联想到从一个智能体到多个不同功能智能体的扩展,这就是第四种设计模式的思想。
This straightforward yet highly effective method is applicable to a single intelligent agent. Dr. Ng also suggests the concept of scaling up from one to multiple intelligent agents with distinct functionalities, aligning with the fourth design pattern.
二、使用工具(Tool use)
大语言模型本身也不是万能的,比如吴恩达提到,在早期视觉研究领域,大模型便需要依靠视觉处理工具才能实现对图像视频等内容的处理。
Large language models are not all-powerful. Dr. Ng cited early vision research, where such models depended on visual processing tools to handle image and video content.
而现在有了多元化的工具支持,才使得大语言模型看起来无所不能。对于ChatGPT来说,最著名的第三方工具应该非Wolfram莫属了,正是有了它,ChatGPT才能执行各种复杂的运算。
Today, with a variety of tools at their disposal, large language models seem capable of anything. For ChatGPT, the most notable third-party tool is Wolfram, which enables it to conduct complex computations.
三、规划(Planning)
即让AI智能体来提前规划好执行计划或路径。比如,让AI通过提供的小男孩的姿势,生成一个小女孩的读书的图片,并且要求姿势要参考小男孩,最后再用语音描述生成的图片内容。
This involves having AI intelligent agents pre-plan their execution strategies or paths. For example, an AI could generate an image of a girl reading, based on the posture of a boy, using the boy's posture as a reference, and then describe the generated image using voice narration.
智能体通过进行规划,就会按照以下步骤来进行。首先,从示例图片中确定小男孩的姿势并通过特定的模型进行提取。然后, 通过另一个合适的模型,生成一个小女孩图片,并将提取的姿势融合进去。最后,通过图片转文本模型,得到图片描述文本,再通过文本转语音模型,将文本转为语音。
The intelligent agent would follow these planning steps: first, identify and extract the boy's posture from a sample image using a specific model. Next, generate an image of a girl incorporating the extracted posture with another suitable model. Finally, convert the image into a descriptive text using an image-to-text model, and then transform the text into voice using a text-to-speech model.
四、多智能体协作(Multi-agent collaboration)
这里提到了两种协作场景。第一种,以Chat Dev为例,如果你想开发一款游戏,你可以让它分别扮演一个游戏开发公司的CEO、项目经理、软件工程师、测试人员等不同角色 。然后,它将以不同角色开始进行协作和深入对话。最后通过不断编码、测试等迭代,就有可能会生成效果优越的复杂程序。
Two collaboration scenarios are presented. The first, using Chat Dev as an example, involves assigning different roles such as CEO, project manager, software engineer, and tester within a game development company, facilitating collaborative and in-depth discussions. Through iterative coding and testing, superior complex programs may be developed.
第二种,让不同智能体进行辩论,比如可以让ChatGPT和Gemini就同一个问题进行辩论迭代,最终达到提升性能,取得更优的结果。
The second scenario involves intelligent agents engaging in debates, such as having ChatGPT and Gemini debate the same issue, leading to performance enhancement and optimal outcomes.
结语
以上四种模式,便是吴恩达博士所提供的四种实现智能体工作流的方式。他认为,在未来智能体推理设计模式将会是一个重要的方向。通过这种智能体模式,AI所能实现的功能将大幅扩展。
These four patterns represent Dr. Ng's vision for implementing intelligent agent workflows. He foresees these design patterns as a key direction for future AI development, significantly expanding the capabilities of AI.
如果你现在还在等待GPT-5的发布,不妨试试智能体工作流和推理,也许通过GPT-4构建的工作流,就能超出你的预期。
If you are still waiting for the release of GPT-5, you might want to try intelligent agent workflows and reasoning. Perhaps by constructing a workflow with GPT-4, you can exceed your expectations.
于此同时,因为智能体工作流的特性决定了它总是需要花费一些时间才能生成答案, 所以,快速生成token的技术研究也将成为一种趋势。但是我们在使用智能体工作流时,还是更需要有一些耐心,毕竟再聪明的人,也需要一些时间来思考不是吗?
Moreover, due to the inherent nature of intelligent agent workflows requiring time to generate responses, research into rapid token generation technologies is becoming a trend. However, patience is essential when employing intelligent agent workflows, as even the most intelligent individuals require time for contemplation.
END
出自:https://mp.weixin.qq.com/s/3EI_ICojniMNLAR6debnBw
风平IP智造平台是基于AIGC的智能化IP打造平台,提供虚拟数字人定制、数字人直播和AI短视频内容生产、IP培育和交易的一站式解决方案。