线上调用大模型接口反应慢的定位和解决

一、业务流程：

用户填写主题和生成要素-->接口拉取prompt配置-->向量检索范文-->填充prompt内容-->调用大模型接口-->大模型响应处理并返回SSE响应

二、线上问题：

一到线上用户多的时候响应非常慢，AI内容生成非常慢，需要1分钟才会开始慢慢响应内容。

三、问题定位

定位到接口

首先查看chrome network接口请求耗时，确定是接口问题。
正常情况Waiting for server 只有一百多毫秒，用户多的时候该值达到1分钟。

观察pod的cpu mem 磁盘资源占用情况均正常。排除资源分配问题。

定位到请求大模型

通过打印的chatId日志来定位业务流程中哪个环节响应慢。

定位到请求慢的过程在建立连接到大模型接口响应的耗时。
和能力侧确认大模型接口未到并发限制。

定位到发起请求的client问题

再用CURL命令，在响应慢的时间节点，在服务的pod中模拟请求大模型接口，观察是否是大模型接口响应慢或者是网络连接问题。确认直接调用响应时间比较快。

由此确认是发起请求的client问题。

四、源码排查、SDK调优和本地压测

sdk源码

Java服务使用的大模型client sdk是github中开源的项目。

		<dependency>
                    <groupId>com.theokanning.openai-gpt3-java</groupId>
                    <artifactId>service</artifactId>
                    <version>0.18.2</version>
                </dependency>

查看openai-gpt3-java sdk的service类初始化相关的代码逻辑

其使用的连接池的配置是5个最大空闲连接数和1s的连接闲置时间。
因为在创建OpenAiService时没有提供对应的参数配置连接池参数，所以我们新建一个类DASOpenAiService继承OpenAiService，重写defaultClient方法。

    public static ConnectionPool connectionPool = new ConnectionPool(100, 10L, TimeUnit.SECONDS);

    public static OkHttpClient defaultClient(String token, Duration timeout) {
        return (new okhttp3.OkHttpClient.Builder().connectionPool(connectionPool)).addInterceptor(new OkHttpAuthenticationInterceptor(token)).connectionPool(connectionPool).readTimeout(timeout.toMillis(), TimeUnit.MILLISECONDS).build();
    }

定时任务监测连接池

写一个定时任务，每秒打印连接池中的连接数和闲置连接数。

@Component
@Slf4j
public class ConnectionTask {
    /**
     * 每小时0分执行一次
     */
    @Scheduled(fixedRate = 1000)
    public void executeInternal() {
        log.info("连接池信息：连接数:{} 空闲连接数:{}", DASOpenAiService.connectionPool.connectionCount(), DASOpenAiService.connectionPool.idleConnectionCount());
    }
}

修改连接池参数，上线后最大连接数仍然只有5。

本地压测

抽取大模型接口调用和响应的部分代码，在本地进行debuge测试，观察连接池配置是否被成功应用生效。确认连接池已经配置生效。

使用apache bench在本地进行并发进行测试。

ab -n 7 -c 7 http://localhost:8080/generateByElements/sse

观察本地console日志，最大连接数仍然只有5。

修改连接池配置

再次回到源码查看okhttp3 ConnectionPool的相关配置项。因为源码是Kotlin，所以右上角将其转换为java。

有一项是Dispatcher配置，其中的关键参数是 maxRequests和maxRequestsPerHost
。maxRequestsPerHost默认值是5。

所以将连接池增加dispatcher配置，每路最大并发为100。同时调整connectTimeOut参数值，避免非正常的连接一直占用资源。

    public static OkHttpClient defaultClient(String token, Duration timeout) {
        Dispatcher dispatcher = new Dispatcher();
        dispatcher.setMaxRequests(100);
        dispatcher.setMaxRequestsPerHost(100);
        return (new okhttp3.OkHttpClient.Builder().connectionPool(connectionPool)).addInterceptor(new OkHttpAuthenticationInterceptor(token)).connectionPool(connectionPool).readTimeout(timeout.toMillis(), TimeUnit.MILLISECONDS).dispatcher(dispatcher).build();
    }

再次使用apache bench进行压测。验证最大连接数可以达到7（压测的并发数）。

线上调整观察

调整配置后再次上线观察，连接数终于超过默认值5了，线上请求速度也变得非常快。

不如吃茶去