【学习】ChatGPT对问答社区产生了哪些影响?

        引用 StackExchange 社区 CEO Prashanth Chandrasekar 的一篇博客标题 “Community is the future of AI”,引出本文的观点,即ChatGPT对问答社区产生了颠覆性影响,问答社区必须釜底抽薪、涅槃重生,但我们必须坚信“社区才是AI的未来”。

目录

一、影响

(一)新创建的问题数量在减少

1.结果数据

 2.折线图

3.对比

4.结论

(二)新答案

1.结果数据

 2.折线图

3.比较

4.结论

(三)新用户注册

 1.结果数据

2.折线图

3.对比

4.结论

二、思考


一、影响

        内容参考自 StackExchange 以下问题的答案(由用户 starball 回答):Did Stack Exchange's traffic go down since ChatGPT?icon-default.png?t=N658https://meta.stackexchange.com/questions/387278/did-stack-exchanges-traffic-go-down-since-chatgpt

        备注:结果来源于 Stack Exchange Data Explorer (SEDE) queries,统计数据从2018年开始,到当前博客写作日期的上个月(2023年6月),借用回答者的一句话:“2018 chosen somewhat arbitrarily - I just wanted some more context than looking at just 2022-2023”。

(一)新创建的问题数量在减少

        查询命令如下,与原始答案不同的是,这里添加了对统计结束时间的限制:

AND P.CreationDate < DATEFROMPARTS(2023, 7, 1) -- Limit to July 2023

/*-- INSTRUCTIONS:
    1)  Set the columns of #AllSiteResults to what you need in the final query.
    2)  Set the @seSiteQuery text (inside the WHILE loop) to the query that will run on each site to build
        the #AllSiteResults table.
    3)  Comment out the `WHERE       (dadn.dbName = 'StackExchange.Meta'...` line if site metas are desired.
    4)  Adjust the final query if post processing is desired (optional).
*/
DECLARE @seDbName       AS NVARCHAR (max)
DECLARE @seSiteURL      AS NVARCHAR (max)
DECLARE @sitePrettyName AS NVARCHAR (max)
DECLARE @seSiteQuery    AS NVARCHAR (max)

CREATE TABLE #AllSiteResults (
      -- PUT THE COLUMNS YOU WILL USE, HERE
      -- vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      [Date] DATE,
      [NewQuestions] INT
      -- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)

DECLARE seSites_crsr CURSOR FOR
WITH dbsAndDomainNames AS (
    SELECT      dbL.dbName
                , STRING_AGG (dbL.domainPieces, '.')    AS siteDomain
    FROM (
        SELECT      TOP 50000   -- Never be that many sites and TOP is needed for order by, below
                    name        AS dbName
                    , value     AS domainPieces
                    , row_number ()  OVER (ORDER BY (SELECT 0)) AS [rowN]
        FROM        sys.databases
        CROSS APPLY STRING_SPLIT (name, '.')
        WHERE       CASE    WHEN state_desc = 'ONLINE'
                            THEN OBJECT_ID (QUOTENAME (name) + '.[dbo].[PostNotices]', 'U') -- Pick a table unique to SE data
                    END
                    IS NOT NULL
        ORDER BY    dbName, [rowN] DESC
    ) AS dbL
    GROUP BY    dbL.dbName
)
SELECT      REPLACE (REPLACE (dadn.dbName, 'StackExchange.', ''), '.', ' ' )  AS [Site Name]
            , dadn.dbName
            , CASE  -- See https://meta.stackexchange.com/q/215071
                    WHEN dadn.dbName = 'StackExchange.Mathoverflow.Meta'
                    THEN 'https://meta.mathoverflow.net/'
                    -- Some AVP/Audio/Video/Sound kerfuffle?
                    WHEN dadn.dbName = 'StackExchange.Audio'
                    THEN 'https://video.stackexchange.com/'
                    -- Ditto
                    WHEN dadn.dbName = 'StackExchange.Audio.Meta'
                    THEN 'https://video.meta.stackexchange.com/'
                    -- Normal site
                    ELSE 'https://' + LOWER (siteDomain) + '.com/'
            END AS siteURL
FROM        dbsAndDomainNames dadn
WHERE       (dadn.dbName = 'StackExchange.Meta'  OR  dadn.dbName NOT LIKE '%Meta%')

-- Step through cursor
OPEN    seSites_crsr
FETCH   NEXT FROM seSites_crsr INTO @sitePrettyName, @seDbName, @seSiteURL
WHILE   @@FETCH_STATUS = 0
BEGIN
    -- QUERY THAT YOU WANT TO RUN ON EACH SITE, GOES HERE
    -- For example:
    -- vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
    SET @seSiteQuery = '
        USE [' + @seDbName + ']
        INSERT INTO #AllSiteResults
        SELECT
            DATEFROMPARTS(YEAR(P.CreationDate), MONTH(P.CreationDate), 1) AS Date,
            COUNT(*) AS NewQuestions
        FROM Posts P
        WHERE P.PostTypeId = 1 -- Questions
            AND YEAR(P.CreationDate) >= 2018
            AND P.CreationDate < DATEFROMPARTS(2023, 7, 1) -- Limit to July 2023
        GROUP BY DATEFROMPARTS(YEAR(P.CreationDate), MONTH(P.CreationDate), 1)
    '
    -- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    EXEC sp_executesql @seSiteQuery

    FETCH NEXT FROM seSites_crsr INTO @sitePrettyName, @seDbName, @seSiteURL
END
CLOSE       seSites_crsr
DEALLOCATE  seSites_crsr

-- ADJUST THIS QUERY IF ANY POST PROCESSING IS DESIRED.
SELECT      MAX([Date]) AS Date, SUM([NewQuestions]) AS NewQuestions
FROM        #AllSiteResults
GROUP BY    [Date]
ORDER BY    [Date]

1.结果数据

        统计结果如下:

# 全网每月新问题数
Date	NewQuestions
2018/1/1 0:00	237404
2018/2/1 0:00	224143
2018/3/1 0:00	251429
2018/4/1 0:00	239636
2018/5/1 0:00	246762
2018/6/1 0:00	222615
2018/7/1 0:00	228026
2018/8/1 0:00	228371
2018/9/1 0:00	210906
2018/10/1 0:00	232536
2018/11/1 0:00	219300
2018/12/1 0:00	196715
2019/1/1 0:00	223493
2019/2/1 0:00	215514
2019/3/1 0:00	236543
2019/4/1 0:00	226815
2019/5/1 0:00	225382
2019/6/1 0:00	201296
2019/7/1 0:00	217943
2019/8/1 0:00	201189
2019/9/1 0:00	199442
2019/10/1 0:00	219626
2019/11/1 0:00	214303
2019/12/1 0:00	194623
2020/1/1 0:00	212131
2020/2/1 0:00	208687
2020/3/1 0:00	223977
2020/4/1 0:00	261901
2020/5/1 0:00	266095
2020/6/1 0:00	242444
2020/7/1 0:00	232102
2020/8/1 0:00	209958
2020/9/1 0:00	202312
2020/10/1 0:00	206767
2020/11/1 0:00	197732
2020/12/1 0:00	196899
2021/1/1 0:00	205239
2021/2/1 0:00	191901
2021/3/1 0:00	215623
2021/4/1 0:00	196910
2021/5/1 0:00	193489
2021/6/1 0:00	183171
2021/7/1 0:00	174020
2021/8/1 0:00	171779
2021/9/1 0:00	168816
2021/10/1 0:00	168644
2021/11/1 0:00	169028
2021/12/1 0:00	158997
2022/1/1 0:00	169548
2022/2/1 0:00	158851
2022/3/1 0:00	171107
2022/4/1 0:00	160153
2022/5/1 0:00	163367
2022/6/1 0:00	156387
2022/7/1 0:00	185296
2022/8/1 0:00	189989
2022/9/1 0:00	176730
2022/10/1 0:00	184835
2022/11/1 0:00	189896
2022/12/1 0:00	168000
2023/1/1 0:00	170610
2023/2/1 0:00	155088
2023/3/1 0:00	160060
2023/4/1 0:00	131668
2023/5/1 0:00	130464
2023/6/1 0:00	140041

2.折线图

        折线图如下:

【学习】ChatGPT对问答社区产生了哪些影响?_第1张图片

3.对比

        StackOverflow 是 StackExchange 最大的网站,所以同时统计一下 StackOverflow 的情况做出对比。

        代码如下:

SELECT
  DATEFROMPARTS(YEAR(P.CreationDate), MONTH(P.CreationDate), 1) AS Date,
  COUNT(*) AS NewQuestions
FROM Posts P
WHERE P.PostTypeId = 1 -- Questions
  AND YEAR(P.CreationDate) >= 2018
  AND P.CreationDate < DATEFROMPARTS(2023, 7, 1)
GROUP BY DATEFROMPARTS(YEAR(P.CreationDate), MONTH(P.CreationDate), 1)
ORDER BY Date

         图如下:

【学习】ChatGPT对问答社区产生了哪些影响?_第2张图片

        可以看到,StackOverflow 的趋势与 StackExchange 的整体趋势接近。

4.结论

        每年十二月份的活跃度通常会急剧下降。 2018年和2019年相当明显,但2020年和2021年就不那么明显了,2022年相当明显。所以很难说这到底是由于寒假导致的正常活动下降, 还是人们在 ChatGPT 而不是 Stack Exchange 上提问导致的。

        然而,近年来,1 月份的活动水平始终有所回升(2018-2022 年可观察到),但 2023 年 1 月则不然:新问题活动水平与 2022 年 12 月几乎持平,而且年后持续走低。

(二)新答案

         代码如下:

/*-- INSTRUCTIONS:
    1)  Set the columns of #AllSiteResults to what you need in the final query.
    2)  Set the @seSiteQuery text (inside the WHILE loop) to the query that will run on each site to build
        the #AllSiteResults table.
    3)  Comment out the `WHERE       (dadn.dbName = 'StackExchange.Meta'...` line if site metas are desired.
    4)  Adjust the final query if post processing is desired (optional).
*/
DECLARE @seDbName       AS NVARCHAR (max)
DECLARE @seSiteURL      AS NVARCHAR (max)
DECLARE @sitePrettyName AS NVARCHAR (max)
DECLARE @seSiteQuery    AS NVARCHAR (max)

CREATE TABLE #AllSiteResults (
      -- PUT THE COLUMNS YOU WILL USE, HERE
      -- vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      [Date] DATE,
      [Type] NVARCHAR(max),
      [NewAnswers] REAL,
      [NewQuestions] REAL
      -- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)

DECLARE seSites_crsr CURSOR FOR
WITH dbsAndDomainNames AS (
    SELECT      dbL.dbName
                , STRING_AGG (dbL.domainPieces, '.')    AS siteDomain
    FROM (
        SELECT      TOP 50000   -- Never be that many sites and TOP is needed for order by, below
                    name        AS dbName
                    , value     AS domainPieces
                    , row_number ()  OVER (ORDER BY (SELECT 0)) AS [rowN]
        FROM        sys.databases
        CROSS APPLY STRING_SPLIT (name, '.')
        WHERE       CASE    WHEN state_desc = 'ONLINE'
                            THEN OBJECT_ID (QUOTENAME (name) + '.[dbo].[PostNotices]', 'U') -- Pick a table unique to SE data
                    END
                    IS NOT NULL
        ORDER BY    dbName, [rowN] DESC
    ) AS dbL
    GROUP BY    dbL.dbName
)
SELECT      REPLACE (REPLACE (dadn.dbName, 'StackExchange.', ''), '.', ' ' )  AS [Site Name]
            , dadn.dbName
            , CASE  -- See https://meta.stackexchange.com/q/215071
                    WHEN dadn.dbName = 'StackExchange.Mathoverflow.Meta'
                    THEN 'https://meta.mathoverflow.net/'
                    -- Some AVP/Audio/Video/Sound kerfuffle?
                    WHEN dadn.dbName = 'StackExchange.Audio'
                    THEN 'https://video.stackexchange.com/'
                    -- Ditto
                    WHEN dadn.dbName = 'StackExchange.Audio.Meta'
                    THEN 'https://video.meta.stackexchange.com/'
                    -- Normal site
                    ELSE 'https://' + LOWER (siteDomain) + '.com/'
            END AS siteURL
FROM        dbsAndDomainNames dadn
WHERE       (dadn.dbName = 'StackExchange.Meta'  OR  dadn.dbName NOT LIKE '%Meta%')

-- Step through cursor
OPEN    seSites_crsr
FETCH   NEXT FROM seSites_crsr INTO @sitePrettyName, @seDbName, @seSiteURL
WHILE   @@FETCH_STATUS = 0
BEGIN
    -- QUERY THAT YOU WANT TO RUN ON EACH SITE, GOES HERE
    -- For example:
    -- vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
    SET @seSiteQuery = '
        USE [' + @seDbName + ']
        INSERT INTO #AllSiteResults
        -- Ignore the datapoints for the current month. That data is not yet complete. Ex. Things like roomba have "lag"
        SELECT
          DATEFROMPARTS(YEAR(P.CreationDate), MONTH(P.CreationDate), 1) AS Date,
          ''average new non-deleted answers per new question'' AS Type,
          CAST(SUM(CASE WHEN P.PostTypeId = 2 THEN 1 ELSE 0 END) AS REAL) AS NewAnswerCount,
          CAST(SUM(CASE WHEN P.PostTypeId = 1 THEN 1 ELSE 0 END) AS REAL) AS NewQuestionCount
        FROM PostsWithDeleted P
        WHERE P.PostTypeId IN (1,2)
          AND DATEFROMPARTS(2017, 12, 1) < P.CreationDate
          AND P.CreationDate < DATEFROMPARTS(2023, 7, 1)
          AND (P.PostTypeId = 1 OR P.DeletionDate IS NULL)
        GROUP BY DATEFROMPARTS(YEAR(P.CreationDate), MONTH(P.CreationDate), 1)
        UNION
        SELECT
          DATEFROMPARTS(YEAR(P.CreationDate), MONTH(P.CreationDate), 1) AS Date,
          ''average new deleted answers per new question'' AS Type,
          CAST(SUM(CASE WHEN P.PostTypeId = 2 THEN 1 ELSE 0 END) AS REAL) AS NewAnswerCount,
          CAST(SUM(CASE WHEN P.PostTypeId = 1 THEN 1 ELSE 0 END) AS REAL) AS NewQuestionCount
        FROM PostsWithDeleted P
        WHERE P.PostTypeId IN (1,2)
          AND DATEFROMPARTS(2017, 12, 1) < P.CreationDate
          AND P.CreationDate < DATEFROMPARTS(2023, 7, 1)
          AND (P.PostTypeId = 1 OR P.DeletionDate IS NOT NULL)
        GROUP BY DATEFROMPARTS(YEAR(P.CreationDate), MONTH(P.CreationDate), 1)
        --ORDER BY Date
    '
    -- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    EXEC sp_executesql @seSiteQuery

    FETCH NEXT FROM seSites_crsr INTO @sitePrettyName, @seDbName, @seSiteURL
END
CLOSE       seSites_crsr
DEALLOCATE  seSites_crsr

-- ADJUST THIS QUERY IF ANY POST PROCESSING IS DESIRED.
SELECT      MAX([Date]) AS [Date], MAX([Type]) AS [Type], SUM([NewAnswers]) / SUM([NewQuestions]) AS [AverageNewAnswersPerNewQuestion]
FROM        #AllSiteResults
GROUP BY    [Date], [Type]
ORDER BY    [Date]

1.结果数据

 

# 全网每个新问题的平均答案数
Date	Type	AverageNewAnswersPerNewQuestion
2017/12/1 0:00	average new deleted answers per new question	0.183502
2017/12/1 0:00	average new non-deleted answers per new question	0.89314
2018/1/1 0:00	average new non-deleted answers per new question	0.905366
2018/1/1 0:00	average new deleted answers per new question	0.183494
2018/2/1 0:00	average new deleted answers per new question	0.183517
2018/2/1 0:00	average new non-deleted answers per new question	0.89258
2018/3/1 0:00	average new deleted answers per new question	0.183375
2018/3/1 0:00	average new non-deleted answers per new question	0.884344
2018/4/1 0:00	average new non-deleted answers per new question	0.888751
2018/4/1 0:00	average new deleted answers per new question	0.179318
2018/5/1 0:00	average new non-deleted answers per new question	0.88925
2018/5/1 0:00	average new deleted answers per new question	0.175033
2018/6/1 0:00	average new deleted answers per new question	0.173389
2018/6/1 0:00	average new non-deleted answers per new question	0.8859
2018/7/1 0:00	average new deleted answers per new question	0.176872
2018/7/1 0:00	average new non-deleted answers per new question	0.906406
2018/8/1 0:00	average new non-deleted answers per new question	0.931269
2018/8/1 0:00	average new deleted answers per new question	0.180633
2018/9/1 0:00	average new deleted answers per new question	0.180515
2018/9/1 0:00	average new non-deleted answers per new question	0.919448
2018/10/1 0:00	average new non-deleted answers per new question	0.897453
2018/10/1 0:00	average new deleted answers per new question	0.176562
2018/11/1 0:00	average new deleted answers per new question	0.170907
2018/11/1 0:00	average new non-deleted answers per new question	0.869077
2018/12/1 0:00	average new deleted answers per new question	0.181469
2018/12/1 0:00	average new non-deleted answers per new question	0.89007
2019/1/1 0:00	average new non-deleted answers per new question	0.901642
2019/1/1 0:00	average new deleted answers per new question	0.180455
2019/2/1 0:00	average new deleted answers per new question	0.172495
2019/2/1 0:00	average new non-deleted answers per new question	0.901606
2019/3/1 0:00	average new non-deleted answers per new question	0.884096
2019/3/1 0:00	average new deleted answers per new question	0.169558
2019/4/1 0:00	average new deleted answers per new question	0.164192
2019/4/1 0:00	average new non-deleted answers per new question	0.862334
2019/5/1 0:00	average new deleted answers per new question	0.160113
2019/5/1 0:00	average new non-deleted answers per new question	0.860274
2019/6/1 0:00	average new non-deleted answers per new question	0.867573
2019/6/1 0:00	average new deleted answers per new question	0.16234
2019/7/1 0:00	average new deleted answers per new question	0.163545
2019/7/1 0:00	average new non-deleted answers per new question	0.875567
2019/8/1 0:00	average new non-deleted answers per new question	0.887144
2019/8/1 0:00	average new deleted answers per new question	0.170328
2019/9/1 0:00	average new deleted answers per new question	0.16466
2019/9/1 0:00	average new non-deleted answers per new question	0.882504
2019/10/1 0:00	average new deleted answers per new question	0.162603
2019/10/1 0:00	average new non-deleted answers per new question	0.872577
2019/11/1 0:00	average new non-deleted answers per new question	0.87025
2019/11/1 0:00	average new deleted answers per new question	0.158717
2019/12/1 0:00	average new deleted answers per new question	0.166101
2019/12/1 0:00	average new non-deleted answers per new question	0.870477
2020/1/1 0:00	average new non-deleted answers per new question	0.875078
2020/1/1 0:00	average new deleted answers per new question	0.160808
2020/2/1 0:00	average new deleted answers per new question	0.153799
2020/2/1 0:00	average new non-deleted answers per new question	0.849026
2020/3/1 0:00	average new non-deleted answers per new question	0.7816
2020/3/1 0:00	average new deleted answers per new question	0.146717
2020/4/1 0:00	average new deleted answers per new question	0.147589
2020/4/1 0:00	average new non-deleted answers per new question	0.779073
2020/5/1 0:00	average new non-deleted answers per new question	0.800361
2020/5/1 0:00	average new deleted answers per new question	0.151292
2020/6/1 0:00	average new deleted answers per new question	0.151263
2020/6/1 0:00	average new non-deleted answers per new question	0.806941
2020/7/1 0:00	average new deleted answers per new question	0.157548
2020/7/1 0:00	average new non-deleted answers per new question	0.825327
2020/8/1 0:00	average new deleted answers per new question	0.15776
2020/8/1 0:00	average new non-deleted answers per new question	0.824721
2020/9/1 0:00	average new non-deleted answers per new question	0.790762
2020/9/1 0:00	average new deleted answers per new question	0.154112
2020/10/1 0:00	average new non-deleted answers per new question	0.764484
2020/10/1 0:00	average new deleted answers per new question	0.157146
2020/11/1 0:00	average new deleted answers per new question	0.153182
2020/11/1 0:00	average new non-deleted answers per new question	0.750182
2020/12/1 0:00	average new deleted answers per new question	0.158255
2020/12/1 0:00	average new non-deleted answers per new question	0.794868
2021/1/1 0:00	average new non-deleted answers per new question	0.798135
2021/1/1 0:00	average new deleted answers per new question	0.157774
2021/2/1 0:00	average new deleted answers per new question	0.153559
2021/2/1 0:00	average new non-deleted answers per new question	0.779712
2021/3/1 0:00	average new non-deleted answers per new question	0.758168
2021/3/1 0:00	average new deleted answers per new question	0.143849
2021/4/1 0:00	average new deleted answers per new question	0.147879
2021/4/1 0:00	average new non-deleted answers per new question	0.754431
2021/5/1 0:00	average new deleted answers per new question	0.148288
2021/5/1 0:00	average new non-deleted answers per new question	0.755367
2021/6/1 0:00	average new non-deleted answers per new question	0.765796
2021/6/1 0:00	average new deleted answers per new question	0.146673
2021/7/1 0:00	average new non-deleted answers per new question	0.776896
2021/7/1 0:00	average new deleted answers per new question	0.146639
2021/8/1 0:00	average new non-deleted answers per new question	0.792048
2021/8/1 0:00	average new deleted answers per new question	0.15187
2021/9/1 0:00	average new deleted answers per new question	0.150652
2021/9/1 0:00	average new non-deleted answers per new question	0.785845
2021/10/1 0:00	average new deleted answers per new question	0.147064
2021/10/1 0:00	average new non-deleted answers per new question	0.755685
2021/11/1 0:00	average new non-deleted answers per new question	0.744011
2021/11/1 0:00	average new deleted answers per new question	0.141775
2021/12/1 0:00	average new non-deleted answers per new question	0.764816
2021/12/1 0:00	average new deleted answers per new question	0.148233
2022/1/1 0:00	average new deleted answers per new question	0.146973
2022/1/1 0:00	average new non-deleted answers per new question	0.780104
2022/2/1 0:00	average new deleted answers per new question	0.137886
2022/2/1 0:00	average new non-deleted answers per new question	0.753624
2022/3/1 0:00	average new deleted answers per new question	0.131572
2022/3/1 0:00	average new non-deleted answers per new question	0.729329
2022/4/1 0:00	average new non-deleted answers per new question	0.734639
2022/4/1 0:00	average new deleted answers per new question	0.13609
2022/5/1 0:00	average new non-deleted answers per new question	0.728135
2022/5/1 0:00	average new deleted answers per new question	0.131584
2022/6/1 0:00	average new deleted answers per new question	0.137936
2022/6/1 0:00	average new non-deleted answers per new question	0.743591
2022/7/1 0:00	average new deleted answers per new question	0.138265
2022/7/1 0:00	average new non-deleted answers per new question	0.764211
2022/8/1 0:00	average new deleted answers per new question	0.140399
2022/8/1 0:00	average new non-deleted answers per new question	0.766521
2022/9/1 0:00	average new non-deleted answers per new question	0.757918
2022/9/1 0:00	average new deleted answers per new question	0.134554
2022/10/1 0:00	average new non-deleted answers per new question	0.738382
2022/10/1 0:00	average new deleted answers per new question	0.13114
2022/11/1 0:00	average new non-deleted answers per new question	0.718665
2022/11/1 0:00	average new deleted answers per new question	0.130706
2022/12/1 0:00	average new non-deleted answers per new question	0.73153
2022/12/1 0:00	average new deleted answers per new question	0.182844
2023/1/1 0:00	average new deleted answers per new question	0.163113
2023/1/1 0:00	average new non-deleted answers per new question	0.755667
2023/2/1 0:00	average new non-deleted answers per new question	0.743234
2023/2/1 0:00	average new deleted answers per new question	0.157383
2023/3/1 0:00	average new deleted answers per new question	0.156859
2023/3/1 0:00	average new non-deleted answers per new question	0.741384
2023/4/1 0:00	average new non-deleted answers per new question	0.720417
2023/4/1 0:00	average new deleted answers per new question	0.151795
2023/5/1 0:00	average new non-deleted answers per new question	0.732252
2023/5/1 0:00	average new deleted answers per new question	0.143678
2023/6/1 0:00	average new deleted answers per new question	0.120529
2023/6/1 0:00	average new non-deleted answers per new question	0.759169

 2.折线图

【学习】ChatGPT对问答社区产生了哪些影响?_第3张图片

         在 StackExchange 全网中,变化趋势几乎同步,反映出 ChatGPT 的出现对老用户(老用户倾向于在社区中贡献答案)的影响不大。这个结论参考自于一篇论文,好像是《Reading Answers on Stack Overflow: Not Enough!》(DOI:10.1109/tse.2019.2954319)。

        比较显著的是从 2022 年 11 月到 2022 年 12 月,删除的回复帖子数量出现了前所未有的增加。所以值得思考,是什么导致用户删除回复的帖子数量的?是平台对 ChatGPT 生成的答案的限制?有这种可能,需要后续验证。

3.对比

        与 StackOverflow 做对比,代码如下:

SELECT
  DATEFROMPARTS(YEAR(P.CreationDate), MONTH(P.CreationDate), 1) AS Date,
  CASE WHEN P.DeletionDate IS NULL THEN 'non-deleted' ELSE 'deleted' END AS Status,
  COUNT(*) AS NewAnswers
FROM PostsWithDeleted P
WHERE P.PostTypeId = 2 -- Answers
  AND YEAR(P.CreationDate) >= 2018
  AND P.CreationDate < DATEFROMPARTS(2023, 7, 1)
GROUP BY
  DATEFROMPARTS(YEAR(P.CreationDate), MONTH(P.CreationDate), 1),
  CASE WHEN P.DeletionDate IS NULL THEN 'non-deleted' ELSE 'deleted' END
ORDER BY Date

        结果图如下:

【学习】ChatGPT对问答社区产生了哪些影响?_第4张图片

         虽然近年来整体呈现下降趋势,但是,从2022年12月依赖,趋势变得更陡峭,占 StackExchange 大头的 StackOverflow 受 ChatGPT 的冲击是很大的。该图的趋势与新问题数量相似。

        对上面代码做如下修改:

FROM        dbsAndDomainNames dadn
-- WHERE       (dadn.dbName = 'StackExchange.Meta'  OR  dadn.dbName NOT LIKE '%Meta%')
WHERE       (dadn.dbName = 'StackOverflow') -- Only select Stack Overflow
-- Comment out the following line if site metas are desired
-- AND dadn.dbName NOT LIKE '%Meta%'
-- Step through cursor

         只获取 StackOverflow 的情况,图如下:

【学习】ChatGPT对问答社区产生了哪些影响?_第5张图片

         可以看出,提出的问题被回答的情况明显的增加。

4.结论

        由于 StackOverflow 占大头,而这个论坛主要是面向技术人员的,由于 ChatGPT 的出现,很多问题不再需要在论坛中提问,所以导致 StackExchange 整体的活跃度下降。新答案数量也以同样的趋势在减少。但是所提问题的回答情况似乎没有受到影响,甚至在 StackOverflow 网站中,2023年以来,每个新的答案的平均回答率有上升趋势,答案的删除情况也明显的减少(这是进入后ChatGPT时代了吗?)原因还有待进一步考证。

(三)新用户注册

        代码如下:

SELECT
  DATEFROMPARTS(YEAR(U.CreationDate), MONTH(U.CreationDate), 1) AS Date,
  COUNT(*) AS NewUsers
FROM Users U
WHERE YEAR(U.CreationDate) >= 2018
  AND U.CreationDate < DATEFROMPARTS(2023, 7, 1)
GROUP BY
  DATEFROMPARTS(YEAR(U.CreationDate), MONTH(U.CreationDate), 1)
ORDER BY Date

 1.结果数据

# 每月新用户注册数
Date	NewUsers
2018/1/1 0:00	303064
2018/2/1 0:00	277977
2018/3/1 0:00	323594
2018/4/1 0:00	298977
2018/5/1 0:00	304450
2018/6/1 0:00	262515
2018/7/1 0:00	274961
2018/8/1 0:00	278287
2018/9/1 0:00	272128
2018/10/1 0:00	304007
2018/11/1 0:00	301353
2018/12/1 0:00	274968
2019/1/1 0:00	319298
2019/2/1 0:00	291886
2019/3/1 0:00	315879
2019/4/1 0:00	312456
2019/5/1 0:00	312536
2019/6/1 0:00	280361
2019/7/1 0:00	296166
2019/8/1 0:00	287561
2019/9/1 0:00	284323
2019/10/1 0:00	316200
2019/11/1 0:00	291018
2019/12/1 0:00	294891
2020/1/1 0:00	318868
2020/2/1 0:00	307204
2020/3/1 0:00	344408
2020/4/1 0:00	468303
2020/5/1 0:00	384514
2020/6/1 0:00	344951
2020/7/1 0:00	323656
2020/8/1 0:00	305353
2020/9/1 0:00	307687
2020/10/1 0:00	332683
2020/11/1 0:00	322922
2020/12/1 0:00	345420
2021/1/1 0:00	363082
2021/2/1 0:00	339855
2021/3/1 0:00	394545
2021/4/1 0:00	420813
2021/5/1 0:00	667550
2021/6/1 0:00	617327
2021/7/1 0:00	421782
2021/8/1 0:00	444984
2021/9/1 0:00	535107
2021/10/1 0:00	568112
2021/11/1 0:00	509507
2021/12/1 0:00	372878
2022/1/1 0:00	420221
2022/2/1 0:00	389280
2022/3/1 0:00	446719
2022/4/1 0:00	480977
2022/5/1 0:00	371546
2022/6/1 0:00	345780
2022/7/1 0:00	395201
2022/8/1 0:00	359470
2022/9/1 0:00	368080
2022/10/1 0:00	382385
2022/11/1 0:00	400144
2022/12/1 0:00	373079
2023/1/1 0:00	345450
2023/2/1 0:00	307249
2023/3/1 0:00	359899
2023/4/1 0:00	359313
2023/5/1 0:00	343078
2023/6/1 0:00	285215

 

2.折线图

【学习】ChatGPT对问答社区产生了哪些影响?_第6张图片        2018、2019年均在十二月有减少,但是在次年一月份就明显的回升。2020年没有这个规律,可能受疫情的影响很大,包括2021、2022年新用户明显的增加均可能受疫情的影响。2021年在12月减少,次年一月回暖,所以假设存在这样的规律。但是2022年12月下降之后,次年1月下降更多。

3.对比

【学习】ChatGPT对问答社区产生了哪些影响?_第7张图片

        相比于全网,只关注于 StackOverflow 的话,2018 年 12 月新用户下降 13.1%,次年 1 月出现恢复。2019年11月至12月增长7.6%,次年1月进一步增长8.0%。2020 年,进入 12 月变化不大,次年 1 月增长 9.1%。2021年12月下降6.1%,次年1月恢复15.4%2022年,12月下降了5.8%,次年1月又下降了12.9%。

4.结论

        继 2022 年 12 月之后,2023 年 1 月新问题活动没有恢复,新用户注册量进一步下降,与往年趋势相比均呈下降趋势。但新问题的回答几乎没有受到影响。

        可能的影响因素:

        人们正在使用 ChatGPT 而不是 Stack Exchange。 这可能是一个重要的影响因素。

        在禁止使用 ChatGPT 的网站(例如 Stack Overflow)上写答案的人将被暂停,因此无法提出或回答问题。

        活动下降与 2023 年 1 月及随后几个月的重大科技裁员同时发生。 starball 认为不可能排除它是一个影响因素,尤其是 Stack Exchange 网络中大多数最大的站点都与技术相关。

二、思考

        这里引用 StackExchange 社区 CEO Prashanth Chandrasekar 博客 Community is the future of AIicon-default.png?t=N658https://stackoverflow.blog/2023/04/17/community-is-the-future-of-ai/中的一句话:

我一直在与不同经验水平的开发人员交谈,并且我一直听到新手程序员在人工智能的帮助下构建简单的网络应用程序的轶事。然而,这些故事中的大多数并不是以人工智能提示开始和结束的。相反,人工智能提供了一个起点和一些初始动力,而人类则进行额外的研究和学习来完成工作。人工智能可以调试一些错误,但会受到其他错误的阻碍。它可以建议良好的后端服务,但通常无法解决集成不同服务时出现的所有摩擦点。当然,当问题不是由机器指令而是人为错误造成时,最好的答案来自经历过相同问题的其他人。 

你可能感兴趣的:(论文学习,人工智能,ChatGPT,StackOverflow,QA)