V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
MySQL 5.5 Community Server
MySQL 5.6 Community Server
Percona Configuration Wizard
XtraBackup 搭建主从复制
Great Sites on MySQL
Percona
MySQL Performance Blog
Severalnines
推荐管理工具
Sequel Pro
phpMyAdmin
推荐书目
MySQL Cookbook
MySQL 相关项目
MariaDB
Drizzle
参考文档
http://mysql-python.sourceforge.net/MySQLdb.html
killva4624
V2EX  ›  MySQL

如何按天统计前 7 天周活?

  •  
  •   killva4624 · 2023-03-02 15:02:55 +08:00 · 1617 次点击
    这是一个创建于 414 天前的主题,其中的信息可能已经有所发展或是发生改变。

    比如下表,记录了日期和对应用户:

    datetime,username
    2023-01-01,UserA
    2023-01-01,UserB
    2023-01-02,UserA
    2023-01-03,UserC
    ...
    

    需求是按天统计出该天起前七天的周活,当天没有对应用户的话补 0 ,如:

    2023-01-08(2023-01-01~2023-01-07), 7
    2023-01-09(2023-01-02~2023-01-08), 10
    

    ....

    14 条回复    2023-03-06 10:27:40 +08:00
    pota
        1
    pota  
       2023-03-02 15:15:55 +08:00
    这种我感觉还是将每天的统计数据存起来直接拿好点
    oRbceGJ
        2
    oRbceGJ  
       2023-03-02 15:31:16 +08:00
    用偏移类开窗函数?
    Ericcccccccc
        3
    Ericcccccccc  
       2023-03-02 15:34:17 +08:00
    没必要强行用纯 sql 做这个事情吧.
    aw2350
        4
    aw2350  
       2023-03-02 15:35:36 +08:00
    parttition by ,按照日期汇总
    yuhu96
        5
    yuhu96  
       2023-03-02 15:41:12 +08:00
    能不能按照当前[date ,date - 7 ]作为一个分组字段来进行聚合啊。窗口函数感觉不适用在这里。
    byty18768
        6
    byty18768  
       2023-03-02 15:42:45 +08:00
    搞一张存储自然日的日期表,excel 生成很快的
    `
    datetime,start_date,end_date
    2023-01-08,2023-01-01,2023-01-07
    2023-01-09,2023-01-02,2023-01-08
    2023-01-10,2023-01-03,2023-01-09
    `
    笛卡尔积
    `
    select a. datetime ,count(distinct username)
    from 日期表 a
    join 数据表 b on b. datetime between a. start_date and a.end_date
    group by 1
    `
    Chad0000
        7
    Chad0000  
       2023-03-02 15:43:23 +08:00
    记录日活,主键:
    日期_用户 Id 。没活的不记。然后周活就是最近七天的,Group By 用户 Id 即可。
    JKeita
        8
    JKeita  
       2023-03-02 15:50:02 +08:00
    按天取数据存到一份文本文件中,然后直接用 linux 命令进行去重计数。
    killva4624
        9
    killva4624  
    OP
       2023-03-02 17:31:43 +08:00
    @byty18768 直接用子查询生成这个这个自然日表也可以吧?
    YYYYMMDDHHSS
        10
    YYYYMMDDHHSS  
       2023-03-02 17:40:33 +08:00
    with data as (

    select '2023-01-01' as dt ,'UserA' as user_name union all
    select '2023-01-01' as dt ,'UserB' as user_name union all
    select '2023-01-02' as dt ,'UserA' as user_name union all
    select '2023-01-03' as dt ,'UserC' as user_name union all
    select '2023-01-04' as dt ,'UserA' as user_name union all
    select '2023-01-05' as dt ,'UserB' as user_name union all
    select '2023-01-06' as dt ,'UserA' as user_name union all
    select '2023-01-07' as dt ,'UserC' as user_name union all
    select '2023-01-09' as dt ,'UserA' as user_name union all
    select '2023-01-10' as dt ,'UserB' as user_name union all
    select '2023-01-11' as dt ,'UserA' as user_name union all
    select '2023-01-12' as dt ,'UserC' as user_name union all
    select '2023-01-13' as dt ,'UserA' as user_name union all
    select '2023-01-14' as dt ,'UserB' as user_name union all
    select '2023-01-15' as dt ,'UserA' as user_name union all
    select '2023-01-16' as dt ,'UserC' as user_name union all
    select '2023-01-17' as dt ,'UserA' as user_name union all
    select '2023-01-18' as dt ,'UserB' as user_name
    )
    select
    t.dt,
    date_add(t.dt, 7) dt_to,
    count(distinct t.user_name) as cnt
    from
    (
    select
    d1.dt,
    d2.user_name
    from
    data d1
    join data d2
    where
    1 = 1
    --and d1.dt > d2.dt
    and datediff(d1.dt, d2.dt) <= 7
    ) t
    group by
    dt
    ----

    大数据量不建议这样做
    sunny1688
        11
    sunny1688  
       2023-03-02 18:30:58 +08:00
    @Chad0000 老哥的方案是最轻松还解耦的,一条简单的 sql 就行了
    vitoliu
        12
    vitoliu  
       2023-03-02 18:35:13 +08:00
    SQL 挺难想的,我反正比较难硬加出来。通过代码来构建逻辑简单清晰,灵活性更强。
    统计某一个时段的周活,可以按天、小时统计,定时任务去跑,还可以无限加条件。比如用户注册时间小于 3 个月,判断新用户留存情况。
    Morii
        13
    Morii  
       2023-03-02 18:41:42 +08:00
    这种不都是定时调度 ETL 做成指标放到数仓里吗?
    byty18768
        14
    byty18768  
       2023-03-06 10:27:40 +08:00
    @killva4624 主要是你不是说可能会存在整天都没有用户的嘛,所以考虑导入表来补 0 。如果底层每天都是全的肯定没问题
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   我们的愿景   ·   实用小工具   ·   3000 人在线   最高记录 6543   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 26ms · UTC 11:08 · PVG 19:08 · LAX 04:08 · JFK 07:08
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.