Day 35 Superset Dashboard

๐Ÿ“‹ย ๊ณต๋ถ€ ๋‚ด์šฉ

์‹œ๊ฐํ™” ํˆด

KPI, ์ง€ํ‘œ, ์ค‘์š”ํ•œ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋“ค์„ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ณ„์‚ฐ/๋ถ„์„ ํ‘œ์‹œํ•ด์ฃผ๋Š” ํˆด
Dashboard or BI(Business Intelligence) Tool

๊ฒฐ์ •๊ถŒ์ž : ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์˜์‚ฌ๊ฒฐ์ •์„ ํ•  ์ˆ˜ ์žˆ์Œ
ํ˜„์—… ์ข…์‚ฌ์ž : ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์‰ฝ๊ฒŒ ํ•  ์ˆ˜ ์žˆ์Œ

  • KPI : Key Performance Indicator

  • ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์˜์‚ฌ๊ฒฐ์ •์˜ ์ข…๋ฅ˜

    • ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๊ฒฐ์ • (Data-Driven Decision)
    • ๋ฐ์ดํ„ฐ ์ฐธ๊ณ  ๊ฒฐ์ • (Data-Informed Decision)
  • Citizen Data Analyst / Scientist

  • EDA : Exploratory Data Analysis

    • ๋ฐ์ดํ„ฐ ํŠน์„ฑ ๋ถ„์„

์–ด๋–ค ์‹œ๊ฐํ™” ํˆด์ด ์žˆ์„๊นŒ?

๊ธฐ์—…ToolsํŠน์ง•
Excel
Google Spreadsheet
๊ฐ€์žฅ ๋งŽ์ด ์“ฐ์ด๋Š” ์‹œ๊ฐํ™” ํˆด
PythonEDA์— ๋” ์ ํ•ฉ
GoogleLooker
SalesforceTableau
MSPower BI
Apache Superset
ReDash
์˜คํ”ˆ์†Œ์Šค
Mode Analytics
Google Studio
AWS Quicksight
์ž์‚ฌ ํด๋ผ์šฐ๋“œ ๊ธฐ๋ฐ˜์˜ Dashboard
๊ธฐ๋Šฅ์ด ๋น„๊ต์  ๋–จ์–ด์ง
  • Excel, Python ๋“ฑ์€ ์ฝ”๋”ฉ์ด ๊ฐ€๋Šฅํ•ด์•ผ ํ™œ์šฉ ๊ฐ€๋Šฅ -> ๊ธฐ๋Šฅ์ƒ์˜ ์ œ์•ฝ ์กด์žฌ

Looker

  • 2019.06 ๊ตฌ๊ธ€์— ์ธ์ˆ˜๋จ
  • ํŠน์ง•
    • LookML์ด ์ž์ฒด ์–ธ์–ด๋กœ ๋ฐ์ดํ„ฐ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด์คŒ
    • ๋‚ด๋ถ€๊ณ ๊ฐ ๋ฟ ์•„๋‹ˆ๋ผ ์™ธ๋ถ€ ๊ณ ๊ฐ์„ ์œ„ํ•œ ๋Œ€์‹œ๋ณด๋“œ ์ž‘์„ฑ ๊ฐ€๋Šฅ
    • ๋‹ค๋ฅธ ์‚ฌ๋žŒ์ด ์ž‘์„ฑํ•œ Dashboard๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ๋‚ด๊ฐ€ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Œ(Template ์ฒ˜๋Ÿผ)
    • setup์€ ํž˜๋“  ํŽธ, backend์— ๋ถ€ํ•˜๊ฐ€ ์ข€ ์žˆ๋Š” ํŽธ

Tableau

  • 2019.06 Salesforce์— ์ธ์ˆ˜๋จ
  • ํŠน์ง•
    • ๋‹ค์–‘ํ•œ ์ œํ’ˆ๊ตฐ, ์ผ๋ถ€๋Š” ๋ฌด๋ฃŒ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
    • ๋ฐฐ์šฐ๊ธฐ ์–ด๋ ต์ง€๋งŒ ๊ฐ•๋ ฅํ•œ ๋Œ€์‹œ๋ณด๋“œ -> ์ผ๋ถ€ ์ „๋ฌธ๊ฐ€๋งŒ ๋Œ€์‹œ๋ณด๋“œ ์ž‘์„ฑ ๊ฐ€๋Šฅ
    • Looker ๋œจ๊ธฐ ์ „๊นŒ์ง€ ์˜ค๋žซ๋™์•ˆ ๋งˆ์ผ“ ๋ฆฌ๋”๋กœ ๊ตฐ๋ฆผํ•ด์˜ด

ReDash

  • 2020 Databricks์— ์ธ์ˆ˜๋จ

  • ํŠน์ง•

    • ์˜คํ”ˆ์†Œ์Šค๋กœ ์‹œ์ž‘, Superset๊ณผ ํก์‚ฌ
    • SQL ์—๋””ํ„ฐ ์กด์žฌ -> Dashboad์™€ ์—ฐ๊ฒฐ๋œ ๊ณณ์— Query๋ฅผ ๋ณด๋‚ผ ์ˆ˜ ์žˆ์Œ
  • Superset์ด Redash์™€ ๋‹ค๋ฅธ ์ 

    • role ๊ธฐ๋ฐ˜ ์‚ฌ์šฉ์ž ์—ญํ• , ๊ถŒํ•œ ์ง€์ • ๊ฐ€๋Šฅ
    • dashboard์— ์—ญํ•  ์ง€์ • ๊ฐ€๋Šฅ

Mode Anlytics

  • ํŠน์ง•
    • ์กฐ๊ธˆ ๋” ๊ธฐ์ˆ ์ ์ธ ์ธ๋ ฅ์„ ๋Œ€์ƒ์œผ๋กœ ํ•˜๋Š” ๋Œ€์‹œ๋ณด๋“œ
    • SQL, R, Python ๋“ฑ์œผ๋กœ ๋ถ„์„ ๊ฐ€๋Šฅ
    • KPI ๋Œ€์‹œ๋ณด๋“œ ๋ณด๋‹ค๋Š” EDA ํˆด์— ๋” ๊ฐ€๊นŒ์›€

์‹œ๊ฐํ™” ํˆด ์„ ํƒ?

  • Looker vs. Tableau

    • ๋‘˜ ๋‹ค ์ดˆ๋ฐ˜ learning curve ์กด์žฌ
    • Tableau: ๊ฐ€๊ฒฉ์ ์ธ ๋ถ€๋ถ„์— ์ด์ 
  • Self Service Dashboard์˜ ์ค‘์š”์„ฑ

    • ๋งค๋ฒˆ ์‚ฌ๋žŒ์˜ ๋…ธ๋™์„ ํ•„์š”๋กœ ํ•˜์ง€ ์•Š์Œ
    • ์‚ฌ์šฉํ•˜๊ธฐ ์‰ฌ์›Œ์•ผ ๋” ๋งŽ์€ ์ธ๋ ฅ๋“ค์ด ์ง์ ‘ ๋Œ€์‹œ๋ณด๋“œ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ

Supserset

  • Airbnb์—์„œ ์‹œ์ž‘๋œ ์˜คํ”ˆ์†Œ์Šค, Maxim(Airflow ๊ฐœ๋ฐœ์ž)๊ณผ ๊ฐ™์ด ์‹œ์ž‘

ํŠน์ง•

  • ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ์‹œ๊ฐํ™”
  • ์‰ฌ์šด ์ธํ„ฐํŽ˜์ด์Šค
  • ๋Œ€์‹œ๋ณด๋“œ ๊ณต์œ  ์ง€์›
  • ์—”๋”ํ”„๋ผ์ด์ฆˆ ์ˆ˜์ค€์˜ ๋ณด์•ˆ, ๊ถŒํ•œ ์ œ์–ด ๊ธฐ๋Šฅ ์ œ๊ณต
  • SQLALchemy ์—ฐ๋™ -> ๋‹ค์–‘ํ•œ db(SQLAlchemy์™€ ์—ฐ๋™๋˜๋Š”) ์ง€์›
  • Druid.io(streaming db)์™€ ์—ฐ๋™ํ•œ ์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ์˜ ์‹œ๊ฐํ™” ๊ฐ€๋Šฅ
  • API, ํ”Œ๋Ÿฌ๊ทธ์ธ ์•„ํ‚คํ…์ณ ์ œ๊ณต -> ๊ธฐ๋Šฅ ํ™•์žฅ์ด ์‰ฌ์›€

๊ตฌ์กฐ

  • Python์œผ๋กœ ์ œ์ž‘๋จ
  • Web Interface -> Flask & React JS
  • metadata db : sqlite
    • ๋ณ‘๋ ฌ์„ฑ์ด ๋–จ์–ด์ง€๋Š” ๋‹จ์  -> postgresql or mysql ์‚ฌ์šฉ
  • Redis๋ฅผ caching layer๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ ์ตœ์ ํ™”

์šฉ์–ด

  • Database : backend db (Redshift, Druid, …)
  • Dataset : table
  • Dashboard - Chart : Dashboard๋Š” ํ•˜๋‚˜ ์ด์ƒ์˜ chart๋กœ ๊ตฌ์„ฑ๋จ

์‹ค์Šต Dashboard

๊ตฌ์„ฑ

  • DB : Redshift
  • 2 charts in 1 Dashboard
    • ์ฑ„๋„๋ณ„ MAU(Monthly Active User) chart
      • Dataset : analytics.user_session_summary
    • Monthly Cohort chart
      • Dataset : analytics.cohort_summary

MAU chart

  • session๋‹จ์œผ๋กœ ์™„์ „ํ•œ ์ •๋ณด๋ฅผ ๊ฐ–๊ฒŒ ๋งŒ๋“  ํ…Œ์ด๋ธ”
1
2
3
4
CREATE TABLE analytics.user_session_summary AS
SELECT usc.*, t.ts
FROM raw_data.user_session_channel usc
LEFT JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid

Monthly Cohort chart

  • Cohort?

    • ํŠน์ • ์†์„ฑ(๋ณดํ†ต์€ ์‚ฌ์šฉ์ž์˜ ์„œ๋น„์Šค ๋“ฑ๋ก์›”)์„ ๋ฐ”ํƒ•์œผ๋กœ ๋‚˜๋‰œ ์‚ฌ์šฉ์ž ๊ทธ๋ฃน
  • Cohort ๋ถ„์„?

    • Cohort ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฌ์šฉ์ž์˜ ์ดํƒˆ๋ฅ , ์ž”์กด๋ฅ , ์ด ์†Œ๋น„๊ธˆ์•ก ๋“ฑ์„ ๊ณ„์‚ฐ
    • ์‚ฌ์šฉ์ž ์ž”์กด๋ฅ (Retention) : ๋ณดํ†ต ์›” ๊ธฐ๋ฐ˜์œผ๋กœ ์‹œ๊ฐํ™”ํ•ด์„œ ๋ด„
  • analytics.cohort_summary

1
2
3
4
5
6
7
8
CREATE TABLE analytics.cohort_summary as SELECT cohort_month, visited_month, cohort.userid FROM (
SELECT userid, date_trunc('month', MIN(ts)) cohort_month
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid GROUP BY 1
) cohort JOIN (
SELECT DISTINCT userid, date_trunc('month', ts) visited_month FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid
) visit ON cohort.cohort_month <= visit.visited_month and cohort.userid = visit.userid;

Google Spreadsheet๋ฅผ ํ™œ์šฉํ•œ ์‹œ๊ฐํ™” ์‹ค์Šต

  • +) Python gspread module์„ ํ™œ์šฉํ•˜๋ฉด Python์œผ๋กœ ์Šคํ”„๋ ˆ๋“œ์‹œํŠธ ์กฐ์ž‘ ๊ฐ€๋Šฅ

  • MAU chart

  • Monthly Cohort chart

Superset ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

preset.io

  1. ํšŒ์›๊ฐ€์ž…
  2. workspace ์ƒ์„ฑ
  3. database ์—ฐ๊ฒฐ

docker

  1. docker ์„ค์น˜ ๋ฐ ํ™œ์šฉ ram ํฌ๊ธฐ ์„ค์ • (mac: 6GB ์ด์ƒ)

  2. Superset Github repo๋ฅผ ํด๋ก 

    1
    
    git clone https://github.com/apache/superset.git
    
  3. superset ํด๋”๋กœ ๋“ค์–ด๊ฐ€์„œ ๋‹ค์Œ ๋‘ command ์‹คํ–‰

    1
    2
    3
    4
    5
    
    cd superset
    # ํŠน์ • ๋ฒ„์ „์„ ๋‹ค์šด๋กœ๋“œ ํ•˜๋ ค๋ฉด ์•„๋ž˜ command ์‹คํ–‰
    # git checkout 1.4.0
    docker-compose -f docker-compose-non-dev.yml pull
    docker-compose -f docker-compose-non-dev.yml up
    
  4. http://localhost:8088์œผ๋กœ ์›น UI ๋กœ๊ทธ์ธ (id : admin, pw : admin)

Redshift db ์—ฐ๊ฒฐ

  • SQLAlchemy URI ๋ฅผ ํ†ตํ•ด ์—ฐ๊ฒฐ ๊ฐ€๋Šฅ

    1
    
    postgresql://admin:xxxx@default-workgroup.705556746971.ap-northeast-2.redshift-serverless.amazonaws.com:5439/dev
    

  • ์—ฐ๊ฒฐ๋œ db ์ •๋ณด

SQL Lab

  • Redshift๋กœ SQL ์ฟผ๋ฆฌ ๋ณด๋‚ผ ์ˆ˜ ์žˆ์Œ

  • MAU, Monthly Cohort chart๋ฅผ ์œ„ํ•œ table์„ ์ƒ์„ฑ

  • ์„ค์ •

    • ์‚ฌ์šฉํ•˜๋Š” database์— DML ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•ด์•ผ ํ•จ

MAU

  • dataset & chart

Cohort

  • dataset & chart

(์ถ”๊ฐ€) snowflake db ์—ฐ๊ฒฐ

  • ์ž…๋ ฅ ์ •๋ณด

  • ACCOUNT ํ™•์ธ ๋ฐฉ๋ฒ• (in Snowflake)

HW - nps chart

  • dataset

    ์ด์ „ ๋‹จ๊ณ„์—์„œ ์—ฐ๊ฒฐ์‹œํ‚จ snowflake db๋ฅผ ์‚ฌ์šฉ

  • chart

    Edit dataset > Metrics > overal_nps item ์ถ”๊ฐ€

    chart ์„ค์ • ๋ฐ ๊ฒฐ๊ณผ ํ™”๋ฉด

result dashboard

๐Ÿ‘€ย CHECK

์šฉ์–ด

๋ฐ์ดํ„ฐ ๊ฑฐ๋ฒ„๋„Œ์Šค

๊ธฐํƒ€

preset.io ๊ฐ€์ž…

  • google account๋กœ ๊ฐ€์ž… ์‹œ 14์ผ๋™์•ˆ Professional plan ๋ฌด๋ฃŒ ์ฒดํ—˜ ๊ฐ€๋Šฅ

  • 14์ผ ์ดํ›„์—๋Š” starter plan(free)๋กœ ๋ณ€๊ฒฝ๋˜๋Š” ๋“ฏ ํ•จ

โ— ๋Š๋‚€ ์ 

superset ๋Œ€์‹œ๋ณด๋“œ๋ฅผ ์จ ๋ณด๊ฒŒ ๋๋‹ค. ๋‚ด๊ฐ€ ๊ฑฑ์ •? ์šฐ๋ คํ–ˆ๋˜ ๊ฒƒ ๋ณด๋‹ค๋Š” ์‰ฌ์› ๋‹ค. snowflake๋„ ๊ทธ๋ ‡๊ณ  redshift๋„ ๊ทธ๋ ‡๊ณ  ๋‹ค๋“ค UI๊ฐ€ ํŽธํ•˜๊ณ  ๊น”๋”ํ•ด์„œ ์ข‹์•˜๋‹ค. docker๋ฅผ ์“ฐ๊ณ  ์‹ถ์—ˆ๋Š”๋ฐ ๋งฅ๋ถ ๋žจ์ด 8gb.. ๋‹ค๋ฅธ ํ”„๋กœ๊ทธ๋žจ์„ ๋‹ค ๋„๊ณ  ๋„์ปค๋งŒ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ๊ฐ€๋Šฅํ•  ์ˆ˜๋„ ์žˆ๋Š”๋ฐ ๊ทธ๊ฑด ์•„๋‹ˆ๋ผ ๊ฒฐ๊ตญ preset.io ์‚ฌ์ดํŠธ์—์„œ ํ•˜๊ฒŒ ๋๋‹ค. preset.io ์‚ฌ์ดํŠธ๋Š” ์–ด์ฐจํ”ผ superset ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์‚ฌ์ดํŠธ๊ณ  docker๋ฅผ ํ†ตํ•ด ๊ตฌ๋™ํ•˜๋Š”๊ฒƒ๊ณผ UI๊ฐ€ ๊ฐ™์•„์„œ ๋ฌธ์ œ๋Š” ์—†์„ ๊ฒƒ ๊ฐ™๋‹ค.

๋ชจ๊ฐ์ฝฉ ์ฐธ๊ฐ€์ž๋ถ„๋“คํ•˜๊ณ  ํ•œ์‹œ๊ฐ„์ •๋„ ๋Œ€ํ™”ํ•˜๋Š” ์‹œ๊ฐ„์„ ๊ฐ€์กŒ๋Š”๋ฐ ๊ฐ์ž ๊ฐ–๊ณ  ์žˆ๋Š” ์ •๋ณด๋ฅผ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ์–ด์„œ ์ข‹์•˜๋‹ค. ๋‹ค๋“ค ์–ด๋””์„œ ๊ทธ๋ ‡๊ฒŒ ์ •๋ณด๋ฅผ ๋งŽ์ด ์–ป๋Š”์ง€ ์‹ ๊ธฐํ•˜๋‹ค๋Š” ์ƒ๊ฐ๋„ ํ–ˆ๋‹ค. ใ…‹ใ…‹ใ…‹

์ฃผ๋ง์—๋Š” ์ฃผ์ค‘์— ๋ฐฐ์šด๊ฒƒ๋“ค ๋ณต์Šต ํ•  ๊ฒธ ๋ฏธ๋ฆฌ ์„ธํŒ…์„ ํ•˜๋ ค๊ณ  ํ•œ๋‹ค. ๊ฐ™์€ ํŒ€ ๋ถ„๋“คํ•˜๊ณ  ์ƒ์˜๋ฅผ ํ•ด ๋ด์•ผ ํ•˜๊ฒ ์ง€๋งŒ, superset-snowflake-s3(or google cloud) ๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ฒŒ AWS์˜ ์š”์ƒํ•œ ์ฒญ๊ตฌ ์‹œ์Šคํ…œ ๋•Œ๋ฌธ์— ๋ฐœ์ƒํ•˜๋Š” ๋ถ€์ฐจ์ ์ธ ๊ณผ๊ธˆ์—†์ด ๋ฌด๋ฃŒ ํ”Œ๋žœ์œผ๋กœ๋งŒ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์–ด ๋ณด์ธ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ snowflake credit ์ดˆ๊ณผํ•˜๋ฉด ๊ฒฐ๊ตญ์€ ๊ณผ๊ธˆ์„ ํ•ด์•ผ๊ฒ ์ง€..?

Hugo๋กœ ๋งŒ๋“ฆ
Jimmy์˜ Stack ํ…Œ๋งˆ ์‚ฌ์šฉ ์ค‘