Day 34 Snowflake

๐Ÿ“‹ย ๊ณต๋ถ€ ๋‚ด์šฉ

Snowflake

์†Œ๊ฐœ

ํŠน์ง•

  • ๊ฐ€๋ณ€๋น„์šฉ ๋ชจ๋ธ

    • storage, computing infra ๋ณ„๋„ ์„ค์ • ๊ฐ€๋Šฅ
    • ๋…ธ๋“œ ์ˆ˜ ์กฐ์ •, distkey(๋ฐ์ดํ„ฐ skew ๋ฌธ์ œ) ๋“ฑ ์กฐ์ ˆ ๋ถˆํ•„์š”
  • ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ

    • SQL ๊ธฐ๋ฐ˜ ๋น…๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ
    • ๋น„๊ตฌ์กฐํ™” ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ, ML ๊ธฐ๋Šฅ ์ œ๊ณต
  • ๋ฉ€ํ‹ฐํด๋ผ์šฐ๋“œ

    • AWS, GCP, Azure ๊ณผ ๊ฐ™์€ ๊ธ€๋กœ๋ฒŒ ํด๋ผ์šฐ๋“œ ์œ„์—์„œ ๋ชจ๋‘ ๋™์ž‘(๋ฉ€ํ‹ฐํด๋ผ์šฐ๋“œ)
    • ๋‹ค์–‘ํ•œ ํฌ๋งท, ๋‹ค์–‘ํ•œ ํ”Œ๋žซํผ์˜ ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€(S3, GC Cloud Storage ๋“ฑ) ์ง€์›
  • ๋‹ค๋ฅธ ์ง€์—ญ ๋ฐ์ดํ„ฐ ๊ณต์œ  (Cross-Region Replication) ๊ธฐ๋Šฅ ์ง€์›

  • Time travel

    • ์ •์˜๋œ ๊ธฐ๊ฐ„ ๋‚ด์˜ ๋ชจ๋“  ์‹œ์ ์—์„œ ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์— ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ
    • ์˜ค๋ธŒ์ ํŠธ ๋ณต์›, ๊ณผ๊ฑฐ ์‹œ์  ๋ฐ์ดํ„ฐ ๋ณต์ œ ํ˜น์€ ๋ฐฑ์—…, ์ง€์ •๋œ ๊ธฐ๊ฐ„ ๋™์•ˆ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ/์กฐ์ž‘ ๋ถ„์„ ๋“ฑ์œผ๋กœ ํ™œ์šฉ ๊ฐ€๋Šฅ
  • Python API ์ œ๊ณต

  • Snowflake ๊ณ„์ • ๊ตฌ์„ฑ

    • Organizations
      • ํ•˜๋‚˜ ํ˜น์€ ๊ทธ ์ด์ƒ์˜ Account๋กœ ๊ตฌ์„ฑ๋จ
    • Accounts
      • ํ•˜๋‚˜ ํ˜น์€ ๊ทธ ์ด์ƒ์˜ DB๋กœ ๊ตฌ์„ฑ๋จ
    • Databases
      • Account์— ์†Œ์†๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ์ปจํ…Œ์ด๋„ˆ
      • Warehouse(์ปดํ“จํŒ…๋ฆฌ์†Œ์Šค)์™€ ์ผ๋Œ€์ผ ๊ด€๊ณ„๊ฐ€ ์•„๋‹˜ (1 Warehouse - 4 Databases)
  • Data Marketplace

  • Data Sharing

    “Share, Don’t Move”
    Dataset์„ Storage level์—์„œ ๊ณต์œ ํ•˜๋Š” ๋ฐฉ์‹

  • Activity

    • Query/Copy/Task History

๋น„์šฉ

์ปดํ“จํŒ… ๋น„์šฉ : Credit

  • Credit : ์ฟผ๋ฆฌ ์‹คํ–‰, ๋ฐ์ดํ„ฐ ๋กœ๋“œ ๋“ฑ ์ž‘์—… ์ˆ˜ํ–‰์— ์†Œ๋น„๋˜๋Š” ๋ฆฌ์†Œ์Šค ๋‹จ์œ„
  • 1 Credit = $2~$4
  • Snowflake Warehouse : ์‚ฌ์šฉ์‹œ๊ฐ„ ๋‹น ํฌ๋ ˆ๋”ง ์ฒญ๊ตฌ

์Šคํ† ๋ฆฌ์ง€ ๋น„์šฉ

  • TB ๋‹น ๋น„์šฉ ์ฒญ๊ตฌ

๋„คํŠธ์›Œํฌ ๋น„์šฉ

  • ์ง€์—ญ๊ฐ„, ๋˜๋Š” (๋‹ค๋ฅธ)ํด๋ผ์šฐ๋“œ๊ฐ„ ๋ฐ์ดํ„ฐ ์ „์†ก ์‹œ TB ๋‹น ๋น„์šฉ ์ฒญ๊ตฌ

Data Governance

ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์žฌ์ ์†Œ์— ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์‚ฌ์šฉ๋จ์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ ํ”„๋กœ์„ธ์Šค

Object Tagging

Snowflake object์— ํƒœ๊ทธ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ
Object : Organization, Account, Schema, View, …

  • ์ƒ์„ฑ : CREATE TAG
  • Snowflake์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ œ๊ณตํ•˜๋Š” ์‹œ์Šคํ…œ ํƒœ๊ทธ ์กด์žฌ
  • ์ง€์ •๋œ tag๋Š” ๊ตฌ์กฐ๋ฅผ ๋”ฐ๋ผ ๊ณ„์Šน๋จ

Data Classification

Snowflake๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์ž๋™์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜์—ฌ ํƒœ๊ทธ๋ฅผ ์ง€์ •ํ•ด์ฃผ๋Š” ๊ธฐ๋Šฅ
‘๋งค๋‰ด์–ผํ•˜๊ฒŒ ๊ด€๋ฆฌํ•˜๊ธฐ๊ฐ€ ์–ด๋ ค์šด’ Object Tagging์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•ด์ฃผ๊ธฐ ์œ„ํ•ด ๋“ฑ์žฅ

  1. Analyze : ํ…Œ์ด๋ธ”์—์„œ ๊ฐœ์ธ์ •๋ณด๋‚˜ ๋ฏผ๊ฐ์ •๋ณด๊ฐ€ ์žˆ๋Š” ์ปฌ๋Ÿผ๋“ค์„ ๋ถ„๋ฅ˜
  2. Review : ๋ถ„๋ฅ˜ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์‚ฌ๋žŒ(๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋“ฑ)์ด ์ˆ˜์ • ๋“ฑ ๋ฆฌ๋ทฐ
  3. Apply : ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ System Tag๋กœ ์ ์šฉ
    • SNOWFLAKE.CORE.PRIVACY_CATEGORY (์ƒ์œ„๋ ˆ๋ฒจ)
      • IDENTIFIER, QUASI_IDENTIFIER, SENSITIVE
    • SNOWFLAKE.CORE.SEMANTIC_CATEGORY (ํ•˜์œ„๋ ˆ๋ฒจ - ๋” ์„ธ๋ถ€์ •๋ณด)
  • ์‹๋ณ„์ž์™€ ์ค€์‹๋ณ„์ž
    • ์‹๋ณ„์ž(Identifier) : ๊ฐœ์ธ์„ ๋ฐ”๋กœ ์ง€์นญํ•  ์ˆ˜ ์žˆ๋Š” ์ •๋ณด
    • ์ค€์‹๋ณ„์ž(Quasi Identifier) : ์กฐํ•ฉ์„ ํ†ตํ•ด ๊ฐœ์ธ์„ ์ง€์นญํ•  ์ˆ˜ ์žˆ๋Š” ์ •๋ณด
      PRIVACY_CATEGORYSEMANTIC_CATEGORY
      IDENTIFIEREMAIL, NAME, PHONE_NUMBER, …
      QUASI_IDENTIFIERAGE, GENDER, …

Tag based Masking Policies

ํƒœ๊ทธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์œ ์ €์™€ ๊ทธ ๊ถŒํ•œ์„ ์ง€์ •ํ•˜๋Š” ๊ธฐ๋Šฅ

  • Tag์— ์•ก์„ธ์Šค ๊ถŒํ•œ์„ ์ง€์ •
  • Tag๊ฐ€ ์ง€์ •๋œ Snowflake Object์˜ ์•ก์„ธ์Šค ๊ถŒํ•œ์„ ๊ทธ์— ๋งž์ถฐ ์ œํ•œํ•˜๋Š” ๋ฐฉ์‹
  • ๊ฐœ์ธ์ •๋ณด์™€ ๊ฐ™์€ Tag์— ์•ก์„ธ์Šค ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋งŽ์ด ์‚ฌ์šฉ๋จ

Access History

๋ฐ์ดํ„ฐ ์•ก์„ธ์Šค์— ๋Œ€ํ•œ ๊ธฐ๋ก์„ ์ œ๊ณตํ•˜์—ฌ ๊ฐ์‚ฌ ์ถ”์ ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•จ์œผ๋กœ์จ ๋ณด์•ˆ๊ณผ ๊ทœ์ • ์ค€์ˆ˜
๋ชจ๋“  ํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค์—์„œ ์ œ๊ณต๋˜๊ณ  ์žˆ๋Š” ๊ธฐ๋Šฅ

  • 'Access History'
    • ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋กœ๊ทธ์ธ, ์‹คํ–‰๋œ ์ฟผ๋ฆฌ, ํ…Œ์ด๋ธ” ๋ฐ ๋ทฐ ์•ก์„ธ์Šค, ๋ฐ์ดํ„ฐ ์กฐ์ž‘ ์ž‘์—…
  • ์ž ์žฌ์ ์ธ ๋ณด์•ˆ ์œ„๋ฐ˜์ด๋‚˜ ๋ฌด๋‹จ ์•ก์„ธ์Šค ์‹œ๋„์˜ ์กฐ์‚ฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด์คŒ
  • ์บก์ฒ˜๋œ ์ •๋ณด : ์‚ฌ์šฉ์ž ์‹ ์›, IP ์ฃผ์†Œ, ํƒ€์ž„์Šคํƒฌํ”„ ๋ฐ ๊ธฐํƒ€ ๊ด€๋ จ ์„ธ๋ถ€ ์ •๋ณด ํฌํ•จ

Object Dependencies

ํ…Œ์ด๋ธ”์ด๋‚˜ ๋ทฐ ๋“ฑ Object๋ฅผ ์ˆ˜์ •ํ•  ๋•Œ ์ด๋กœ ์ธํ•œ ์˜ํ–ฅ์„ ์ž๋™์œผ๋กœ ์‹๋ณ„ํ•˜๋Š” ๊ธฐ๋Šฅ
๋ฐ์ดํ„ฐ ๊ฑฐ๋ฒ„๋„Œ์Šค์™€ ์‹œ์Šคํ…œ ๋ฌด๊ฒฐ์„ฑ ์œ ์ง€๋ฅผ ๋ชฉ์ ์œผ๋กœ ํ•จ

  • ex: ํ…Œ์ด๋ธ” ์ด๋ฆ„์ด๋‚˜ ์ปฌ๋Ÿผ ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•˜๊ฑฐ๋‚˜ ์‚ญ์ œํ•˜๋Š” ๊ฒฝ์šฐ
  • ๊ณ„์Šน ๊ด€๊ณ„ ๋ถ„์„์„ ํ†ตํ•œ ๋” ์„ธ๋ฐ€ํ•œ ๋ณด์•ˆ ๋ฐ ์•ก์„ธ์Šค ์ œ์–ด
    • ์–ด๋–ค ํ…Œ์ด๋ธ”์˜ ๊ฐœ์ธ์ •๋ณด ์ปฌ๋Ÿผ์ด ์ƒˆ๋กœ์šด ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค๋•Œ ์‚ฌ์šฉ๋œ๋‹ค๋ฉด?
      • ์›๋ณธ ํ…Œ์ด๋ธ”์—์„œ์˜ ๊ถŒํ•œ ์„ค์ •์ด ๊ทธ๋Œ€๋กœ ์ „ํŒŒ๋จ (Tag ํฌํ•จ)

Snowflake ํ™œ์šฉ ์‹ค์Šต

๊ณ„์ • ์ƒ์„ฑ

  • ๋ฌด๋ฃŒ ์‹œํ—˜ํŒ ๊ณ„์ • ์ƒ์„ฑ

  • ๋ฌด๋ฃŒ๊ณ„์ •์€ ๋ณ„๋„์˜ ๋กœ๊ทธ์ธ ๋งํฌ๋ฅผ ํ†ตํ•ด ์ ‘์†ํ•ด์•ผ ํ•จ

์‹ค์Šต ์ฝ”๋“œ

Schema

1
2
3
4
5
SNOWFLAKE DB & Schema
dev
โ”œโ”€ raw_data
โ”œโ”€ analytics
โ””โ”€ adhoc

db & schema ์ƒ์„ฑ

1
2
3
4
5
6
-- create db and schema
CREATE DATABASE dev;

CREATE SCHEMA dev.raw_data;
CREATE SCHEMA dev.analytics;
CREATE SCHEMA dev.adhoc;

table ์ƒ์„ฑ

  • session_transaction, user_session_channel, session_timestamp ์„ธ ๊ฐœ์˜ ํ…Œ์ด๋ธ” ์ƒ์„ฑ
1
2
3
4
5
6
-- create tables
CREATE OR REPLACE TABLE dev.raw_data.session_transaction (
 sessionid varchar(32) primary key,
 refunded boolean,
 amount int
);

s3 data ์—ฐ๊ฒฐ

  • S3 read ๊ถŒํ•œ์„ ๊ฐ€์ง„ IAM User ์ƒ์„ฑ & ACCESS KEY๋ฅผ ๋ฐœ๊ธ‰
  • COPY command๋กœ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์—์„œ ์ถ”์ถœํ•˜์—ฌ ์ €์žฅ
1
2
3
4
COPY INTO dev.raw_data.session_transaction
FROM 's3://{s3-bucket-path}/session_transaction.csv'
credentials=(AWS_KEY_ID='' AWS_SECRET_KEY='')
FILE_FORMAT = (type='CSV' skip_header=1 FIELD_OPTIONALLY_ENCLOSED_BY='"');
  • analytics schema์— ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๊ณ  ๋ฐ์ดํ„ฐ๊ฐ€ ์ž˜ ๋ถˆ๋Ÿฌ์™€์กŒ๋Š”์ง€ ํ™•์ธ
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
-- create a table using CTAS
CREATE TABLE dev.analytics.mau_summary AS
SELECT
 TO_CHAR(A.ts, 'YYYY-MM') AS month,
 COUNT(DISTINCT B.userid) AS mau
FROM raw_data.session_timestamp A
JOIN raw_data.user_session_channel B ON A.sessionid = B.sessionid
GROUP BY 1
ORDER BY 1 DESC;

SELECT * FROM dev.analytics.mau_summary LIMIT 10;

Role & User ์ƒ์„ฑ

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
-- create 3 roles
CREATE ROLE analytics_users;
CREATE ROLE analytics_authors;
CREATE ROLE pii_users;

-- create a user
CREATE USER seorim PASSWORD='xx';

-- grant role to user
GRANT ROLE analytics_users to USER seorim;

๊ฐ ROLE์˜ ๊ถŒํ•œ ์„ค์ •

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
-- set up analytics_users
GRANT USAGE on schema dev.raw_data to ROLE analytics_users;
GRANT SELECT on all tables in schema dev.raw_data to ROLE analytics_users;
GRANT USAGE on schema dev.analytics to ROLE analytics_users;
GRANT SELECT on all tables in schema dev.analytics to ROLE analytics_users;
GRANT ALL on schema dev.adhoc to ROLE analytics_users;
GRANT ALL on all tables in schema dev.adhoc to ROLE analytics_users;

-- set up analytics_authors
-- authors role์ด users role์˜ ๊ถŒํ•œ ์„ค์ •๋„ ์ƒ์†๋ฐ›๊ฒŒ ๋จ
GRANT ROLE analytics_users TO ROLE analytics_authors;
GRANT ALL on schema dev.analytics to ROLE analytics_authors;
GRANT ALL on all tables in schema dev.analytics to ROLE analytics_authors;

๐Ÿ‘€ย CHECK

(์–ด๋ ต๊ฑฐ๋‚˜ ์ƒˆ๋กญ๊ฒŒ ์•Œ๊ฒŒ ๋œ ๊ฒƒ ๋“ฑ ๋‹ค์‹œ ํ™•์ธํ•  ๊ฒƒ๋“ค)

Snowflake SQL

“Snowflake supports standard SQL”, including a subset of ANSI SQL:1999 and the SQL:2003 analytic extensions. Snowflake also supports common variations for a number of commands where those variations do not conflict with each other.

Snowflake Object Hierachy

โ— ๋Š๋‚€ ์ 

๊ฐ•์˜๋Š” Snowflake์— ๋Œ€ํ•ด ๋ฐฐ์šฐ๊ณ , ๊ฐ„๋‹จํ•˜๊ฒŒ ํ™œ์šฉํ•ด๋ณด๋Š” ๋‚ด์šฉ์ด์—ˆ๋‹ค. ์ „๋ฐ˜์ ์œผ๋กœ ์–ด๋ ค์šด ๊ฑด ์—†์—ˆ๋‹ค. notebook์„ ๋”ฐ๋กœ ์จ์•ผํ•˜๋Š”(Colab or Jupyter) Redshift์™€๋Š” ๋‹ค๋ฅด๊ฒŒ, ์ž์ฒด์ ์œผ๋กœ notebook ์—ญํ• ์„ ํ•˜๋Š” worksheet๋ฅผ ์ œ๊ณตํ•œ๋‹ค. worksheet๋ฅผ ํฌํ•จํ•œ UI๊ฐ€ redshift๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ์™€ ๋น„๊ตํ•˜๋ฉด ๋” ํŽธํ•˜๊ณ  ๊น”๋”ํ•ด์„œ ์ข‹์•˜๋‹ค.

์˜ค๋Š˜์€ ๊ฒŒ๋”์—์„œ ๋ชจ๊ฐ์ฝ”(๋ชจ๊ฐ๊ณต)์„ ํ•ด๋ดค๋‹ค. ํ™”๋ฉด ๊ณต์œ ๋ฅผ ํ•˜๋ฉด์„œ ๊ฐ•์˜๋ฅผ ๋“ค์œผ๋‹ˆ๊นŒ ์›๋ž˜๋ณด๋‹ค ์ง‘์ค‘์€ ์ž˜ ๋๋Š”๋ฐ, ์•„์‰ฌ์šด์ ์€ ๊ฐ™์ด ๊ณต๋ถ€ํ•˜๋Š” ๋Š๋‚Œ์ด ์ž˜ ์•ˆ๋“ค๋”๋ผ. ์–ด๋–ป๊ฒŒ ์ง„ํ–‰ํ•ด์•ผํ• ์ง€ ๊ณ ๋ฏผ์ด ์ข€ ๋œ๋‹ค. ์‹œ๊ฐ„์„ ์ •ํ•ด์„œ ๋ง์„ ๊ฑธ๊ฑฐ๋‚˜ ์ž˜ ์•ˆ๋˜๋Š” ๊ฑธ ๋ฌผ์–ด๋ณด๋ฉด ์–ด๋–จ๊นŒ? ๋ชจ๊ฐ์ฝ” ์ฐธ์—ฌํ•˜์‹œ๋Š” ๋ถ„๋“คํ•˜๊ณ  ์นœํ•ด์ง€๊ณ  ์‹ถ์€๋ฐ ๋ง ๊ฑธ๊ธฐ๊ฐ€ ์‰ฝ์ง€ ์•Š์•„์„œ ์–ด๋–ค ์ฃผ์ œ๋กœ ์–ด๋–ป๊ฒŒ ์–˜๊ธฐํ•˜๋ฉด ์ข‹์„์ง€ ๊ณ ๋ฏผ์„ ์ข€ ํ•ด๋ด์•ผ๊ฒ ๋‹ค…. ใ… ใ…œ

Hugo๋กœ ๋งŒ๋“ฆ
Jimmy์˜ Stack ํ…Œ๋งˆ ์‚ฌ์šฉ ์ค‘