πΒ κ³΅λΆ λ΄μ©
GCS (Google Cloud Storage)
λ²ν· μμ±
Scraping using requests
Setting Session
1 2 3 4 5 6 7 8 9 10 11 12 13 14
import requests s = requests.Session() headers = { ... "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36", "X-Requested-With": "XMLHttpRequest", ... } payload = { "account": {myaccount}, "password": {mypassword}, }
Login with Session
1 2
# sessionμ postλ‘ login res = s.post("https://kdx.kr/auth/autoLogin", headers=headers, data=payload)
Download file with login authentication
1 2
# sessionμ getμΌλ‘ νμΌ url ν¬ν¨ν΄μ request 보λ΄κ³ λ°μ΄ν° λ°μμ΄ response = s.get(file_url, stream=True)
Set download path to GCS bucket# Directly Download file to GCS
1 2 3 4 5 6 7 8 9 10
# import google cloud library from google.cloud import storage # setting with bucket name storage_client = storage.Client() bucket_name = {mybucket} # μ¬κΈ°μ μ€μ λ²ν· μ΄λ¦μ μ λ ₯νμΈμ bucket = storage_client.bucket(bucket_name) blob = bucket.blob(f"{filename}.csv") blob.upload_from_string(response.content)
Google Cloud μΈμ¦ μ 보
IAM λ° κ΄λ¦¬μ > μλΉμ€ κ³μ > μλΉμ€ κ³μ λ§λ€κΈ°
- μ‘μΈμ€ κΆν μ€μ (GCSμ κ°μ²΄λ₯Ό μμ±ν μ μλ κΆνμΌλ‘ μ€μ )
ν€ μμ± λ° λ‘컬μ μ μ₯
~/.zshrc
1
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"
Snowflake
Connect with GCS
write SQL Code
GCS Integration
- snowflake’s GCS account object μμ±λ¨
Make custom IAM Role
- can read bucket, and CRUD objects in the bucket
- connect snowflake’s account with this custom Role
Bulk Update with COPY
Command
- νμΌμ μ λΆ ν μ΄λΈλ‘ 볡μ¬νλ λ°©μ
- νμΌμ μΌλΆ 컬λΌλ§ 골λΌμ 볡μ¬νλ λ°©μμ μ¬μ©ν μ μλ€.
λ¨μμλ λ¬΄λ£ μκΈ νμΈνλ λ²
Superset ( preset.io )
Connect with Snowflake
Security > Network policy
πΒ CHECK
(μ΄λ ΅κ±°λ μλ‘κ² μκ² λ κ² λ± λ€μ νμΈν κ²λ€)