Day 36

πŸ“‹Β κ³΅λΆ€ λ‚΄μš©

GCS (Google Cloud Storage)

버킷 생성

Scraping using requests

  1. Setting Session

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    
    import requests
    
    s = requests.Session()
    headers = {
    ...
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest",
    ...
    }
    
    payload = {
        "account": {myaccount},
        "password": {mypassword},
    }
    
  2. Login with Session

    1
    2
    
    # session에 post둜 login
    res = s.post("https://kdx.kr/auth/autoLogin", headers=headers, data=payload)
    
  3. Download file with login authentication

    1
    2
    
    # session에 get으둜 파일 url ν¬ν•¨ν•΄μ„œ request 보내고 데이터 λ°›μ•„μ˜΄
    response = s.get(file_url, stream=True)
    
  4. Set download path to GCS bucket# Directly Download file to GCS

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    # import google cloud library
    from google.cloud import storage
    
    # setting with bucket name
    storage_client = storage.Client()
    bucket_name = {mybucket}  # 여기에 μ‹€μ œ 버킷 이름을 μž…λ ₯ν•˜μ„Έμš”
    bucket = storage_client.bucket(bucket_name)
    
    blob = bucket.blob(f"{filename}.csv")
    blob.upload_from_string(response.content)
    

Google Cloud 인증 정보

  • IAM 및 κ΄€λ¦¬μž > μ„œλΉ„μŠ€ 계정 > μ„œλΉ„μŠ€ 계정 λ§Œλ“€κΈ°

    • μ•‘μ„ΈμŠ€ κΆŒν•œ μ„€μ • (GCS에 객체λ₯Ό 생성할 수 μžˆλŠ” κΆŒν•œμœΌλ‘œ μ„€μ •)

  • ν‚€ 생성 및 λ‘œμ»¬μ— μ €μž₯

  • ~/.zshrc

    1
    
    export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"
    

Snowflake

Connect with GCS

write SQL Code

GCS Integration

  • snowflake’s GCS account object 생성됨

Make custom IAM Role

  • can read bucket, and CRUD objects in the bucket
  • connect snowflake’s account with this custom Role

Bulk Update with COPY Command

  • νŒŒμΌμ„ μ „λΆ€ ν…Œμ΄λΈ”λ‘œ λ³΅μ‚¬ν•˜λŠ” 방식
  • 파일의 일뢀 컬럼만 κ³¨λΌμ„œ λ³΅μ‚¬ν•˜λŠ” 방식을 μ‚¬μš©ν•  수 μžˆλ‹€.

λ‚¨μ•„μžˆλŠ” 무료 μš”κΈˆ ν™•μΈν•˜λŠ” 법

Superset ( preset.io )

Connect with Snowflake

Security > Network policy

πŸ‘€Β CHECK

(μ–΄λ ΅κ±°λ‚˜ μƒˆλ‘­κ²Œ μ•Œκ²Œ 된 것 λ“± λ‹€μ‹œ 확인할 것듀)

❗ λŠλ‚€ 점

Licensed under CC BY-NC-SA 4.0
Hugo둜 λ§Œλ“¦
Jimmy의 Stack ν…Œλ§ˆ μ‚¬μš© 쀑