전체 글
-
캐글 advanced_sql/03-nested-and-repeated-data<Kaggle-Course> 2023. 3. 15. 19:46
pop_lang_query = """ SELECT language.name AS language_name, COUNT(*) AS num_repos FROM `bigquery-public-data.github_repos.languages`, UNNEST(language) AS language GROUP BY language_name ORDER BY num_repos DESC """ all_langs_query = """ SELECT language.name AS name, language.bytes AS bytes FROM `bigquery-public-data.github_repos.languages`, UNNEST(language) AS language WHERE repo_name = 'polyrab..
-
캐글 advanced_sql/01-joins-and-unions<Kaggle-Course> 2023. 3. 15. 16:22
first_query = """ SELECT q.id AS q_id, MIN(TIMESTAMP_DIFF(a.creation_date, q.creation_date, SECOND)) as time_to_answer FROM `bigquery-public-data.stackoverflow.posts_questions` AS q INNER JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a ON q.id = a.parent_id WHERE q.creation_date >= '2018-01-01' and q.creation_date < '2018-02-01' GROUP BY q_id ORDER BY time """ correct_query = """ ..
-
캐글 intro_to_sql/03-group-by-having-count<Kaggle-Course> 2023. 3. 15. 15:37
prolific_commenters_query = """SELECT author, COUNT(id) AS NumPosts FROM `bigquery-public-data.hacker_news.comments` GROUP BY author HAVING COUNT(id) > 10000""" deleted_posts_query = """SELECT COUNT(1) AS num_deleted_posts FROM `bigquery-public-data.hacker_news.comments` WHERE deleted = True """
-
캐글 intro_to_sql/05-as-with<Kaggle-Course> 2023. 3. 15. 15:36
rides_per_month_query = """WITH cte AS ( SELECT EXTRACT(MONTH FROM trip_start_timestamp) AS month FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips` WHERE EXTRACT(YEAR FROM trip_start_timestamp) = 2017 ) SELECT month, COUNT(1) AS num_trips FROM cte GROUP BY month ORDER BY month """ rides_per_year_query = """WITH cte AS ( SELECT EXTRACT(YEAR FROM trip_start_timestamp) AS year FROM `bigq..
-
캐글 intro_to_sql/04-order-by<Kaggle-Course> 2023. 3. 15. 15:35
code_count_query = """ SELECT indicator_code, indicator_name, COUNT(1) AS num_rows FROM `bigquery-public-data.world_bank_intl_education.international_education` WHERE year = 2016 GROUP BY indicator_code, indicator_name HAVING num_rows >= 175 ORDER BY num_rows DESC """ country_spend_pct_query = """ SELECT country_name, AVG(value) AS avg_ed_spending_pct FROM `bigquery-public-data.world_bank_intl_..
-
캐글 intro_to_sql/06-joining-data<Kaggle-Course> 2023. 3. 15. 15:26
questions_query = \ """ SELECT id, title, owner_user_id FROM `bigquery-public-data.stackoverflow.posts_questions` WHERE tags LIKE '%bigquery%' """ answers_query = \ """ SELECT pa.id, pa.body, pa.owner_user_id FROM `bigquery-public-data.stackoverflow.posts_questions` AS pq INNER JOIN `bigquery-public-data.stackoverflow.posts_answers` AS pa ON pq.id = pa.parent_id WHERE pq.tags LIKE '%bigquery%' "..
-
cudf cuml 란?<Research>/[용어정리] 2023. 3. 15. 13:45
cudf와 cuml은 NVIDIA에서 개발한 GPU 가속 데이터 프로세싱 라이브러리입니다. cudf (CUDA Dataframe)는 Pandas와 유사한 API를 제공하는 GPU 가속 데이터 프레임입니다. 따라서 데이터 과학자와 엔지니어는 Pandas와 비슷한 인터페이스를 사용하여 대규모 데이터 세트를 처리하고 분석할 수 있습니다. 그러나 cudf는 CPU보다 빠른 처리 속도를 제공합니다. cuml (CUDA Machine Learning)은 GPU 가속 머신 러닝 라이브러리입니다. cuml은 scikit-learn과 유사한 API를 제공하며, 선형 회귀, 로지스틱 회귀, KNN, SVM, K-means 클러스터링 등의 일반적인 머신 러닝 알고리즘을 지원합니다. 이를 통해 데이터 과학자와 엔지니어는 대용..