Access Kaggle Public API with REST

Tram Ho

1 Introduction

Kaggle, of Google, is a community for Data Scientist and Machine Learning. Kaggle allows users to find and publish datasets, find and build models, participate in contests for Data Scientist/Machine Learning, and more.

Kaggle’s API documentation is officially a Python CLI. Although Python CLI has its usefulness, but sometimes when I want to be a Crawler, REST API will be more useful. And there is very little official documentation with Public Kaggle API. So here is a guide to accessing Kaggle Public API with REST.

2 Generate API Key

To use the Kaggle Public API, you need to generate an API key token, go to your Account link, scroll down to the API section and click the Create New API Token button.

Once there, a file kaggle.json is downloaded and will contain your username and key information.

3 Interacting with APIs

Luckily, the Kaggle team already created a yaml swagger file so we don’t have to dig through the URLs.

  1. Go to here to get the yaml swagger file.
  2. Use any Swagger UI, here I also use the website https://editor.swagger.io/
  3. Import the above yaml file into Swagger Editor.
  4. You can see the endpoints with query information displayed.

  1. Kaggle uses a basic Authorization system, you just need to enter the username and API key in the two fields, respectively, username and password. When the REST request code, it will need to encrypt both of these information with native 64 encoding with the format {username}:{api key} to get the header token.

  1. When you code Crawler, usually what you are interested in will be the Dataset. Kaggle’s Base API URL is: https://www.kaggle.com/api/v1/. To search for datasets with a total of 11 parameters, from search keywords to pagination,… With the method GET, URL là https://www.kaggle.com/api/v1/datasets/list. Here is an example where I find all datasets without any parameters.

  1. List of kaggle API descriptions for each object:
  • Competitions : List competitions, Query files in competitions, Download files in competitions, Submit entries to contests, List submissions in competitions, Get rankings in competitions.
  • Datasets: List dataset, List files for dataset, Download dataset file, Initialize metadata file to create dataset, Create new dataset, Create new dataset instance, Download metadata for existing dataset, Get dataset creation status
  • Kernels: List kernels, Initialize metadata for a kernel, Push up a kernel, Pull down a kernel, Retrieve kernel output, Get the status of the latest kernel run
Share the news now