Kaggle快速上传dataset的方法

原理

从国内上传到有cdn的地方(如GitHub), 再在kaggle的kernel上下载下来,直接上传dataset。

方法

首先需要掌握kaggle-api的使用,kaggle-api是kaggle官方提供的命令行工具,可以从命理完成比赛数据的下载、dataset下载上传,获取榜单等操作。

https://github.com/Kaggle/kaggle-api

本地安装:pip install kaggle

Kaggle已经安装好了,不用再安装

步骤1:下载账户API json

https://www.kaggle.com/me/account

步骤2:在页面创建一个dataset

https://www.kaggle.com/datasets

步骤3:下载dataset的metadata

运行:kaggle datasets metadata shopee-models

步骤4:下载数据集并上传到dataset

完整代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 将API json文件写到这里
!mkdir /root/.kaggle
lines = '''{"username":"写你的用户名","key":"写你的key"}'''
with open('/root/.kaggle/kaggle.json', 'w') as up:
up.write(lines)
# 创建文件夹,写入dataset的metadata
!mkdir hubmapkidneysegmentation
lines = '''{
"id": "finlay/shopee-models",
"id_no": 122348,
"title": "shopee_models",
"subtitle": "",
"description": "",
"keywords": [],
"resources": []
}'''
with open('hubmapkidneysegmentation/dataset-metadata.json', 'w') as up:
up.write(lines)
# 下载文件,这里用axel多线程下载,直接用wget也可以的。
!apt-get install axel
!axel -n 12 https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b7-dcc49843.pth -o hubmapkidneysegmentation/baseline_fold0_densenet_224_epoch50.pth
# 上传文件,这里会覆盖上传
!kaggle datasets version -p ./hubmapkidneysegmentation -m "Updated data fcn"