ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

Goodreads-books(好书籍相关数据集)

2020-12-29 17:57:21  阅读:321  来源: 互联网

标签:good had dataset books 数据 书籍 Goodreads


原文:

Goodreads-books

comprehensive list of all books listed in goodreads

The primary reason for creating this dataset is the requirement of a good clean dataset of books. Being a bookie myself (see what I did there?) I had searched for datasets on books in kaggle itself - and I found out that while most of the datasets had a good amount of books listed, there were either a) major columns missing or b) grossly unclean data. I mean, you can't determine how good a book is just from a few text reviews, come on! What I needed were numbers, solid integers and floats that say how many people liked the book or hated it, how much did they like it, and stuff like that. Even the good dataset that I found was well-cleaned, it had a number of interlinked files, which increased the hassle. This prompted me to use the Goodreads API to get a well-cleaned dataset, with the promising features only ( minus the redundant ones ), and the result is the dataset you're at now.

译:

好书

古德雷兹所列全部书籍的综合清单

创建这个数据集的主要原因是需要一个干净的图书数据集。我自己是个赌徒(看到我在那里做了什么了吗?)我在kaggle自己的书中搜索了数据集,我发现,虽然大多数数据集都列出了大量的书,但要么是a)主要列缺失,要么是b)数据极不干净。我的意思是,你不能仅仅从几篇课文评论就决定一本书有多好,拜托!我需要的是数字、实心整数和浮点数,这些数字可以表示有多少人喜欢或讨厌这本书,有多少人喜欢这本书,等等。即使是我发现的好数据集也很干净,它有许多相互关联的文件,这增加了麻烦。这促使我使用GoodReadsAPI来获得一个干净的数据集,只包含有希望的特性(减去多余的特性),结果就是现在的数据集。

大家可以到官网地址下载数据集,我自己也在百度网盘分享了一份。可关注本人公众号,回复“2020122901”获取下载链接。

标签:good,had,dataset,books,数据,书籍,Goodreads
来源: https://blog.csdn.net/ISWZY/article/details/111934099

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有