标签:python pandas google-bigquery
假设我将以下查询发送到BQ:
SELECT shipmentID, category, quantity
FROM [myDataset.myTable]
此外,假设查询返回如下数据:
shipmentID category quantity
1 shoes 5
1 hats 3
2 shirts 1
2 hats 2
3 toys 3
2 books 1
3 shirts 1
如何从BQ中调整结果以产生输出,如下所示:
shipmentID shoes hats shirts toys books
1 5 3 0 0 0
2 0 2 1 0 1
3 0 0 1 3 0
作为一些额外的背景,我实际上有2000个需要转动的类别,数据量是这样的,我不能直接通过Python中的Pandas DataFrame(使用所有内存,然后慢速爬行).我尝试使用关系数据库,但遇到了列限制,所以我希望能够直接在BQ中执行它,即使我必须通过python构建查询本身.有什么建议?
**编辑1
我应该提一下,转动数据本身可以在块中完成,因此不是问题.真正的麻烦在于之后尝试进行聚合,因此每个shipmentID只有一行.这就是吃掉所有RAM的原因.
**编辑2
在尝试下面接受的答案后,我发现尝试使用它来创建2k列数据透视表导致“资源超出”错误.我的BQ团队能够重构查询以将其分解为更小的块并允许它通过.查询的基本结构如下:
SELECT
SetA.*,
SetB.*,
SetC.*
FROM (
SELECT
shipmentID,
SUM(IF (category="Rocks", qty, 0)),
SUM(IF (category="Paper", qty, 0)),
SUM(IF (category="Scissors", qty, 0))
FROM (
SELECT
a.shipmentid shipmentid,
a.quantity quantity,
a.category category
FROM
[myDataset.myTable] a)
GROUP EACH BY
shipmentID ) SetA
INNER JOIN EACH (
SELECT
shipmentID,
SUM(IF (category="Jello Molds", quantity, 0)),
SUM(IF (category="Torque Wrenches", quantity, 0))
FROM (
SELECT
a.shipmentID shipmentID,
a.quantity quantity,
a.category category
FROM
[myDataset.myTable] a)
GROUP EACH BY
shipmentID ) SetB
ON
SetA.shipmentid = SetB.shipmentid
INNER JOIN EACH (
SELECT
shipmentID,
SUM(IF (category="Deep Thoughts", qty, 0)),
SUM(IF (category="Rainbows", qty, 0)),
SUM(IF (category="Ponies", qty, 0))
FROM (
SELECT
a.shipmentid shipmentid,
a.quantity quantity,
a.category category
FROM
[myDataset.myTable] a)
GROUP EACH BY
shipmentID ) SetC
ON
SetB.shipmentID = SetC.shipmentID
通过一个接一个地添加INNER JOIN EACH段,可以无限期地继续上述模式.对于我的应用程序,BQ能够处理每个块大约500列.
解决方法:
这是一种方法:
select shipmentID,
sum(IF (category='shoes', quantity, 0)) AS shoes,
sum(IF (category='hats', quantity, 0)) AS hats,
sum(IF (category='shirts', quantity, 0)) AS shirts,
sum(IF (category='toys', quantity, 0)) AS toys,
sum(IF (category='books', quantity, 0)) AS books,
from
(select 1 as shipmentID, 'shoes' as category, 5 as quantity),
(select 1 as shipmentID, 'hats' as category, 3 as quantity),
(select 2 as shipmentID, 'shirts' as category, 1 as quantity),
(select 2 as shipmentID, 'hats' as category, 2 as quantity),
(select 3 as shipmentID, 'toys' as category, 3 as quantity),
(select 2 as shipmentID, 'books' as category, 1 as quantity),
(select 3 as shipmentID, 'shirts' as category, 1 as quantity),
group by shipmentID
返回:
+-----+------------+-------+------+--------+------+-------+---+
| Row | shipmentID | shoes | hats | shirts | toys | books | |
+-----+------------+-------+------+--------+------+-------+---+
| 1 | 1 | 5 | 3 | 0 | 0 | 0 | |
| 2 | 2 | 0 | 2 | 1 | 0 | 1 | |
| 3 | 3 | 0 | 0 | 1 | 3 | 0 | |
+-----+------------+-------+------+--------+------+-------+---+
请参阅其他pivot table example的手册.
标签:python,pandas,google-bigquery 来源: https://codeday.me/bug/20190519/1137387.html
本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享; 2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关; 3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关; 4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除; 5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。