如何将json从云存储上的文件导入Bigquery

2019-06-29 07:43:44 阅读：249 来源： 互联网

标签：json python import ruby google-bigquery

我试图通过api将文件(json.txt)从云存储导入Bigquery并抛出错误.当通过web ui完成时,它可以工作并且没有错误(我甚至设置了maxBadRecords = 0).有人可以告诉我我在这里做错了什么吗？代码是错误的,还是我需要在某个地方更改Bigquery中的某些设置？

该文件是一个纯文本utf-8文件,内容如下：我保留了关于bigquery和json导入的文档.

{"person_id":225,"person_name":"John","object_id":1}
{"person_id":226,"person_name":"John","object_id":1}
{"person_id":227,"person_name":"John","object_id":null}
{"person_id":229,"person_name":"John","object_id":1}

并在导入作业时抛出以下错误：“值无法转换为预期类型.”每一行.

    {
    "reason": "invalid",
    "location": "Line:15 / Field:1",
    "message": "Value cannot be converted to expected type."
   },
   {
    "reason": "invalid",
    "location": "Line:16 / Field:1",
    "message": "Value cannot be converted to expected type."
   },
   {
    "reason": "invalid",
    "location": "Line:17 / Field:1",
    "message": "Value cannot be converted to expected type."
   },
  {
    "reason": "invalid",
    "location": "Line:18 / Field:1",
    "message": "Value cannot be converted to expected type."
   },
   {
    "reason": "invalid",
    "message": "Too many errors encountered. Limit is: 10."
   }
  ]
 },
 "statistics": {
  "creationTime": "1384484132723",
  "startTime": "1384484142972",
  "endTime": "1384484182520",
  "load": {
   "inputFiles": "1",
   "inputFileBytes": "960",
   "outputRows": "0",
   "outputBytes": "0"
  }
 }
}

该文件可在此处访问：
http://www.sendspace.com/file/7q0o37

我的代码和架构如下：

def insert_and_import_table_in_dataset(tar_file, table, dataset=DATASET)
config= {
  'configuration'=> {
      'load'=> {
        'sourceUris'=> ["gs://test-bucket/#{tar_file}"],
        'schema'=> {
          'fields'=> [
            { 'name'=>'person_id', 'type'=>'INTEGER', 'mode'=> 'nullable'},
            { 'name'=>'person_name', 'type'=>'STRING', 'mode'=> 'nullable'},
            { 'name'=>'object_id',  'type'=>'INTEGER', 'mode'=> 'nullable'}
          ]
        },
        'destinationTable'=> {
          'projectId'=> @project_id.to_s,
          'datasetId'=> dataset,
          'tableId'=> table
        },
        'sourceFormat' => 'NEWLINE_DELIMITED_JSON',
        'createDisposition' => 'CREATE_IF_NEEDED',
        'maxBadRecords'=> 10,
      }
    },
  }

result = @client.execute(
  :api_method=> @bigquery.jobs.insert,
  :parameters=> {
     #'uploadType' => 'resumable',          
      :projectId=> @project_id.to_s,
      :datasetId=> dataset},
  :body_object=> config
)

# upload = result.resumable_upload
# @client.execute(upload) if upload.resumable?

puts result.response.body
json = JSON.parse(result.response.body)    
while true
  job_status = get_job_status(json['jobReference']['jobId'])
  if job_status['status']['state'] == 'DONE'
    puts "DONE"
    return true
  else
   puts job_status['status']['state']
   puts job_status 
   sleep 5
  end
end
end

有人可以告诉我我做错了什么吗？我该修复什么,在哪里？

此外,在未来的某个时刻,我希望使用压缩文件并从中导入 – 这是“tar.gz”还是可以,或者我只需要将其设为“.gz”吗？

提前感谢您的帮助.欣赏它.

解决方法:

很多人(包括我)受到了同样的打击,你受到的打击 –
您正在导入json文件但未指定导入格式,因此它默认为csv.

如果你将configuration.load.sourceFormat设置为NEWLINE_DELIMITED_JSON,你应该很高兴.

我们有一个错误,使其更难做或至少能够检测文件何时是错误的类型,但我会优先考虑.

标签：json,python,import,ruby,google-bigquery
来源： https://codeday.me/bug/20190629/1324451.html

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

如何将json从云存储上的文件导入Bigquery