Debugging BigQuery Problems

Imports into BigQuery tend to generate errors that get gobbled and become incomprehensible. We can use the bq command to find the job that failed and see what happened:

bq ls --jobs=true --all=true
 
              jobId                 Job Type    State      Start Time         Duration     
 ---------------------------------- ---------- --------- ----------------- ---------------- 
  job_v3I9eP.....<REDACTED>   query      SUCCESS   25 Jul 14:20:05   0:00:00.425000  
  job_WEYIoL.....<REDACTED>   query      SUCCESS   25 Jul 14:20:02   0:00:00.169000  
  job_pz0LrU.....<REDACTED>   query      SUCCESS   25 Jul 14:20:01   0:00:00.416000  
  job_Gm5t7s.....<REDACTED>   query      SUCCESS   25 Jul 14:20:00   0:00:00.102000  
  job_gUQPMv.....<REDACTED>   query      SUCCESS   25 Jul 14:15:24   0:00:00.338000
 

We can inspect a particular job and get the logs:

 
bq show --job=true hSi...
 
Job redacted:hSi...
 
  
 
  Job Type    State      Start Time         Duration                                  User Email                               Bytes Processed   Bytes Billed   Billing Tier   Labels  
 
 ---------- --------- ----------------- ---------------- -------------------------------------------------------------------- ----------------- -------------- -------------- -------- 
 
  load       FAILURE   25 Jul 13:51:30   0:00:42.551000   user@redacted                                                          
 
  
 
Error encountered during job execution:
 
Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 626045; errors: 2; max bad: 0; error percent: 0
 
Failure details:
 
 - gs://bucketname/temp/`schema`.`articles_tmp`
 
   .csv/part.01.0001.csv.gz: Error while reading data, error message:
 
   Bad character (ASCII 0) encountered.; line_number: 141935
 
   byte_offset_to_start_of_line: 66762918 column_index: 3 column_name:
 
   "summary" column_type: STRING value: "Co mnie bardzo za..." File: g
 
   s://bucketname/temp/`schema`.`articles_tmp`.
 
   csv/part.01.0001.csv.gz
 
 - gs://bucketname/temp/`schema`.`articles_tmp`
 
   .csv/part.01.0001.csv.gz: Error while reading data, error message:
 
   Bad character (ASCII 0) encountered.; line_number: 166345
 
   byte_offset_to_start_of_line: 78934854 column_index: 3 column_name:
 
   "summary" column_type: STRING value: "Bill Smith joins..." File: g
 
   s://bucketname/temp/`schema`.`articles_tmp`.
 
   csv/part.01.0001.csv.gz
 
 - You are loading data without specifying data format, data will be
 
   treated as CSV format by default. If this is not what you mean,
 
   please specify data format by --source_format.

Now we can see the cause of the error which in this case is to do with encoding