Debugging BigQuery Problems
Imports into BigQuery tend to generate errors that get gobbled and become incomprehensible. We can use the bq command to find the job that failed and see what happened:
bq ls --jobs=true --all=true
jobId Job Type State Start Time Duration
---------------------------------- ---------- --------- ----------------- ----------------
job_v3I9eP..... <REDACTED> query SUCCESS 25 Jul 14:20:05 0:00:00.425000
job_WEYIoL..... <REDACTED> query SUCCESS 25 Jul 14:20:02 0:00:00.169000
job_pz0LrU..... <REDACTED> query SUCCESS 25 Jul 14:20:01 0:00:00.416000
job_Gm5t7s..... <REDACTED> query SUCCESS 25 Jul 14:20:00 0:00:00.102000
job_gUQPMv..... <REDACTED> query SUCCESS 25 Jul 14:15:24 0:00:00.338000
We can inspect a particular job and get the logs:
bq show --job=true hSi...
Job redacted:hSi...
Job Type State Start Time Duration User Email Bytes Processed Bytes Billed Billing Tier Labels
---------- --------- ----------------- ---------------- -------------------------------------------------------------------- ----------------- -------------- -------------- --------
load FAILURE 25 Jul 13:51:30 0:00:42.551000 user@redacted
Error encountered during job execution:
Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 626045 ; errors: 2 ; max bad: 0 ; error percent: 0
Failure details:
- gs://bucketname/temp/` schema ` . ` articles_tmp `
.csv/part.01.0001.csv.gz: Error while reading data, error message:
Bad character (ASCII 0 ) encountered.; line_number: 141935
byte_offset_to_start_of_line: 66762918 column_index: 3 column_name:
"summary" column_type: STRING value: "Co mnie bardzo za..." File: g
s://bucketname/temp/` schema ` . ` articles_tmp ` .
csv/part.01.0001.csv.gz
- gs://bucketname/temp/` schema ` . ` articles_tmp `
.csv/part.01.0001.csv.gz: Error while reading data, error message:
Bad character (ASCII 0 ) encountered.; line_number: 166345
byte_offset_to_start_of_line: 78934854 column_index: 3 column_name:
"summary" column_type: STRING value: "Bill Smith joins..." File: g
s://bucketname/temp/` schema ` . ` articles_tmp ` .
csv/part.01.0001.csv.gz
- You are loading data without specifying data format, data will be
treated as CSV format by default. If this is not what you mean,
please specify data format by --source_format.
Now we can see the cause of the error which in this case is to do with encoding