What is BigQuery?

Google BigQuery is a fully-managed, serverless data warehouse that enables super-fast SQL queries using the processing power of Google’s infrastructure. It’s designed for analyzing large datasets and is commonly used for business intelligence, data analytics, and machine learning.

BigQuery Connection Requirements

Unlike traditional databases that use username/password authentication, BigQuery uses Google Cloud Service Accounts for secure API access. This means you’ll need:

A Google Cloud Project with BigQuery enabled
A Service Account with appropriate permissions
A Service Account Key (JSON file) for authentication (how to generate)
Below are the minimum required BigQuery permissions for Julius to work.

Providing the minimum required permissions

Navigate to the Google Cloud Console IAM Roles
Create a new role with the following permissions:

Minimum Required Permissions

Your service account needs at least the following permissions to work with BigQuery:

bigquery.jobs.create - Required to execute a BigQuery job
bigquery.readsessions.create - Required to create a read session
bigquery.readsessions.getData - Required to get data from a read session
bigquery.tables.get - Required to access metadata about a table
bigquery.tables.getData - Required to get data from a table
bigquery.tables.list - Required to list available tables in a dataset
bigquery.routines.get - Required to access metadata about a routine
bigquery.routines.list - Required to list available routines in a dataset

Assign the roles to the service account

More Granular Permissions

The following permissions allow julius to access information about available tables. Currently, BigQuery does not provide a way to restrict viewing metadata to specific tables. However, you can restrict access to viewing the data within specific tables.

bigquery.tables.get - Required to access metadata about a table
bigquery.tables.list - Required to list available tables in a dataset
bigquery.routines.get - Required to access metadata about a routine
bigquery.routines.list - Required to list available routines in a dataset

You can view Google’s documentation to understand why we need these permissions: BigQuery Information Schema Tables With this, you could create two roles, one role that allows julius to access the high level metadata about the tables and another role that allows julius to access particular data within a subset of tables. The second role would contain the following permissions:

bigquery.tables.getData - Required to get data from a table
bigquery.readsessions.getData - Required to get data from a read session

You can then apply more granular permissions for each table you want to allow julius to access on this role.

Connecting Julius to BigQuery

Navigate to Data Connectors

Go to your Julius Data Connectors Settings 2. Click Create new Data Connector 3. Select BigQuery from the available options

Configure Connection Details

You’ll see a form with the following fields:

Fields marked with an asterisk (*) are required to establish a connection.

Connection Name*

string

What it is: A friendly name to identify this BigQuery connection
Example: “Production Analytics” or “Sales Data Warehouse”
Tip: Choose a name that helps you remember which BigQuery project/datasets this connects to

SERVICE_ACCOUNT_JSON*

object

What it is: The complete JSON content from your downloaded service account key file
How to use: Open the downloaded JSON file in a text editor and copy the entire contents
Security: Julius encrypts and securely stores these credentials

Make sure to copy the entire JSON content including the opening and closing curly braces { }. Missing any part will cause authentication to fail.

MFA_TYPE

string

What it is: Multi-Factor Authentication type if your organization requires additional security
When needed: Only if your Google Cloud organization has additional authentication requirements
Most users: Can leave this blank unless specifically required by your organization’s security policy

Test and Save Connection

Click Add Connection to test the connection
Julius will validate your service account credentials and access permissions
If successful, your connector will be saved and ready to use
If there’s an error, check the common issues section below

Troubleshooting Common Issues

Authentication failed or invalid credentials

Verify you copied the complete JSON content (including braces) - Check that the service account still exists in Google Cloud Console - Ensure the service account key hasn’t been deleted or disabled - Confirm the JSON format is valid (no extra characters or line breaks)

Permission denied errors

Verify the service account has at minimum the bigquery.jobs.create, bigquery.tables.get, and bigquery.tables.list permissions - Check that the service account has BigQuery Job User and BigQuery Data Viewer roles - Check if datasets have additional access restrictions - Ensure BigQuery API is enabled in your Google Cloud project - Confirm you’re using the correct Google Cloud project

Julius can't find my tables or datasets

Verify the service account has access to the specific datasets - Check dataset regions - ensure they’re in the same region or multi-region - Confirm table names and dataset IDs are correct - Ensure datasets aren’t deleted or moved to a different project

Using Your BigQuery Connector

Once your BigQuery connector is set up:

Start a conversation with Julius
Ask about your data using natural language:
- “Show me sales data from the last quarter”
- “What’s the average order value by region?”
- “Create a chart showing user growth over time”
Julius will automatically:
- Connect to your BigQuery project
- Write and execute SQL queries
- Handle BigQuery’s specific syntax and functions
- Present results in easy-to-understand formats
- Create visualizations when requested

Julius understands BigQuery’s unique features like nested/repeated fields, array functions, and standard SQL syntax. You don’t need to know BigQuery-specific SQL!

Query optimization for large-scale BigQuery warehouses

Basic mechanics - How BigQuery pricing works

BigQuery charges per TB of data scanned (not stored)
When you run SELECT * on a 100M record table, you pay for scanning every column
Unfiltered queries scan entire tables regardless of result size
Partitioned tables allow BigQuery to skip irrelevant data chunks
Clustered tables organize data for faster retrieval within partitions

Key characteristics of cost-effective queries

Column selectivity: Only request needed columns (SELECT specific_cols vs SELECT *)
Row filtering: Use WHERE clauses with partitioned/clustered columns
Partition pruning: Filter on partition keys (usually date/timestamp)
Early aggregation: Group/summarize before joining large tables
Query caching: Identical queries reuse previous results

Concrete examples

Expensive query (might scan 3TB)

SELECT * FROM sales_data 
WHERE customer_id = '12345'

Optimized query (might scan 50GB)

SELECT customer_id, order_date, total_amount 
FROM sales_data 
WHERE DATE(order_date) >= '2024-01-01' 
  AND customer_id = '12345'

BigQuery smart prompting framework on Julius

1. Always include time boundaries

Why this matters: Tables can contain years of data. Without time limits, Julius might scan everything.

Costly approach: “What are our sales trends?”

Cost-effective approach: “Show me daily sales trends for the past 3 months”

Why it’s better: The first query forces Julius to scan the entire sales history (potentially years of data). The second query uses date partitioning to scan only 3 months of data - potentially reducing processing by 80-90%. Pro tip: Start with recent data for exploration, then expand timeframes only when needed.

2. Be specific about what you need

Why this matters: Julius will include relevant columns and metrics based on your request. Vague requests lead to broader data pulls.

Costly approach: “Analyze our customer data”

Cost-effective approach: “Show me customer purchase frequency and average order value for active customers in Q4”

Why it’s better: The first query might pull every customer attribute across all time periods. The second targets specific metrics (purchase frequency, AOV) for a defined segment (active customers) in a specific period (Q4). This could reduce data processing by 70-95%.

3. Use smart sampling for exploration

Why this matters: When you’re exploring patterns or testing hypotheses, perfect precision often isn’t necessary.

Costly approach: “What patterns do you see in user behavior?”

Cost-effective approach: “Analyze login patterns and session duration using a 20% sample of users from the past month”

Why it’s better: The first query processes every user interaction ever recorded. The second processes only 20% of recent data, which is typically sufficient for pattern recognition and costs 80% less while still providing statistically valid insights.

4. Layer your analysis

Why this matters: Start broad with cost-effective queries, then drill down based on what you discover.

Costly approach: “Give me a complete breakdown of all our metrics by every possible dimension”

Cost-effective approach: “Show me our top 5 product categories by revenue this quarter, then I’ll dive deeper into the most interesting one”

Why it’s better: The first approach processes massive amounts of data upfront. The layered approach finds the most important areas first (small data scan), then focuses detailed analysis only where it matters most.

5. Reference existing summaries when available

Why this matters: We have pre-computed summary tables and views that contain aggregated data.

Costly approach: “What’s our monthly revenue growth?”

Cost-effective approach: “Using our monthly_revenue_summary table, show me growth rates for the past year”

Why it’s better: The first query might aggregate daily transaction data from scratch. The second uses pre-computed monthly summaries, processing 12 rows instead of potentially millions.

Quick check: The smart prompting checklist

Before submitting your query, check:

✅ Time Range: Did you specify when? (past month, Q4, last 90 days)
✅ Scope: Are you asking for specific metrics or segments?
✅ Purpose: Is this exploration (use sampling) or precision analysis?
✅ Building Blocks: Can you start with a summary or build on previous work?

Reach out to team@julius.ai for support or to ask questions not answered in our documentation.

Get Started

Features

Account

Setup BigQuery

What is BigQuery?

BigQuery Connection Requirements

Providing the minimum required permissions

Minimum Required Permissions

More Granular Permissions

Connecting Julius to BigQuery

Troubleshooting Common Issues

Using Your BigQuery Connector

Query optimization for large-scale BigQuery warehouses

Basic mechanics - How BigQuery pricing works

Key characteristics of cost-effective queries

Concrete examples

BigQuery smart prompting framework on Julius

1. Always include time boundaries

2. Be specific about what you need

3. Use smart sampling for exploration

4. Layer your analysis

5. Reference existing summaries when available

Quick check: The smart prompting checklist

Get Started

Features

Account

​What is BigQuery?

​BigQuery Connection Requirements

​Providing the minimum required permissions

​Minimum Required Permissions

​More Granular Permissions

​Connecting Julius to BigQuery

​Troubleshooting Common Issues

​Using Your BigQuery Connector

​Query optimization for large-scale BigQuery warehouses

​Basic mechanics - How BigQuery pricing works

​Key characteristics of cost-effective queries

​Concrete examples

​BigQuery smart prompting framework on Julius

​1. Always include time boundaries

​2. Be specific about what you need

​3. Use smart sampling for exploration

​4. Layer your analysis

​5. Reference existing summaries when available

​Quick check: The smart prompting checklist

What is BigQuery?

BigQuery Connection Requirements

Providing the minimum required permissions

Minimum Required Permissions

More Granular Permissions

Connecting Julius to BigQuery

Troubleshooting Common Issues

Using Your BigQuery Connector

Query optimization for large-scale BigQuery warehouses

Basic mechanics - How BigQuery pricing works

Key characteristics of cost-effective queries

Concrete examples

BigQuery smart prompting framework on Julius

1. Always include time boundaries

2. Be specific about what you need

3. Use smart sampling for exploration

4. Layer your analysis

5. Reference existing summaries when available

Quick check: The smart prompting checklist