Project: Snowquery - R Package for Database Querying
snowquery
is an R package I developed to simplify running SQL queries against Snowflake, Redshift, and PostgreSQL databases directly from R.
The Motivation
I love R for data analysis. I chose Snowflake as the data warehouse for my startup. Snowflake only had a python connector. Nail: meet Hammer.
I wanted:
- A unified package for querying databases we were using from R.
- Abstract away the complexities and python requirements for the Snowflake connection.
- A simple and local credential management system.
How It Works
For Snowflake:
snowquery
uses the reticulate
package to run the official Snowflake Python connector in the background. This bypasses the need for users to manually configure ODBC drivers or deal with other R-specific Snowflake connection hurdles. The results come back as Pandas DataFrames from the Python connector which are seamlessly converted into R dataframes.
For Redshift & PostgreSQL:
For these databases, snowquery
uses the standard DBI
interface with the RPostgres
driver, making for a normal and efficient connection method.
Key Features
- Simplified Querying: A single function,
queryDB()
, is the main entry point for executing queries. - Credential Management:
- The package looks for a
snowquery_creds.yaml
file in the user's home directory (~
). This file can store connection details (account, username, password, warehouse, role, etc.) for multiple named connections. - Credentials can also be passed directly to the
queryDB()
function, overriding or supplementing the YAML file.
- The package looks for a
- Supported Databases: Snowflake, Redshift, and PostgreSQL.
- Python Dependency Management: For Snowflake connections, it checks for a Python installation and the
snowflake-connector-python[pandas]
library, providing guidance through error messages if they are missing. - Connection Timeout: Users can specify a timeout for database connections. Kind of important if you want to run very long-running queries.
Technical Stack
- R
- Python (for Snowflake via
reticulate
) snowflake-connector-python[pandas]
DBI
RPostgres
yaml
(for credential file parsing)
Installation & Usage
The package can be installed from CRAN or the development version from GitHub:
# From CRAN
install.packages("snowquery")
Typical usage would look like this:
library(snowquery)
# Using a named connection from snowquery_creds.yaml
my_data <- queryDB("SELECT * FROM my_awesome_table", conn_name = 'my_snowflake_dwh')
# Manually passing credentials
more_data <- queryDB(
"SELECT * FROM another_table",
db_type = 'postgres',
host = 'localhost',
port = 5432,
database = 'mydb',
username = 'user',
password = 'password'
)
Project Links & Status
- Status: Active and available on CRAN.
- CRAN: https://cran.r-project.org/package=snowquery
- GitHub Repository: https://github.com/mermelstein/snowquery
- Package Website: https://snowquery.org