Snowflake

Snowflake

Snowflake is a powerful cloud-based data warehouse and analytics solution with unlimited, seamless scalability allowing users to blend, analyze, and transform data with standardized ANSI SQL. Similar to Modzy, Snowflake offers dynamic, scalable computing power with charges based purely on usage. We've seen customers use Snowflake with Alteryx, Tableau, Excel and many other tools to great effect.

Here, using Snowflake's "no-tool integration" using Python, we'll pull data from one of Snowflake's Data Exchange markets, process it using Modzy-managed AI models, and write the data back into the Snowflake database for seamless use in the existing pipeline.

Set up the Snowflake connection

Snowflake is a fully supported data source and target and offers a seamless experience connecting data into any AI/ML model managed by Modzy.

First, create or find a relevant dataset in Snowflake. We created a database called 'MODZY_TEST' with a schema called 'AMAZON' and populated it with data from the AWS reviews dataset for this example.

Using the details of your account and dataset, let's set up a connection in Python using the Snowflake library (pip install snowflake):

import snowflake.connector

conn = snowflake.connector.connect(
    user='<your_username',
    password='<your_password',
    account='<account_id>.us-east-2.aws',

    schema='AMAZON',
    database='MODZY_TEST',
    role = 'ACCOUNTADMIN',
    
    application='modzy’
)

The final line, application= allows Snowflake to identify and optimize the connection made to Modzy instances.

Fetch data

With a simple SQL query and using Pandas, we can directly fetch data into a Pandas DataFrame for easy manipulation.

import pandas as pd

query = '''SELECT * FROM REVIEWS'''
df = pd.read_sql(query, conn)

Note for large data sets you may query more limited data using standard SQL queries like SELECT TOP 100 * FROM REVIEWS;

We will use the dataframe to build our inputs object and submit it to our Modzy instance for processing, replacing API_URL and API_KEY with your respective instance and key:

from modzy import ApiClient
client = ApiClient(base_url=API_URL, api_key=API_KEY)

inputs = {}
for index, row in df.iterrows():
    inputs[row['REVIEW_ID']] = {
        'input.txt':  row['REVIEW_BODY']
    }

job = client.jobs.submit_text('ed542963de', '1.0.1', inputs)

You may have noticed small changes from our other tutorials using this same data set: by default Snowflake converts all object names to UPPERcase (if quotes are used, objects will be case-sensitive), so appropriate changes have been made here.

Let's retrieve our processed results and push them back into the dataframe:

results = client.results.block_until_complete(job, timeout=None)

for r in results['results']:
    for c in results['results'][r]["results.json"]["data"]["result"]["classPredictions"]:
        if c['class'] == 'negative':
            df.loc[df['REVIEW_ID'] == r, 'NEG'] = c["score"]
        elif c['class'] == 'neutral':
            df.loc[df['REVIEW_ID'] == r, 'NEU'] = c["score"]
        elif c['class'] == 'positive':
            df.loc[df['REVIEW_ID'] == r, 'POS'] = c["score"]

The final step is to use Snowflake's direct writeback of the dataframe to write the scoring of each review directly from the Pandas Dataframe. Here we are writing to a new table, REVIEWS_CLASSIFIED, to avoid altering our source data:

from snowflake.connector.pandas_tools import write_pandas

write_pandas(
    conn=conn,
    df=df,
    table_name="REVIEWS_CLASSIFIED",
    database="MODZY_TEST",
    schema="AMAZON",
)

That's all there is to it! Now log into Snowflake to access your augmented data table:

31043104 31043104