Skip to content

A Pandas-inspired data analysis project with lazy semantics and query-offloading to SQLite

License

Notifications You must be signed in to change notification settings

rohankumar42/pandaSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pandaSQL

License: GPL v3 Build Status Coverage Status

pandaSQL is a data-analysis library inspired by pandas, but designed to use existing database optimization techniques. While pandaSQL provides the familiar pandas-like API, internally, it uses SQLite to get you results faster.

Install

pandaSQL can be installed via pip as follows:

git clone https://github.com/rohankumar42/pandaSQL.git
cd pandaSQL
python3 -m pip install .

How to Use

pandaSQL uses the same syntax that pandas does.

> import pandasql as ps
> df = ps.read_csv('my_data.csv')    # or ps.DataFrame(pandas_df)

A crucial difference between pandaSQL and pandas is that pandaSQL is lazy. This means that when you say:

> filtered = df[df['speed'] == 'fast']

filtered does not actually have any filtered results yet. Results are computed automatically when they are needed. For example, if you try to print filtered:

> print(filtered)
       name speed
0  pandaSQL  fast
1   SQLite3  fast

The results are automatically computed for you.

Development Note

pandaSQL is a fun project that I have been working on in my spare time. If you run into any issues, let me know!

About

A Pandas-inspired data analysis project with lazy semantics and query-offloading to SQLite

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages