pandaSQL

pandaSQL is a data-analysis library inspired by pandas, but designed to use existing database optimization techniques. While pandaSQL provides the familiar pandas-like API, internally, it uses SQLite to get you results faster.

Install

pandaSQL can be installed via pip as follows:

git clone https://github.com/rohankumar42/pandaSQL.git
cd pandaSQL
python3 -m pip install .

How to Use

pandaSQL uses the same syntax that pandas does.

> import pandasql as ps
> df = ps.read_csv('my_data.csv')    # or ps.DataFrame(pandas_df)

A crucial difference between pandaSQL and pandas is that pandaSQL is lazy. This means that when you say:

> filtered = df[df['speed'] == 'fast']

filtered does not actually have any filtered results yet. Results are computed automatically when they are needed. For example, if you try to print filtered:

> print(filtered)
       name speed
0  pandaSQL  fast
1   SQLite3  fast

The results are automatically computed for you.

Development Note

pandaSQL is a fun project that I have been working on in my spare time. If you run into any issues, let me know!

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
benchmarks		benchmarks
pandasql		pandasql
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pandaSQL

Install

How to Use

Development Note

About

Releases

Packages

Contributors 2

Languages

License

rohankumar42/pandaSQL

Folders and files

Latest commit

History

Repository files navigation

pandaSQL

Install

How to Use

Development Note

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages