pandas-appender

Have you ever wanted to append a bunch of rows to a Pandas DataFrame? Turns out that it's extremely inefficient to do! For a large dataframe, you're supposed to make multiple dataframes and pd.concat() them instead.

Also, Pandas deprecated dataframe.append() in version 1.4 and intends to remove it in 2.0.

So... helper function? Pandas doesn't have one. Roll your own? Ugh. OK then: here's that helper function. It can append around 1 million very small rows per cpu-second. It has a modest additional memory usage of around 5 megabytes, dynamically growing with the number of rows appended.

Install

pip install pandas-appender

Usage

from pandas_appender import DF_Appender

dfa = DF_Appender(ignore_index=True)  # note that ignore_index moves to the init
for i in range(1_000_000):
    dfa = dfa.append({'i': i})

df = dfa.finalize()  # must call .finalize() before you can use the results

Type hints and category detection

Using narrower types and categories can often dramatically reduce the size of a DataFrame. There are two ways to do this in pandas-appender. One is to append to an existing dataframe:

dfa = DF_Appender(df, ignore_index=True)

and the second is to pass in a dtypes= argument:

dfa = DF_Appender(ignore_index=True, dtypes=another_dataframe.dtypes)

pandas-appender also offers a way to infer which columns would be smaller if they were categories. This code will either analyze an existing dataframe that you're appending to:

dfa = DF_Appender(df, ignore_index=True, infer_categories=True)

or it will analyze the first chunk of appended lines:

dfa = DF_Appender(ignore_index=True, infer_categories=True)

These inferred categories will override existing types or a dtypes= argument.

Incompatibilities with pandas.DataFrame.append()

DF_Appender must be finalized before use

Pandas: df_new = df.append() # df_new is a dataframe
DF_Appender: dfa_new = dfa.append() # must do df = dfa.finalize() to get a DataFrame

pandas.DataFame.append is idempotent, DF_Appender is not

Pandas: df_new = df.append() # df is not changed
DF_Appender: dfa_new = dfa.append() # modifies dfa, and dfa_new == dfa

pandas.DataFrame.append will promote types, while DF_Appender is strict

Pandas: append 0.1 to an integer column, and the column will be promoted to float
DF_Appender: when initialized with dtypes= or an existing DataFrame, appending 0.1 to an integer column causes 0.1 to be cast to an integer, i.e. 0.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
pandas_appender		pandas_appender
scripts		scripts
test		test
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pandas-appender

Install

Usage

Type hints and category detection

Incompatibilities with pandas.DataFrame.append()

DF_Appender must be finalized before use

pandas.DataFame.append is idempotent, DF_Appender is not

pandas.DataFrame.append will promote types, while DF_Appender is strict

About

Releases

Packages

Languages

License

wumpus/pandas-appender

Folders and files

Latest commit

History

Repository files navigation

pandas-appender

Install

Usage

Type hints and category detection

Incompatibilities with pandas.DataFrame.append()

DF_Appender must be finalized before use

pandas.DataFame.append is idempotent, DF_Appender is not

pandas.DataFrame.append will promote types, while DF_Appender is strict

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages