You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
we found orc scan has poor perfomance while running tpcds benchmark:
the same scan operator is times slower than parquet (from tpcds q3).
To Reproduce
Steps to reproduce the behavior:
generate parquet and orc datasets using /tpcds/datagen.
run bechmarks on both datasets using /tpcds/benchmark-runner.
compare the performance of NativeParquetScan and NativeOrcScan.
Expected behavior
orc should have the similar performance comparing to parquet.
Screenshots
Edit
the main reason is that orc-rust reads all data without column pruning and predicate filtering, after applying column pruning with datafusion-contrib/datafusion-orc#133 , the performance will be much better:
currently orc is still 20%~30% slower than parquet, which maybe related to unsupported predicate filtering.
The text was updated successfully, but these errors were encountered:
Describe the bug
we found orc scan has poor perfomance while running tpcds benchmark:
the same scan operator is times slower than parquet (from tpcds q3).
To Reproduce
Steps to reproduce the behavior:
Expected behavior
orc should have the similar performance comparing to parquet.
Screenshots
Edit
the main reason is that orc-rust reads all data without column pruning and predicate filtering, after applying column pruning with datafusion-contrib/datafusion-orc#133 , the performance will be much better:
currently orc is still 20%~30% slower than parquet, which maybe related to unsupported predicate filtering.
The text was updated successfully, but these errors were encountered: