Compatibility

This project is just a wrapper. So, it inherits limitations and bugs from other projects. Sorry for that.

Limitations of Pandas like framework
pandas
All data must be in DRAM
modin
Read this
cudf
All data must be in VRAM
All data types in cuDF are nullable
Iterating over a cuDF Series, DataFrame or Index is not supported.
Join (or merge) and groupby operations in cuDF do not guarantee output ordering.
The order of operations is not always deterministic
Cudf does not support duplicate column names
Cudf also supports .apply() it relies on Numba to JIT compile the UDF and execute it on the GPU
.apply(result_type=...) not supported
dask
transpose() and MultiIndex are not implemented
Column assignment doesn't support type list
dask_cudf
See cudf and dask.
Categories with strings not implemented
pyspark
Read this

Limitations of Numpy like framework
numpy
All data must be in RAM
cupy
Read this
- block() not implemented
- delete() not implemented
- insert() not implemented
dask array
Read this
- identity() not implemented
- asfarray() not implemented
- asfortranarray() not implemented
- ascontiguousarray() not implemented
- asarray_chkfinite() not implemented
- require() not implemented
- column_stack() not implemented
- row_stack() not implemented
- split() not implemented
- resize() not implemented
- trim_zeros() not implemented
- in1d() not implemented
- intersect1d() not implemented
- setdiff1d() not implemented
- setxor1d() not implemented
- column_stack() not implemented
- row_stack() not implemented
- fromiter() not implemented

For compatibility between numpy and cupy, see here.

File format compatibility

To be compatible with all framework, you must only use the common features. We accept some function to read or write files, but we write a warning if you use a function not compatible with others frameworks.

read_... / to_...	pandas	cudf	modin	dask	dask_modin	dask_cudf	pyspark
vdf.read_csv	✓	✓	✓	✓	✓	✓	✓
VDataFrame.to_csv	✓	✓	✓	✓	✓	✓	✓
VSeries.to_csv	✓		✓	✓	✓	✓	✓
vdf.read_excel	✓		✓				✓
VDataFrame.to_excel	✓		✓				✓
VSeries.to_excel	✓		✓				✓
vdf.read_feather	✓	✓	✓
VDataFrame.to_feather	✓	✓	✓
vdf.read_fwf	✓		✓	✓	✓
vdf.read_hdf	✓	✓	✓	✓	✓
VDataFrame.to_hdf	✓	✓	✓	✓	✓
VSeries.to_hdf	✓	✓	✓	✓	✓
vdf.read_json	✓	✓	✓	✓	✓	✓	✓
VDataFrame.to_json	✓	✓	✓	✓	✓	✓	✓
VSeries.to_json	✓	✓	✓	✓	✓	✓	✓
vdf.read_orc	✓	✓	✓	✓	✓	✓	✓
VDataFrame.to_orc	✓	✓	✓	✓	✓	✓	✓
vdf.read_parquet	✓	✓	✓	✓	✓	✓	✓
VDataFrame.to_parquet	✓	✓	✓	✓	✓	✓	✓
vdf.read_sql_table	✓		✓	✓	✓		✓
VDataFrame.to_sql	✓		✓	✓	✓		✓
VSeries.to_sql	✓		✓	✓	✓		✓

load... / save...	numpy	cupy	dask.array
vpd.load() npy	✓	✓	✓
vpd.save() npy	✓	✓	✓
vpd.savez() npz	✓	✓
vpd.loadtxt()	✓	✓
vpd.savetxt()	✓	✓

Cross framework compatibility

	small data	middle data	big data
1-CPU	pandas, numpy Limits:+
n-CPU		modin, numpy Limits+	dask, dask_modin or pyspark and dask.array Limits:++
GPU	cudf, cupy Limits:++		dask_cudf, pyspark+spark-rapids and dask.array Limits:+++

To develop, you can choose the level to be compatible with others frameworks. Each cell is strongly compatible with the upper left part.

No need of GPU?

If you don't need to use a GPU, then develop for dask and use mode in bold.

	small data	middle data	big data
1-CPU	pandas, numpy Limits:+
n-CPU		modin, numpy Limits+	dask, dask_modin or pyspark and dask.array Limits:++
GPU	cudf, cupy Limits:++		dask_cudf, pyspark+spark-rapids and dask.array Limits:+++

You can ignore this API:

VDataFrame.apply_rows()

No need of big data?

If you don't need to use big data, then develop for cudf and use mode in bold..

	small data	middle data	big data
1-CPU	pandas, numpy Limits:+
n-CPU		modin, numpy Limits+	dask, dask_modin or pyspark and dask.array Limits:++
GPU	cudf, cupy Limits:++		dask_cudf, pyspark+spark-rapids and dask.array Limits:+++

You can ignore these API:

@delayed
map_partitions()
categorize()
compute()
npartitions=...

Need all possibility?

To be compatible with all modes, develop for dask_cudf.

	small data	middle data	big data
1-CPU	pandas, numpy Limits:+
n-CPU		modin, numpy Limits+	dask, dask_modin or pyspark and dask.array Limits:++
GPU	cudf, cupy Limits:++		dask_cudf, pyspark+spark-rapids and dask.array Limits:+++

and accept all the limitations.