Technical point of view
virtual_dataframe framework patch others frameworks to unify the API.
VDF_MODE:
Pandas like frameworks
Pandas
- Add
vdf.BackEndDataFrame = pandas.DataFrame - Add
vdf.BackEndSeries = pandas.Series - Add
vdf.BackEndArray = numpy.ndarray - Add
vdf.BackEndPandas = pandas - Add
vdf.FrontEndPandas = pandas -
Add
vdf.FrontEndNumpy = numpy -
Add
vdf.compute()to return a tuple of args and be compatible withdask.compute() - Add
vdf.concat()an alias ofpanda.concat() - Add
vdf.delayed()to delay a calland be compatible withdask.delayed() - Add
vdf.persist()to parameters and empty image and be compatible withdask.persist() -
Add
vdf.visualize()to return an empty image and be compatible withdask.visualize() -
Add
vdf.from_pandas()to return df and be compatible with `dask.from_pandas() -
Add
vdf.from_backend()an alias offrom_pandas() -
Add
vdf.numpyan alias ofnumpymodule -
Remove extra parameters used by Dask in:
*.to_csv(),*.to_excel(),*.to_feather(),*.to_hdf(),*.to_json()- Update the pandas API to accept glob filename in:
vdf.read_csv(),vdf.read_excel(),vdf.read_feather(),vdf.read_fwf(),vdf.read_hdf,vdf.read_json(),vdf.read_orc(),vdf.read_parquet(),vdf.read_sql_table()DF.to_csv(),DF.to_excel(),DF.to_feather(),DF.to_hdf(),DF.to_json(),Series.to_csv(),Series.to_excel(),Series.to_hdf(),Series.to_json()- Add methods with
_not_implemented DF.to_fwf()- Add
DF.to_pandas()to returnself - Add
DF.to_backend()to returnself - Add
DF.to_ndarray()an alias toto_numpy() - Add
DF.apply_rows()to be compatible withcudf.apply_rows() - Add
DF.map_partitions()to be compatible withdask.map_partitions() - Add
DF.compute()to returnselfand be compatible with `dask.DataFrame.compute() - Add
DF.repartition()to returnselfand be compatible with `dask.DataFrame.repartition() - Add
DF.visualize()to returnvisualize(self)and be compatible with `dask.DataFrame.visualize() -
Add
DF.categorize()to returnselfand be compatible with `dask.DataFrame.categorize() -
Add
Series.to_pandas()to returnself - Add
Series.to_backend()to returnself -
Add
Series.to_ndarray()alias ofto_numpy -
Add
Series.compute()to returnselfand be compatible with `dask.Series.compute() - Add
Series.map_partitions()to returnself.map()and be compatible with `dask.Series.map_partitions() - Add
Series.persist()to returnselfand be compatible with 'dask.Series.persist() - Add
Series.repartition()to returnselfand be compatible with 'dask.Series.repartition() - Add
Series.visualize()to returnvisualize(self)and be compatible with 'dask.Series.visualize()
cudf
- Add
vdf.BackEndDataFrame = cudf.DataFrame - Add
vdf.BackEndSeries = cudf.Series - Add
vdf.BackEndArray = cupy.ndarray - Add
vdf.BackEndPandas = cudf - Add
vdf.FrontEndPandas = cudf -
Add
vdf.FrontEndNumpy = cupy -
Add
vdf.compute()to return an tuple of args and be compatible withdask.compute() - Add
vdf.concat()an alias ofpanda.concat() - Add
vdf.delayed()to delay a calland be compatible withdask.delayed() - Add
vdf.persist()to parameters and empty image and be compatible withdask.persist() -
Add
vdf.visualize()to return an empty image and be compatible withdask.visualize() -
Add
vdf.from_pandas()to return df and be compatible with `dask.from_pandas() -
Add
vdf.from_backend()an alias offrom_pandas() -
Add
vdf.numpyan alias ofcupymodule -
Remove extra parameters used by Dask in:
*.to_csv(),*.to_excel(),*.to_feather(),*.to_hdf(),*.to_json()- Update the pandas API to accept glob filename in:
vdf.read_csv(),vdf.read_feather(),vdf.read_json()DF.to_csv(),DF.to_excel(),DF.to_feather(),DF.to_hdf(),DF.to_json(),Series.to_hdf(),Series.to_json()- Add methods with
_not_implemented vdf.read_excel(),vdf.read_fwf(),vdf.read_sql_table()DF.to_csv(),DF.to_excel()- Add
pandas.DataFrame.to_pandas()to returnself - Add
DF.to_backend()to returnself - Add
DF.to_ndarray()to convert DataFrame tocupy.ndarray - Add
DF.map_partitions()to be compatible withdask.map_partitions() - Add
DF.compute()to returnselfand be compatible with `dask.DataFrame.compute() - Add
DF.repartition()to returnselfand be compatible with `dask.DataFrame.repartition() - Add
DF.visualize()to returnvisualize(self)and be compatible with `dask.DataFrame.visualize() -
Add
DF.categorize()to returnselfand be compatible with `dask.DataFrame.categorize() -
Add
pandas.Series.to_pandas()to returnself - Add
Series.to_backend()to returnself -
Add
Series.to_ndarray()alias ofto_numpy -
Add
Series.compute()to returnselfand be compatible with `dask.Series.compute() - Add
Series.map_partitions()to returnself.map()and be compatible with `dask.Series.map_partitions() - Add
Series.persist()to returnselfand be compatible with 'dask.Series.persist() - Add
Series.repartition()to returnselfand be compatible with 'dask.Series.repartition() - Add
Series.visualize()to returnvisualize(self)and be compatible with 'dask.Series.visualize()
modin or dask_modin
- Set
MODIN_ENGINE=daskfordask_modin - Set
MODIN_ENGINE=pythonformodin - Add
vdf.BackEndDataFrame = modin.pandas.DataFrame - Add
vdf.BackEndSeries = modin.pandas.Series - Add
vdf.BackEndArray = numpy.ndarray - Add
vdf.BackEndPandas = modin.pandas - Add
vdf.FrontEndPandas = modin.pandas -
Add
vdf.FrontEndNumpy = numpy -
Add
vdf.compute()to return a tuple of args and be compatible withdask.compute() - Add
vdf.concat()an alias ofmodin.pandas.concat() - Add
vdf.delayed()to delay a calland be compatible withdask.delayed() - Add
vdf.persist()to parameters and empty image and be compatible withdask.persist() -
Add
vdf.visualize()to return an empty image and be compatible withdask.visualize() -
Add
vdf.from_pandas()to return modin DataFrame or Series and be compatible with `dask.from_pandas() -
Add
vdf.from_backend()an alias offrom_pandas() -
Add
vdf.numpyan alias ofnumpymodule -
Remove extra parameters used by Dask in:
*.to_csv(),*.to_excel(),*.to_feather(),*.to_hdf(),*.to_json()- Add warning when using:
read_excel(),read_feather(),read_fwf(),read_hdf(),read_sql_table()DF.to_excel(),DF.to_feather(),DF.to_hdf(),DF.to_sql()Series.to_csv(),Series.to_excel(),Series.to_hdf(),Series.to_json()- Update the pandas API to accept glob filename in:
vdf.read_excel(),vdf.read_feather(),vdf.read_fwf(),vdf.read_hdf,vdf.read_orc()DF.to_excel(),DF.to_feather(),DF.to_hdf(),DF.to_sql()Series.to_csv(),Series.to_excel(),Series.to_hdf(),Series.to_json()- Add methods with
_not_implemented DF.to_orc()- Add
DF.to_pandas()to convert topanda.DataFrame - Add
DF.to_backend()to returnself -
Add
DF.to_ndarray()an alias toto_numpy() -
Add
DF.apply_rows()to be compatible withcudf.apply_rows() - Add
DF.map_partitions()to be compatible withdask.map_partitions() - Add
DF.compute()to returnselfand be compatible with `dask.DataFrame.compute() - Add
DF.repartition()to returnselfand be compatible with `dask.DataFrame.repartition() - Add
DF.visualize()to returnvisualize(self)and be compatible with `dask.DataFrame.visualize() -
Add
DF.categorize()to returnselfand be compatible with `dask.DataFrame.categorize() -
Add
Series.to_pandas()to returnmodin.pandas.Series.to_pandas() - Add
Series.to_backend()to returnself -
Add
Series.to_ndarray()alias ofto_numpy -
Add
Series.compute()to returnselfand be compatible with `dask.Series.compute() - Add
Series.map_partitions()to returnself.map()and be compatible with `dask.Series.map_partitions() - Add
Series.persist()to returnselfand be compatible with 'dask.Series.persist() - Add
Series.repartition()to returnselfand be compatible with 'dask.Series.repartition() -
Add
Series.visualize()to returnvisualize(self)and be compatible with 'dask.Series.visualize() -
And all patch in pandas
dask
- Add
vdf.BackEndDataFrame = pandas.DataFrame - Add
vdf.BackEndSeries = pandas.Series - Add
vdf.BackEndArray = numpy.ndarray - Add
vdf.BackEndPandas = pandas - Add
vdf.FrontEndPandas = dask.dataframe -
Add
vdf.FrontEndNumpy = dask.array -
Add
vdf.concat()an alias ofdask.dataframe.multi.concat() -
Add
vdf.from_pandas()an alias ofdask.dataframe.from_pandas() -
Add
vdf.from_backend()an alias offrom_pandas() -
Add
vdf.numpyan alias ofnumpymodule -
Add warning in:
read_fwf(),read_hdf(),read_sql_table()- Add methods with
_not_implemented read_excel(),read_feather()DF.to_excel(),DF.to_feather(),DF.to_fwf()- Add
DF.to_pandas()to returnself.compute() - Add
DF.to_backend()an alias ofto_pandas() - Add
DF.to_numpy()to returnself.compute().to_numpy() - Add
DF.to_ndarray()an alias todask.DataFrame.to_dask_array() - Add
DF.apply_rows()to be compatible withcudf.apply_rows() - Patch
DF.to_sql()andSeries.to_sql()to acceptconoruri - Add
Series.to_pandas()to returnself.compute() - Add
Series.to_backend()an alias ofto_pandas() - Add
Series.to_numpy()to returnself.compute().to_numpy() -
Add
Series.to_ndarray()alias ofdask.dataframe.Series.to_dask_array() -
And all patch in pandas
dask_cudf
- Add
vdf.BackEndDataFrame = cudf.DataFrame - Add
vdf.BackEndSeries = cudf.Series - Add
vdf.BackEndArray = cudf - Add
vdf.BackEndPandas = pandas - Add
vdf.FrontEndPandas = dask_cudf -
Add
vdf.FrontEndNumpy = cupy -
Add
vdf.compute()todask.compute() - Add
vdf.concat()todask.dataframe.multi.concat() - Add
vdf.delayed()todask.delayed() - Add
vdf.persist()todask.persist() -
Add
vdf.visualize()todask.visualize() -
Add
vdf.from_pandas()todask_cudf.from_cudf() -
Add
vdf.from_backend()todask_cudf.from_cudf() -
Add
vdf.numpyan alias ofcupymodule -
Add a warning in:
Series.to_hdf(),Series.to_json()- Add methods with
_not_implemented read_excel(),read_feather(),read_fwf(),read_hdf(),read_sql_table()DF.to_excel(),DF.to_feather(),DF.to_fwf(),DF.to_hdf(),DF.to_sql(),Series.to_csv(),Series.to_excel(),- Add
DF.to_pandas()to returnself.compute().to_pandas() - Add
DF.to_backend()to returnself.compute()and returncudf.DataFrame - Add
DF.to_numpy()toself.compute().to_numpy() -
Add
DF.to_ndarray()an alias toself.compute()and returncudf.DataFrame -
Add
Series.to_pandas()to returnself.compute().to_pandas() - Add
Series.to_backend()to returnself.compute()and returncudf.Series - Add
Series.to_numpy()to returnself.compute().to_numpy() -
Add
Series.to_ndarray()to return acudf.Series -
Add
Series.compute()to returnselfand be compatible with `dask.Series.compute() - Add
Series.map_partitions()to returnself.map()and be compatible with `dask.Series.map_partitions() - Add
Series.persist()to returnselfand be compatible with 'dask.Series.persist() - Add
Series.repartition()to returnselfand be compatible with 'dask.Series.repartition() -
Add
Series.visualize()to returnvisualize(self)and be compatible with 'dask.Series.visualize() -
And all patch in cudf
pyspark
- Add
vdf.BackEndDataFrame = pandas.DataFrame - Add
vdf.BackEndSeries = pandas.Series - Add
vdf.BackEndArray = numpy.ndarray - Add
vdf.BackEndPandas = pandas - Add
vdf.FrontEndPandas = pyspark.pandas -
Add
vdf.FrontEndNumpy = numpy -
Add
vdf.compute()to return a tuple of args and be compatible withdask.compute() - Add
vdf.concat()an alias ofpyspark.pandas.concat() - Add
vdf.delayed()to delay a call and be compatible withdask.delayed() - Add
vdf.persist()to persist the current DF -
Add
vdf.visualize()to return an empty image and be compatible withdask.visualize() -
Add
vdf.from_backend()an alias offrom_pandas() -
Add
vdf.numpyan alias ofnumpymodule -
Remove extra parameters used by Dask in:
*.to_csv(),*.to_excel(),*.to_feather(),*.to_hdf(),*.to_json()from_pandas()- Add warning in:
read_excel(),reql_sql_table()- Update the pandas API to accept glob filename in:
vdf.read_csv(),vdf.read_excel(),vdf.read_json(),vdf.read_orc()DF.to_csv(),DF.to_excel(),DF.to_feather(),DF.to_hdf(),DF.to_json(),Series.to_csv(),Series.to_excel(),Series.to_hdf(),Series.to_json()- Add methods with
_not_implemented vdf.read_feather(),vdf.read_fwf(),vdf.read_hdf()DF.to_sql(),Series.to_sql()- Add
DF.to_backend()an alias ofto_pandas() -
Add
DF.to_ndarray()an alias toto_numpy() -
Add
DF.apply_rows()to be compatible withcudf.apply_rows() - Add
DF.categorize()to returnselfand be compatible with `dask.DataFrame.categorize() - Add
DF.compute()to returnselfand be compatible with `dask.DataFrame.compute() - Add
DF.map_partitions()to be compatible withdask.map_partitions() - Add
DF.persist()to returnselfand be compatible with `dask.DataFrame.visualize() - Add
DF.repartition()to returnselfand be compatible with `dask.DataFrame.repartition() -
Add
DF.visualize()to returnvisualize(self)and be compatible with `dask.DataFrame.visualize() -
Add
Series.to_backend()alias ofto_pandas() -
Add
Series.to_ndarray()alias ofto_numpy() -
Add
Series.compute()to returnselfand be compatible with `dask.Series.compute() - Add
Series.map_partitions()to returnself.map()and be compatible with `dask.Series.map_partitions() - Add
Series.persist()to returnselfand be compatible with 'dask.Series.persist() - Add
Series.repartition()to returnselfand be compatible with 'dask.Series.repartition() - Add
Series.visualize()to returnvisualize(self)and be compatible with 'dask.Series.visualize()
Numpy like familly
Numpy
It's not possible to update some method in numpy.ndarray.
vdf.numpyis an alias ofnumpy- Add
vdf.numpy.asnumpy(ar)to returnar - Add
vdf.numpy.asndarray(ar)to returnar.to_numpy() - Add
vdf.numpy.compute(...)to return a tuple with parameters - Add
vdf.numpy.compute_chunk_sizes(ar)to returnar - Add
vdf.numpy.rechunk(ar)to returnar - Add
vdf.numpy.arange(), remove the parameterchunks, invokenumpy.arange()and return a view withVndarray - Add
vdf.numpy.from_array(), remove the parameterchunks, invokenumpy.arange()and return a view withVndarray - Add
vdf.numpy.load()to remove the parameterchunks - Add
vdf.numpy.save()to remove the parameterchunks - Add
vdf.numpy.savez()to remove the parameterchunks - Add
vdf.numpy.random.*to remove the parameterchunks
cupy
vdf.numpyis an alias ofcupy- Add
vdf.numpy.asndarray(ar)to returnar.to_numpy() - Add
vdf.numpy.compute(...)to return a tuple with parameters - Add
vdf.numpy.compute_chunk_sizes(ar)to returnar - Add
vdf.numpy.rechunk(ar)to returnar - Add
vdf.numpy.arange(), remove the parameterchunks, invokenumpy.arange()and return a view withVndarray - Add
vdf.numpy.from_array(), remove the parameterchunks, invokenumpy.arange()and return a view withVndarray - Add
vdf.numpy.load()to remove the parameterchunks - Add
vdf.numpy.save()to remove the parameterchunks - Add
vdf.numpy.savez()to remove the parameterchunks - Add
vdf.numpy.random.*to remove the parameterchunks
dask_array
vdf.numpyis an alias ofdasl.array- Add
vdf.numpy.asarray(ar)to return array of numpy or cupy - Add
vdf.numpy.asndarray(ar)to returnar.to_numpy() - Add
vdf.numpy.compute(...)to return a tuple with parameters - Add
vdf.numpy.compute_chunk_sizes(ar)to returnar - Add
vdf.numpy.rechunk(ar)to returnar - Add
vdf.numpy.arange(), remove the parameterchunks, invokenumpy.arange()and return a view withVndarray - Add
vdf.numpy.from_array(), remove the parameterchunks, invokenumpy.arange()and return a view withVndarray - Add
vdf.numpy.load()to remove the parameterchunks - Add
vdf.numpy.save()to remove the parameterchunks - Add
vdf.numpy.savez()to remove the parameterchunks - Add
vdf.numpy.random.*to remove the parameterchunks