Technical point of view
virtual_dataframe
framework patch others frameworks to unify the API.
VDF_MODE
:
Pandas like frameworks
Pandas
- Add
vdf.BackEndDataFrame = pandas.DataFrame
- Add
vdf.BackEndSeries = pandas.Series
- Add
vdf.BackEndArray = numpy.ndarray
- Add
vdf.BackEndPandas = pandas
- Add
vdf.FrontEndPandas = pandas
-
Add
vdf.FrontEndNumpy = numpy
-
Add
vdf.compute()
to return a tuple of args and be compatible withdask.compute()
- Add
vdf.concat()
an alias ofpanda.concat()
- Add
vdf.delayed()
to delay a calland be compatible withdask.delayed()
- Add
vdf.persist()
to parameters and empty image and be compatible withdask.persist()
-
Add
vdf.visualize()
to return an empty image and be compatible withdask.visualize()
-
Add
vdf.from_pandas()
to return df and be compatible with `dask.from_pandas() -
Add
vdf.from_backend()
an alias offrom_pandas()
-
Add
vdf.numpy
an alias ofnumpy
module -
Remove extra parameters used by Dask in:
*.to_csv()
,*.to_excel()
,*.to_feather()
,*.to_hdf()
,*.to_json()
- Update the pandas API to accept glob filename in:
vdf.read_csv()
,vdf.read_excel()
,vdf.read_feather()
,vdf.read_fwf()
,vdf.read_hdf
,vdf.read_json()
,vdf.read_orc()
,vdf.read_parquet()
,vdf.read_sql_table()
DF.to_csv()
,DF.to_excel()
,DF.to_feather()
,DF.to_hdf()
,DF.to_json()
,Series.to_csv()
,Series.to_excel()
,Series.to_hdf()
,Series.to_json()
- Add methods with
_not_implemented
DF.to_fwf()
- Add
DF.to_pandas()
to returnself
- Add
DF.to_backend()
to returnself
- Add
DF.to_ndarray()
an alias toto_numpy()
- Add
DF.apply_rows()
to be compatible withcudf.apply_rows()
- Add
DF.map_partitions()
to be compatible withdask.map_partitions()
- Add
DF.compute()
to returnself
and be compatible with `dask.DataFrame.compute() - Add
DF.repartition()
to returnself
and be compatible with `dask.DataFrame.repartition() - Add
DF.visualize()
to returnvisualize(self)
and be compatible with `dask.DataFrame.visualize() -
Add
DF.categorize()
to returnself
and be compatible with `dask.DataFrame.categorize() -
Add
Series.to_pandas()
to returnself
- Add
Series.to_backend()
to returnself
-
Add
Series.to_ndarray()
alias ofto_numpy
-
Add
Series.compute()
to returnself
and be compatible with `dask.Series.compute() - Add
Series.map_partitions()
to returnself.map()
and be compatible with `dask.Series.map_partitions() - Add
Series.persist()
to returnself
and be compatible with 'dask.Series.persist() - Add
Series.repartition()
to returnself
and be compatible with 'dask.Series.repartition() - Add
Series.visualize()
to returnvisualize(self)
and be compatible with 'dask.Series.visualize()
cudf
- Add
vdf.BackEndDataFrame = cudf.DataFrame
- Add
vdf.BackEndSeries = cudf.Series
- Add
vdf.BackEndArray = cupy.ndarray
- Add
vdf.BackEndPandas = cudf
- Add
vdf.FrontEndPandas = cudf
-
Add
vdf.FrontEndNumpy = cupy
-
Add
vdf.compute()
to return an tuple of args and be compatible withdask.compute()
- Add
vdf.concat()
an alias ofpanda.concat()
- Add
vdf.delayed()
to delay a calland be compatible withdask.delayed()
- Add
vdf.persist()
to parameters and empty image and be compatible withdask.persist()
-
Add
vdf.visualize()
to return an empty image and be compatible withdask.visualize()
-
Add
vdf.from_pandas()
to return df and be compatible with `dask.from_pandas() -
Add
vdf.from_backend()
an alias offrom_pandas()
-
Add
vdf.numpy
an alias ofcupy
module -
Remove extra parameters used by Dask in:
*.to_csv()
,*.to_excel()
,*.to_feather()
,*.to_hdf()
,*.to_json()
- Update the pandas API to accept glob filename in:
vdf.read_csv()
,vdf.read_feather()
,vdf.read_json()
DF.to_csv()
,DF.to_excel()
,DF.to_feather()
,DF.to_hdf()
,DF.to_json()
,Series.to_hdf()
,Series.to_json()
- Add methods with
_not_implemented
vdf.read_excel()
,vdf.read_fwf()
,vdf.read_sql_table()
DF.to_csv()
,DF.to_excel()
- Add
pandas.DataFrame.to_pandas()
to returnself
- Add
DF.to_backend()
to returnself
- Add
DF.to_ndarray()
to convert DataFrame tocupy.ndarray
- Add
DF.map_partitions()
to be compatible withdask.map_partitions()
- Add
DF.compute()
to returnself
and be compatible with `dask.DataFrame.compute() - Add
DF.repartition()
to returnself
and be compatible with `dask.DataFrame.repartition() - Add
DF.visualize()
to returnvisualize(self)
and be compatible with `dask.DataFrame.visualize() -
Add
DF.categorize()
to returnself
and be compatible with `dask.DataFrame.categorize() -
Add
pandas.Series.to_pandas()
to returnself
- Add
Series.to_backend()
to returnself
-
Add
Series.to_ndarray()
alias ofto_numpy
-
Add
Series.compute()
to returnself
and be compatible with `dask.Series.compute() - Add
Series.map_partitions()
to returnself.map()
and be compatible with `dask.Series.map_partitions() - Add
Series.persist()
to returnself
and be compatible with 'dask.Series.persist() - Add
Series.repartition()
to returnself
and be compatible with 'dask.Series.repartition() - Add
Series.visualize()
to returnvisualize(self)
and be compatible with 'dask.Series.visualize()
modin or dask_modin
- Set
MODIN_ENGINE=dask
fordask_modin
- Set
MODIN_ENGINE=python
formodin
- Add
vdf.BackEndDataFrame = modin.pandas.DataFrame
- Add
vdf.BackEndSeries = modin.pandas.Series
- Add
vdf.BackEndArray = numpy.ndarray
- Add
vdf.BackEndPandas = modin.pandas
- Add
vdf.FrontEndPandas = modin.pandas
-
Add
vdf.FrontEndNumpy = numpy
-
Add
vdf.compute()
to return a tuple of args and be compatible withdask.compute()
- Add
vdf.concat()
an alias ofmodin.pandas.concat()
- Add
vdf.delayed()
to delay a calland be compatible withdask.delayed()
- Add
vdf.persist()
to parameters and empty image and be compatible withdask.persist()
-
Add
vdf.visualize()
to return an empty image and be compatible withdask.visualize()
-
Add
vdf.from_pandas()
to return modin DataFrame or Series and be compatible with `dask.from_pandas() -
Add
vdf.from_backend()
an alias offrom_pandas()
-
Add
vdf.numpy
an alias ofnumpy
module -
Remove extra parameters used by Dask in:
*.to_csv()
,*.to_excel()
,*.to_feather()
,*.to_hdf()
,*.to_json()
- Add warning when using:
read_excel()
,read_feather()
,read_fwf()
,read_hdf()
,read_sql_table()
DF.to_excel()
,DF.to_feather()
,DF.to_hdf()
,DF.to_sql()
Series.to_csv()
,Series.to_excel()
,Series.to_hdf()
,Series.to_json()
- Update the pandas API to accept glob filename in:
vdf.read_excel()
,vdf.read_feather()
,vdf.read_fwf()
,vdf.read_hdf
,vdf.read_orc()
DF.to_excel()
,DF.to_feather()
,DF.to_hdf()
,DF.to_sql()
Series.to_csv()
,Series.to_excel()
,Series.to_hdf()
,Series.to_json()
- Add methods with
_not_implemented
DF.to_orc()
- Add
DF.to_pandas()
to convert topanda.DataFrame
- Add
DF.to_backend()
to returnself
-
Add
DF.to_ndarray()
an alias toto_numpy()
-
Add
DF.apply_rows()
to be compatible withcudf.apply_rows()
- Add
DF.map_partitions()
to be compatible withdask.map_partitions()
- Add
DF.compute()
to returnself
and be compatible with `dask.DataFrame.compute() - Add
DF.repartition()
to returnself
and be compatible with `dask.DataFrame.repartition() - Add
DF.visualize()
to returnvisualize(self)
and be compatible with `dask.DataFrame.visualize() -
Add
DF.categorize()
to returnself
and be compatible with `dask.DataFrame.categorize() -
Add
Series.to_pandas()
to returnmodin.pandas.Series.to_pandas()
- Add
Series.to_backend()
to returnself
-
Add
Series.to_ndarray()
alias ofto_numpy
-
Add
Series.compute()
to returnself
and be compatible with `dask.Series.compute() - Add
Series.map_partitions()
to returnself.map()
and be compatible with `dask.Series.map_partitions() - Add
Series.persist()
to returnself
and be compatible with 'dask.Series.persist() - Add
Series.repartition()
to returnself
and be compatible with 'dask.Series.repartition() -
Add
Series.visualize()
to returnvisualize(self)
and be compatible with 'dask.Series.visualize() -
And all patch in pandas
dask
- Add
vdf.BackEndDataFrame = pandas.DataFrame
- Add
vdf.BackEndSeries = pandas.Series
- Add
vdf.BackEndArray = numpy.ndarray
- Add
vdf.BackEndPandas = pandas
- Add
vdf.FrontEndPandas = dask.dataframe
-
Add
vdf.FrontEndNumpy = dask.array
-
Add
vdf.concat()
an alias ofdask.dataframe.multi.concat()
-
Add
vdf.from_pandas()
an alias ofdask.dataframe.from_pandas()
-
Add
vdf.from_backend()
an alias offrom_pandas()
-
Add
vdf.numpy
an alias ofnumpy
module -
Add warning in:
read_fwf()
,read_hdf()
,read_sql_table()
- Add methods with
_not_implemented
read_excel()
,read_feather()
DF.to_excel()
,DF.to_feather()
,DF.to_fwf()
- Add
DF.to_pandas()
to returnself.compute()
- Add
DF.to_backend()
an alias ofto_pandas()
- Add
DF.to_numpy()
to returnself.compute().to_numpy()
- Add
DF.to_ndarray()
an alias todask.DataFrame.to_dask_array()
- Add
DF.apply_rows()
to be compatible withcudf.apply_rows()
- Patch
DF.to_sql()
andSeries.to_sql()
to acceptcon
oruri
- Add
Series.to_pandas()
to returnself.compute()
- Add
Series.to_backend()
an alias ofto_pandas()
- Add
Series.to_numpy()
to returnself.compute().to_numpy()
-
Add
Series.to_ndarray()
alias ofdask.dataframe.Series.to_dask_array()
-
And all patch in pandas
dask_cudf
- Add
vdf.BackEndDataFrame = cudf.DataFrame
- Add
vdf.BackEndSeries = cudf.Series
- Add
vdf.BackEndArray = cudf
- Add
vdf.BackEndPandas = pandas
- Add
vdf.FrontEndPandas = dask_cudf
-
Add
vdf.FrontEndNumpy = cupy
-
Add
vdf.compute()
todask.compute()
- Add
vdf.concat()
todask.dataframe.multi.concat()
- Add
vdf.delayed()
todask.delayed()
- Add
vdf.persist()
todask.persist()
-
Add
vdf.visualize()
todask.visualize()
-
Add
vdf.from_pandas()
todask_cudf.from_cudf()
-
Add
vdf.from_backend()
todask_cudf.from_cudf()
-
Add
vdf.numpy
an alias ofcupy
module -
Add a warning in:
Series.to_hdf()
,Series.to_json()
- Add methods with
_not_implemented
read_excel()
,read_feather()
,read_fwf()
,read_hdf()
,read_sql_table()
DF.to_excel()
,DF.to_feather()
,DF.to_fwf()
,DF.to_hdf()
,DF.to_sql()
,Series.to_csv()
,Series.to_excel()
,- Add
DF.to_pandas()
to returnself.compute().to_pandas()
- Add
DF.to_backend()
to returnself.compute()
and returncudf.DataFrame
- Add
DF.to_numpy()
toself.compute().to_numpy()
-
Add
DF.to_ndarray()
an alias toself.compute()
and returncudf.DataFrame
-
Add
Series.to_pandas()
to returnself.compute().to_pandas()
- Add
Series.to_backend()
to returnself.compute()
and returncudf.Series
- Add
Series.to_numpy()
to returnself.compute().to_numpy()
-
Add
Series.to_ndarray()
to return acudf.Series
-
Add
Series.compute()
to returnself
and be compatible with `dask.Series.compute() - Add
Series.map_partitions()
to returnself.map()
and be compatible with `dask.Series.map_partitions() - Add
Series.persist()
to returnself
and be compatible with 'dask.Series.persist() - Add
Series.repartition()
to returnself
and be compatible with 'dask.Series.repartition() -
Add
Series.visualize()
to returnvisualize(self)
and be compatible with 'dask.Series.visualize() -
And all patch in cudf
pyspark
- Add
vdf.BackEndDataFrame = pandas.DataFrame
- Add
vdf.BackEndSeries = pandas.Series
- Add
vdf.BackEndArray = numpy.ndarray
- Add
vdf.BackEndPandas = pandas
- Add
vdf.FrontEndPandas = pyspark.pandas
-
Add
vdf.FrontEndNumpy = numpy
-
Add
vdf.compute()
to return a tuple of args and be compatible withdask.compute()
- Add
vdf.concat()
an alias ofpyspark.pandas.concat()
- Add
vdf.delayed()
to delay a call and be compatible withdask.delayed()
- Add
vdf.persist()
to persist the current DF -
Add
vdf.visualize()
to return an empty image and be compatible withdask.visualize()
-
Add
vdf.from_backend()
an alias offrom_pandas()
-
Add
vdf.numpy
an alias ofnumpy
module -
Remove extra parameters used by Dask in:
*.to_csv()
,*.to_excel()
,*.to_feather()
,*.to_hdf()
,*.to_json()
from_pandas()
- Add warning in:
read_excel()
,reql_sql_table()
- Update the pandas API to accept glob filename in:
vdf.read_csv()
,vdf.read_excel()
,vdf.read_json()
,vdf.read_orc()
DF.to_csv()
,DF.to_excel()
,DF.to_feather()
,DF.to_hdf()
,DF.to_json()
,Series.to_csv()
,Series.to_excel()
,Series.to_hdf()
,Series.to_json()
- Add methods with
_not_implemented
vdf.read_feather()
,vdf.read_fwf()
,vdf.read_hdf()
DF.to_sql()
,Series.to_sql()
- Add
DF.to_backend()
an alias ofto_pandas()
-
Add
DF.to_ndarray()
an alias toto_numpy()
-
Add
DF.apply_rows()
to be compatible withcudf.apply_rows()
- Add
DF.categorize()
to returnself
and be compatible with `dask.DataFrame.categorize() - Add
DF.compute()
to returnself
and be compatible with `dask.DataFrame.compute() - Add
DF.map_partitions()
to be compatible withdask.map_partitions()
- Add
DF.persist()
to returnself
and be compatible with `dask.DataFrame.visualize() - Add
DF.repartition()
to returnself
and be compatible with `dask.DataFrame.repartition() -
Add
DF.visualize()
to returnvisualize(self)
and be compatible with `dask.DataFrame.visualize() -
Add
Series.to_backend()
alias ofto_pandas()
-
Add
Series.to_ndarray()
alias ofto_numpy()
-
Add
Series.compute()
to returnself
and be compatible with `dask.Series.compute() - Add
Series.map_partitions()
to returnself.map()
and be compatible with `dask.Series.map_partitions() - Add
Series.persist()
to returnself
and be compatible with 'dask.Series.persist() - Add
Series.repartition()
to returnself
and be compatible with 'dask.Series.repartition() - Add
Series.visualize()
to returnvisualize(self)
and be compatible with 'dask.Series.visualize()
Numpy like familly
Numpy
It's not possible to update some method in numpy.ndarray
.
vdf.numpy
is an alias ofnumpy
- Add
vdf.numpy.asnumpy(ar)
to returnar
- Add
vdf.numpy.asndarray(ar)
to returnar.to_numpy()
- Add
vdf.numpy.compute(...)
to return a tuple with parameters - Add
vdf.numpy.compute_chunk_sizes(ar)
to returnar
- Add
vdf.numpy.rechunk(ar)
to returnar
- Add
vdf.numpy.arange()
, remove the parameterchunks
, invokenumpy.arange()
and return a view withVndarray
- Add
vdf.numpy.from_array()
, remove the parameterchunks
, invokenumpy.arange()
and return a view withVndarray
- Add
vdf.numpy.load()
to remove the parameterchunks
- Add
vdf.numpy.save()
to remove the parameterchunks
- Add
vdf.numpy.savez()
to remove the parameterchunks
- Add
vdf.numpy.random.*
to remove the parameterchunks
cupy
vdf.numpy
is an alias ofcupy
- Add
vdf.numpy.asndarray(ar)
to returnar.to_numpy()
- Add
vdf.numpy.compute(...)
to return a tuple with parameters - Add
vdf.numpy.compute_chunk_sizes(ar)
to returnar
- Add
vdf.numpy.rechunk(ar)
to returnar
- Add
vdf.numpy.arange()
, remove the parameterchunks
, invokenumpy.arange()
and return a view withVndarray
- Add
vdf.numpy.from_array()
, remove the parameterchunks
, invokenumpy.arange()
and return a view withVndarray
- Add
vdf.numpy.load()
to remove the parameterchunks
- Add
vdf.numpy.save()
to remove the parameterchunks
- Add
vdf.numpy.savez()
to remove the parameterchunks
- Add
vdf.numpy.random.*
to remove the parameterchunks
dask_array
vdf.numpy
is an alias ofdasl.array
- Add
vdf.numpy.asarray(ar)
to return array of numpy or cupy - Add
vdf.numpy.asndarray(ar)
to returnar.to_numpy()
- Add
vdf.numpy.compute(...)
to return a tuple with parameters - Add
vdf.numpy.compute_chunk_sizes(ar)
to returnar
- Add
vdf.numpy.rechunk(ar)
to returnar
- Add
vdf.numpy.arange()
, remove the parameterchunks
, invokenumpy.arange()
and return a view withVndarray
- Add
vdf.numpy.from_array()
, remove the parameterchunks
, invokenumpy.arange()
and return a view withVndarray
- Add
vdf.numpy.load()
to remove the parameterchunks
- Add
vdf.numpy.save()
to remove the parameterchunks
- Add
vdf.numpy.savez()
to remove the parameterchunks
- Add
vdf.numpy.random.*
to remove the parameterchunks