Installation

$ conda install -c conda-forge "virtual_dataframe"

Installing with pip

Use

$ pip install "virtual_dataframe"

Installing from the GitHub main branch

$ pip install "virtual_dataframe@git+https://github.com/pprados/virtual-dataframe"

Dependencies

You must install all others frameworks to use it with virtual_dataframe.

You can create a set of virtual environment, with the tools:

$ build-conda-vdf-env --help

like

$ build-conda-vdf-env pandas cudf dask_cudf pyspark pyspark_gpu-local
$ conda env list
$ conda activate vdf-cudf
$ conda activate vdf-dask_cudf-local

The VDF_MODE is set for each environment. If you create an environment for a dask, spark or pyspark framework, two environment will be created. One vdf-XXX where you must set the VDF_CLUSTER variable and another vdf-XXX-local with a pre set of VDF_CLUSTER=dask://.local or VDF_CLUSTER=spark://.local to use a local cluster.

For pyspark_gpù, somes environment variables will be set, to reference the rapids-4-spark_2.12-22.10.0.jar file. You must have this file in the root of your project.

You can find all environement Yaml file here.

You can remove all or specific versions with:

$ build-conda-vdf-envs --remove