
Project Jupyter
The open-source standard for interactive computing, data science, and scientific research.
Vaex democratizes big data, making it accessible to anyone, on any machine, at any scale.

Vaex is a high-performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), enabling the exploration and visualization of large tabular datasets up to billions of rows on a single machine. It leverages memory mapping, efficient out-of-core algorithms, and a sophisticated expression system to handle datasets far larger than available RAM. Vaex integrates seamlessly with the Python data science ecosystem, including Pandas, NumPy, Scikit-learn, and Apache Arrow, providing a familiar API for data manipulation and analysis. Vaex is designed for data scientists, analysts, and engineers who need to work with large datasets without relying on distributed computing frameworks, allowing for rapid prototyping and deployment of data-driven solutions. It aims to improve business outcomes, reduce development time, and empower data scientists.
Vaex is a high-performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), enabling the exploration and visualization of large tabular datasets up to billions of rows on a single machine.
Explore all tools that specialize in memory mapping & out-of-core processing. This domain focus ensures Vaex delivers optimized results for this specific requirement.
Explore all tools that specialize in lazy evaluation. This domain focus ensures Vaex delivers optimized results for this specific requirement.
Explore all tools that specialize in real-time insights. This domain focus ensures Vaex delivers optimized results for this specific requirement.
Vaex uses memory mapping to open datasets directly from disk without loading them into RAM, enabling working with datasets larger than available memory. This approach significantly reduces memory usage and improves performance.
Vaex employs lazy evaluation, where calculations are only performed when needed. This optimizes performance by avoiding unnecessary computations and minimizing memory footprint.
Vaex's expression system allows users to perform complex calculations and transformations on DataFrames using a concise and intuitive syntax. Expressions are automatically optimized and executed efficiently.
Vaex provides optimized visualization routines for creating histograms, scatter plots, and other visualizations of large datasets. These routines leverage memory mapping and lazy evaluation for fast and interactive exploration.
Vaex integrates with Scikit-learn and other machine learning libraries to enable out-of-core model training. This allows users to build models on datasets larger than available RAM.
Install Vaex using pip: `pip install vaex`.
Import Vaex in your Python environment: `import vaex`.
Load a large dataset into a Vaex DataFrame using memory mapping: `df = vaex.open('your_data.hdf5')` or `df = vaex.from_csv('your_data.csv', sep=',')`.
Explore the DataFrame using familiar Pandas-like syntax: `df.head()`, `df.describe()`.
Visualize data using Vaex's fast plotting capabilities: `df.plot1d(df.x, selection=df.y > 10)`.
Perform calculations and transformations on the DataFrame using expressions: `df['z'] = df.x**2 + df.y`.
Build and train machine learning models using Scikit-learn integration: `from sklearn.linear_model import LinearRegression; model = LinearRegression(); model.fit(df.x.values.reshape(-1, 1), df.y.values); df['y_pred'] = model.predict(df.x.values.reshape(-1, 1))`.
All Set
Ready to go
Verified feedback from other users.
"Vaex is known for its speed and efficiency in handling large datasets, making it a powerful tool for data exploration and analysis."
0Post questions, share tips, and help other users.

The open-source standard for interactive computing, data science, and scientific research.

The collaborative workspace for data science and analytics, combining notebooks, data apps, and AI assistance in one platform.
Google Earth Engine is a planetary-scale platform for Earth science data and analysis, providing access to a multi-petabyte catalog of satellite imagery and geospatial datasets.

End-to-end AI data development platform for frontier AI and agentic systems.

End-to-end data science platform for faster insights and greater impact.

Accelerate data science workflows with open-source libraries on GPUs.