
Trino
Fast distributed SQL query engine for big data analytics.

Liberating data tables locked inside PDF files.

Tabula is an open-source tool designed to extract data tables from PDF documents into CSV, Microsoft Excel spreadsheets, or JSON files. It addresses the common problem of accessing and utilizing tabular data embedded in PDF files, particularly text-based PDFs. The tool operates through a user-friendly interface, allowing users to upload a PDF, select the desired table region by clicking and dragging a box, preview the extracted data, and then export it in the preferred format. Tabula can be installed on Windows, Mac, and Linux systems, requiring Java for Windows and Linux users. It's architecture focuses on providing a simple and intuitive way to liberate data. It's designed to be free and open-source, making it accessible to a wide range of users. Tabula is built to turn clunky documents into usable data formats, increasing efficiency.
Tabula is an open-source tool designed to extract data tables from PDF documents into CSV, Microsoft Excel spreadsheets, or JSON files.
Explore all tools that specialize in table detection. This domain focus ensures Tabula delivers optimized results for this specific requirement.
Allows users to define the area of the PDF page containing the table to be extracted, ignoring surrounding text and formatting.
Supports exporting extracted data in CSV, Microsoft Excel (.xlsx), and JSON formats.
Supports processing multiple PDF files in a batch, automating the extraction process for large datasets.
Provides a preview of the extracted data before exporting, enabling users to verify the accuracy of the extraction and adjust table selection if necessary.
Tabula is an open-source project, which allows users to contribute to its development, customize it to their needs, and use it without licensing fees.
Offers a graphical user interface for ease of use, allowing users without programming skills to perform data extraction tasks.
Download the appropriate version of Tabula for your operating system (Windows, Mac, or Linux).
If using Windows or Linux, ensure Java is installed. Download Java if necessary.
Extract the downloaded zip file to a local directory.
Navigate to the extracted folder and run the Tabula program.
If the web browser does not open automatically, open your web browser and go to http://localhost:8080.
Upload a PDF file containing a data table to Tabula.
Browse to the page containing the table, then select the table by clicking and dragging to draw a box around it.
Click 'Preview & Export Extracted Data'.
Inspect the data preview to ensure it looks correct. Adjust the selection if data is missing.
Click the 'Export' button and choose the desired file format (CSV, Excel, or JSON).
All Set
Ready to go
Verified feedback from other users.
"Generally positive reviews highlight its ease of use and effectiveness for extracting tabular data from PDFs, though some users experience issues with complex layouts."
Post questions, share tips, and help other users.

Fast distributed SQL query engine for big data analytics.

Unlocking insights from unstructured data.

A visual data science platform combining visual analytics, data science, and data wrangling.

Open Source OCR Engine capable of recognizing over 100 languages.

Move your data easily, securely, and efficiently with Stitch, now part of Qlik Talend Cloud.

Open Source High-Performance Data Warehouse delivering Sub-Second Analytics for End Users and Agents at Scale.