Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
fastparquet is a Python library designed to handle the Parquet file format, which is commonly used in big data workflows. It integrates well with other Python-based data processing tools like Dask and Pandas. Let’s explore its features and see some code examples. Later in this article, we will also learn about IronPDF, a PDF generation library from Iron Software.
fastparquet is efficient and supports a wide range of Parquet features. Some of its key features include:
Easily read from and write to Parquet files and other data files.
Seamlessly work with Pandas DataFrames and Dask for parallel processing.
Supports various compression algorithms like gzip, snappy, brotli, lz4, and zstandard in data files.
Optimized for both storage and retrieval of large datasets or data files using parquet columnar file format and metadata file pointing to file.
You can install fastparquet using pip:
pip install fastparquet
Or using conda:
conda install -c conda-forge fastparquet
Here’s a simple example to get you started with fastparquet.
You can write a Pandas DataFrame to a Parquet file:
import pandas as pd
import fastparquet
# Create a sample DataFrame
df = {
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['New York', 'Los Angeles', 'Chicago']
}
# Write the DataFrame to single output file using single file path
df.to_parquet('example.parquet', engine='fastparquet')
# Display message
print("DataFrame successfully written to 'example.parquet'.")
You can read a Parquet file into a Pandas DataFrame:
import pandas as pd
import fastparquet
# Read a Parquet file
df = pd.read_parquet('example.parquet', engine='fastparquet')
# Display the DataFrame
print(df.head())
import fastparquet as fp
# Reading metadata from Parquet file
meta = fp.ParquetFile('example.parquet').metadata
print("Parquet file metadata:")
print(meta)
fastparquet python integrates well with Dask for handling large datasets in parallel:
import dask.dataframe as dd
# Read a Parquet file into a Dask DataFrame
ddf = dd.read_parquet('example.parquet', engine='fastparquet')
# Perform operations on the Dask DataFrame
result = ddf.groupby('name').mean().compute()
# Display the result for simple data types
print(result)
You can specify different compression algorithms when writing Parquet files:
import pandas as pd
import fastparquet
# Create a sample DataFrame
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35]
})
# Write the DataFrame to a Parquet file with gzip compression
df.to_parquet('example.parquet', engine='fastparquet', compression='gzip')
IronPDF is a robust Python library crafted for generating, modifying, and digitally signing PDF documents derived from HTML, CSS, images, and JavaScript. It excels in performance while maintaining a minimal memory footprint. Here are its key features:
Convert HTML files, HTML strings, and URLs into PDF documents with IronPDF. For instance, effortlessly render webpages into PDFs using the Chrome PDF renderer.
Compatible with Python 3+ across Windows, Mac, Linux, and various Cloud Platforms. IronPDF is also accessible for .NET, Java, Python, and Node.js environments.
Modify document properties, enhance security with password protection and permissions, and integrate digital signatures into your PDFs using IronPDF.
Tailor PDFs with customized headers, footers, page numbers, and adjustable margins. It supports responsive layouts and accommodates custom paper sizes.
Conforms to PDF standards like PDF/A and PDF/UA. It handles UTF-8 character encoding and manages assets such as images, CSS stylesheets, and fonts effectively.
# install latest version of the libraries
pip install fastparquet
pip install pandas
pip install ironpdf
The following code example demonstrates the use of fastparquet and IronPDF together in Python:
import pandas as pd
import fastparquet as fp
from ironpdf import *
# Apply your license key
License.LicenseKey = "your Key"
# Sample DataFrame
data = {
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
# Writing DataFrame to a Parquet file
fp.write('example.parquet', df)
# Reading from Parquet file into DataFrame
df_read = fp.ParquetFile('example.parquet').to_pandas()
# Displaying the read DataFrame
print("Original DataFrame:")
print(df)
print("\nDataFrame read from Parquet file:")
print(df_read)
renderer = ChromePdfRenderer()
# Create a PDF from a HTML string using Python
content = "<h1>Awesome Iron PDF with FastParquet</h1>"
content += "<p> Original DataFrame:"+"</p>"
content += "<p>"+f"{str(df)}"+"</p>"
content += "<p> DataFrame read from Parquet file:"+"</p>"
content += "<p>"+f"{str(df_read)}"+"</p>"
pdf = renderer.RenderHtmlAsPdf(content)
# Export to a file or Stream
pdf.SaveAs("Demo-FastParquet.pdf")
This code snippet demonstrates how to utilize several Python libraries to manipulate data and generate a PDF document from HTML content.
`) displaying the original DataFrame (`df`) and the DataFrame read from the Parquet file (`df_read`).
Code demonstrates a sample code for FastParquet, and then it seamlessly integrates data processing capabilities with PDF generation, making it useful for creating reports or documents based on data stored in parquet files.
IronPDF page.
Place the License Key at the start of the script before using IronPDF package:
from ironpdf import *
# Apply your license key
License.LicenseKey = "key"
fastparquet is a powerful and efficient library for working with parquet files in Python. Its integration with Pandas and Dask makes it a great choice for handling large datasets in a Python-based big data workflow. IronPDF is a robust Python library that facilitates the creation, manipulation, and rendering of PDF documents directly from Python applications. It simplifies tasks such as converting HTML content into PDF documents, creating interactive forms, and performing various PDF manipulations like merging files or adding watermarks. IronPDF integrates seamlessly with existing Python frameworks and environments, providing developers with a versatile solution for generating and customizing PDF documents dynamically. Together with fastparquet and IronPDF data, manipulation of parquet file format and PDF generation can be done seamlessly.
IronPDF offers a comprehensive documentation and code examples to help developers make the best of its features. For more information, please refer to the documentation and code example pages.
9 .NET API products for your office documents