pdfhtmlconversiontutorial

HTML to PDF Conversion: The Complete Developer Guide

Master HTML to PDF conversion with this in-depth guide covering techniques, libraries, and best practices for developers.

By Sarah Kim2026-03-1010 min read

Converting HTML to PDF is one of the most common requirements in web development. From generating invoices and receipts to creating printable reports and documentation, the ability to transform web content into well-formatted PDF documents is essential for many applications. This guide covers everything you need to know about HTML to PDF conversion in 2026.

Understanding the PDF Generation Landscape

PDF generation from HTML can be approached in several ways, each with distinct tradeoffs. The three main categories are:

**Browser-based rendering**: Using headless browsers (Puppeteer, Playwright) to render HTML and print to PDF
**Dedicated PDF libraries**: Using libraries like wkhtmltopdf, Prince XML, or WeasyPrint
**API services**: Using cloud-based APIs that handle rendering and return PDF documents

Each approach has its strengths. Browser-based rendering provides the most accurate representation of modern web content, including JavaScript-rendered elements, CSS Grid, Flexbox, and web fonts. Dedicated libraries are often faster for simple documents but may not support cutting-edge CSS features. API services offer the convenience of browser-based rendering without the infrastructure overhead.

Browser-Based PDF Generation

Headless browsers like Playwright provide a built-in method for generating PDFs that leverages the browser's own print functionality. Here is an example:

javascript

const { chromium } = require('playwright');

async function generatePdf(html, options = {}) {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  await page.setContent(html, { waitUntil: 'networkidle' });

  const pdf = await page.pdf({
    format: options.format || 'A4',
    margin: options.margin || {
      top: '20mm',
      bottom: '20mm',
      left: '15mm',
      right: '15mm'
    },
    printBackground: true,
    displayHeaderFooter: true,
    headerTemplate: options.headerTemplate || '',
    footerTemplate: options.footerTemplate || '<div style="font-size:10px;text-align:center;width:100%"><span class="pageNumber"></span> / <span class="totalPages"></span></div>',
  });

  await browser.close();
  return pdf;
}

This approach produces high-quality PDFs that closely match how the HTML appears in a browser. However, there are important considerations for print media.

Designing for Print

When generating PDFs from HTML, you need to think about the document in terms of printed pages rather than scrollable web content. CSS provides specific features for print media:

css

@media print {
  /* Control page breaks */
  .chapter { page-break-before: always; }
  .no-break { page-break-inside: avoid; }

  /* Hide navigation and interactive elements */
  nav, .sidebar, button { display: none; }

  /* Adjust typography for print */
  body {
    font-size: 12pt;
    line-height: 1.5;
    color: #000;
  }

  /* Show link URLs */
  a[href]::after {
    content: " (" attr(href) ")";
    font-size: 0.8em;
    color: #666;
  }
}

The @page rule allows you to set page-specific properties like size, margins, and named pages for different sections of your document:

css

@page {
  size: A4;
  margin: 25mm 15mm;
}

@page :first {
  margin-top: 50mm; /* Extra space for cover page */
}

@page landscape {
  size: A4 landscape;
}

Headers and Footers

PDF headers and footers are a common requirement for professional documents. When using browser-based rendering, you can use HTML templates with special CSS classes for dynamic content:

html

<!-- Header template -->
<div style="font-size: 10px; padding: 5mm 15mm; width: 100%; display: flex; justify-content: space-between;">
  <span>Company Name</span>
  <span>Confidential</span>
</div>

<!-- Footer template -->
<div style="font-size: 10px; padding: 5mm 15mm; width: 100%; display: flex; justify-content: space-between;">
  <span>Generated on <span class="date"></span></span>
  <span>Page <span class="pageNumber"></span> of <span class="totalPages"></span></span>
</div>

These templates support special CSS classes: date, title, url, pageNumber, and totalPages which are automatically populated by the PDF renderer.

Common Use Cases and Templates

Invoice Generation

Invoices are perhaps the most common use case for HTML to PDF conversion. A well-structured invoice template should include:

Company logo and branding
Invoice number and date
Billing and shipping addresses
Itemized line items with quantities, prices, and totals
Tax calculations
Payment terms and bank details
QR code for quick payment

Report Generation

Business reports often combine narrative text with data tables, charts, and visualizations. When converting reports to PDF:

Use page breaks between sections for readability
Include a table of contents for longer documents
Ensure charts and graphs render at sufficient resolution
Consider landscape orientation for wide data tables

Documentation Export

Technical documentation exports benefit from:

Syntax-highlighted code blocks with print-friendly colors
Proper table formatting with page break handling
Cross-reference links that show page numbers in print
An index or glossary section

Performance Optimization

PDF generation can be resource-intensive. Here are strategies to improve performance:

Template Caching: Pre-compile and cache your HTML templates. If you are using a templating engine, compile templates once and reuse them for each PDF generation.

Browser Instance Pooling: Instead of launching a new browser for each PDF, maintain a pool of browser instances that can be reused. This eliminates the startup overhead, which can be 500ms to 2 seconds per launch.

Minimal Dependencies: Keep your HTML templates lean. Avoid loading large JavaScript frameworks or unnecessary stylesheets. Include only the CSS needed for the PDF layout.

Image Optimization: Convert large images to appropriate sizes before including them in PDFs. Base64-encode critical images to avoid additional network requests during rendering.

Parallel Processing: When generating multiple PDFs, process them in parallel using worker threads or a job queue. Set concurrency limits based on your available memory and CPU.

Using an API Service

For teams that want to avoid managing browser infrastructure, API services like CaptureAPI provide a straightforward alternative:

bash

curl -X POST "https://captureapi.dev/api/v1/pdf" \
  -H "X-API-Key: your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "html": "<h1>Invoice #1234</h1><p>Amount: $99.00</p>",
    "format": "A4",
    "margin": {"top": "25mm", "bottom": "25mm", "left": "15mm", "right": "15mm"}
  }' \
  -o invoice.pdf

API services handle all the complexity of browser management, font rendering, and scaling. They provide consistent results across all requests and can handle burst traffic without any infrastructure changes.

Best Practices Summary

Always design with print in mind from the start, using `@media print` styles alongside your screen styles.
Test your PDFs across different page sizes (A4, Letter) and orientations.
Handle page breaks explicitly to prevent awkward content splitting.
Use web-safe fonts or embed custom fonts to ensure consistent rendering.
Implement error handling for timeouts, memory limits, and malformed HTML.
Cache generated PDFs when the source content does not change frequently.

HTML to PDF conversion in 2026 is more reliable and accessible than ever, whether you choose to self-host or use a managed API. The key is choosing the right approach for your specific needs and following established best practices for print-ready HTML.