Advanced publication process customization

Staatic is a robust WordPress plugin designed to transform dynamic WordPress sites into static versions. This transformation not only enhances the speed and security of websites but also introduces a high degree of flexibility for customization. This guide delves into the inner workings of Staatic, exploring its core components, customization options, and the advanced capabilities it offers for fine-tuning content inclusion and handling complex site architectures.

How Staatic works under the hood

Staatic’s core functionality is driven by a purpose-built, highly adaptable crawler module. This module meticulously scans your WordPress site, capturing all rendered HTML and assets. It dynamically adjusts links to align with the destination URL you’ve configured, ensuring that the static site functions correctly regardless of where it’s hosted.

The Staatic settings screen offers a user-friendly, no-code solution for specifying which URLs should be included in or excluded from the static version of your site. This feature is crucial for quickly adjusting the scope of your site’s static version without delving into complex code modifications.

However, the unique structure of a website or specific performance optimization goals may require a deeper dive into customization. Perhaps your site integrates WordPress content with a custom-built application, or you’re looking to maintain dynamic functionality for certain elements within the static version. In these cases, adjusting behavior through code may become necessary.

Fine-tuning content inclusion

Staatic distinguishes between content that is part of your site and content that isn’t by analyzing the site’s origin URL. For instance, if your dynamic WordPress site resides at https://wordpress.example.com and you’re publishing the static version to https://www.example.com, Staatic’s crawler will recognize any link starting with the WordPress site’s URL as internal content to be crawled and included in the static site. This approach ensures that all relevant content is captured during the static conversion process.

However, modern websites often have complex architectures, sometimes integrating WordPress with other systems or platforms. These integrations can blur the lines between what is considered part of the WordPress site and what isn’t, especially when these systems share content or functionality without a clear boundary. To address these complexities, Staatic offers the staatic_should_crawl_url filter hook.

This filter acts as a gatekeeper, evaluating each URL the crawler encounters. By hooking into this filter, developers can specify which URLs should be considered part of the site (and thus included in the crawl) and which should be excluded.

<?php

add_filter( 'staatic_should_crawl_url' , function ( $value, $url, $context ) {
    $path = $url->getPath();

    $acceptedPrefixes = array(
        // WordPress
        '/wp-admin',
        '/wp-content',
        '/wp-includes',

        // Content
        '/en/blog',
        '/nl/blog',
        // ...
    );

    foreach ( $acceptedPrefixes as $prefix ) {
        if ( str_starts_with( $path, $prefix ) ) {
            return true;
        }
    }

    if ( str_contains( $path, 'sitemap' ) && str_ends_with( $path, '.xml' ) ) {
        return true;
    }

    return false;
}, 10, 3 );

Beyond controlling which URLs are crawled, this filter can also be used to manage how URLs are presented in the static version of your site. For example, when you configure Staatic to use relative URLs for portability, there might be cases where you need to maintain absolute URLs. This is common for canonical URL references or links in XML sitemaps, where the absolute URL is crucial for SEO purposes.

<?php

add_filter( 'staatic_should_crawl_url' , function ( $value, $url, $context ) {
    if ( ( $context[ 'htmlTagName' ] ?? '' ) === 'link' &&
        ( $context[ 'htmlAttributeName' ] ?? '' ) === 'href' &&
        ( str_contains( $context[ 'htmlElement' ] ?? '', 'canonical' ) ) ) {
        return false;
    }

    return $value;
}, 10, 3 );

Enabling extended URL context is a prerequisite for such nuanced adjustments. This feature provides additional information about each encountered URL, allowing for more sophisticated logic in determining whether a URL should be followed or how it should be transformed in the static site.

<?php

add_filter( 'staatic_extended_url_context', '__return_true' );

Handling live URLs

When working with Staatic to transform your dynamic WordPress site into a static version, you might encounter a common scenario: your site, hosted at https://wordpress.example.com, includes links that point directly to what will be your live static site, such as https://www.example.com. Staatic, by default, focuses on processing links that originate from your WordPress site’s domain. This means any link that doesn’t start with https://wordpress.example.com could be overlooked during the static site generation process.

In many situations, this default behavior works well because the content linked via your live site URL (https://www.example.com) is likely also accessible through a link that Staatic recognizes and processes. However, complications can arise, especially when you’ve tailored your site’s destination URL format, like opting for relative URLs over absolute ones. In such cases, ensuring that links to your live site are treated with the same importance as those from your WordPress domain becomes crucial.

To address this need, Staatic provides the staatic_replace_live_urls filter hook. This feature allows you to specify URLs that, although not directly associated with your WordPress site’s domain, should be considered part of your site during the static conversion process. Essentially, it lets you tell Staatic, "Hey, treat these links as if they’re coming from my WordPress site, even though they’re not."

Here’s a practical example of how to implement this filter:

<?php

add_filter( 'staatic_replace_live_urls', function( $value ) {
    // List URLs that should be treated as originating from the WordPress site.
    return array( 'https://example.com', 'https://www.example.com', );
} );

This code snippet effectively instructs Staatic to recognize links pointing to https://example.com and https://www.example.com as internal links, ensuring they’re included in the static site generation. This adjustment is particularly useful for maintaining consistent HTML output and ensuring that all parts of your site are accurately represented in its static form.

Tailoring responses during crawling

Staatic’s process for turning your WordPress site into a static version involves crawling your site as an unauthenticated user, capturing pages and assets as they are publicly presented. However, there might be scenarios where you want specific content to be displayed differently or certain elements to be excluded when Staatic is doing the crawling.

The key to customizing how your site responds to the Staatic crawler lies in identifying it, which can be done through its unique user agent string, which includes StaaticWordPress. This identifier allows you to detect when Staatic is accessing your site, enabling you to tailor the content specifically for static site generation.

This functionality is particularly useful for optimizing what gets included in the static version of your site. For instance, you might want to hide admin-specific notices or user-specific content that doesn’t need to be part of the static site. By detecting the Staatic crawler, you can conditionally adjust the content, ensuring that the static version of your site includes only what is necessary for public consumption.

Implementing these customizations typically involves adding conditions to your site’s code that check for the Staatic user agent. When detected, these conditions trigger modifications to the site’s response, such as excluding non-essential scripts or dynamic elements that are irrelevant to static viewers.

This can be implemented as follows:

<?php

if ( isset( $_SERVER['HTTP_USER_AGENT'] ) && str_contains( $_SERVER['HTTP_USER_AGENT'], 'StaaticWordPress' ) ) {
    // Do something special here...
}

Understanding the Staatic publication process

So far we’ve focused a lot on the crawler component, which is crucial for gathering the content of your WordPress site to create a static version. However, crawling is just one part of a broader sequence known as the publication process. This process is organized into distinct tasks, each designed to handle specific aspects of turning your dynamic site into a static one.

Breakdown of the publication tasks

The publication process is divided into several key tasks, grouped under different phases for clarity and systematic execution. Here’s an overview of these tasks:

SetupTask (setup): This initial task sets up the environment and prepares the system for the subsequent phases of the publication process.
Crawler phase:
- InitiateCrawlerTask (initialize_crawler): Starts the crawler, setting up necessary parameters and states.
- CrawlTask (crawl): This is where the actual crawling of your WordPress site happens, capturing all necessary data and content.
- FinishCrawlerTask (finish_crawler): Cleans up and finalizes the crawling phase, ensuring all data is properly stored and ready for the next phase.
PostProcessTask (post_process): Performs any final tweaks or optimizations to the static site before deployment.
Deployer phase:
- InitiateDeploymentTask (initiate_deployment): Prepares for deploying the static content, setting up deployment configurations and resources.
- DeployTask (deploy): Executes the deployment of the static files to your chosen hosting environment.
- FinishDeploymentTask (finish_deployment): Concludes the deployment, ensuring everything is correctly deployed and operational.
FinishTask (finish): Officially ends the publication process, marking the static site as complete and ready for access.

Customizing the publication tasks

The publication process can be customized to meet specific user needs using two main approaches:

Using action hooks: For simpler customizations, you can utilize the staatic_publication_task_before and staatic_publication_task_after hooks. These hooks allow you to inject custom actions either before or after any of the publication tasks. This is ideal for adding quick modifications or integrations without needing to alter the core functionality of the tasks.

<?php

use Staatic\WordPress\Publication\Publication;
use Staatic\WordPress\Publication\Publication\TaskInterface;

add_action( 'staatic_publication_task_before', function ( Publication $publication, TaskInterface $task ) {
    // Do something special here...
}, 10, 2 );

Registering a custom task: For more comprehensive or specialized modifications, you can introduce custom tasks into the process. This involves creating a PHP class that implements the Staatic\WordPress\Publication\Task\TaskInterface interface and integrating this custom task into the workflow using the staatic_publication_tasks filter hook. This method provides the greatest level of customization, enabling you to add entirely new functionalities or alter existing ones to better suit your site’s requirements.

<?php

use Staatic\WordPress\Publication\Publication;
use Staatic\WordPress\Publication\Task\InitiateDeploymentTask;
use Staatic\WordPress\Publication\Task\TaskCollection;
use Staatic\WordPress\Publication\Task\TaskInterface;

add_filter( 'staatic_publication_tasks', function ( TaskCollection $tasks ) {
    $customTask = new class implements TaskInterface {
        public static function name(): string {
            return 'custom_task';
        }

        public function description(): string {
            return 'This is a custom task.';
        }

        public function supports( Publication $publication ): bool {
            return true;
        }

        public function execute( Publication $publication, bool $limitedResources ): bool {
            // Do something special here...

            // Indicate that this task has finished.
            return true;
        }
    };

    return $tasks->addBefore( $customTask, InitiateDeploymentTask::name() );
} );

Summary

In this article, we’ve detailed how Staatic transforms dynamic WordPress sites into static versions, focusing on its powerful crawler and structured publication process. Staatic not only enhances site performance and security but also offers significant flexibility for customization, with numerous hooks that enable modifications at nearly every stage of the publication process.

Whether it’s tweaking how content is crawled, altering deployment strategies, or modifying the final output, Staatic equips users with the tools needed for detailed customization. For those seeking deeper customization, Staatic’s source code is rich with apply_filter and do_action hooks, offering further possibilities for refining the static generation process.

For a deeper dive into how action and filter hooks can enhance your WordPress development, please explore our detailed documentation on action and filter hooks. This guide provides comprehensive insights and practical examples to help you effectively utilize these powerful features in your projects.