Robust Links

EXPERIMENTAL DRAFT REVISION OF https://robustlinks.mementoweb.org/spec/

Last updated: March 17, 2025


Authored by:
   Sawood Alam - Internet Archive
   Shawn M. Jones - Los Alamos National Laboratory
   Martin Klein - Pacific Northwest National Laboratory
   Michael L. Nelson - Old Dominion University
   Herbert Van de Sompel - Data Archiving and Networking Services (DANS)

Abstract

A Robust Link offers multiple pathways to revisit the content of a linked resource, even a long time after the link to the resource was put in place. It combats problems that are prevalent with links on the Web: linked resources vanish (link rot; 404 Not Found) or their content changes over time (content drift). This document describes how to provide Robust Links in HTML. It recommends creating one or more snapshots of a resource around the time of linking to it, for example, in a web archive or a versioning system. And it recommends creating a Robust Link by conveying the following information as attributes on the link to a resource: the URI of the resource, the intended linking datetime, and the URIs of one or more snapshots of the linked resource as well as their respective datetimes.

Table of Contents

1. Why provide Robust Links?

The provision of a Robust Link for a resource is motivated by the desire to offer multiple pathways to revisit the content of that resource as it existed around the time it was linked. This can be important because it is well known that, as time goes by, the content of any web resource changes (content drift) or the resource vanishes altogether (link rot; 404 Not Found).

Robust Links combat link rot and content drift by annotating a link to a resource with information that supports revisiting various states of the linked resource. In the below example, the URI of the resource that motivates the provision of a Robust Link is conveyed as the value of data-originalurl, the URI of a snapshot of that resource created around the time of linking is provided as the value of data-versionurl, and the linking date is the value of data-versiondate:
<a href="https://www.w3.org/"
   data-originalurl="https://www.w3.org/"
   data-versiondate="2024-11-21" 
   data-versionurl="https://web.archive.org/web/20241121100333/https://www.w3.org/">
Robust Link to the W3C home page that supports revisiting the way it looked on November 21th 2024
</a>
The data-* attributes provided on the link can be leveraged by client applications (or manually) to visit the linked resource, both in its current state (using data-originalurl) and in the state it was around the time it was linked (respectively using data-versionurl, and the combination of data-originalurl and data-versiondate). Meanwhile, the URI provided in href remains the default target of the link as was intended by the linker. As shown in the examples, depending on the motivation for linking, that can be the URI of the resource itself or the URI of a snapshot of the resource.

2. How to provide Robust Links?

As is the case with any link in HTML, the default link target is provided by the href attribute for the HTML anchor (<a>) element. But a link to a resource can be turned into a Robust Link by including three extension attributes for that element: data-originalurl, data-versiondate, and data-versionurl. The use and value of each attribute is as follows:
  • data-originalurl: REQUIRED - The URI of the resource that motivates the provision of a Robust Link. In common cases, the default target resource (in href) is also the resource that motivates the provision of a Robust Link. But, as illustrated by the Examples, interesting scenarios exist in which this is not the case.
  • data-versiondate: REQUIRED - The intended date of linking to the resource, i.e. the date of the state of the resource that the linker wants the link visitor to be able to experience.
  • data-versionurl: RECOMMENDED - The URI of one or more snapshots of the resource, optionally accompanied by the date those snapshots were created. Creation of snapshots of the linked resource is recommended and can, for example, be achieved as follows:
    • Some web archives provide services that allow creating such snapshots. For example, the Internet Archive provides "Save Page Now" and perma.cc's main service is the on-demand creation of snapshots (requires an account).
    • Versioning systems automatically create snapshots of each version of a hosted resource and each is assigned a dedicated URI.
When these data-* attributes are provided in a machine-actionable manner, as detailed in the below sub-sections, client applications can be devised that support several pathways to visit a linked resource:
  • Using data-originalurl: Visiting the resource in its current state. Cases exist in which that is not be possible because the resource no longer exists. In such cases, data-originalurl still provides relevant provenance information.
  • Using data-originalurl in combination with data-versiondate: Finding snapshots of the resource that are temporally close the intended linking date, in case no snapshots were created around the time of linking, or in case snapshots that were created become temporally or permanently inaccessible.
  • Using data-versionurl: Visiting each snapshot of the resource that is listed.
  • Using data-originalurl in combination with data-versionurl: Finding snapshots of the resource that are temporally close the dates snapshots were created, in case those snapshots becomes temporally or permanently inaccessible.

3. Robust Links attributes

This section enumerates the attributes used on a Robust Link, specifies their respective values, provides information for their use, and indicates the use requirement for each.

3.1. href

Use of the href attribute is REQUIRED and its value is as detailed by the pertinent HTML specification.

3.2. data-originalurl

Use of the data-originalurl attribute is REQUIRED. Its value is the URI of the resource that motivates the provision of a Robust Link. The URI must be absolute, not relative.

Example data-originalurl="https://www.w3.org/"

3.3. data-versiondate

Use of the data-versiondate attribute is REQUIRED. Its value is the intended datetime of linking expressed in UTC. The value can be provided at date or datetime granularity using a choice of two syntaxes: one aligned with ISO8601 and the other following a convention to express datetimes in URIs that is commonly used by web archives. Valid values are mnemonically shown in the table below and are formalized in the ABNF for data-versiondate.

date datetime
ISO8601 YYYY-MM-DD YYYY-MM-DDThh:mm:ssZ
Web Archive URI YYYYMMDD YYYYMMDDhhmmss


Examples data-versiondate="2024-11-21" ; data-versiondate="20241121" ; data-versiondate="20241121T162207Z" ; data-versiondate="20241121162207"

A value provided for data-versiondate must be interpreted as noon UTC of the indicated date, i.e. YYYY-MM-DD must be interpreted as YYYY-MM-DDT12:00:00Z and YYYYMMDD must be interpreted as YYYYMMDD120000.

3.4. data-versionurl

Use of the data-versionurl attribute is RECOMMENDED. Its value is a list of one or more URIs of snapshots of the resource that motivates the provision of a Robust Link (see data-originalurl), whereby each snapshot URI may optionally be followed by that snapshot's datetime. URIs must be absolute, not relative. The provided information elements (URIs and datetimes) must be space-separated. The attribute's value is formalized in the ABNF for data-versionurl.

Example data-versionurl="https://web.archive.org/web/20241121100333/https://www.w3.org/ https://perma.cc/44TF-9JXB 20241120164333"

A snapshot datetime provided in data-versionurl must be interpreted as noon UTC of the indicated date, i.e. YYYY-MM-DD must be interpreted as YYYY-MM-DDT12:00:00Z and YYYYMMDD must be interpreted as YYYYMMDD120000.

3.5. Missing attribute information

Values for the data-versiondate attribute must only be provided at the granularity available to the linker, i.e. should not be artificially generated on the basis of a known date value.

In case no value is provided for the data-versiondate attribute, client applications may attempt to determine a plausible value, taking the linking context into account. For example, a client application might decide to use the creation date or last modification date of the HTML page in which the Robust Link is provided as the value for data-versiondate.

In case no value is provided for the data-originalurl attribute, client applications may attempt to determine a plausible value, taking the linking context into account. For example, as shown in the Examples, the URI provided in data-originalurl commonly, but not always, identifies the same resource as the URI provided for the href attribute. Therefore, in cases in which data-originalurl is not provided, client applications might decide to use (the absolute URI rendition of) the URI in href as the value for data-originalurl.

4. Examples

This section provides examples that differ with regard to the nature of the link target: in Section 4.1., the default link target (URI in href) is a live web resource, whereas in Section 4.2. it is a snapshot of a web resource.

4.1. Linking to a live web resource

In a common case, the main intent is to link to a live web resource and to also allow future users of the link to see the state of the linked resource around the date that was intended by the linker. In this case, the values for the attributes on the link are as follows:
  • href: the URI of the live web resource.
  • data-originalurl: the URI of the live web resource.
  • data-versiondate: the intended date of linking.
  • data-versionurl: the URI of one or more snapshots of the live web resource and optionally the respective snapshot dates, provided as a list as described above.
Example 4.1.1. Common case, expressive

Assume creating a Robust Link to https://www.w3.org/ on November 20th 2024. On that day, a snapshot of the resource is created in perma.cc. Its URI is https://perma.cc/44TF-9JXB and the snapshot datetime provided by perma.cc is 2024-11-20T16:43:33Z. The next day, an additional snapshot is created in the Internet Archive. Its URI is https://web.archive.org/web/20241121100333/https://www.w3.org/ and its snapshot date is 2024-11-21. A Robust Link to https://www.w3.org/ would be:
<a href="https://www.w3.org/"
   data-originalurl="https://www.w3.org/"
   data-versiondate="2024-11-20"
   data-versionurl="https://perma.cc/44TF-9JXB 2024-11-20T16:43:33Z 
                    https://web.archive.org/web/20241121100333/https://www.w3.org/ 2024-11-21T10:03:33Z">
   Robust Link to the W3C home page
</a>
Example 4.1.2. Common case, minimal

Assume creating a Robust Link to https://www.w3.org/ on November 20th 2024, a very busy day that leaves you with no time to create any snapshots of the linked resource. A Robust Link to https://www.w3.org/ that is less expressive than the one in Example 4.1.1., but still supports the creation of alternative pathways to visit the linked resource would be:
<a href="https://www.w3.org/"
   data-originalurl="https://www.w3.org/"
   data-versiondate="2024-11-20">
   Robust Link to the W3C home page
</a>
Example 4.1.3. Resource moved

On January 20th 2017, the US administration changed hands. Around 17:00 CET on that day, the domain https://whitehouse.gov was taken over by the new administration. By that time, the content of the previous administration's White House pages had been moved to https://obamawhitehouse.archives.gov. A Robust Link to the previous administration's White House page that expresses provenance and, additionally, conveys snapshots of the very last versions that were available at https://whitehouse.gov would be:
<a href="https://obamawhitehouse.archives.gov"
   data-originalurl="https://whitehouse.gov"
   data-versiondate="2017-01-20T12:00:00Z"
   data-versionurl="https://web.archive.org/web/20170120160218/https://www.whitehouse.gov/ 2017-01-20T16:02:18Z
                    https://perma.cc/39FJ-5K7L 2017-01-20T14:26:00Z">
   Robust Link to the previous administration's White House page
   </a>
Example 4.1.4. Resource format migrated

In October 1999, a research paper was published in D-Lib Magazine that contained various screencams created using Lotus Notes. The paper links to the screencams that are provided as executables files for Windows computers. Since, by 2024, these files could no longer be rendered on any common computer, they were migrated from the original exe format to mp4. One such migrated screencam is https://www.dlib.org/dlib/october99/van_de_sompel/lanlxxx.mp4, which was derived from the original https://www.dlib.org/dlib/october99/van_de_sompel/lanlxxx.exe. A Robust Link to the migrated screencam would be:
<a href="https://www.dlib.org/dlib/october99/van_de_sompel/lanlxxx.mp4"
   data-originalurl="https://www.dlib.org/dlib/october99/van_de_sompel/lanlxxx.exe"
   data-versiondate="19991015">
   Robust Link to the migrated screencam
</a>
Example 4.1.5. Resource resurrected from archive

Early 2025, the Belgian experimental computer music duo Young Farmers Claim Future decided to bring their mid-90ies website back online. Originally, it had been hosted on a Ghent University server at the long gone URL http://dewey.rug.ac.be/barn/tex/yfcf.html. Since they didn't have any copies, they resurrected the site from the Internet Archive, which provides the entry point at https://web.archive.org/web/19990220153326/http://dewey.rug.ac.be/barn/tex/yfcf.html. They downloaded the entire site from the Archive, providing a new entry point at https://youngfarmersclaimfuture.info/barn/tex/yfcf.html. In that page, Robust Links are provided pointing at other pages of the web site, conveying pertinent provenance information. An example of such a Robust Link to a resource resurrected from a web archive is:
<a href="https://youngfarmersclaimfuture.info/barn/tex/max.html"
   data-originalurl="http://dewey.rug.ac.be/barn/tex/max.html"
   data-versiondate="20150101"
   data-versionurl="https://web.archive.org/web/19990220013212/http://dewey.rug.ac.be/barn/tex/max.html 19990220013212">
   Robust Link to a resource resurrected from a web archive
</a>

4.2. Linking to a snapshot - Live web resource provides motivation

Sometimes a live web resource motivates the provision of a Robust Link yet the intent is for the default link to target a specific version thereof, for example a snapshot of the resource in a web archive or one of its version in a version control system. In this case, the values for the attributes on the link are as follows:
  • href: the URI of the snapshot/version that captures the desired state of the resource.
  • data-originalurl: the URI of the web resource for which the URI of a snapshot/version is provided in href.
  • data-versiondate: the intended date of linking.
  • data-versionurl: the URI of one or more snapshots of the live web resource other than the one provided in href and optionally the respective snapshot dates, provided as a list as described above.
Example 4.2.1. Specific resource state in a web archive

Assume creating a Robust Link on November 21st 2024 that is primarily intended to convey the state of https://www.w3.org/ on that day. In order to do so, the snapshot https://web.archive.org/web/20241121100333/https://www.w3.org/ is created and its URI is conveyed in the href attribute on the link. On November 20th 2024 another snapshot has been created in perma.cc and its URI is https://perma.cc/44TF-9JXB and its snapshot datetime is 2024-11-20T16:43:33Z. A Robust Link to the November 21st 2024 version of the W3C home page would be:
<a href="https://web.archive.org/web/20241121100333/https://www.w3.org/"
   data-originalurl="https://www.w3.org/"
   data-versiondate="2024-11-21"
   data-versionurl="https://perma.cc/44TF-9JXB 2024-11-20T16:43:33Z">
   Robust Link to the November 21st 2024 version of the W3C home page
</a>
Example 4.2.2. Specific resource state in a version control system, take 1

Assume creating a Robust Link on November 22nd 2024 that is primarily intended to point to the version of https://en.wikipedia.org/wiki/Web_archiving that is current on that day, which is https://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=1258145565. In this case, a Robust Link to the November 22nd 2024 version of the Wikipedia page would be:
<a href="http://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=637465880"
   data-originalurl="https://en.wikipedia.org/wiki/Web_archiving"
   data-versiondate="2024-11-22">
   Robust Link to the November 22nd 2024 version of the Wikipedia page
</a>
Example 4.2.3. Specific resource state in a version control system, take 2

Assume creating a Robust Link on November 22nd 2024 that is primarily intended to point to the version of https://en.wikipedia.org/wiki/Web_archiving that was current on April 10th 2012, which is https://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=485347845. In this case, the Robust Link to the April 10th 2012 version of the Wikipedia page looks like this:
<a href="https://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=485347845"
   data-originalurl="https://en.wikipedia.org/wiki/Web_archiving"
   data-versiondate="2012-04-10">
   Robust Link to the April 10th 2012 version of the Wikipedia page
</a>

4.3. Linking to a snapshot of a web resource - Snapshot provides motivation

The Robust Links approach can be further illustrated by means of an example that might be appealing to frequent users of web archives. In this example, a snapshot of a web resource motivates the provision of a Robust Link and, for example, in order to prove that a specific snapshot/version existed, a snapshot of that snapshot is created in a system other than the one that hosts the original snapshot. In this case, values for the attributes on the link are as follows:
  • href: the URI of the snapshot/version that captures the desired state of the resource.
  • data-originalurl: the URI of the snapshot/version that captures the desired state of the resource.
  • data-versiondate: the intended date of linking.
  • data-versionurl: the URI of one or more snapshots of the snapshot/version provided in href and data-originalurl, and optionally the respective snapshot dates, provided as a list as described above.
Example 4.3.1. Proving existence of a snapshot

Assume creating a Robust Link on December 9th 2024 that is intended to illustrate that the November 21st 2024 snapshot of the W3C home page https://web.archive.org/web/20241121100333/https://www.w3.org/ existed on that day in the Internet Archive. In order to do so, a snapshot of the snapshot is created in archive.today. Its URI is https://archive.ph/T9xD2. Although the snapshot is created on December 9th, archive.today recognizes it as being a snapshot of a snapshot and maintains the snapshot datetime of the original snapshot, i.e. 20241121T10:03:33Z. In this case, the Robust Link to prove the November 21st 2024 snapshot of the W3C page existed in the Internet Archive on December 9th 2024 looks like this:
<a href="https://web.archive.org/web/20241121100333/https://www.w3.org/"
   data-originalurl="https://web.archive.org/web/20241121100333/https://www.w3.org/"
   data-versiondate="2024-12-09"
   data-versionurl="https://archive.ph/T9xD2 20241121T10:03:33Z">
   Robust Link to prove the November 21st 2024 snapshot of the W3C page existed in the Internet Archive on December 9th 2024
</a>

5. Archiving considerations

Web Archives that ingest, store, and replay web pages that contain Robust Links must take the following directions into account:
  • For links in pages for which data-versiondate, data-versionurl, and/or data-originalurl are provided, web archives must leave the provided attributes and values untouched.
  • For links for which data-versiondate, data-versionurl, and/or data-originalurl are not provided, web archives must not add them.
  • For links for which the URI of a snapshot is provided as the value of the href attribute, web archives that create a snapshot of that snapshot must adhere to the Sticky Memento-Datetime and original Link provision of RFC7089: the Memento-Datetime and the original link provided in the HTTP header of the former snapshot should be maintained and be provided for the snapshot of that snapshot.
  • For links in pages for which data-versiondate, data-versionurl, and/or data-originalurl are provided, web archives should consider leveraging the provided values for replay.

6.1. ABNF for data-versiondate

Valid values for the data-versiondate attribute are defined by the below ABNF that reuses the following constructs:
  • date-fullyear, date-month, date-mday, time-hour, time-minute, time-second from the ABNF in Section 5.6 of RFC3339
 
data-versiondate = versiondate 
versiondate = date / datetime
date = iso8601-date / web-archive-date
datetime = iso8601-datetime / web-archive-datetime
iso8601-date = date-fullyear "-" date-month "-" date-mday
web-archive-date = date-fullyear date-month date-mday
iso8601-datetime = date-fullyear "-" date-month "-" date-mday "T" time-hour ":" time-minute ":" time-second "Z"
web-archive-datetime = date-fullyear date-month date-mday time-hour time-minute time-second
        

6.2. ABNF for data-versionurl

Valid values for the data-versionurl attribute are defined by the below ABNF that reuses the following constructs:
 
data-versionurl = (snapshot-URI *1(SP versiondate)) *(SP URI *1(SP versiondate))
snapshot-URI = scheme ":" "//" authority path-absolute [ "?" query ] [ "#" fragment ]
        

6. Acknowledgments

The original version of the specification also included the following authors: Harihar Shankar (Los Alamos National Laboratory), Richard Wincewicz (University of Edinburgh).

To Discuss

  • Section "Archiving Considerations” to discuss some recommendations on how archival replay systems should or should not rewrite URLs in data-* attributes.
  • Client support for attributes without data- prefix
  • Examples, web site resurrection context: What to do with links for which original is dead and no mementos exist?
  • Section "Security Considerations" for clients that consume RL annotations
  • Where to publish?
  • Reference implementation: DOI-ed CSS and JS
  • Updating existing tools and documentation