Los Alamos National Laboratory
The notion of persistence of persistent identifiers (PIDs) has been studied in the past and it has been shown that digital object identifiers (DOIs) in particular do not necessarily point to the same content over time. Reasons for these observations can vary from human error to network outages. We are approaching the question of reliability of identifying web content via PIDs from a slightly different angle. We investigate scholarly publishers and their responses to simple DOI requests using different HTTP clients and HTTP methods. Supported by web standards and best practices developed in the community, we would expect a publisher’s response to be the same, regardless of what client or method is used. How else can we trust in the persistence of such identifiers? In this talk, I will present the preliminary results of our experimental investigation into scholarly publishers’ behavior on the web. Our findings indicate that HTTP clients resembling human and machine-behavior indeed experience a different scholarly web and that the network location from which requests against DOIs are sent makes a significant difference. These results hint at the lack of adherence to best practice on the web by the (scholarly) community and therefore have real-world implications for web (crawling) engineers that rely on standards and best practice.