A ColdFusion user asked me for a way to programmatically determine if a URL exists, so I threw together this UDF. It uses <cfhttp> to attempt to retrieve a specified URL (not resolving any internal URLs, and not throwing an error upon failure). If an HTTP status code 404 is returned then the UDF returns false, otherwise it returns true. The code is not just checking for status code 200 (or 2xx, for that matter), and this means that codes like 401 (unauthorized) and 403 (forbidden) will return true (these status codes do not necessarily mean that the URL does not exist).
<!---
Does a URL exist? Checks that host exists and for
404 status code.
--->
<cffunction name="URLExists" output="no" returntype="boolean">
<!--- Accepts a URL --->
<cfargument name="u" type="string" required="yes">
<!--- Initialize result --->
<cfset var result=true>
<!--- Attempt to retrieve the URL --->
<cfhttp method="head" url="#ARGUMENTS.u#"
resolveurl="no" throwonerror="no" />
<!--- Check if no status code or status code 404 --->
<cfif NOT IsDefined("cfhttp.responseheader.status_code")
OR cfhttp.responseheader.status_code EQ "404">
<!--- Does not exist, return FALSE --->
<cfset result=false>
</cfif>
<cfreturn result>
</cffunction>
Updated as per Steve's sugestion and Gus' important feedback.
This function is not quite correct. You need to first check that status code is actually returned. If you run your code against a domain that doesn't resolve, you won't get a status code to check and will throw an error. Try it with http://www.benfrta.com
The corrected code should be:
<!--- Does a URL exist? Checks for 404 status code. --->
<cffunction name="URLExists" output="no" returntype="boolean">
<!--- Accepts a URL --->
<cfargument name="u" type="string" required="yes">
<!--- Initialize result --->
<cfset var result=true>
<!--- Attempt to retrieve the URL --->
<cfhttp url="#ARGUMENTS.u#" resolveurl="no" throwonerror="no" />
<!--- Check That a Status Code is Returned --->
<cfif isDefined('cfhttp.responseheader.status_code')>
<cfif cfhttp.responseheader.status_code EQ "404">
<!--- If 404, return FALSE --->
<cfset result=false>
</cfif>
<cfelse>
<!--- No Status Code Returned --->
<cfset result=false>
</cfif>
<cfreturn result>
</cffunction>
Ray, sure, go for it.
I think the 'NOT IsDefined("cfhttp.responseheader.statuscode") ' should be in parenthesis. Maybe I am just a paranoid programmer (I haven't actually tested it), but I think that having the 'NOT' at the beginning of the statement without putting it into parenthesis would make the not work against the whole statement. For instance the status_code actually returns 404, the 'NOT' would make the whole statement false and the url would be deemed valid, even though it is not.
...We'll see if this makes it past the spam filter... Hopefully my second half will as well...
I ran into another issue on my server. Its DNS provider still sends 200 status code if the URL does not exist. Maybe it is a bug in their code, but I added a check for it. Other DNS providers may have similar issues, so check for a known bad URL first and adapt as necessary. I checked cfhttp.responseheader.server for a "OpenDNS Guide".
<cfif (NOT IsDefined("cfhttp.responseheader.status_code")) OR cfhttp.responseheader.status_code EQ "404" OR (IsDefined("cfhttp.responseheader.server") AND cfhttp.responseheader.server EQ "OpenDNS Guide")>