Tuesday, March 31, 2009

Finding broken links using HttpWebRequest/HttpWebResponse in C#


HttpWebRequest/HttpWebResponse can be used to find broken links in website. You can refer below function isBrokenLink.
StatusCode in the response will be used for finding whether the link is broken or not. But normally exception will be thrown if the link is broken. So the Timeout property of webrequest plays important role here. (i-e) If we specify more timeout value, then total execution will take more time. If we specify less timout
then there may a possiblity of declaring a valid link as a broken link. If anyone knows how to handle it appropriately, you can mention it in the comments.

private bool isBrokenLink(string url)
{

Boolean isBrokenLink = false;

try
{

WebRequest http = HttpWebRequest.Create(url);
http.Timeout = 5000;
HttpWebResponse httpresponse = (HttpWebResponse)http.GetResponse();

if (httpresponse.StatusCode == HttpStatusCode.OK)
{
isBrokenLink = false;
}
else
{
isBrokenLink = true;
}


}
catch (Exception ex)
{
isBrokenLink = true;

}
return isBrokenLink;

}


Below updates added on April 21.

Making below two changes in the above code may increase the performance.

HttpWebRequest http = (HttpWebRequest) WebRequest.Create(url);
http.UserAgent = "Mozilla/9.0 (compatible; MSIE 6.0; Windows 98)";
http.Method = "HEAD";



Actually the HEAD method will allow verifying the link without downloading entire content. So the performance will be increased. Particularly, it will improve the performance significantly when verifying the missing images.
More Articles...

No comments:

Search This Blog