Many websites use URL redirection technique to forward the original request from one location to another several times for different reasons (Domain forwarding, URL shortening, Privacy protection, Maintaining similar domain names referring single website, etc.). In this post, I'll try to demonstrate how to get all the redirections of a URL using Apache HttpComponents HttpClient. A live demo will be a bonus here.
Tools and Technologies used in this article
1. Add Maven dependency
Create a Maven project (maven-archetype-quickstart) and add Apache HttpClient dependency in pom.xml
File: pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.srccodes.tools.domain</groupId>
<artifactId>tool-url-redirection-checker</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<name>tool-url-redirection-checker</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<httpclient-version>4.3.1</httpclient-version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>${httpclient-version}</version>
</dependency>
</dependencies>
</project>
2. Code
- Create and configure a CloseableHttpClient using custom configuration. Being thread safe, a single instance can be used to execute multiple HTTP requests. Http client handles all redirections automatically unless explicitly disabled using disableAutomaticRetries()
- Create an HttpGet instance using the link, redirections will be fetched for.
- Create a local HTTP execution context - HttpClientContext
- Execute the HttpGet request using the http client by passing the local instance of HttpClientContext.
- On successful execution of the request, use the context object to get all the redirection locations.
- Close the response - CloseableHttpResponse to release resources.
File: UrlRedirectionLocationsFetcher.javapackage com.srccodes.tools.domain; import java.io.IOException; import java.net.URI; import java.util.List; import org.apache.http.client.ClientProtocolException; import org.apache.http.client.config.CookieSpecs; import org.apache.http.client.config.RequestConfig; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.client.protocol.HttpClientContext; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; /** * Utility class to get all redirection locations of a URL. * * @author Abhijit Ghosh * @version 1.0 */ public class UrlRedirectionLocationsFetcher { // Web browser agent public static String USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.151 Safari/535.19"; // Create and configure HttpClient private static final CloseableHttpClient httpClient = HttpClients.custom() .setUserAgent(USER_AGENT) .setDefaultRequestConfig(RequestConfig.custom().setCookieSpec(CookieSpecs.BROWSER_COMPATIBILITY).build()) .build(); /** * To get all the redirect locations of the supplied link * * @param link * @return * @throws ClientProtocolException * @throws IOException */ public List<URI> getAllRedirectLocations(String link) throws ClientProtocolException, IOException { List<URI> redirectLocations = null; CloseableHttpResponse response = null; try { HttpClientContext context = HttpClientContext.create(); HttpGet httpGet = new HttpGet(link); response = httpClient.execute(httpGet, context); // get all redirection locations redirectLocations = context.getRedirectLocations(); } finally { if(response != null) { response.close(); } } return redirectLocations; } public static void main(String[] args) throws ClientProtocolException, IOException { // Input URL String link = "http://bit.ly/1c1mBAI"; UrlRedirectionLocationsFetcher urlRedirectionLocationsFetcher = new UrlRedirectionLocationsFetcher(); List<URI> allRedirectLocations = urlRedirectionLocationsFetcher.getAllRedirectLocations(link); if (allRedirectLocations != null) { System.out.println(link); for (URI uri : allRedirectLocations) { System.out.println("|\nv\n" + uri.toASCIIString()); } } else { System.out.println("Not found!"); } } }
User Agent
Several websites responded with 500 status code when presented with the default User-Agent header. One website sent a 200 status code but the html content of the page was truncated with "500 server error" For maximum compatibility, use a standard web browser user-agent string.Cookie Policies
Very few websites support anything other than base Netscape cookies.
3. Run
For testing purpose, I have shortened http://srccodes.com/tools/domain/url-redirection-checker using Google URL Shortener followed by bit.ly shortening service. Naked domain forwarding (http://srccodes.com --> /) is enabled in this site. As a result, we'll get three consecutive redirections of the url http://bit.ly/1c1mBAI.
Console Output
http://bit.ly/1c1mBAI
|
v
http://goo.gl/WS8WBn
|
v
http://srccodes.com/tools/domain/url-redirection-checker
|
v
http://www.srccodes.com/tools/domain/url-redirection-checker
4. Live Demo
For live demo try URL Redirection Checker Tool.
URL Redirection Checker
Download SrcCodes
All code samples shown in this post are available on GitHub.
Comments