My name
is
Jon Skeet

JAVA: how to download webpage dynamically created by servlet

I want to download a source of a webpage to a file (*.htm) (i.e. entire content with all html markups at all) from this URL:

http://isap.sejm.gov.pl/DetailsServlet?id=WDU20061831353

which works perfectly fine with FileUtils.copyURLtoFile method.

However, the said URL has also some links, for instance one which I'm very interested in:

http://isap.sejm.gov.pl/RelatedServlet?id=WDU20061831353&type=9&isNew=true

This link works perfectly fine If open it with a regular browser, but when I try to download it in Java by means of FileUtils -- I got only a no-content page with single message "trwa ladowanie danych" (which means: "loading data...") but then nothing happens, the target page is not loaded.

Could anyone help me with this? From the URL I can see that the page uses Servlets -- is there a special way to download pages created with servlets?

Regards --

This isn't a servlet issue - that just happens to be the technology used to implement the server, but generally clients don't need to care about that. I strongly suspect it's just that the server is responding with different data depending on the request headers (e.g. User-Agent). I see a very different response when I fetch it with curl compared to when I load it in Chrome, for example.

I suggest you experiment with curl, making a request which looks as close as possible to a request from a browser, and then fiddling until you can find out exactly which headers are involved. You might want to use Wireshark or Fiddler to make it easy to see the exact requests/responses involved.

Of course, even if you can fetch the original HTML correctly, there's still all the Javascript - it would be entirely feasible for the HTML to contain none of the data, but for it to include Javascript which does the actual data fetching. I don't believe that's the case for this particular page, but you may well find it happens for

See more on this question at Stackoverflow