Fatching a web page with space in url

  • Follow


I tried to fatch a wab page in java as following

	public static String downloadWWWPage(String pageAddr) throws
IOException {
....

			URL url = new URL(pageAddr);
			String websiteAddress = url.getHost();
			String file = url.getFile();
....
			Socket clientSocket = new Socket(websiteAddress, 80);
			System.out.println("Socket opened to " + websiteAddress + "\n");

			// creating a BufferReader object using the input stream reader
			// this will read the content send by the webserver
			BufferedReader inFromServer = new BufferedReader(
					new InputStreamReader(clientSocket.getInputStream()));

			// Need to create a output stream writer
			// that will talk to the webserver of the website
			OutputStreamWriter outWriter = new OutputStreamWriter(clientSocket
					.getOutputStream());

			// make the GET call to the webserver with the desired url or the
			// file name
			// which you intent to get, also mention the protocol type, which is
			// HTTP/1.0
			// This call will trigger the webserver to throw this page, which
			// will be read
			// by the input stream

			// making a get call to the file
			outWriter.write("GET " + file + " HTTP/1.0\r\n\n");

===============================

The problem is that when varible "file" above contain a space charater
(even I encoded the space to %20 or  ) I got 404 but the same url
worked in the address field of a browser.

Is there any way to get around?

thanks

mark

0
Reply coolshare (39) 8/15/2006 5:38:33 PM

Dude,
You need to encode the URI ("file" in your example).
Try java.net.URLEncoder.encode(String e, String e);

But you sure do it in a complicated way. Why not ;

InputStream in = url.openConnection().getInputStream()

  or

Object o = url.openConnection().getObject();

-a

cool2005 wrote:
> I tried to fatch a wab page in java as following
> 
> 	public static String downloadWWWPage(String pageAddr) throws
> IOException {
> ...
> 
> 			URL url = new URL(pageAddr);
> 			String websiteAddress = url.getHost();
> 			String file = url.getFile();
> ...
> 			Socket clientSocket = new Socket(websiteAddress, 80);
> 			System.out.println("Socket opened to " + websiteAddress + "\n");
> 
> 			// creating a BufferReader object using the input stream reader
> 			// this will read the content send by the webserver
> 			BufferedReader inFromServer = new BufferedReader(
> 					new InputStreamReader(clientSocket.getInputStream()));
> 
> 			// Need to create a output stream writer
> 			// that will talk to the webserver of the website
> 			OutputStreamWriter outWriter = new OutputStreamWriter(clientSocket
> 					.getOutputStream());
> 
> 			// make the GET call to the webserver with the desired url or the
> 			// file name
> 			// which you intent to get, also mention the protocol type, which is
> 			// HTTP/1.0
> 			// This call will trigger the webserver to throw this page, which
> 			// will be read
> 			// by the input stream
> 
> 			// making a get call to the file
> 			outWriter.write("GET " + file + " HTTP/1.0\r\n\n");
> 
> ===============================
> 
> The problem is that when varible "file" above contain a space charater
> (even I encoded the space to %20 or  ) I got 404 but the same url
> worked in the address field of a browser.
> 
> Is there any way to get around?
> 
> thanks
> 
> mark
> 

0
Reply anon (388) 8/15/2006 5:45:48 PM


cool2005 wrote:
> I tried to fatch a wab

'fatch a wab' ..is that code?

>..page ..

What URL?

( And do you have the site owner's consent to fetch and
use the information from a Java (or other) program?  Many
site owners do not allow it, and take measures to oprevent it. )

>...in java as following


> 			URL url = new URL(pageAddr);

...please don't post 'tab' characters to usenet.
I don't  know how wide that is in your newsclient,
but on Google it's this wide..
                                               URL url = new
URL(pageAddr);

Andrew T.

0
Reply andrewthommo (2516) 8/15/2006 7:55:37 PM

Andrew Thompson wrote:
> cool2005 wrote:
> > I tried to fatch a wab
>
> 'fatch a wab' ..is that code?
>
> >..page ..
>
> What URL?
>
> ( And do you have the site owner's consent to fetch and
> use the information from a Java (or other) program?  Many
> site owners do not allow it, and take measures to oprevent it. )
>
> >...in java as following
>
>
> > 			URL url = new URL(pageAddr);
>
> ..please don't post 'tab' characters to usenet.
> I don't  know how wide that is in your newsclient,
> but on Google it's this wide..
>                                                URL url = new
> URL(pageAddr);
>
> Andrew T.

Below coding is work.

/*

 2006/08/14 eric.leung Work at Home. Need to check with proxy
Authenticator.


 * Copyright (c) 2000 David Flanagan.  All rights reserved.
 * This code is from the book Java Examples in a Nutshell, 2nd Edition.
 * It is provided AS-IS, WITHOUT ANY WARRANTY either expressed or
implied.
 * You may study, use, and modify it for any non-commercial purpose.
 * You may distribute it non-commercially as long as you retain this
notice.
 * For a commercial use license, or to purchase the book (recommended),
 * visit http://www.davidflanagan.com/javaexamples2.
 */
// package com.davidflanagan.examples.net;
import java.io.*;
import java.net.*;

/**
 * This program connects to a Web server and downloads the specified
URL
 * from it.  It uses the HTTP protocol directly.
 **/
public class HttpClient {
    public static void main(String[] args) {
        try {
            // Check the arguments
            if ((args.length != 1) && (args.length != 2))
                throw new IllegalArgumentException("Wrong number of
args");

            // Get an output stream to write the URL contents to
            OutputStream to_file;
            if (args.length == 2) to_file = new
FileOutputStream(args[1]);
            else to_file = System.out;

            // Now use the URL class to parse the user-specified URL
into
            // its various parts.
            URL url = new URL(args[0]);
            String protocol = url.getProtocol();
            if (!protocol.equals("http")) // Check that we support the
protocol
               throw new IllegalArgumentException("Must use 'http:'
protocol");
            String host = url.getHost();
            int port = url.getPort();
            if (port == -1) port = 80; // if no port, use the default
HTTP port
            String filename = url.getFile();

            // Open a network socket connection to the specified host
and port
            Socket socket = new Socket(host, port);

            // Get input and output streams for the socket
            InputStream from_server = socket.getInputStream();
            PrintWriter to_server = new
PrintWriter(socket.getOutputStream());

            // Send the HTTP GET command to the Web server, specifying
the file
            // This uses an old and very simple version of the HTTP
protocol
            to_server.print("GET " + filename + "\n\n");
            to_server.flush();  // Send it right now!

            // Now read the server's response, and write it to the file
            byte[] buffer = new byte[4096];
            int bytes_read;
            while((bytes_read = from_server.read(buffer)) != -1)
                to_file.write(buffer, 0, bytes_read);

            // When the server closes the connection, we close our
stuff
            socket.close();
            to_file.close();
            System.out.println("Output to " + args[1]);
        }
        catch (Exception e) {    // Report any errors that arise
            System.err.println(e);
            System.err.println("\n2006/08/11\n");
            System.err.println("Usage: java HttpClient <URL>
[<filename>]");
            System.err.println("e.g. java HttpClient
http://hk.yahoo.com abc.txt");
        }
    }
}

0
Reply moon_ils-se (53) 8/16/2006 7:46:59 AM

3 Replies
25 Views

(page loaded in 0.1 seconds)


Reply: