|
|
Fatching a web page with space in url
I tried to fatch a wab page in java as following
public static String downloadWWWPage(String pageAddr) throws
IOException {
....
URL url = new URL(pageAddr);
String websiteAddress = url.getHost();
String file = url.getFile();
....
Socket clientSocket = new Socket(websiteAddress, 80);
System.out.println("Socket opened to " + websiteAddress + "\n");
// creating a BufferReader object using the input stream reader
// this will read the content send by the webserver
BufferedReader inFromServer = new BufferedReader(
new InputStreamReader(clientSocket.getInputStream()));
// Need to create a output stream writer
// that will talk to the webserver of the website
OutputStreamWriter outWriter = new OutputStreamWriter(clientSocket
.getOutputStream());
// make the GET call to the webserver with the desired url or the
// file name
// which you intent to get, also mention the protocol type, which is
// HTTP/1.0
// This call will trigger the webserver to throw this page, which
// will be read
// by the input stream
// making a get call to the file
outWriter.write("GET " + file + " HTTP/1.0\r\n\n");
===============================
The problem is that when varible "file" above contain a space charater
(even I encoded the space to %20 or ) I got 404 but the same url
worked in the address field of a browser.
Is there any way to get around?
thanks
mark
|
|
0
|
|
|
|
Reply
|
coolshare (39)
|
8/15/2006 5:38:33 PM |
|
Dude,
You need to encode the URI ("file" in your example).
Try java.net.URLEncoder.encode(String e, String e);
But you sure do it in a complicated way. Why not ;
InputStream in = url.openConnection().getInputStream()
or
Object o = url.openConnection().getObject();
-a
cool2005 wrote:
> I tried to fatch a wab page in java as following
>
> public static String downloadWWWPage(String pageAddr) throws
> IOException {
> ...
>
> URL url = new URL(pageAddr);
> String websiteAddress = url.getHost();
> String file = url.getFile();
> ...
> Socket clientSocket = new Socket(websiteAddress, 80);
> System.out.println("Socket opened to " + websiteAddress + "\n");
>
> // creating a BufferReader object using the input stream reader
> // this will read the content send by the webserver
> BufferedReader inFromServer = new BufferedReader(
> new InputStreamReader(clientSocket.getInputStream()));
>
> // Need to create a output stream writer
> // that will talk to the webserver of the website
> OutputStreamWriter outWriter = new OutputStreamWriter(clientSocket
> .getOutputStream());
>
> // make the GET call to the webserver with the desired url or the
> // file name
> // which you intent to get, also mention the protocol type, which is
> // HTTP/1.0
> // This call will trigger the webserver to throw this page, which
> // will be read
> // by the input stream
>
> // making a get call to the file
> outWriter.write("GET " + file + " HTTP/1.0\r\n\n");
>
> ===============================
>
> The problem is that when varible "file" above contain a space charater
> (even I encoded the space to %20 or ) I got 404 but the same url
> worked in the address field of a browser.
>
> Is there any way to get around?
>
> thanks
>
> mark
>
|
|
0
|
|
|
|
Reply
|
anon (388)
|
8/15/2006 5:45:48 PM
|
|
cool2005 wrote:
> I tried to fatch a wab
'fatch a wab' ..is that code?
>..page ..
What URL?
( And do you have the site owner's consent to fetch and
use the information from a Java (or other) program? Many
site owners do not allow it, and take measures to oprevent it. )
>...in java as following
> URL url = new URL(pageAddr);
...please don't post 'tab' characters to usenet.
I don't know how wide that is in your newsclient,
but on Google it's this wide..
URL url = new
URL(pageAddr);
Andrew T.
|
|
0
|
|
|
|
Reply
|
andrewthommo (2516)
|
8/15/2006 7:55:37 PM
|
|
Andrew Thompson wrote:
> cool2005 wrote:
> > I tried to fatch a wab
>
> 'fatch a wab' ..is that code?
>
> >..page ..
>
> What URL?
>
> ( And do you have the site owner's consent to fetch and
> use the information from a Java (or other) program? Many
> site owners do not allow it, and take measures to oprevent it. )
>
> >...in java as following
>
>
> > URL url = new URL(pageAddr);
>
> ..please don't post 'tab' characters to usenet.
> I don't know how wide that is in your newsclient,
> but on Google it's this wide..
> URL url = new
> URL(pageAddr);
>
> Andrew T.
Below coding is work.
/*
2006/08/14 eric.leung Work at Home. Need to check with proxy
Authenticator.
* Copyright (c) 2000 David Flanagan. All rights reserved.
* This code is from the book Java Examples in a Nutshell, 2nd Edition.
* It is provided AS-IS, WITHOUT ANY WARRANTY either expressed or
implied.
* You may study, use, and modify it for any non-commercial purpose.
* You may distribute it non-commercially as long as you retain this
notice.
* For a commercial use license, or to purchase the book (recommended),
* visit http://www.davidflanagan.com/javaexamples2.
*/
// package com.davidflanagan.examples.net;
import java.io.*;
import java.net.*;
/**
* This program connects to a Web server and downloads the specified
URL
* from it. It uses the HTTP protocol directly.
**/
public class HttpClient {
public static void main(String[] args) {
try {
// Check the arguments
if ((args.length != 1) && (args.length != 2))
throw new IllegalArgumentException("Wrong number of
args");
// Get an output stream to write the URL contents to
OutputStream to_file;
if (args.length == 2) to_file = new
FileOutputStream(args[1]);
else to_file = System.out;
// Now use the URL class to parse the user-specified URL
into
// its various parts.
URL url = new URL(args[0]);
String protocol = url.getProtocol();
if (!protocol.equals("http")) // Check that we support the
protocol
throw new IllegalArgumentException("Must use 'http:'
protocol");
String host = url.getHost();
int port = url.getPort();
if (port == -1) port = 80; // if no port, use the default
HTTP port
String filename = url.getFile();
// Open a network socket connection to the specified host
and port
Socket socket = new Socket(host, port);
// Get input and output streams for the socket
InputStream from_server = socket.getInputStream();
PrintWriter to_server = new
PrintWriter(socket.getOutputStream());
// Send the HTTP GET command to the Web server, specifying
the file
// This uses an old and very simple version of the HTTP
protocol
to_server.print("GET " + filename + "\n\n");
to_server.flush(); // Send it right now!
// Now read the server's response, and write it to the file
byte[] buffer = new byte[4096];
int bytes_read;
while((bytes_read = from_server.read(buffer)) != -1)
to_file.write(buffer, 0, bytes_read);
// When the server closes the connection, we close our
stuff
socket.close();
to_file.close();
System.out.println("Output to " + args[1]);
}
catch (Exception e) { // Report any errors that arise
System.err.println(e);
System.err.println("\n2006/08/11\n");
System.err.println("Usage: java HttpClient <URL>
[<filename>]");
System.err.println("e.g. java HttpClient
http://hk.yahoo.com abc.txt");
}
}
}
|
|
0
|
|
|
|
Reply
|
moon_ils-se (53)
|
8/16/2006 7:46:59 AM
|
|
|
3 Replies
25 Views
(page loaded in 0.1 seconds)
|
|
|
|
|
|
|
|
|