finding subdirectories from starting URL

  • Follow


     I want to find subdirectories from a starting URL.  For example,if I start with http://www.someplace.net, I want to be able to findthe subdirectories there, e.g.:http://www.someplace.net/documentation/http://www.someplace.net/about/http://www.someplace.net/images/    Are there Java methods that facilitate this?Thanks, Alan
0
Reply jalanthomas (125) 11/12/2007 12:20:17 AM

Alan wrote:>      I want to find subdirectories from a starting URL.  For example,> if I start with http://www.someplace.net, I want to be able to find> the subdirectories there, e.g.:> > http://www.someplace.net/documentation/> http://www.someplace.net/about/> http://www.someplace.net/images/> >     Are there Java methods that facilitate this?> > Thanks, Alan> There is no easy way to do that, unless someplace.net gives you a listing page.  Generally, in order to do that, you have to have either direct access to the disk, access to an FTP account on that machine, or you have to crawl the web site and parse out the urls.Note, this is not a limitation of Java, but simply a result of the way http works.There are plenty of web-crawling libraries/programs out there, I suggest you Google for them.Good luck,Daniel-- Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
0
Reply Daniel 11/12/2007 12:32:19 AM


Alan wrote:>I want to find subdirectories from a starting URL.  For example,>if I start with http://www.someplace.net, I want to be able to find>the subdirectories there, ..Why?  What business is that of yours?What does this ability offer to the end user?-- Andrew Thompsonhttp://www.athompson.info/andrew/Message posted via JavaKB.comhttp://www.javakb.com/Uwe/Forums.aspx/java-general/200711/1
0
Reply Andrew 11/12/2007 3:11:31 AM

On Mon, 12 Nov 2007 00:20:17 -0000, Alan <jalanthomas@verizon.net>
wrote, quoted or indirectly quoted someone who said :

>http://www.someplace.net/documentation/
>http://www.someplace.net/about/
>http://www.someplace.net/images/
>
>    Are there Java methods that facilitate this?

In general no. It is considered confidential information.  Sometimes a
server will give you a directory listing in HTML if you give it an URL
of a directory without in index.html file in it.
-- 
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
0
Reply Roedy 11/12/2007 3:15:24 AM

Daniel Pitts <newsgroup.spamfilter@virtualinfinity.net> writes:> Alan wrote:>>      I want to find subdirectories from a starting URL.  For example,>> if I start with http://www.someplace.net, I want to be able to find>> the subdirectories there, e.g.:>>>> http://www.someplace.net/documentation/>> http://www.someplace.net/about/>> http://www.someplace.net/images/>>>>     Are there Java methods that facilitate this?>>>> Thanks, Alan>>> There is no easy way to do that, unless someplace.net gives you a> listing page.  Generally, in order to do that, you have to have either> direct access to the disk, access to an FTP account on that machine,> or you have to crawl the web site and parse out the urls.>> Note, this is not a limitation of Java, but simply a result of the way> http works.In particular, note that documentation, about, and images may not even bedirectories at all. A content-management system could use them as keysinto a database of managed documents, for example, or as category keywordsthat are used to dynamically assemble a list of documents in the specifiedcategory.sherm---- WV News, Blogging, and Discussion: http://wv-www.comCocoa programming in Perl: http://camelbones.sourceforge.net
0
Reply Sherman 11/12/2007 4:40:06 AM

Roedy Green wrote:
> On Mon, 12 Nov 2007 00:20:17 -0000, Alan <jalanthomas@verizon.net>
> wrote, quoted or indirectly quoted someone who said :
> 
>> http://www.someplace.net/documentation/
>> http://www.someplace.net/about/
>> http://www.someplace.net/images/
>>
>>    Are there Java methods that facilitate this?
> 
> In general no. It is considered confidential information.  Sometimes a
> server will give you a directory listing in HTML if you give it an URL
> of a directory without in index.html file in it.

In addition, many directories on the server hard drive, while they are 
subdirectories of the web document directory, will not be accessible to public 
clients.  A classic example is the WEB-INF/ directory tree in Java EE apps, 
but Apache .htaccess can also restrict directories.

-- 
Lew
0
Reply Lew 11/12/2007 4:45:43 AM

Roedy Green wrote:
>>http://www.someplace.net/documentation/
...
>>    Are there Java methods that facilitate this?
>
>In general no. It is considered confidential information.  Sometimes a
>server will give you a directory listing in HTML if you give it an URL
>of a directory without in index.html file in it.

An ironic side note is that I'd like my server to *allow*
automatic 'directory listing' for dirs with no index.html,
but cannot figure how to achieve it using the arcane
..contol panel ..thingy the host offers.

If directory indexing *is* turned on, it is a perhaps tedious
but mundane task to parse the resulting HTML, looking for 
links to sub-dirs and resources.

-- 
Andrew Thompson
http://www.athompson.info/andrew/

Message posted via JavaKB.com
http://www.javakb.com/Uwe/Forums.aspx/java-general/200711/1

0
Reply Andrew 11/12/2007 6:11:09 AM

On Nov 12, 5:11 pm, "Andrew Thompson" <u32984@uwe> wrote:
> Roedy Green wrote:
> >>http://www.someplace.net/documentation/
> ..
> >>    Are there Java methods that facilitate this?
>
> >In general no. It is considered confidential information.  Sometimes a
> >server will give you a directory listing in HTML if you give it an URL
> >of a directory without in index.html file in it.
>
> An ironic side note is that I'd like my server to *allow*
> automatic 'directory listing' for dirs with no index.html,
> but cannot figure how to achieve it using the arcane
> .contol panel ..thingy the host offers.

[snip]

If it is an IIS web server, you will usually find a checkbox
in the IIS server properties, or for memory, you can even get
to it by right clicking on the virtual directory and editing
the properties.

Tomcat for example has the following setting in its web.xml file:

    <init-param>
        <param-name>listings</param-name>
        <param-value>true</param-value>
    </init-param>

But I think it's only good whilst developing.

--
Chris

0
Reply Chris 11/12/2007 6:33:22 AM

    Thanks for the information.  I think I shall just follow hreflinks instead of finding directories.Thanks, Alan
0
Reply Alan 11/12/2007 2:37:42 PM

Alan wrote:>Thanks for the information.  I think I shall just follow href>links instead of finding directories.In that case, as Daniel mentioned, search "Web crawler"/"web crawling".  There have been some interesting discussionsabout crawlers in these groups, across the ages.  As I vaguely recall there was a source posted for one by Mr Omar Khan ..ahh yes.<http://groups.google.com/group/comp.lang.java.programmer/msg/df4a6f43d57e3e6a>But please (please, please) respect the directions of the site's robots.txt (if it has one).-- Andrew Thompsonhttp://www.athompson.info/andrew/Message posted via JavaKB.comhttp://www.javakb.com/Uwe/Forums.aspx/java-general/200711/1
0
Reply Andrew 11/12/2007 4:02:14 PM

9 Replies
108 Views

(page loaded in 0.133 seconds)


Reply: