An Executor-like structure providing more than threads

  • Follow


Holla yalls,

This is a slightly confused question, for which i apologise. I've been on 
a minor quick-and-dirty hacking run, and haven't really thought this 
through properly.

An Executor is a thing that has a pool of Threads, accepts a stream of 
Runnables (or Callables), pairs them up one after the other with Threads, 
then sends the pairs off together to do some work, and accepts the Threads 
back afterwards.

What if the resource needed to perform a task, the thing you wanted to 
maintain a limited pool of, reuse, and provide shared access to, was more 
than a Thread?

In my case, it was a Downloader, which was a Thread plus an instance of 
Apache's HttpClient and a buffer. The tasks were URLs to download. URLs 
come in, are assigned to a Downloader which downloads them, and when the 
download is done, the Downloader goes back to be assigned another URL.

I could have written this with the Downloaders being the active party, 
going to a queue of URLs, pulling one off, downloading it, then going back 
for another. Or i could have put Downloaders in a pool, and had the 
submission mechanism pull them out and hand URLs to them. But i really 
wanted to be able to use all the cool stuff in ExecutorService, like 
getting Futures and having orderly shutdown, and a properly controllable 
thread pool and so on. So what i did was subclass Thread to add the other 
bits (the HttpClient and so on), and then, in the tasks, do something 
like:

((DownloadThread)Thread.currentThread()).getHttpClient()

And so on. I thought that was quite clever, although it is clearly also 
entirely bletcherous.

Any thoughts? What's the right way to do this?

tom

-- 
the themes of time-travel, dreams, madness, and destiny are inextricably
confused
0
Reply twic (2083) 1/17/2010 1:11:36 AM

On Sun, 17 Jan 2010 01:11:36 +0000, Tom Anderson
<twic@urchin.earth.li> wrote, quoted or indirectly quoted someone who
said :

>
>Any thoughts? What's the right way to do this?

Does Executor dispense just raw Threads or you can you get it to
dispense your custom Thread class? If so, your thread constructor
could allocate the extra goodies.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
I decry the current tendency to seek patents on algorithms. There are better ways to earn a living than to prevent other people from making use of one�s contributions to computer science. 
~ Donald Ervin Knuth (born: 1938-01-10 age: 72)
0
Reply Roedy 1/17/2010 2:10:55 AM


In article <alpine.DEB.1.10.1001170040570.6852@urchin.earth.li>,
 Tom Anderson <twic@urchin.earth.li> wrote:

> An Executor is a thing that has a pool of Threads, accepts a stream 
> of Runnables (or Callables), pairs them up one after the other with 
> Threads, then sends the pairs off together to do some work, and 
> accepts the Threads back afterwards.
> 
> What if the resource needed to perform a task, the thing you wanted 
> to maintain a limited pool of, reuse, and provide shared access to, 
> was more than a Thread?
> 
> In my case, it was a Downloader, which was a Thread plus an instance 
> of Apache's HttpClient and a buffer. The tasks were URLs to download. 
> URLs come in, are assigned to a Downloader which downloads them, and 
> when the download is done, the Downloader goes back to be assigned 
> another URL.
> 
> I could have written this with the Downloaders being the active 
> party, going to a queue of URLs, pulling one off, downloading it, 
> then going back for another. Or i could have put Downloaders in a 
> pool, and had the submission mechanism pull them out and hand URLs to 
> them. But i really wanted to be able to use all the cool stuff in 
> ExecutorService, like getting Futures and having orderly shutdown, 
> and a properly controllable thread pool and so on. So what i did was 
> subclass Thread to add the other bits (the HttpClient and so on), and 
> then, in the tasks, do something like:
> 
> ((DownloadThread)Thread.currentThread()).getHttpClient()
> 
> And so on. I thought that was quite clever, although it is clearly also 
> entirely bletcherous.
> 
> Any thoughts? What's the right way to do this?

As DownloadThread implements Runnable, I'd think you can just execute() 
them from, or submit() them for Future reference to, an Executor 
created by Executors.newFixedThreadPool(), as shown here:

<http://groups.google.com/group/comp.lang.java.programmer/msg/1d9c821dcda040f6>

-- 
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>
0
Reply John 1/17/2010 2:11:17 AM

Tom Anderson wrote:
> [...]
> I could have written this with the Downloaders being the active party, 
> going to a queue of URLs, pulling one off, downloading it, then going 
> back for another. Or i could have put Downloaders in a pool, and had the 
> submission mechanism pull them out and hand URLs to them. But i really 
> wanted to be able to use all the cool stuff in ExecutorService, like 
> getting Futures and having orderly shutdown, and a properly controllable 
> thread pool and so on. So what i did was subclass Thread to add the 
> other bits (the HttpClient and so on), and then, in the tasks, do 
> something like:
> 
> ((DownloadThread)Thread.currentThread()).getHttpClient()
> 
> And so on. I thought that was quite clever, although it is clearly also 
> entirely bletcherous.

I don't see anything fundamentally wrong with the idea.  So long as you 
really have a one-to-one relationship between the "downloader" and the 
"thread" parts, maybe it does make sense to have a "downloader thread".

As far as the specific line of code you posted, the one change I would 
have made is to have an appropriate static method in DownloadThread:

   class DownloadThread extends Thread
   {
     public static DownloadThread currentThread() throws 
UnsupportedOperationException
     {
       Thread thread = Thread.currentThread();

       if (thread instanceof DownloadThread)
       {
         return (DownloadThread)thread;
       }

       // Alternative to the below would be to just make the
       // cast and let the bad cast exception propagate up
       throw new UnsupportedOperationException("current thread is not a 
DownloadThread");
     }
   }

That way you don't have a bunch of explicit casting all over the place.

Though, that said it seems to me that most if not all of the time, you 
should not need the currentThread() method anyway, because you should be 
executing methods within the DownloadThread instance already.  Cases 
where client code needs to get specific output of the DownloadThread 
class would be handled instead via some mechanism where you don't need 
to concern yourself with the current thread.  For example, a 
listener/event API, or even just a simple completion callback, either 
approach in a Future implementation where the relevant instance of the 
DownloadThread class is delivered via some mechanism other than checking 
the current Thread instance.

> Any thoughts? What's the right way to do this?

I guess the more I think about it, the more I wonder why you should ever 
need to make the assumption that the DownloadThread object of interest 
is the same as the current Thread object.

I also think that you could implement a Downloader pool that is not so 
explicitly tied to a Thread pool.  It seems to me that the Thread pool 
dependency is an implementation detail, and should be abstracted such 
that the client of the Downloader pool should not need to know or 
concern their self with that detail.

I would think that the Downloader pool client would simply submit the 
URLs, and receive notification via some mechanism of completion defined 
by the Downloader pool.  That the Downloader pool is itself depending on 
a Thread pool behind the scenes should be irrelevant and unknowable from 
the API alone.  To make the Downloader and Thread one and the same 
object seems like a leaky abstraction to me.

That said, I try to be pragmatic.  There are certainly others who are 
more OOP-Nazi than I am (and I preemptively reject the rampant 
over-application of Godwin's law that goes on in this newsgroup) and who 
might insist against an unnecessary dependency in the object hierarchy 
like that.

But while I personally probably wouldn't implement it that way, I'm not 
comfortable asserting that your own choice is wrong.  Especially not 
having a full view of the entire system you're implementing, and 
especially having experience with your other contributions to the 
newsgroup (granted, sometimes having a positive reputation can interfere 
with your ability to receive good, critical feedback because people make 
too broad an assumption that you know what you're doing :) ).  Maybe for 
your particular system, this really is a good design.

Pete
0
Reply Peter 1/17/2010 4:37:14 AM

Tom Anderson wrote:
> Holla yalls,
> 
> This is a slightly confused question, for which i apologise. I've been 
> on a minor quick-and-dirty hacking run, and haven't really thought this 
> through properly.
> 
> An Executor is a thing that has a pool of Threads, accepts a stream of 
> Runnables (or Callables), pairs them up one after the other with 
> Threads, then sends the pairs off together to do some work, and accepts 
> the Threads back afterwards.
> 
> What if the resource needed to perform a task, the thing you wanted to 
> maintain a limited pool of, reuse, and provide shared access to, was 
> more than a Thread?
> 
> In my case, it was a Downloader, which was a Thread plus an instance of 
> Apache's HttpClient and a buffer. The tasks were URLs to download. URLs 
> come in, are assigned to a Downloader which downloads them, and when the 
> download is done, the Downloader goes back to be assigned another URL.
> 
> I could have written this with the Downloaders being the active party, 
> going to a queue of URLs, pulling one off, downloading it, then going 
> back for another. Or i could have put Downloaders in a pool, and had the 
> submission mechanism pull them out and hand URLs to them. But i really 
> wanted to be able to use all the cool stuff in ExecutorService, like 
> getting Futures and having orderly shutdown, and a properly controllable 
> thread pool and so on. So what i did was subclass Thread to add the 
> other bits (the HttpClient and so on), and then, in the tasks, do 
> something like:
> 
> ((DownloadThread)Thread.currentThread()).getHttpClient()
> 
> And so on. I thought that was quite clever, although it is clearly also 
> entirely bletcherous.
> 
> Any thoughts? What's the right way to do this?
> 
> tom
> 
The pattern you are asking about is called a Resource Pool.  I wouldn't 
combine the concept of Thread Pool with HttpClient Pool.  They are 
orthogonal concepts, so they should be accessible orthogonally.

-- 
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
0
Reply Daniel 1/17/2010 5:08:38 AM

On 17/01/10 01:11, Tom Anderson wrote:
> What if the resource needed to perform a task, the thing you wanted to
> maintain a limited pool of, reuse, and provide shared access to, was
> more than a Thread?
>
> [...] But i really wanted to be able to use all the cool stuff in
> ExecutorService, like getting Futures and having orderly shutdown, and
> a properly controllable thread pool and so on. So what i did was
> subclass Thread to add the other bits (the HttpClient and so on), and
> then, in the tasks, do something like:
>
> ((DownloadThread)Thread.currentThread()).getHttpClient()

Would a ThreadLocal<HttpClient> be appropriate?

-- 
ss at comp dot lancs dot ac dot uk

0
Reply Steven 1/17/2010 11:02:57 AM

Steven Simpson wrote:
> On 17/01/10 01:11, Tom Anderson wrote:
>> What if the resource needed to perform a task, the thing you wanted to
>> maintain a limited pool of, reuse, and provide shared access to, was
>> more than a Thread?
>>
>> [...] But i really wanted to be able to use all the cool stuff in
>> ExecutorService, like getting Futures and having orderly shutdown, and
>> a properly controllable thread pool and so on. So what i did was
>> subclass Thread to add the other bits (the HttpClient and so on), and
>> then, in the tasks, do something like:
>>
>> ((DownloadThread)Thread.currentThread()).getHttpClient()
> 
> Would a ThreadLocal<HttpClient> be appropriate?

My noion based on the question and some of the answers given is that the 
Runnable or Callable running from the thread pool should instantiate a client 
locally from the separate client pool.  It won't need 'ThreadLocal' because 
the client reference will already be local.  The tangle comes from the idea 
that the client must be associated with the thread at construction of the 
thread.  I propose that the client be acquired in the run method of the thread.

I am not fond of 'ThreadLocal'; I'm not a fan of global objects generally.

-- 
Lew
0
Reply Lew 1/17/2010 2:52:25 PM

Lew wrote:
> Steven Simpson wrote:
>> On 17/01/10 01:11, Tom Anderson wrote:
>>> What if the resource needed to perform a task, the thing you wanted to
>>> maintain a limited pool of, reuse, and provide shared access to, was
>>> more than a Thread?
>>>
>>> [...] But i really wanted to be able to use all the cool stuff in
>>> ExecutorService, like getting Futures and having orderly shutdown, and
>>> a properly controllable thread pool and so on. So what i did was
>>> subclass Thread to add the other bits (the HttpClient and so on), and
>>> then, in the tasks, do something like:
>>>
>>> ((DownloadThread)Thread.currentThread()).getHttpClient()
>>
>> Would a ThreadLocal<HttpClient> be appropriate?
> 
> My noion based on the question and some of the answers given is that the 
> Runnable or Callable running from the thread pool should instantiate a 
> client locally from the separate client pool.  It won't need 
> 'ThreadLocal' because the client reference will already be local.  The 
> tangle comes from the idea that the client must be associated with the 
> thread at construction of the thread.  I propose that the client be 
> acquired in the run method of the thread.
> 
> I am not fond of 'ThreadLocal'; I'm not a fan of global objects generally.

Used with care, though, used with care. :-) The canonical examples for 
good use of ThreadLocals are probably best called "thread globals", 
which is what I think you're getting at. However, this makes sense for 
things like JDBC connections and transaction contexts.

In some of the applications I maintain the mechanism for keeping an 
application-managed JPA EntityManager around for each request, 
furthermore in a long-running transaction implementation, includes 
ThreadLocal storage. To me this is defensible - the EM is part of the 
"context" or "environment" of each thread.

The criticism you're making is well-founded. What is stored *in* the 
ThreadLocal is not immune to being abused.

AHS
0
Reply Arved 1/17/2010 3:29:50 PM

On 17/01/10 14:52, Lew wrote:
> Steven Simpson wrote:
>> On 17/01/10 01:11, Tom Anderson wrote:
>>> What if the resource needed to perform a task, the thing you wanted to
>>> maintain a limited pool of, reuse, and provide shared access to, was
>>> more than a Thread?
>>>
>>> [...] But i really wanted to be able to use all the cool stuff in
>>> ExecutorService, like getting Futures and having orderly shutdown, and
>>> a properly controllable thread pool and so on. So what i did was
>>> subclass Thread to add the other bits (the HttpClient and so on), and
>>> then, in the tasks, do something like:
>>>
>>> ((DownloadThread)Thread.currentThread()).getHttpClient()
>>
>> Would a ThreadLocal<HttpClient> be appropriate?
>
> My noion based on the question and some of the answers given is that
> the Runnable or Callable running from the thread pool should
> instantiate a client locally from the separate client pool.

I thought the point was that you need at least one HttpClient per thread
(or you will have no more threads running than HttpClients), and you do
not need more than one per thread (spares are wasted).  (This is what I
take from the items of the pool being 'more than a Thread'.)  The
threads are already in a pool, so you may as well exploit the thread
pool policy for maintaining HttpClients, rather than trying to build a
parallel pool.

> It won't need 'ThreadLocal' because the client reference will already
> be local.  The tangle comes from the idea that the client must be
> associated with the thread at construction of the thread.  I propose
> that the client be acquired in the run method of the thread.

I am supposing that the OP is using one of the Executors statics to
build a thread pool, so he'll have to pass in a ThreadFactory, in order
to inject his DownloadThread subclass.  The thread pool determines what
Runnable to pass to the thread, so he can't inject his own
client-acquiring code there.  (He could supply his own Runnable to the
thread to create the client, and then run the pool's Runnable:

new ThreadFactory() {
  public Thread createThread(final Runnable action) {
    Runnable myAction = new Runnable() {
      public void run() {
        HttpClient client = new HttpClient(...);
        action.run();
      }
    };
    return new DownloadThread(myAction);
  }
}

....but he still can't get that client reference into action.run() or
indeed the tasks' run()/call() methods, except by the way he's doing it
now (making it a field in DownloadThread).)

With the ThreadLocal, the first task that each thread executes will set
up the HttpClient, and subsequent tasks will re-use it:

// field with same lifetime as the ExecutorService
ThreadLocal<HttpClient> clientPool = new ThreadLocal<HttpClient>();

// in method creating new task
Runnable action = new Runnable() {
  public void run() {
    HttpClient client = clientPool.get();
    if (client == null) {
      client = new HttpClient(...);
      clientPool.set(client);
    }

    // Do stuff with client...
  }
};

I think that's cleaner, as it doesn't require a Thread subclass, so it
doesn't require a ThreadFactory, nor the 'bletcherous' cast to the subclass.

> I am not fond of 'ThreadLocal'; I'm not a fan of global objects
> generally.

What is global here?  The ThreadLocal instance need not be any more
global than the thread pool it's associated with.

The value returned by ThreadLocal#get is also not a global - in terms of
its lifetime, it exists (approximately) for as long as the Thread's own
Runnable#run() is executing, just as a local variable of that method
would, except it is also visible to anything that is both called by that
method and has access to clientPool.

-- 
ss at comp dot lancs dot ac dot uk

0
Reply Steven 1/17/2010 7:50:41 PM

On Sat, 16 Jan 2010, Roedy Green wrote:

> On Sun, 17 Jan 2010 01:11:36 +0000, Tom Anderson
> <twic@urchin.earth.li> wrote, quoted or indirectly quoted someone who
> said :
>
>> Any thoughts? What's the right way to do this?
>
> Does Executor dispense just raw Threads or you can you get it to 
> dispense your custom Thread class? If so, your thread constructor could 
> allocate the extra goodies.

I can, and that's exactly what it does:

ExecutorService executor = Executors.newFixedThreadPool(numThreads);
((ThreadPoolExecutor)executor).setThreadFactory(new DownloadThreadFactory());

public class DownloadThreadFactory implements ThreadFactory {
 	HttpConnectionManager connMgr = new MultiThreadedHttpConnectionManager();

 	public Thread newThread(Runnable r) {
 		return new DownloadThread(connMgr, r);
 	}
}

public class DownloadThread extends Thread {
 	private final HttpClient client;
 	private byte[] buffer;

 	public DownloadThread(HttpConnectionManager connMgr, Runnable r) {
 		super(r);
 		client = new HttpClient(connMgr);
 		buffer = new byte[1024];
 	}
}

tom

-- 
Why did one straw break the camel's back? Here's the secret: the million
other straws underneath it - it's all mathematics. -- Mos Def
0
Reply Tom 1/21/2010 11:06:28 PM

On Sat, 16 Jan 2010, Peter Duniho wrote:

> Tom Anderson wrote:
>> [...]
>> I could have written this with the Downloaders being the active party, 
>> going to a queue of URLs, pulling one off, downloading it, then going back 
>> for another. Or i could have put Downloaders in a pool, and had the 
>> submission mechanism pull them out and hand URLs to them. But i really 
>> wanted to be able to use all the cool stuff in ExecutorService, like 
>> getting Futures and having orderly shutdown, and a properly controllable 
>> thread pool and so on. So what i did was subclass Thread to add the other 
>> bits (the HttpClient and so on), and then, in the tasks, do something like:
>> 
>> ((DownloadThread)Thread.currentThread()).getHttpClient()
>> 
>> And so on. I thought that was quite clever, although it is clearly also 
>> entirely bletcherous.
>
> I don't see anything fundamentally wrong with the idea.  So long as you 
> really have a one-to-one relationship between the "downloader" and the 
> "thread" parts, maybe it does make sense to have a "downloader thread".
>
> As far as the specific line of code you posted, the one change I would have 
> made is to have an appropriate static method in DownloadThread:
>
>  class DownloadThread extends Thread
>  {
>    public static DownloadThread currentThread() throws  UnsupportedOperationException
>    {
>      Thread thread = Thread.currentThread();
>
>      if (thread instanceof DownloadThread)
>      {
>        return (DownloadThread)thread;
>      }
>
>      // Alternative to the below would be to just make the
>      // cast and let the bad cast exception propagate up
>      throw new UnsupportedOperationException("current thread is not a  DownloadThread");
>    }
>  }
>
> That way you don't have a bunch of explicit casting all over the place.

That's certainly an improvement.

> Though, that said it seems to me that most if not all of the time, you should 
> not need the currentThread() method anyway, because you should be executing 
> methods within the DownloadThread instance already.

Ah, no, because the idea is that the DownloadThread is infrastructure to 
support all sorts of generic downloading tasks. One might be to get a file 
from a given URL; another might be to get a series of files from the same 
server, yet another might download one thing, parse it to find the URL to 
another thing, then download that, etc. Moreover, a single DownloadThread 
might do different tasks in succession. That means having code outside the 
DownloadThread itself.

Although i admit that in this app, there was only one type of task. I 
would like to turn this into a generic utility, though.

> Cases where client code needs to get specific output of the 
> DownloadThread class would be handled instead via some mechanism where 
> you don't need to concern yourself with the current thread.  For 
> example, a listener/event API, or even just a simple completion 
> callback, either approach in a Future implementation where the relevant 
> instance of the DownloadThread class is delivered via some mechanism 
> other than checking the current Thread instance.
>
>> Any thoughts? What's the right way to do this?
>
> I guess the more I think about it, the more I wonder why you should ever need 
> to make the assumption that the DownloadThread object of interest is the same 
> as the current Thread object.
>
> I also think that you could implement a Downloader pool that is not so 
> explicitly tied to a Thread pool.  It seems to me that the Thread pool 
> dependency is an implementation detail, and should be abstracted such that 
> the client of the Downloader pool should not need to know or concern their 
> self with that detail.
>
> I would think that the Downloader pool client would simply submit the URLs, 
> and receive notification via some mechanism of completion defined by the 
> Downloader pool.  That the Downloader pool is itself depending on a Thread 
> pool behind the scenes should be irrelevant and unknowable from the API 
> alone.  To make the Downloader and Thread one and the same object seems like 
> a leaky abstraction to me.

Could be. If by 'thread pool', you mean something like an executor, where 
there is explicitly a pool of threads, then i can't see how you'd 
implement the Downloader pool - would Downloaders fetch threads from the 
pool when they need to run or something? Store the download task in 
themselves, then submit themselves to the executor, having a run method 
which runs the download task? Yes, that would work.

I wonder if there's a design pattern there - objects which submit 
themselves to be executed to service a request from another object, 
converting method parameters into instance fields. Sort of a variation of 
Active Object. I haven't seen it enough times to say so, really, but it 
has the feel of a pattern, somehow.

Thinking about it, i could probably do away with pools altogether and just 
have active threads pulling tasks from a queue and executing them; these 
tasks could have a wider interface than Runnable, through which a 
HttpClient and buffer could be passed:

interface Download {
 	public void perform(HttpClient client, byte[] buffer);
}

class DownloadWorker implements Runnable {
 	private BlockingQueue<Download> downloads; // shared
 	private HttpClient client;
 	private byte[] buffer;

 	public DownloadWorker(BlockingQueue<Download> downloads, HttpConnectionManager connMgr) {
 		this.downloads = downloads;
 		client = new HttpClient(connMgr);
 		buffer = new byte[1024];
 	}
 	public void run() {
 		// exception handling not shown
 		while (true) {
 			Download dl = downloads.take();
 			dl.perform(client, buffer);
 		}
 	}
}

BlockingQueue<Download> downloads;
HttpConnectionManager connMgr;
// pool management also not shown
for (int i = 0; i < numThreads; ++i) {
 	new Thread(new DownloadWorker(downloads, connMgr)).start();
}

Look ma, no executor!

> That said, I try to be pragmatic.  There are certainly others who are 
> more OOP-Nazi than I am (and I preemptively reject the rampant 
> over-application of Godwin's law that goes on in this newsgroup) and who 
> might insist against an unnecessary dependency in the object hierarchy 
> like that.
>
> But while I personally probably wouldn't implement it that way, I'm not 
> comfortable asserting that your own choice is wrong.  Especially not 
> having a full view of the entire system you're implementing, and 
> especially having experience with your other contributions to the 
> newsgroup (granted, sometimes having a positive reputation can interfere 
> with your ability to receive good, critical feedback because people make 
> too broad an assumption that you know what you're doing :) ).  Maybe for 
> your particular system, this really is a good design.

I refer you to the timestamp on my original message - it was getting on 
for one in the morning when i wrote that. I wouldn't for a second dream of 
trying to defend that code!

tom

-- 
Why did one straw break the camel's back? Here's the secret: the million
other straws underneath it - it's all mathematics. -- Mos Def
0
Reply Tom 1/21/2010 11:34:25 PM

On Sat, 16 Jan 2010, Daniel Pitts wrote:

> Tom Anderson wrote:
>
>> What if the resource needed to perform a task, the thing you wanted to 
>> maintain a limited pool of, reuse, and provide shared access to, was 
>> more than a Thread?
>
> The pattern you are asking about is called a Resource Pool.  I wouldn't 
> combine the concept of Thread Pool with HttpClient Pool.  They are 
> orthogonal concepts, so they should be accessible orthogonally.

Except there's a 1:1 relationship between threads and HttpClients. Yes, 
you could write a task which looked like:

public void run() {
 	HttpClient client = httpClientPool.acquire();
 	// do stuff
 	httpClientPool.release(client);
}

But that seems like unnecessary boilerplate. Still, it is actually less 
code than my smartarse solution, so maybe it's the right answer.

It would deal gracefully with the case where a task needs no HttpClient, 
or more than one, which mine doesn't. I wouldn't anticipate those being 
common, though.

tom

-- 
Why did one straw break the camel's back? Here's the secret: the million
other straws underneath it - it's all mathematics. -- Mos Def
0
Reply Tom 1/21/2010 11:36:52 PM

On Sun, 17 Jan 2010, Steven Simpson wrote:

> On 17/01/10 01:11, Tom Anderson wrote:
>> What if the resource needed to perform a task, the thing you wanted to
>> maintain a limited pool of, reuse, and provide shared access to, was
>> more than a Thread?
>>
>> [...] But i really wanted to be able to use all the cool stuff in
>> ExecutorService, like getting Futures and having orderly shutdown, and
>> a properly controllable thread pool and so on. So what i did was
>> subclass Thread to add the other bits (the HttpClient and so on), and
>> then, in the tasks, do something like:
>>
>> ((DownloadThread)Thread.currentThread()).getHttpClient()
>
> Would a ThreadLocal<HttpClient> be appropriate?

That would certainly do it. ThreadLocals make me uneasy, though. Don't 
really know why - intellectually, i know they're no different to a normal 
variable with the same scope.

tom

-- 
Why did one straw break the camel's back? Here's the secret: the million
other straws underneath it - it's all mathematics. -- Mos Def
0
Reply Tom 1/21/2010 11:38:00 PM

Tom Anderson wrote:
> On Sat, 16 Jan 2010, Daniel Pitts wrote:
> 
>> Tom Anderson wrote:
>>
>>> What if the resource needed to perform a task, the thing you wanted 
>>> to maintain a limited pool of, reuse, and provide shared access to, 
>>> was more than a Thread?
>>
>> The pattern you are asking about is called a Resource Pool.  I 
>> wouldn't combine the concept of Thread Pool with HttpClient Pool.  
>> They are orthogonal concepts, so they should be accessible orthogonally.
> 
> Except there's a 1:1 relationship between threads and HttpClients. Yes, 
> you could write a task which looked like:
> 
> public void run() {
>     HttpClient client = httpClientPool.acquire();
>     // do stuff
>     httpClientPool.release(client);
> }
> 
> But that seems like unnecessary boilerplate. Still, it is actually less 
> code than my smartarse solution, so maybe it's the right answer.
> 
> It would deal gracefully with the case where a task needs no HttpClient, 
> or more than one, which mine doesn't. I wouldn't anticipate those being 
> common, though.
> 
> tom
> 
Are HttpClient objects really that expensive to create? I noticed your 
solution is re-using the same connection manager...  Perhaps you're 
anticipating problems where there are non, and you should just use "new 
HttpClient(connectionManager)" where it is needed, and forget about 
HttpClient pooling.

*then*, consider profiling your code. If it turns out that there is a 
lot of time spent allocating and releasing HttpClient objects, then you 
should consider pooling.

-- 
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
0
Reply Daniel 1/21/2010 11:42:08 PM

On Sun, 17 Jan 2010, Steven Simpson wrote:

> On 17/01/10 14:52, Lew wrote:
>> Steven Simpson wrote:
>>> On 17/01/10 01:11, Tom Anderson wrote:
>>>> What if the resource needed to perform a task, the thing you wanted to
>>>> maintain a limited pool of, reuse, and provide shared access to, was
>>>> more than a Thread?
>>>>
>>>> [...] But i really wanted to be able to use all the cool stuff in
>>>> ExecutorService, like getting Futures and having orderly shutdown, and
>>>> a properly controllable thread pool and so on. So what i did was
>>>> subclass Thread to add the other bits (the HttpClient and so on), and
>>>> then, in the tasks, do something like:
>>>>
>>>> ((DownloadThread)Thread.currentThread()).getHttpClient()
>>>
>>> Would a ThreadLocal<HttpClient> be appropriate?
>>
>> My noion based on the question and some of the answers given is that
>> the Runnable or Callable running from the thread pool should
>> instantiate a client locally from the separate client pool.
>
> I thought the point was that you need at least one HttpClient per thread 
> (or you will have no more threads running than HttpClients), and you do 
> not need more than one per thread (spares are wasted).  (This is what I 
> take from the items of the pool being 'more than a Thread'.)  The 
> threads are already in a pool, so you may as well exploit the thread 
> pool policy for maintaining HttpClients, rather than trying to build a 
> parallel pool.

Exactly.

>> It won't need 'ThreadLocal' because the client reference will already 
>> be local.  The tangle comes from the idea that the client must be 
>> associated with the thread at construction of the thread.  I propose 
>> that the client be acquired in the run method of the thread.
>
> I am supposing that the OP is using one of the Executors statics to 
> build a thread pool, so he'll have to pass in a ThreadFactory, in order 
> to inject his DownloadThread subclass.  The thread pool determines what 
> Runnable to pass to the thread, so he can't inject his own 
> client-acquiring code there.  (He could supply his own Runnable to the 
> thread to create the client, and then run the pool's Runnable:
>
> new ThreadFactory() {
>  public Thread createThread(final Runnable action) {
>    Runnable myAction = new Runnable() {
>      public void run() {
>        HttpClient client = new HttpClient(...);
>        action.run();
>      }
>    };
>    return new DownloadThread(myAction);
>  }
> }
>
> ...but he still can't get that client reference into action.run() or 
> indeed the tasks' run()/call() methods, except by the way he's doing it 
> now (making it a field in DownloadThread).)

Right. The whole DownloadThread thing was really an attempt to pass the 
HttpClient around the side of the Runnable interface.

> With the ThreadLocal, the first task that each thread executes will set
> up the HttpClient, and subsequent tasks will re-use it:
>
> // field with same lifetime as the ExecutorService
> ThreadLocal<HttpClient> clientPool = new ThreadLocal<HttpClient>();
>
> // in method creating new task
> Runnable action = new Runnable() {
>  public void run() {
>    HttpClient client = clientPool.get();
>    if (client == null) {
>      client = new HttpClient(...);
>      clientPool.set(client);
>    }
>
>    // Do stuff with client...
>  }
> };

You could remove the uncertainty by putting the setup in the factory:

public Thread newThread(Runnable r) {
 	return new Thread(new Runnable() {
 		public void run() {
 			clientPool.set(new HttpClient(connMgr));
 			r.run();
 		}
 	});
}

That will run the HttpClient setup before heading into the code supplied 
by the Executor, which is presumably some kind of internal class which 
pulls Runnables off the Executor's queue.

> I think that's cleaner, as it doesn't require a Thread subclass, so it
> doesn't require a ThreadFactory, nor the 'bletcherous' cast to the subclass.

But it does require a ThreadLocal, which is on about the same level as 
Thread subclasses and bletcherous casts, ie correct and legal, but 
aesthetically displeasing.

tom

-- 
Why did one straw break the camel's back? Here's the secret: the million
other straws underneath it - it's all mathematics. -- Mos Def
0
Reply Tom 1/21/2010 11:46:52 PM

Tom Anderson wrote:
> On Sun, 17 Jan 2010, Steven Simpson wrote:
> 
>> On 17/01/10 14:52, Lew wrote:
>>> Steven Simpson wrote:
>>>> On 17/01/10 01:11, Tom Anderson wrote:
>>>>> What if the resource needed to perform a task, the thing you wanted to
>>>>> maintain a limited pool of, reuse, and provide shared access to, was
>>>>> more than a Thread?
>>>>>
>>>>> [...] But i really wanted to be able to use all the cool stuff in
>>>>> ExecutorService, like getting Futures and having orderly shutdown, and
>>>>> a properly controllable thread pool and so on. So what i did was
>>>>> subclass Thread to add the other bits (the HttpClient and so on), and
>>>>> then, in the tasks, do something like:
>>>>>
>>>>> ((DownloadThread)Thread.currentThread()).getHttpClient()
>>>>
>>>> Would a ThreadLocal<HttpClient> be appropriate?
>>>
>>> My noion based on the question and some of the answers given is that
>>> the Runnable or Callable running from the thread pool should
>>> instantiate a client locally from the separate client pool.
>>
>> I thought the point was that you need at least one HttpClient per 
>> thread (or you will have no more threads running than HttpClients), 
>> and you do not need more than one per thread (spares are wasted).  
>> (This is what I take from the items of the pool being 'more than a 
>> Thread'.)  The threads are already in a pool, so you may as well 
>> exploit the thread pool policy for maintaining HttpClients, rather 
>> than trying to build a parallel pool.
> 
> Exactly.
> 
>>> It won't need 'ThreadLocal' because the client reference will already 
>>> be local.  The tangle comes from the idea that the client must be 
>>> associated with the thread at construction of the thread.  I propose 
>>> that the client be acquired in the run method of the thread.
>>
>> I am supposing that the OP is using one of the Executors statics to 
>> build a thread pool, so he'll have to pass in a ThreadFactory, in 
>> order to inject his DownloadThread subclass.  The thread pool 
>> determines what Runnable to pass to the thread, so he can't inject his 
>> own client-acquiring code there.  (He could supply his own Runnable to 
>> the thread to create the client, and then run the pool's Runnable:
>>
>> new ThreadFactory() {
>>  public Thread createThread(final Runnable action) {
>>    Runnable myAction = new Runnable() {
>>      public void run() {
>>        HttpClient client = new HttpClient(...);
>>        action.run();
>>      }
>>    };
>>    return new DownloadThread(myAction);
>>  }
>> }
>>
>> ...but he still can't get that client reference into action.run() or 
>> indeed the tasks' run()/call() methods, except by the way he's doing 
>> it now (making it a field in DownloadThread).)
> 
> Right. The whole DownloadThread thing was really an attempt to pass the 
> HttpClient around the side of the Runnable interface.
> 
>> With the ThreadLocal, the first task that each thread executes will set
>> up the HttpClient, and subsequent tasks will re-use it:
>>
>> // field with same lifetime as the ExecutorService
>> ThreadLocal<HttpClient> clientPool = new ThreadLocal<HttpClient>();
>>
>> // in method creating new task
>> Runnable action = new Runnable() {
>>  public void run() {
>>    HttpClient client = clientPool.get();
>>    if (client == null) {
>>      client = new HttpClient(...);
>>      clientPool.set(client);
>>    }
>>
>>    // Do stuff with client...
>>  }
>> };
> 
> You could remove the uncertainty by putting the setup in the factory:
> 
> public Thread newThread(Runnable r) {
>     return new Thread(new Runnable() {
>         public void run() {
>             clientPool.set(new HttpClient(connMgr));
>             r.run();
>         }
>     });
> }
> 
> That will run the HttpClient setup before heading into the code supplied 
> by the Executor, which is presumably some kind of internal class which 
> pulls Runnables off the Executor's queue.
> 
>> I think that's cleaner, as it doesn't require a Thread subclass, so it
>> doesn't require a ThreadFactory, nor the 'bletcherous' cast to the 
>> subclass.
> 
> But it does require a ThreadLocal, which is on about the same level as 
> Thread subclasses and bletcherous casts, ie correct and legal, but 
> aesthetically displeasing.
> 
> tom
> 
Actually, the appropriate construct would be:

private final ThreadLocal<HttpClient> pool = ThreadLocal<HttpClient>() {
         @Override
         protected HttpClient initialValue() {
            return new HttpClient(connMgr);
         }
     };

-- 
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
0
Reply Daniel 1/21/2010 11:51:40 PM

Tom Anderson wrote:
> [...]
>> Though, that said it seems to me that most if not all of the time, you 
>> should not need the currentThread() method anyway, because you should 
>> be executing methods within the DownloadThread instance already.
> 
> Ah, no, because the idea is that the DownloadThread is infrastructure to 
> support all sorts of generic downloading tasks. One might be to get a 
> file from a given URL; another might be to get a series of files from 
> the same server, yet another might download one thing, parse it to find 
> the URL to another thing, then download that, etc. Moreover, a single 
> DownloadThread might do different tasks in succession. That means having 
> code outside the DownloadThread itself.

Granted, this is strictly intuition talking.  But, IMHO if your proposed 
implementation really does rely on assuming the current thread is in 
fact a DownloadThread instance, and it has to retrieve that instance via 
the static method, that is to me strongly suggestive that the 
implementation's design is awkward at best, and flawed at worst.

I'm not comfortable making a blanket critique without seeing a full 
implementation, but as a general rule, that sort of thing seems fishy to me.

> [...]
>> I would think that the Downloader pool client would simply submit the 
>> URLs, and receive notification via some mechanism of completion 
>> defined by the Downloader pool.  That the Downloader pool is itself 
>> depending on a Thread pool behind the scenes should be irrelevant and 
>> unknowable from the API alone.  To make the Downloader and Thread one 
>> and the same object seems like a leaky abstraction to me.
> 
> Could be. If by 'thread pool', you mean something like an executor, 
> where there is explicitly a pool of threads,

Yes, exactly.  Sorry for the vague terminology�I program in multiple 
platforms, all of which have their own terminology for "thread pool". 
So I just use the generic description.  The basic feature common to all 
is that there's an abstraction away from the threads themselves, with 
the "thread pool" class taking care of mapping specific asynchronous 
operations submitted in some form (function pointers in some cases, 
interface implementations in others) and managing how many threads are 
running, which threads get the next task, etc.

> then i can't see how you'd 
> implement the Downloader pool - would Downloaders fetch threads from the 
> pool when they need to run or something? Store the download task in 
> themselves, then submit themselves to the executor, having a run method 
> which runs the download task? Yes, that would work.

So, by the end of the above paragraph, I assume you _do_ see how you'd 
implement the Downloader pool?  :)

> I wonder if there's a design pattern there - objects which submit 
> themselves to be executed to service a request from another object, 
> converting method parameters into instance fields. Sort of a variation 
> of Active Object. I haven't seen it enough times to say so, really, but 
> it has the feel of a pattern, somehow.

I'm not current on the _names_ of the various design patterns.  However, 
yes�there's definitely a pattern there.  Note that at the very least, 
the conversion of method parameters to instance fields is a way of 
representing a given method call as a "task" ("Future" in Java).  The 
abstraction being some unit of work that can be done in isolation.

In some environments (e.g. C#), the conversion can actually be done 
implicitly, using closures.  The method arguments wind up wrapped in a 
class instance that the compiler generates, making the actual code 
somewhat more concise and simpler to write.

In Java, you'd just write a specific class to represent the task, 
probably implementing Future.  Or possibly just make an anonymous Future 
implementation, with the method parameters being declared as final so 
that they can be used in the anonymous Future implementation.

> Thinking about it, i could probably do away with pools altogether and 
> just have active threads pulling tasks from a queue and executing them; 
> these tasks could have a wider interface than Runnable, through which a 
> HttpClient and buffer could be passed: [...]

Sure.  That's more "producer/consumer" and IMHO would work nicely here 
too.  Though, to some extent that's exactly what a thread pool is 
anyway.  It's a consumer of tasks, with producers submitting those tasks 
to the thread pool.  :)

Pete
0
Reply Peter 1/22/2010 12:36:15 AM

16 Replies
159 Views

(page loaded in 0.178 seconds)

Similiar Articles:
















7/4/2012 6:21:08 PM


Reply: