Hello, I would like to have your opinion about a network application I am currently designing/developing. The aim of this application consists in receiving data counting information (e.g. the number of bytes received from TCP clients) from several hosts, then aggregating them (sum them) and finally sending the results to other hosts. Let's say that this application have to handle data counting from hostA and hostB. These hosts both store a counter name 'bcount' for each TCP clients identified by their hostname or IP address they receive requests from. Each time these hosts receive a TCP request from a client, they send their respective updated 'bcount' values to my application. I have already implemented some helpful data structures: generic single linked list, generic hash tables which are arrays of single linked lists with powers of two as size. Here is my design for this application: The main thread store the same duplicated data for each host in a hash table (a hash table by host). So, each entry of these hash tables would have a hostname (TCP clients hostnames or IP address) as key and their associated 'bcount' value as value. Each time the main thread receives an update for a client hostname from hostA or hostB, it updates another hash table made of entries with the same information as key, TCP client hostname, but with the aggregated value of 'bcount' values for each host (hostA&B) hash table entry for this TCP client hostname. Then it informs a list of threads of such an information arrival adding a pointer to this entry to a queue (double linked list) associated with a counter (equal to the number of threads). The awaken threads would have to consume the entry from the queue, decrementing the associated counter. The last one would have to remove the entry itself from the queue, and from the hash table of aggregated entries so that not to consume too much memory with this latter hash table. What do you think about this design? I have not found any more easy /efficient way to implement these application features. I must find the easiest way to implement this application which must be fast and efficient without consuming too much memory. Regards.
On Thu, 20 Oct 2016 14:06:33 +0200 root_at_home <email@example.com> wrote: > The aim of this application consists in receiving data counting > information (e.g. the number of bytes received from TCP clients) from > several hosts, then aggregating them (sum them) and finally sending > the results to other hosts. The design you describe is quite complex. One thread for reading, a pool for writing. Why do you assume a single thread updating your statistics can't keep up with N hosts sending updates? Especially because you have to arbitrate access to the store (your hash table). I suggest: * a single thread, multiplexed with select, poll, or what have you. * tsearch(3) Read the message, tsearch the key, add the retrieved bcount to the read bcount, and update the record. (There's no "put"; tsearch returns a pointer into the tree.) That will be a lot less code and machinery, and will run nearly as fast as one processor will go. If you need more, use two processes connected by a Unix domain socket or Posix message queue: one to read the network and enqueue the message, one to deque the message and update the store. If that's not fast enough, I bet contention for the store will be the hot spot, and you'll have to use more than one. --jkl