Hi all,
I would like to know if there are any ways to increase the rate at
which packets can be read/copied into a buffer.
I'm writing a program that reads a large pcap file using pcap_loop,
puts the packets into a 100MB buffer and then processes this 100MB
buffer on a GPU. As the program is currently single thread, it will
fill up the buffer, copy the buffer to GPU, process on GPU, copy the
results back to CPU, then repeat the process again by filling up the
buffer until no more packets are read. The way the buffer is filled is
by memcpy-ing every packet into this buffer.
The transferring of data to/from CPU/GPU and the GPU processing is
very fast (multi-Gbps). The bottleneck lies in the memcpy-ing packets
to buffer portion, which can reach only up to 1 Gbps. Therefore, the
whole program can never exceed 1 Gbps.
Is there anyway I can speed up this process?
Thank you.
Regards,
Rayne
|
|
0
|
|
|
|
Reply
|
lancer6238
|
3/25/2011 9:53:19 AM |
|
Forgot to add, I allocated memory for the buffer once at the start of
the program, then write over the existing contents during the memcpy-
ing, and free the memory at the end of the program.
|
|
0
|
|
|
|
Reply
|
lancer6238
|
3/25/2011 10:33:28 AM
|
|
lancer6238@yahoo.com <lancer6238@yahoo.com> wrote:
> Hi all,
> I would like to know if there are any ways to increase the rate at
> which packets can be read/copied into a buffer.
Don't copy them :) (Or at least, not as often)
> I'm writing a program that reads a large pcap file using pcap_loop,
> puts the packets into a 100MB buffer and then processes this 100MB
> buffer on a GPU. As the program is currently single thread, it will
> fill up the buffer, copy the buffer to GPU, process on GPU, copy the
> results back to CPU, then repeat the process again by filling up the
> buffer until no more packets are read. The way the buffer is filled
> is by memcpy-ing every packet into this buffer.
> The transferring of data to/from CPU/GPU and the GPU processing is
> very fast (multi-Gbps). The bottleneck lies in the memcpy-ing
> packets to buffer portion, which can reach only up to 1
> Gbps. Therefore, the whole program can never exceed 1 Gbps.
It seems very odd that you can transfer data to/from the GPU very fast
but can copy internally via the CPU at only 1 Gbps.
> Is there anyway I can speed up this process?
Can your GPU do DMA? Or must you copy data into/out of it with the
processor? If it can do DMA, I believe that there is an mmap()
interface for libpcap and you would use that, and "double buffer."
Allocate two buffers, point libpcap at one, fill it, point libpcap at
the second one, hand the first to the GPU for it to DMA and process,
and then taking at face value the assertion about the speed of the
GPU, check the completion status once the second buffer has filled via
libpcap. Hand the second buffer to the GPU and go back to libpcap
with the first. lather, rinse, repeat.
rick jones
--
portable adj, code that compiles under more than one compiler
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
|
|
0
|
|
|
|
Reply
|
Rick
|
3/25/2011 5:03:57 PM
|
|
On Mar 26, 1:03=A0am, Rick Jones <rick.jon...@hp.com> wrote:
> Can your GPU do DMA? =A0Or must you copy data into/out of it with the
> processor? =A0If it can do DMA, I believe that there is an mmap()
> interface for libpcap and you would use that, and "double buffer."
> Allocate two buffers, point libpcap at one, fill it, point libpcap at
> the second one, hand the first to the GPU for it to DMA and process,
> and then taking at face value the assertion about the speed of the
> GPU, check the completion status once the second buffer has filled via
> libpcap. =A0Hand the second buffer to the GPU and go back to libpcap
> with the first. =A0lather, rinse, repeat.
I'm using Nvidia GTX 580, which I believe can do DMA.
So the buffer allocation would be using mmap()?
Looking at the mmap() function, void *mmap(void *start, size_t length,
int prot, int flags, int fd, off_t offset), what would I use for the
file descriptor, fd? I don't want to read the file directly, but want
to simulate live packet processing where packets are received one by
one.
|
|
0
|
|
|
|
Reply
|
lancer6238
|
3/28/2011 7:05:34 AM
|
|
lancer6238@yahoo.com <lancer6238@yahoo.com> wrote:
> On Mar 26, 1:03 am, Rick Jones <rick.jon...@hp.com> wrote:
> > Can your GPU do DMA? Or must you copy data into/out of it with the
> > processor? If it can do DMA, I believe that there is an mmap()
> > interface for libpcap and you would use that, and "double buffer."
> > Allocate two buffers, point libpcap at one, fill it, point libpcap at
> > the second one, hand the first to the GPU for it to DMA and process,
> > and then taking at face value the assertion about the speed of the
> > GPU, check the completion status once the second buffer has filled via
> > libpcap. Hand the second buffer to the GPU and go back to libpcap
> > with the first. lather, rinse, repeat.
> I'm using Nvidia GTX 580, which I believe can do DMA.
> So the buffer allocation would be using mmap()?
> Looking at the mmap() function, void *mmap(void *start, size_t length,
> int prot, int flags, int fd, off_t offset), what would I use for the
> file descriptor, fd? I don't want to read the file directly, but want
> to simulate live packet processing where packets are received one by
> one.
You need to check that you indeed have a libpcap which supports mmap()
and see how it wants things. Per the mmap manpage, if one does
MAP_ANONYMOUS, the fd argument is ignored.
Actually, in live packet processing - down at the NIC level anyway -
packets are not received one by one. Interrupt coalescing is used and
multiple packets are "received" by the driver in one swell foop.
rick jones
--
It is not a question of half full or empty - the glass has a leak.
The real question is "Can it be patched?"
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
|
|
0
|
|
|
|
Reply
|
Rick
|
3/28/2011 4:23:52 PM
|
|
|
4 Replies
245 Views
(page loaded in 0.067 seconds)
|