f



Why can't my serial driver can't keep up?

Hi,
 I'm trying to write a serial driver for Linux 2.6.7 on an Intel Xscale
PXA255 for a custom UART implemented in an FPGA.  The UART is accessed
through the PCI bus and seems to be working correctly.  I'm testing out
the driver by sending it 260,000 bytes at 19200 no parity, 8 data bits,
1 stop bit.  Unfortunately it doesn't always receive all the data. It
frequently is missing a block of about 30-100 bytes.  I set up the UART
to be 64 bytes deep and interrupt the processor when the FIFO gets 14
bytes full, but I think something is delaying my ISR because the data
is missing on the receive because the UART's FIFO is allowed to
overflow.  My ISR code is just a tight loop that copies no more than 64
bytes data from the UART over the PCI bus to a tty flip buffer using
tty_insert_flip_char().  The driver seems to work perfectly all the
time at 9600.  The processor is running at 400MHz and not running any
other code.   I'm having a hard time seeing why it can't keep up with
data at 19200, and unfortunately the device that should be sending the
data doesn't understand flow control.

I'm relatively new at driver writing so any suggestions on how to help
debug this problem would be greatly appreciated. 

Thanks

Ryan

0
ryan
12/16/2005 4:02:29 PM
comp.linux.development.system 5436 articles. 0 followers. zixenus (12) is leader. Post Follow

6 Replies
771 Views

Similar Articles

[PageSpeed] 9

Here's a little follow up on this problem, I put in some code to
trigger an FPGA line on entry and exit of the ISR.  I'm watching that
along with the actual Interrupt line on my logic analyzer to get the
following timing information.

Typically the ISR responds to the interrupt in 4-6uS and takes a total
of around 22uS to complete. This is what I would expect.  The problem
is when data is missing I see a point where it took 345ms to respond to
the ISR.  I know that Linux is not a RTOS but should it ever take that
long to respond to an interrupt?  What could the kernel be doing that
would change the response time from tens of microseconds to hundreds of
milliseconds?

Thanks for any help

Ryan

0
ryan
12/27/2005 8:05:12 PM
"ryan.stowell@gmail.com" wrote:
> 
> The problem
> is when data is missing I see a point where it took 345ms to respond to
> the ISR.  I know that Linux is not a RTOS but should it ever take that
> long to respond to an interrupt?

Clearly that shouldn't be happening. Typical Linux systems
handle timer interrupts every 1-10ms. If the kernel doesn't
start servicing an IRQ within a few microseconds, they must
be disabled somehow. They could be disabled in the CPU or
in the IRQ controller.

> What could the kernel be doing that
> would change the response time from tens of microseconds to hundreds of
> milliseconds?

The kernel does disable interrupts in certain critical parts
of the code. But those shouldn't be spending lots of time.
Unfortunately I cannot give you an easy way to find out
which one spends too much time.

It could actually be convenient to have some debugging
facility reading the TSC when interrupts are disabled and
enabled such that excesive amounts of time can be noticed
and logged. (I don't know if this is any help to you).

If the kernel has disabled interrupts in the controller,
it may be trying to avoid an IRQ storm. I don't know if
you have given the kernel any reason to think there is an
IRQ storm.

-- 
Kasper Dupont
Note to self: Don't try to allocate
256000 pages with GFP_KERNEL on x86.
0
Kasper
12/27/2005 8:36:37 PM
On Tue, 27 Dec 2005 21:36:37 +0100, Kasper Dupont <64459405747538527765@expires.07.feb.2006.kasperd.net.invalid> wrote:

> "ryan.stowell@gmail.com" wrote:
>>
>> The problem
>> is when data is missing I see a point where it took 345ms to respond to
>> the ISR.

I don't understand the clause "when data is missing".  What data?
Missing where?  Is there any correlation between the state of affairs
(presence of data) in the device, and the kernel's response time?

-Enrique
0
Enrique
12/28/2005 4:10:02 AM
> I don't understand the clause "when data is missing"

"When data is missing" refers to the failure of the UART to deliver all
the transfered data because its FIFO is allowed to overflow due to the
time it takes the ISR to be called in some cases (as i guessed in the
original post on this subject). I have unfortunately been unable to
correlate this delay to any other factors in the kernel.

0
ryan
12/28/2005 3:21:16 PM
On Tue, 27 Dec 2005 21:36:37 +0100, Kasper Dupont
<64459405747538527765@expires.07.feb.2006.kasperd.net.invalid> wrote:

> "ryan.stowell@gmail.com" wrote:
>> 
>> The problem is when data is missing I see a point where it took 345ms
>> to respond to the ISR.  I know that Linux is not a RTOS but should it
>> ever take that long to respond to an interrupt?

That's a third of a second!  The kernel should never disable interrupts
for that long.  For one thing, he'll miss hundreds of timer ticks.


> Clearly that shouldn't be happening. Typical Linux systems handle
> timer interrupts every 1-10ms. 

> If the kernel doesn't start servicing an IRQ within a few
> microseconds, they must be disabled somehow. They could be disabled in
> the CPU or in the IRQ controller.

Or somewhere in the BIOS SMI or ACPI code, if there is some sort of
power management activated.  Also, things like "legacy USB" support in
the BIOS may result in SMI taking over the machine once in a while,
depending on how they're implemented.  The RTAI folk have documented
some instances where the SMI/ACPI effect causes delays in the range of
tens of milliseconds every few seconds.

Could also be a priority thing.  Is a higher-priority interrupt
preventing yours from being serviced?  Is your interrup shared with any
other device that may not be working correctly if the IRQ is shared?

Just to point out that the problem may not be in the kernel.


>> What could the kernel be doing that would change the response time
>> from tens of microseconds to hundreds of milliseconds?
>
> The kernel does disable interrupts in certain critical parts
> of the code. But those shouldn't be spending lots of time.

Yeah, 345 ms is just ridiculous.  Even the old problems with IDE masking
interrupts weren't that bad.


-- 
 -| Bob Hauck
 -| A proud member of the reality-based community.
 -| http://www.haucks.org/
0
Bob
12/28/2005 4:44:59 PM
> Could also be a priority thing.  Is a higher-priority interrupt
> preventing yours from being serviced?  Is your interrup shared with any
> other device that may not be working correctly if the IRQ is shared?

The interrupt isn't shared with anything else. I'm not sure how
interrupts are prioritized, I'm not making this a "fast interrupt" so
other interrupts can interrupt me though.

I think I have found a cause of the problem, although I'm not sure why
it should be the cause.  I removed the nand flash driver that my system
uses to access its file system and mounted the file system off of NFS
instead. This completely resolved the problem of long interrupt
latency.
I'm not sure why the flash driver would be hogging the kernel though
since I'm not using any swap space or doing any file I/O when running
my test. Unfortunately the driver is not open source so I'm going to
have to take this up with the manufacturer or adapt another driver to
work on my system.

Thanks for all the help.

Ryan

0
ryan
12/28/2005 9:27:43 PM
Reply: