error detection rate with crc-16 CCITT

  • Follow


Hi

We're using the 68302 micro with DDCMP serial protocol over two wire
RS485.  According to the user manual, this uses CRC16-CCITT  - X**16
X**12  X**5 + 1.

Does anyone have any idea what the chance of getting an undetected
error is with this protocol?  I know all single bit errors are
detected.  Supposing we run a point to point connection at slightly
faster than it's really capable of and we get 10% of messages with
more than a single bit error.  What percentage of these will go
undetected by the CRC check?

Suppose we run the connection at a "normal" baud rate with almost no
errors.  What is the likelihood of getting undetected errors now?

Thanks for any help.
0
Reply shane.2471958 (30) 3/27/2011 8:58:32 AM

On Sun, 27 Mar 2011 01:58:32 -0700 (PDT), Shane williams
<shane.2471958@gmail.com> wrote:

>Hi
>
>We're using the 68302 micro with DDCMP serial protocol over two wire
>RS485.  According to the user manual, this uses CRC16-CCITT  - X**16
>X**12  X**5 + 1.
>
>Does anyone have any idea what the chance of getting an undetected
>error is with this protocol?  I know all single bit errors are
>detected.  Supposing we run a point to point connection at slightly
>faster than it's really capable of and we get 10% of messages with
>more than a single bit error.  What percentage of these will go
>undetected by the CRC check?
>
>Suppose we run the connection at a "normal" baud rate with almost no
>errors.  What is the likelihood of getting undetected errors now?

The Wikipedia article on the "Mathematics of CRC" is short and a good
place to start. The paper it references
<http://www.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf>
has the analysis you are looking for. Note (as mentioned in the
wikipedia article) that the paper's convention for representing the
polynomial differs from the usual method.

-- 
Rich Webb     Norfolk, VA
0
Reply Rich 3/27/2011 10:35:02 AM


In article <13c95ff0-d9ca-4f0b-92a4-d21fe6c36c55
@j35g2000prb.googlegroups.com>, shane.2471958@gmail.com says...
> 
> Hi
> 
> We're using the 68302 micro with DDCMP serial protocol over two wire
> RS485.  According to the user manual, this uses CRC16-CCITT  - X**16
> X**12  X**5 + 1.
> 
> Does anyone have any idea what the chance of getting an undetected
> error is with this protocol?  I know all single bit errors are
> detected.  Supposing we run a point to point connection at slightly
> faster than it's really capable of and we get 10% of messages with
> more than a single bit error.  What percentage of these will go
> undetected by the CRC check?
> 
> Suppose we run the connection at a "normal" baud rate with almost no
> errors.  What is the likelihood of getting undetected errors now?
> 
> Thanks for any help.


The CRC-16 will be able to detect errors in 99.9984 percent of cases.
This stems from the code being one value off out of 16-bits of
error code count. 

65535 / 65536 = 0.999984 percent

See:
http://automationwiki.com/index.php?title=CRC-16-CCITT

for some implementation ideas.

-------------

Are you getting some of the errors in your transmission path
due to distortion of the RS485 waveform due to non-equal propagation
delays through your logic on the "0"-->"1" transition versus the
one from "1"-->"0"? Common problem with certain optocouplers. ;-)


-- 

Michael Karas
Carousel Design Solutions
http://www.carousel-design.com
0
Reply Michael 3/27/2011 10:53:26 AM

On Mar 27, 11:53=A0pm, Michael Karas <mka...@carousel-design.com> wrote:
> In article <13c95ff0-d9ca-4f0b-92a4-d21fe6c36c55
> @j35g2000prb.googlegroups.com>, shane.2471...@gmail.com says...
>
>
>
>
>
>
>
>
>
>
>
> > Hi
>
> > We're using the 68302 micro with DDCMP serial protocol over two wire
> > RS485. =A0According to the user manual, this uses CRC16-CCITT =A0- X**1=
6
> > X**12 =A0X**5 + 1.
>
> > Does anyone have any idea what the chance of getting an undetected
> > error is with this protocol? =A0I know all single bit errors are
> > detected. =A0Supposing we run a point to point connection at slightly
> > faster than it's really capable of and we get 10% of messages with
> > more than a single bit error. =A0What percentage of these will go
> > undetected by the CRC check?
>
> > Suppose we run the connection at a "normal" baud rate with almost no
> > errors. =A0What is the likelihood of getting undetected errors now?
>
> > Thanks for any help.
>
> The CRC-16 will be able to detect errors in 99.9984 percent of cases.
> This stems from the code being one value off out of 16-bits of
> error code count.
>
> 65535 / 65536 =3D 0.999984 percent
>
> See:http://automationwiki.com/index.php?title=3DCRC-16-CCITT
>
> for some implementation ideas.
>
> -------------
>
> Are you getting some of the errors in your transmission path
> due to distortion of the RS485 waveform due to non-equal propagation
> delays through your logic on the "0"-->"1" transition versus the
> one from "1"-->"0"? Common problem with certain optocouplers. ;-)
>
> --
>
> Michael Karas
> Carousel Design Solutionshttp://www.carousel-design.com

Thanks. I'm trying to figure out whether it's possible/ viable to
dynamically determine the fastest baud rate we can use by checking the
error rate.  The cable lengths and types of wire used when our systems
are installed varies and I was hoping we could automatically work out
what speed a particular connection can run at.  The spec for the
MOC5007 Optocoupler seems a bit vague so I was trying to find a better
one.


0
Reply Shane 3/27/2011 11:39:15 AM


Shane williams wrote:


> Thanks. I'm trying to figure out whether it's possible/ viable to
> dynamically determine the fastest baud rate we can use by checking the
> error rate.

Yes. But:

1) It is easier, faster and more reliable to evaluate the channel by 
transmitting a known pseudo-random test pattern rather then the actual data.

2) If the baud rate is changed dynamically, how would the receivers know 
the baud rate of the transmitters?

3) Since the system is intended to be operable even at the lowest baud, 
why not always use the lowest baud?


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
0
Reply Vladimir 3/27/2011 2:21:18 PM

> I'm trying to figure out whether it's possible/ viable to
> dynamically determine the fastest baud rate we can use by checking the
> error rate. 

Packet length for a 16 bit CRCs should be limited to 4kbyte.
   The CRC doesn�t know at which baud rate the packets are coming.
Your assumption ( which may well be true ) is that the error-pattern
shifts from singlebit to bursts and more errors will go undetected.
But the detected error rate would go way up too. By counting
retransmissions now and later at the higher baud rate one could
easily see if that has happened and switch to a 24 bit or 32 bit CRC.

MfG JRD




0
Reply Rafael 3/27/2011 2:46:09 PM

On 03/27/2011 03:53 AM, Michael Karas wrote:
> In article<13c95ff0-d9ca-4f0b-92a4-d21fe6c36c55
> @j35g2000prb.googlegroups.com>, shane.2471958@gmail.com says...
>>
>> Hi
>>
>> We're using the 68302 micro with DDCMP serial protocol over two wire
>> RS485.  According to the user manual, this uses CRC16-CCITT  - X**16
>> X**12  X**5 + 1.
>>
>> Does anyone have any idea what the chance of getting an undetected
>> error is with this protocol?  I know all single bit errors are
>> detected.  Supposing we run a point to point connection at slightly
>> faster than it's really capable of and we get 10% of messages with
>> more than a single bit error.  What percentage of these will go
>> undetected by the CRC check?
>>
>> Suppose we run the connection at a "normal" baud rate with almost no
>> errors.  What is the likelihood of getting undetected errors now?
>>
>> Thanks for any help.
>
>
> The CRC-16 will be able to detect errors in 99.9984 percent of cases.
> This stems from the code being one value off out of 16-bits of
> error code count.
>
> 65535 / 65536 = 0.999984 percent

It isn't that simple.  CRC-16 will be able to detect _all_ 1, 2 and 3 
bit errors, and some 4-bit errors.  How many 'cases' of four bit errors 
in a message depends on the message length and your error rate, so right 
there your fixed percentage of errors detected goes right out the window.

Read the article cited by Rich Webb.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html
0
Reply Tim 3/27/2011 5:44:00 PM

On 03/27/2011 04:39 AM, Shane williams wrote:
> On Mar 27, 11:53 pm, Michael Karas<mka...@carousel-design.com>  wrote:
>> In article<13c95ff0-d9ca-4f0b-92a4-d21fe6c36c55
>> @j35g2000prb.googlegroups.com>, shane.2471...@gmail.com says...
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> Hi
>>
>>> We're using the 68302 micro with DDCMP serial protocol over two wire
>>> RS485.  According to the user manual, this uses CRC16-CCITT  - X**16
>>> X**12  X**5 + 1.
>>
>>> Does anyone have any idea what the chance of getting an undetected
>>> error is with this protocol?  I know all single bit errors are
>>> detected.  Supposing we run a point to point connection at slightly
>>> faster than it's really capable of and we get 10% of messages with
>>> more than a single bit error.  What percentage of these will go
>>> undetected by the CRC check?
>>
>>> Suppose we run the connection at a "normal" baud rate with almost no
>>> errors.  What is the likelihood of getting undetected errors now?
>>
>>> Thanks for any help.
>>
>> The CRC-16 will be able to detect errors in 99.9984 percent of cases.
>> This stems from the code being one value off out of 16-bits of
>> error code count.
>>
>> 65535 / 65536 = 0.999984 percent
>>
>> See:http://automationwiki.com/index.php?title=CRC-16-CCITT
>>
>> for some implementation ideas.
>>
>> -------------
>>
>> Are you getting some of the errors in your transmission path
>> due to distortion of the RS485 waveform due to non-equal propagation
>> delays through your logic on the "0"-->"1" transition versus the
>> one from "1"-->"0"? Common problem with certain optocouplers. ;-)
>>
>> --
>>
>> Michael Karas
>> Carousel Design Solutionshttp://www.carousel-design.com
>
> Thanks. I'm trying to figure out whether it's possible/ viable to
> dynamically determine the fastest baud rate we can use by checking the
> error rate.  The cable lengths and types of wire used when our systems
> are installed varies and I was hoping we could automatically work out
> what speed a particular connection can run at.  The spec for the
> MOC5007 Optocoupler seems a bit vague so I was trying to find a better
> one.

If you creep up on things, looking for one or two bit errors per packet 
and backing off, then you should do OK.  I'm with Vladimir, however, 
that if you can you should consider just sending pseudo-random 
sequences.  Error counting with those is easy-peasy, and if you know 
it's coming down the pike you don't have to worry about corrupting data 
that you depend on.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html
0
Reply Tim 3/27/2011 5:47:05 PM

On 03/27/2011 07:21 AM, Vladimir Vassilevsky wrote:
>
>
> Shane williams wrote:
>
>
>> Thanks. I'm trying to figure out whether it's possible/ viable to
>> dynamically determine the fastest baud rate we can use by checking the
>> error rate.
>
> Yes. But:
>
> 1) It is easier, faster and more reliable to evaluate the channel by
> transmitting a known pseudo-random test pattern rather then the actual
> data.

I've done this -- and it is.

> 2) If the baud rate is changed dynamically, how would the receivers know
> the baud rate of the transmitters?

There's ways.  Any good embedded programmer should be able to figure out 
half a dozen before they even put pen to napkin.

> 3) Since the system is intended to be operable even at the lowest baud,
> why not always use the lowest baud?

If it's like ones that I've worked with, the data over the link is a 
combination of high-priority "gotta haves" like operational data, and 
lower-priority "dang this would be nice" things like diagnostics, faster 
status updates, and that sort of thing.

So the advantages of going up in speed are obvious.  For that matter, 
there may be advantages to being able to tell the a maintenance guy what 
not-quite-fast-enough speed can be achieved, so he can make an informed 
choice about what faults to look for.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html
0
Reply Tim 3/27/2011 5:51:29 PM

Tim Wescott wrote:

> It isn't that simple. CRC-16 will be able to detect _all_ 1, 2 and 3 bit
> errors, and some 4-bit errors.

I've often wondered about that statement.  Suppose
you get a 1 bit error in the message and an error
in the crc remainder that results in a "good" message?

Is there an implicit guarantee in the algorithm
that it will take more than 3 bits to "fix" the
remainder.

My apologies if this is covered in the Webb article,
running late today and don't have time to read it.
0
Reply Jim 3/27/2011 6:36:18 PM

Hi Shane,

On 3/27/2011 4:39 AM, Shane williams wrote:
> On Mar 27, 11:53 pm, Michael Karas<mka...@carousel-design.com>  wrote:

[8<]

>> Are you getting some of the errors in your transmission path
>> due to distortion of the RS485 waveform due to non-equal propagation
>> delays through your logic on the "0"-->"1" transition versus the
>> one from "1"-->"0"? Common problem with certain optocouplers. ;-)

And some devices degrade with age.

> Thanks. I'm trying to figure out whether it's possible/ viable to
> dynamically determine the fastest baud rate we can use by checking the
> error rate.  The cable lengths and types of wire used when our systems
> are installed varies and I was hoping we could automatically work out
> what speed a particular connection can run at.  The spec for the
> MOC5007 Optocoupler seems a bit vague so I was trying to find a better
> one.

<frown>  You might, instead, want to think of this from the
"engineering" standpoint -- what are the likely/expected
*sources* of your errors?  I.e., how is the channel typically [1]
going to be corrupted.

First, think of the medium by itself.  With a given type of
cable (including "crap" that someone might fabricate on-the-spot),
how will your system likely behave (waveform distortions,
sampling skew in the receiver, component aging, etc.).

Then, think of the likely noise sources that might interfere
with your signal.  Is there some synchronous source nearby that
will periodically be bouncing your grounds or coupling directly
to your signals (i.e., will your cable be routed alongside
something noisey)?  [this assumes you have identified any
sources of "noise" that your system imposes on *itself*!  e.g.,
each time *you* command the VFD to engage the 10HP motor you
might notice glitches in your data...]

Then, think of what aperiodic/transient/"random" disturbances
are likely to be encountered in your environment.

In each case, think of the impact on the data stream AT ALL
THE DATA RATES YOU *MIGHT* BE LIKELY TO HAVE IN USE.  Are
you likely to see lots of dispersed single bit errors?  How
far apart (temporally) are they likely to be (far enough
that two different code words can cover them?)  Or, will
you encounter a burst of consecutive errors?  (if so, how
wide?)

Finally, regarding your hinted algorithm:  note that the
time constant you use in determining when/if to change rates
has to take into consideration these observations on the likely
environment.  E.g., if errors are likely to creep in "slowly"
(beginning with low probability, low error rate), then you
can "notice" the errors and start anticipating more (?) and
back off on your data rate -- hopefully, quick enough that the
error rate doesn't grow to exceed your *continued* ability
for your CRC to remain effective.

OTOH, if the error rate ever "grows" (instantaneously) faster
than your CRC is able to detect the increased error rate,
you run the risk of accepting bad data "as good".  And, sitting
"fat, happy and glorious" all the while you are doing so!
(i.e., sort of like a PLL locking on a harmonic outside the
intended capture range).

Can you, instead, figure out how to *ensure* a reliable channel?

--------------------
[1] and *atypically*!
0
Reply D 3/27/2011 7:29:50 PM

On Mar 28, 6:51=A0am, Tim Wescott <t...@seemywebsite.com> wrote:
> On 03/27/2011 07:21 AM, Vladimir Vassilevsky wrote:
>
>
>
> > Shane williams wrote:
>
> >> Thanks. I'm trying to figure out whether it's possible/ viable to
> >> dynamically determine the fastest baud rate we can use by checking the
> >> error rate.
>
> > Yes. But:
>
> > 1) It is easier, faster and more reliable to evaluate the channel by
> > transmitting a known pseudo-random test pattern rather then the actual
> > data.
>
> I've done this -- and it is.
>
> > 2) If the baud rate is changed dynamically, how would the receivers kno=
w
> > the baud rate of the transmitters?
>
> There's ways. =A0Any good embedded programmer should be able to figure ou=
t
> half a dozen before they even put pen to napkin.
>
> > 3) Since the system is intended to be operable even at the lowest baud,
> > why not always use the lowest baud?
>
> If it's like ones that I've worked with, the data over the link is a
> combination of high-priority "gotta haves" like operational data, and
> lower-priority "dang this would be nice" things like diagnostics, faster
> status updates, and that sort of thing.
>
> So the advantages of going up in speed are obvious. =A0For that matter,
> there may be advantages to being able to tell the a maintenance guy what
> not-quite-fast-enough speed can be achieved, so he can make an informed
> choice about what faults to look for.
>

Didn't think about that.

You're exactly right about the need for speed.  Background data is
fine at the slower rate but when an operator is doing something on the
system we want the response to be faster than the slowest rate gives
us.

Switching rates seems fairly easy to me.  One end tells the other what
rate they're switching to, the other acknowledges, if no ack then
retry a couple of times.  If one end switches and the other doesn't,
after one second or so of no communication, they both switch back to
the slowest rate.

0
Reply Shane 3/27/2011 10:01:53 PM

On Mar 28, 8:29=A0am, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi Shane,
>
> On 3/27/2011 4:39 AM, Shane williams wrote:
>
> > On Mar 27, 11:53 pm, Michael Karas<mka...@carousel-design.com> =A0wrote=
:
>
> [8<]
>
> >> Are you getting some of the errors in your transmission path
> >> due to distortion of the RS485 waveform due to non-equal propagation
> >> delays through your logic on the "0"-->"1" transition versus the
> >> one from "1"-->"0"? Common problem with certain optocouplers. ;-)
>
> And some devices degrade with age.
>
> > Thanks. I'm trying to figure out whether it's possible/ viable to
> > dynamically determine the fastest baud rate we can use by checking the
> > error rate. =A0The cable lengths and types of wire used when our system=
s
> > are installed varies and I was hoping we could automatically work out
> > what speed a particular connection can run at. =A0The spec for the
> > MOC5007 Optocoupler seems a bit vague so I was trying to find a better
> > one.
>
> <frown> =A0You might, instead, want to think of this from the
> "engineering" standpoint -- what are the likely/expected
> *sources* of your errors? =A0I.e., how is the channel typically [1]
> going to be corrupted.
>
> First, think of the medium by itself. =A0With a given type of
> cable (including "crap" that someone might fabricate on-the-spot),
> how will your system likely behave (waveform distortions,
> sampling skew in the receiver, component aging, etc.).
>
> Then, think of the likely noise sources that might interfere
> with your signal. =A0Is there some synchronous source nearby that
> will periodically be bouncing your grounds or coupling directly
> to your signals (i.e., will your cable be routed alongside
> something noisey)? =A0[this assumes you have identified any
> sources of "noise" that your system imposes on *itself*! =A0e.g.,
> each time *you* command the VFD to engage the 10HP motor you
> might notice glitches in your data...]
>
> Then, think of what aperiodic/transient/"random" disturbances
> are likely to be encountered in your environment.
>
> In each case, think of the impact on the data stream AT ALL
> THE DATA RATES YOU *MIGHT* BE LIKELY TO HAVE IN USE. =A0Are
> you likely to see lots of dispersed single bit errors? =A0How
> far apart (temporally) are they likely to be (far enough
> that two different code words can cover them?) =A0Or, will
> you encounter a burst of consecutive errors? =A0(if so, how
> wide?)
>
> Finally, regarding your hinted algorithm: =A0note that the
> time constant you use in determining when/if to change rates
> has to take into consideration these observations on the likely
> environment. =A0E.g., if errors are likely to creep in "slowly"
> (beginning with low probability, low error rate), then you
> can "notice" the errors and start anticipating more (?) and
> back off on your data rate -- hopefully, quick enough that the
> error rate doesn't grow to exceed your *continued* ability
> for your CRC to remain effective.
>
> OTOH, if the error rate ever "grows" (instantaneously) faster
> than your CRC is able to detect the increased error rate,
> you run the risk of accepting bad data "as good". =A0And, sitting
> "fat, happy and glorious" all the while you are doing so!
> (i.e., sort of like a PLL locking on a harmonic outside the
> intended capture range).
>
> Can you, instead, figure out how to *ensure* a reliable channel?
>
> --------------------
> [1] and *atypically*!

Interesting points, thanks.  The environment can be just about
anything.  I suspect we'll back off the baud rate fairly quickly once
errors start occurring.  I'm also thinking we could raise the security
for some of the critical messages, like double transmissions perhaps.


0
Reply Shane 3/27/2011 10:31:34 PM

On Mar 28, 3:46=A0am, Rafael Deliano <Rafael_DelianoENTFER...@t-
online.de> wrote:
> > I'm trying to figure out whether it's possible/ viable to
> > dynamically determine the fastest baud rate we can use by checking the
> > error rate.
>
> Packet length for a 16 bit CRCs should be limited to 4kbyte.
> =A0 =A0The CRC doesn t know at which baud rate the packets are coming.
> Your assumption ( which may well be true ) is that the error-pattern
> shifts from singlebit to bursts and more errors will go undetected.
> But the detected error rate would go way up too. By counting
> retransmissions now and later at the higher baud rate one could
> easily see if that has happened and switch to a 24 bit or 32 bit CRC.
>
> MfG JRD

Packet length is max 270 bytes / 2700 bits or so but critical messages
are more like about 50 bytes / 500 bits.
0
Reply Shane 3/27/2011 10:34:53 PM

On 03/27/2011 11:36 AM, Jim Stewart wrote:
> Tim Wescott wrote:
>
>> It isn't that simple. CRC-16 will be able to detect _all_ 1, 2 and 3 bit
>> errors, and some 4-bit errors.
>
> I've often wondered about that statement. Suppose
> you get a 1 bit error in the message and an error
> in the crc remainder that results in a "good" message?
>
> Is there an implicit guarantee in the algorithm
> that it will take more than 3 bits to "fix" the
> remainder.
>
> My apologies if this is covered in the Webb article,
> running late today and don't have time to read it.

One bit error in the message and one in the CRC counts as two bit 
errors.  It's the number of bit errors in _both_ the CRC _and_ the 
message that you need to count.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html
0
Reply Tim 3/27/2011 10:49:22 PM

Hi Shane,

On 3/27/2011 3:31 PM, Shane williams wrote:

> Interesting points, thanks.  The environment can be just about
> anything.  I suspect we'll back off the baud rate fairly quickly once
> errors start occurring.  I'm also thinking we could raise the security
> for some of the critical messages, like double transmissions perhaps.

Consider carefully what sort of "encoding" you use.  E.g.,
"double transmissions" might add lots of overhead for very
little gain in "reliability".

You can [1] also consider dynamically varying the data rate in
a TDM sort of scheme -- so, in this timeslot, you run at a slow,
reliable rate transfering critical messages; then, in this other
timeslot, you run "flat out" pushing data that would be "nice to
have" but not critical to proper operation.

Again, you really need to look hard at what you are likely to
encounter "in the field" before you can come to any expectations
regarding likely performance.  I've seen (and have been guilty,
myself!) some pretty mangled patches to deployed systems "just
to get by until the FedEx replacement parts delivery arrives".
If you *might* be running on the bleeding edge in some configuration,
the last thing you want is a guy in the field to *think* things
are OK when, in fact, they are not.

[e.g., you might want to add a switch that forces communications
to stay in the "degraded/secure" mode if you suspect you are not
catching all the communication errors in a particular installation...
because the tech made a cable out of "bell wire"]

----------------------------

[1] Depends on what is on the other end of the link, of course.
But, if you can autobaud dynamically, then that suggests you have
some control over both ends of the link!
0
Reply D 3/27/2011 11:22:15 PM

In article <14a46afd-a5a4-4d6b-be24-de552c289027
@l14g2000pre.googlegroups.com>, shane.2471958@gmail.com says...
> Subject: Re: error detection rate with crc-16 CCITT
> Date: Sun, 27 Mar 2011 15:01:53 -0700 (PDT)
> From: Shane williams <shane.2471958@gmail.com>
> Newsgroups: comp.arch.embedded
> 
> On Mar 28, 6:51�am, Tim Wescott <t...@seemywebsite.com> wrote:
> > On 03/27/2011 07:21 AM, Vladimir Vassilevsky wrote:
> >
> >
> >
> > > Shane williams wrote:
> >
> > >> Thanks. I'm trying to figure out whether it's possible/ viable to
> > >> dynamically determine the fastest baud rate we can use by checking the
> > >> error rate.
> >
> > > Yes. But:
> >
> > > 1) It is easier, faster and more reliable to evaluate the channel by
> > > transmitting a known pseudo-random test pattern rather then the actual
> > > data.
> >
> > I've done this -- and it is.
> >
> > > 2) If the baud rate is changed dynamically, how would the receivers know
> > > the baud rate of the transmitters?
> >
> > There's ways. �Any good embedded programmer should be able to figure out
> > half a dozen before they even put pen to napkin.
> >
> > > 3) Since the system is intended to be operable even at the lowest baud,
> > > why not always use the lowest baud?
> >
> > If it's like ones that I've worked with, the data over the link is a
> > combination of high-priority "gotta haves" like operational data, and
> > lower-priority "dang this would be nice" things like diagnostics, faster
> > status updates, and that sort of thing.
> >
> > So the advantages of going up in speed are obvious. �For that matter,
> > there may be advantages to being able to tell the a maintenance guy what
> > not-quite-fast-enough speed can be achieved, so he can make an informed
> > choice about what faults to look for.
> >
> 
> Didn't think about that.
> 
> You're exactly right about the need for speed.  Background data is
> fine at the slower rate but when an operator is doing something on the
> system we want the response to be faster than the slowest rate gives
> us.

> Switching rates seems fairly easy to me.  One end tells the other what
> rate they're switching to, the other acknowledges, if no ack then
> retry a couple of times.  If one end switches and the other doesn't,
> after one second or so of no communication, they both switch back to
> the slowest rate.

Have you thought about about simple heartbeat loopback data packets?

If you get to the situation where too many error bits cannot be detected
how will you know everything is alright.

Every once in a while send a small varying pseudo-random data packet at 
highest speed to various nodes, which will just echo the packet back if 
decoded correctly. Once received check every bit is correct.

This way you are less likely to have false-positives about data being
correct when it is not.

You can change speeds and retry on failures. If you don't see an echo 
back, you have more problems to resolve.

Sending larger data packets at higher speeds helps to thoroughly check
data integrity and more chnce of more data switching frequencies that
may or may not be affected.

-- 
Paul Carpenter          | paul@pcserviceselectronics.co.uk
<http://www.pcserviceselectronics.co.uk/>    PC Services
<http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font
<http://www.gnuh8.org.uk/>  GNU H8 - compiler & Renesas H8/H8S/H8 Tiny
<http://www.badweb.org.uk/> For those web sites you hate
0
Reply Paul 3/27/2011 11:22:59 PM

On Mar 28, 12:22=A0pm, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi Shane,
>
> On 3/27/2011 3:31 PM, Shane williams wrote:
>
> > Interesting points, thanks. =A0The environment can be just about
> > anything. =A0I suspect we'll back off the baud rate fairly quickly once
> > errors start occurring. =A0I'm also thinking we could raise the securit=
y
> > for some of the critical messages, like double transmissions perhaps.
>
> Consider carefully what sort of "encoding" you use. =A0E.g.,
> "double transmissions" might add lots of overhead for very
> little gain in "reliability".
>
> You can [1] also consider dynamically varying the data rate in
> a TDM sort of scheme -- so, in this timeslot, you run at a slow,
> reliable rate transfering critical messages; then, in this other
> timeslot, you run "flat out" pushing data that would be "nice to
> have" but not critical to proper operation.
>
> Again, you really need to look hard at what you are likely to
> encounter "in the field" before you can come to any expectations
> regarding likely performance. =A0I've seen (and have been guilty,
> myself!) some pretty mangled patches to deployed systems "just
> to get by until the FedEx replacement parts delivery arrives".
> If you *might* be running on the bleeding edge in some configuration,
> the last thing you want is a guy in the field to *think* things
> are OK when, in fact, they are not.
>
> [e.g., you might want to add a switch that forces communications
> to stay in the "degraded/secure" mode if you suspect you are not
> catching all the communication errors in a particular installation...
> because the tech made a cable out of "bell wire"]
>
> ----------------------------
>
> [1] Depends on what is on the other end of the link, of course.
> But, if you can autobaud dynamically, then that suggests you have
> some control over both ends of the link!

Yep, it's the same device at both ends.

Regarding double transmissions, what do you mean by "encoding".  We
could complement all bits in the second transmission I guess.

TDM might not be viable and probably too much hassle I suspect.  The
baud rate behavior will be user configurable with probably a system
wide switch to allow the faster baud rate.

Thanks
0
Reply Shane 3/28/2011 12:36:45 AM

On Mar 28, 12:22=A0pm, Paul <p...@pcserviceselectronics.co.uk> wrote:
> In article <14a46afd-a5a4-4d6b-be24-de552c289027
> @l14g2000pre.googlegroups.com>, shane.2471...@gmail.com says...
>
>
> Have you thought about about simple heartbeat loopback data packets?
>
> If you get to the situation where too many error bits cannot be detected
> how will you know everything is alright.
>
> Every once in a while send a small varying pseudo-random data packet at
> highest speed to various nodes, which will just echo the packet back if
> decoded correctly. Once received check every bit is correct.
>
> This way you are less likely to have false-positives about data being
> correct when it is not.
>
> You can change speeds and retry on failures. If you don't see an echo
> back, you have more problems to resolve.
>
> Sending larger data packets at higher speeds helps to thoroughly check
> data integrity and more chnce of more data switching frequencies that
> may or may not be affected.

Thanks for the idea about loop-back data packets.  That sounds useful.

The system is a ring of devices with each connection point to point
with one device at each end.

0
Reply Shane 3/28/2011 12:41:41 AM

On Sun, 27 Mar 2011 17:36:45 -0700 (PDT), Shane williams
<shane.2471958@gmail.com> wrote:

<snippety snip>
>Regarding double transmissions, what do you mean by "encoding".  We
>could complement all bits in the second transmission I guess.

One approach that I've used in the past is to require an ack/nak for
each message sent. If the ack includes the CRC portion of the message
that's being acknowledged, then a simple match by the originator against
the CRC that it sent gives pretty good confidence that the receiver got
a correct message.

The returned CRC is, of course, part of the message body that the remote
unit sends which is in its turn used to build that message's CRC.

-- 
Rich Webb     Norfolk, VA
0
Reply Rich 3/28/2011 1:22:00 AM

On 3/27/2011 5:36 PM, Shane williams wrote:
> On Mar 28, 12:22 pm, D Yuniskis<not.going.to...@seen.com>  wrote:
>> Hi Shane,
>>
>> On 3/27/2011 3:31 PM, Shane williams wrote:
>>
>>> Interesting points, thanks.  The environment can be just about
>>> anything.  I suspect we'll back off the baud rate fairly quickly once
>>> errors start occurring.  I'm also thinking we could raise the security
>>> for some of the critical messages, like double transmissions perhaps.
>>
>> Consider carefully what sort of "encoding" you use.  E.g.,
>> "double transmissions" might add lots of overhead for very
>> little gain in "reliability".
>>
>> You can [1] also consider dynamically varying the data rate in
>> a TDM sort of scheme -- so, in this timeslot, you run at a slow,
>> reliable rate transfering critical messages; then, in this other
>> timeslot, you run "flat out" pushing data that would be "nice to
>> have" but not critical to proper operation.
>>
>> Again, you really need to look hard at what you are likely to
>> encounter "in the field" before you can come to any expectations
>> regarding likely performance.  I've seen (and have been guilty,
>> myself!) some pretty mangled patches to deployed systems "just
>> to get by until the FedEx replacement parts delivery arrives".
>> If you *might* be running on the bleeding edge in some configuration,
>> the last thing you want is a guy in the field to *think* things
>> are OK when, in fact, they are not.
>>
>> [e.g., you might want to add a switch that forces communications
>> to stay in the "degraded/secure" mode if you suspect you are not
>> catching all the communication errors in a particular installation...
>> because the tech made a cable out of "bell wire"]
>>
>> ----------------------------
>>
>> [1] Depends on what is on the other end of the link, of course.
>> But, if you can autobaud dynamically, then that suggests you have
>> some control over both ends of the link!
>
> Yep, it's the same device at both ends.
>
> Regarding double transmissions, what do you mean by "encoding".  We
> could complement all bits in the second transmission I guess.

You are sending 2*n bits to encode n bits of data.
Yet, that encoding will only *detect* a single bit
error.  Won't *correct* ANY errors.  Won't *see*
(certain) two bit errors.  etc.

I.e., your choice of message encoding has lots of
overhead (twice as many bits!) but doesn't give you
a corresponding increase in "reliability".

Without understanding what sorts of errors you are likely
to encounter, it is hard to design a protocol and encoding
scheme that will be resilient to *those* errors.

> TDM might not be viable and probably too much hassle I suspect.  The
> baud rate behavior will be user configurable with probably a system
> wide switch to allow the faster baud rate.

You can also opt to run at the slower (more reliable) rate
ALL THE TIME and encode command messages more robustly than
"less important messages".  I.e., so command messages have
greater Hamming distances (require more bandwidth per bit,
so to speak) while less important messages are *compressed*
so there is more "data" per bit -- and less protection against
corrupted transmission.  As such, the compressed data appears
to have a higher bandwidth -- at reduced reliability -- even
though it is being sent over the same "bit rate" channel.
0
Reply D 3/28/2011 2:15:51 AM

On 3/27/2011 5:41 PM, Shane williams wrote:

> The system is a ring of devices with each connection point to point
> with one device at each end.

Do you *literally* mean a ring topology?  I.e., (excuse the
crappy ASCII art)


  AAAA ----> BBBB ----> CCCC ----> DDDD
  AAAA       BBBB       CCCC       DDDD
  AAAA <----------<----------<---- DDDD

So, for A to send to D, B and C act as intermediaries?

Now, hold that thought...

How does C send to A?  I.e., is the "bottom" connection
simply a pass-thru connection from the downstream node?
Or, is it an active connection (like a second comm channel)?
Asked another way, can C send to A *without* going through
D (i.e., by going through B, instead)?

Regardless...  consider that if you twiddle with the baud rate
on any link, you will either need to make sure *all* links
"simultaneously" update their baud-rates (taking into
consideration any packets "in the pipe")

-- or --

you have to provide an elastic store in each node and some
smarts to decide what data that node can *drop* (since it's
outbound connection may not? be at the same rate as it's
inbound connection)

[this last bit applies iff there is a real second channel
in each node like:

  AAAA ----> BBBB ----> CCCC ----> DDDD
  AAAA       BBBB       CCCC       DDDD
  AAAA <---- BBBB <---- CCCC <---- DDDD

0
Reply D 3/28/2011 5:23:49 AM

On Mar 27, 5:31=A0pm, Shane williams <shane.2471...@gmail.com> wrote:
> On Mar 28, 8:29=A0am, D Yuniskis <not.going.to...@seen.com> wrote:
>
>
>
>
>
> > Hi Shane,
>
> > On 3/27/2011 4:39 AM, Shane williams wrote:
>
> > > On Mar 27, 11:53 pm, Michael Karas<mka...@carousel-design.com> =A0wro=
te:
>
> > [8<]
>
> > >> Are you getting some of the errors in your transmission path
> > >> due to distortion of the RS485 waveform due to non-equal propagation
> > >> delays through your logic on the "0"-->"1" transition versus the
> > >> one from "1"-->"0"? Common problem with certain optocouplers. ;-)
>
> > And some devices degrade with age.
>
> > > Thanks. I'm trying to figure out whether it's possible/ viable to
> > > dynamically determine the fastest baud rate we can use by checking th=
e
> > > error rate. =A0The cable lengths and types of wire used when our syst=
ems
> > > are installed varies and I was hoping we could automatically work out
> > > what speed a particular connection can run at. =A0The spec for the
> > > MOC5007 Optocoupler seems a bit vague so I was trying to find a bette=
r
> > > one.
>
> > <frown> =A0You might, instead, want to think of this from the
> > "engineering" standpoint -- what are the likely/expected
> > *sources* of your errors? =A0I.e., how is the channel typically [1]
> > going to be corrupted.
>
> > First, think of the medium by itself. =A0With a given type of
> > cable (including "crap" that someone might fabricate on-the-spot),
> > how will your system likely behave (waveform distortions,
> > sampling skew in the receiver, component aging, etc.).
>
> > Then, think of the likely noise sources that might interfere
> > with your signal. =A0Is there some synchronous source nearby that
> > will periodically be bouncing your grounds or coupling directly
> > to your signals (i.e., will your cable be routed alongside
> > something noisey)? =A0[this assumes you have identified any
> > sources of "noise" that your system imposes on *itself*! =A0e.g.,
> > each time *you* command the VFD to engage the 10HP motor you
> > might notice glitches in your data...]
>
> > Then, think of what aperiodic/transient/"random" disturbances
> > are likely to be encountered in your environment.
>
> > In each case, think of the impact on the data stream AT ALL
> > THE DATA RATES YOU *MIGHT* BE LIKELY TO HAVE IN USE. =A0Are
> > you likely to see lots of dispersed single bit errors? =A0How
> > far apart (temporally) are they likely to be (far enough
> > that two different code words can cover them?) =A0Or, will
> > you encounter a burst of consecutive errors? =A0(if so, how
> > wide?)
>
> > Finally, regarding your hinted algorithm: =A0note that the
> > time constant you use in determining when/if to change rates
> > has to take into consideration these observations on the likely
> > environment. =A0E.g., if errors are likely to creep in "slowly"
> > (beginning with low probability, low error rate), then you
> > can "notice" the errors and start anticipating more (?) and
> > back off on your data rate -- hopefully, quick enough that the
> > error rate doesn't grow to exceed your *continued* ability
> > for your CRC to remain effective.
>
> > OTOH, if the error rate ever "grows" (instantaneously) faster
> > than your CRC is able to detect the increased error rate,
> > you run the risk of accepting bad data "as good". =A0And, sitting
> > "fat, happy and glorious" all the while you are doing so!
> > (i.e., sort of like a PLL locking on a harmonic outside the
> > intended capture range).
>
> > Can you, instead, figure out how to *ensure* a reliable channel?
>
> > --------------------
> > [1] and *atypically*!
>
> Interesting points, thanks. =A0The environment can be just about
> anything. =A0I suspect we'll back off the baud rate fairly quickly once
> errors start occurring. =A0I'm also thinking we could raise the security
> for some of the critical messages, like double transmissions perhaps.


Use a proper forward error correction scheme.  You'll be able to
monitor the increase in error rate while still getting most packets
through.  A Reed-Solomon code will allow you to (for example) add 20
bytes to a 235 byte message and correct any 10 bad bytes (and all
detect all bad messages with no more than 19 bad bytes).  If you're
getting a bit corrected every few dozen packets, it's probably safe to
bump up the data rate.  If it's a couple dozen bits in every packet,
it's time to back off.  In fact, this can substantially increase your
effective data rate, as you can continue to run in the presence of a
moderate number of errors (disk drives, for instance, run well into
that region, and it's relatively rare these days that *any* sector
actually reads "clean," and a very heavy duty ECC code is used to
compensate).

You can also improve things by using a multi level scheme, which could
be a simple duplication (think disk RAID-1), or some combined code
over multiple packets (simply parity like RAID-5, or Reed-Solomon-ish
like RAID-6), which would provide added recovery, at the expense of
added latency (mainly in the presence of errors).  Since you mentioned
that you have at least two classes of data (critical and nice to
have), apply the second level of FEC to just the critical data (after
protecting each packet with an appropriate RS code), and even a
substantial spike in error rate, you're likely to get the critical
stuff through.
0
Reply robertwessel2 3/28/2011 5:54:59 AM

On Mar 28, 6:23=A0pm, D Yuniskis <not.going.to...@seen.com> wrote:
> On 3/27/2011 5:41 PM, Shane williams wrote:
>
> > The system is a ring of devices with each connection point to point
> > with one device at each end.
>
> Do you *literally* mean a ring topology? =A0I.e., (excuse the
> crappy ASCII art)
>
> =A0 AAAA ----> BBBB ----> CCCC ----> DDDD
> =A0 AAAA =A0 =A0 =A0 BBBB =A0 =A0 =A0 CCCC =A0 =A0 =A0 DDDD
> =A0 AAAA <----------<----------<---- DDDD
>
> So, for A to send to D, B and C act as intermediaries?
>
> Now, hold that thought...
>
> How does C send to A? =A0I.e., is the "bottom" connection
> simply a pass-thru connection from the downstream node?
> Or, is it an active connection (like a second comm channel)?
> Asked another way, can C send to A *without* going through
> D (i.e., by going through B, instead)?
>
> Regardless... =A0consider that if you twiddle with the baud rate
> on any link, you will either need to make sure *all* links
> "simultaneously" update their baud-rates (taking into
> consideration any packets "in the pipe")
>
> -- or --
>
> you have to provide an elastic store in each node and some
> smarts to decide what data that node can *drop* (since it's
> outbound connection may not? be at the same rate as it's
> inbound connection)
>
> [this last bit applies iff there is a real second channel
> in each node like:
>
> =A0 AAAA ----> BBBB ----> CCCC ----> DDDD
> =A0 AAAA =A0 =A0 =A0 BBBB =A0 =A0 =A0 CCCC =A0 =A0 =A0 DDDD
> =A0 AAAA <---- BBBB <---- CCCC <---- DDDD

It's physically a 2 wire half duplex ring with messages going in both
directions around the ring to provide redundancy.  Say 8 nodes 1 to
8.  Node 1 talks to nodes 2 and 8, node 2 talks to nodes 1 and 3 etc.

However we may end up with 3 ports per node making it a collection of
rings or a mesh.  The loading at the slowest baud rate is approx 10%
for 64 nodes.  If we decide to allow mixed baud rates, each node will
have the ability to tell its adjacent nodes to slow down when its
message queue gets to a certain level, allowing it to cope with a
brief surge in messages.  Also to help the propagation delay, we might
split long messages to a max of 50 bytes or so.

0
Reply Shane 3/28/2011 10:28:58 AM

Shane williams wrote:

> It's physically a 2 wire half duplex ring with messages going in both
> directions around the ring to provide redundancy.  Say 8 nodes 1 to
> 8.  Node 1 talks to nodes 2 and 8, node 2 talks to nodes 1 and 3 etc.
> 
> However we may end up with 3 ports per node making it a collection of
> rings or a mesh.  The loading at the slowest baud rate is approx 10%
> for 64 nodes.  If we decide to allow mixed baud rates, each node will
> have the ability to tell its adjacent nodes to slow down when its
> message queue gets to a certain level, allowing it to cope with a
> brief surge in messages.  Also to help the propagation delay, we might
> split long messages to a max of 50 bytes or so.
> 

I think ddcmp dates to the mid 70's and was originally designed by
digital / dec for their decnet network, then updated later for ethernet.
Fwir, it is a connection oriented protocol implemented as a multilayer
stack, that provided reliable comms between nodes. It had error
detection, retries etc much as tcp/ip does. It's a long time since I
used decnet, but I know that there are ddcmp protocol specs and other
docs out there which describe the whole stack. There is, I think, even a
linux decnet protocol driver which might be a usefull bit of code to
look at, even if the complete stack is too much for the application...

Regards,

Chris

0
Reply ChrisQ 3/28/2011 11:04:58 AM


Shane williams wrote:
> On Mar 28, 6:51 am, Tim Wescott <t...@seemywebsite.com> wrote:
> 
>>On 03/27/2011 07:21 AM, Vladimir Vassilevsky wrote:
>>
>>
>>
>>
>>>Shane williams wrote:
>>
>>>>Thanks. I'm trying to figure out whether it's possible/ viable to
>>>>dynamically determine the fastest baud rate we can use by checking the
>>>>error rate.
>>
>>>Yes. But:
>>
>>>1) It is easier, faster and more reliable to evaluate the channel by
>>>transmitting a known pseudo-random test pattern rather then the actual
>>>data.
>>
>>I've done this -- and it is.
>>
>>
>>>2) If the baud rate is changed dynamically, how would the receivers know
>>>the baud rate of the transmitters?
>>
>>There's ways.  Any good embedded programmer should be able to figure out
>>half a dozen before they even put pen to napkin.
>>
>>
>>>3) Since the system is intended to be operable even at the lowest baud,
>>>why not always use the lowest baud?
>>
>>If it's like ones that I've worked with, the data over the link is a
>>combination of high-priority "gotta haves" like operational data, and
>>lower-priority "dang this would be nice" things like diagnostics, faster
>>status updates, and that sort of thing.
>>
>>So the advantages of going up in speed are obvious.  For that matter,
>>there may be advantages to being able to tell the a maintenance guy what
>>not-quite-fast-enough speed can be achieved, so he can make an informed
>>choice about what faults to look for.
>>
> 
> 
> Didn't think about that.
> 
> You're exactly right about the need for speed.  Background data is
> fine at the slower rate but when an operator is doing something on the
> system we want the response to be faster than the slowest rate gives
> us.
> 
> Switching rates seems fairly easy to me.  One end tells the other what
> rate they're switching to, the other acknowledges, if no ack then
> retry a couple of times.  If one end switches and the other doesn't,
> after one second or so of no communication, they both switch back to
> the slowest rate.

Some people are just looking to find trouble for their ass. Perhaps, 
they are masochists; they like to be fucked. Good luck with that; there 
are almost limitless possibilities for the protocol malfunctioning.

VLV




0
Reply Vladimir 3/28/2011 3:04:29 PM

On 2011-03-28, ChrisQ <meru@devnull.com> wrote:
>
> I think ddcmp dates to the mid 70's and was originally designed by
> digital / dec for their decnet network, then updated later for ethernet.

Yes, it was before the VAX days. (VMS is a part of my day job, so I am
familiar with DEC history.)

> Fwir, it is a connection oriented protocol implemented as a multilayer
> stack, that provided reliable comms between nodes. It had error
> detection, retries etc much as tcp/ip does. It's a long time since I
> used decnet, but I know that there are ddcmp protocol specs and other
> docs out there which describe the whole stack. There is, I think, even a
> linux decnet protocol driver which might be a usefull bit of code to
> look at, even if the complete stack is too much for the application...
>

The Phase IV documents can be found at:

http://linux-decnet.sourceforge.net/docs/doc_index.html

I don't know what the current status of the DECnet code in Linux is however
as I never use it.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
0
Reply Simon 3/28/2011 5:17:47 PM

Simon Clubley wrote:

> 
> The Phase IV documents can be found at:
> 
> http://linux-decnet.sourceforge.net/docs/doc_index.html
> 
> I don't know what the current status of the DECnet code in Linux is however
> as I never use it.
> 
> Simon.
> 

A dec document describing the low level protocol, crc, retries and
states etc can be found at:

http://decnet.ipv7.net/docs/dundas/aa-d599a-tc.pdf

I had a previous life working with dec kit and thought I recognised the
name, perhaps from the vms group, but were you by any chance a
contractor in the mid to late 80's ?...

Regards,

Chris
0
Reply ChrisQ 3/28/2011 8:51:41 PM

On Mar 28, 6:54=A0pm, "robertwess...@yahoo.com"
<robertwess...@yahoo.com> wrote:
> On Mar 27, 5:31=A0pm, Shane williams <shane.2471...@gmail.com> wrote:
>
> > Interesting points, thanks. =A0The environment can be just about
> > anything. =A0I suspect we'll back off the baud rate fairly quickly once
> > errors start occurring. =A0I'm also thinking we could raise the securit=
y
> > for some of the critical messages, like double transmissions perhaps.
>
> Use a proper forward error correction scheme. =A0You'll be able to
> monitor the increase in error rate while still getting most packets
> through. =A0A Reed-Solomon code will allow you to (for example) add 20
> bytes to a 235 byte message and correct any 10 bad bytes (and all
> detect all bad messages with no more than 19 bad bytes). =A0If you're
> getting a bit corrected every few dozen packets, it's probably safe to
> bump up the data rate. =A0If it's a couple dozen bits in every packet,
> it's time to back off. =A0In fact, this can substantially increase your
> effective data rate, as you can continue to run in the presence of a
> moderate number of errors (disk drives, for instance, run well into
> that region, and it's relatively rare these days that *any* sector
> actually reads "clean," and a very heavy duty ECC code is used to
> compensate).
>
> You can also improve things by using a multi level scheme, which could
> be a simple duplication (think disk RAID-1), or some combined code
> over multiple packets (simply parity like RAID-5, or Reed-Solomon-ish
> like RAID-6), which would provide added recovery, at the expense of
> added latency (mainly in the presence of errors). =A0Since you mentioned
> that you have at least two classes of data (critical and nice to
> have), apply the second level of FEC to just the critical data (after
> protecting each packet with an appropriate RS code), and even a
> substantial spike in error rate, you're likely to get the critical
> stuff through.

Thanks.  Error correction sounds like it would be too CPU intensive.
I'd be happy just to detect errors.

Do you have any idea how many bytes we would have to add to a 60 byte
message to detect 19 bad bytes or less and how CPU intensive it is?
0
Reply shane.2471958 (30) 3/28/2011 10:48:53 PM

On Mar 29, 4:04=A0am, Vladimir Vassilevsky <nos...@nowhere.com> wrote:
> Shane williams wrote:
> > On Mar 28, 6:51 am, Tim Wescott <t...@seemywebsite.com> wrote:
>
> > Switching rates seems fairly easy to me. =A0One end tells the other wha=
t
> > rate they're switching to, the other acknowledges, if no ack then
> > retry a couple of times. =A0If one end switches and the other doesn't,
> > after one second or so of no communication, they both switch back to
> > the slowest rate.
>
> Some people are just looking to find trouble for their ass. Perhaps,
> they are masochists; they like to be fucked. Good luck with that; there
> are almost limitless possibilities for the protocol malfunctioning.
>

Can you describe just one possibility?


0
Reply shane.2471958 (30) 3/28/2011 11:19:26 PM

On 2011-03-28, ChrisQ <meru@devnull.com> wrote:
> Simon Clubley wrote:
>> 
>> The Phase IV documents can be found at:
>> 
>> http://linux-decnet.sourceforge.net/docs/doc_index.html
>> 
>> I don't know what the current status of the DECnet code in Linux is however
>> as I never use it.
>> 
>
> A dec document describing the low level protocol, crc, retries and
> states etc can be found at:
>
> http://decnet.ipv7.net/docs/dundas/aa-d599a-tc.pdf
>

I'll have a read through it thanks; it's been a long time since I really
did anything with DECnet Phase IV.

> I had a previous life working with dec kit and thought I recognised the
> name, perhaps from the vms group, but were you by any chance a
> contractor in the mid to late 80's ?...
>

No, but late 80s/early 90s was the start of my career and I was writing
code for the PDP-11 before moving onto VAX then Alpha and taking in a
range of other environments along the way.

It's quite possible you ran across me as part of that, especially if
you attended the annual DECUS conferences.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
0
Reply clubley (1187) 3/28/2011 11:25:20 PM

On Mar 28, 5:48=A0pm, Shane williams <shane.2471...@gmail.com> wrote:
> On Mar 28, 6:54=A0pm, "robertwess...@yahoo.com"
>
>
>
>
>
> <robertwess...@yahoo.com> wrote:
> > On Mar 27, 5:31=A0pm, Shane williams <shane.2471...@gmail.com> wrote:
>
> > > Interesting points, thanks. =A0The environment can be just about
> > > anything. =A0I suspect we'll back off the baud rate fairly quickly on=
ce
> > > errors start occurring. =A0I'm also thinking we could raise the secur=
ity
> > > for some of the critical messages, like double transmissions perhaps.
>
> > Use a proper forward error correction scheme. =A0You'll be able to
> > monitor the increase in error rate while still getting most packets
> > through. =A0A Reed-Solomon code will allow you to (for example) add 20
> > bytes to a 235 byte message and correct any 10 bad bytes (and all
> > detect all bad messages with no more than 19 bad bytes). =A0If you're
> > getting a bit corrected every few dozen packets, it's probably safe to
> > bump up the data rate. =A0If it's a couple dozen bits in every packet,
> > it's time to back off. =A0In fact, this can substantially increase your
> > effective data rate, as you can continue to run in the presence of a
> > moderate number of errors (disk drives, for instance, run well into
> > that region, and it's relatively rare these days that *any* sector
> > actually reads "clean," and a very heavy duty ECC code is used to
> > compensate).
>
> > You can also improve things by using a multi level scheme, which could
> > be a simple duplication (think disk RAID-1), or some combined code
> > over multiple packets (simply parity like RAID-5, or Reed-Solomon-ish
> > like RAID-6), which would provide added recovery, at the expense of
> > added latency (mainly in the presence of errors). =A0Since you mentione=
d
> > that you have at least two classes of data (critical and nice to
> > have), apply the second level of FEC to just the critical data (after
> > protecting each packet with an appropriate RS code), and even a
> > substantial spike in error rate, you're likely to get the critical
> > stuff through.
>
> Thanks. =A0Error correction sounds like it would be too CPU intensive.
> I'd be happy just to detect errors.
>
> Do you have any idea how many bytes we would have to add to a 60 byte
> message to detect 19 bad bytes or less and how CPU intensive it is?


To detect (but not correct) all errors of 152 (19*8) or fewer, you'd
have to add at least 152 bits of check code.  If you're only looking
to detect errors occurring in no more than 19 bytes of the message, it
would be a bit less, but not hugely so.  Remember that to detect n
bits of error, the block has to be different enough from any other
valid block that errors in n bit do not make it look like a different
valid block.

If you're asking about a RS code as I described above, the short
message really doesn't buy you anything, since you need about twice
the number of bits worth of RS symbols as the number of error bits you
hope to correct.

RS is moderately computationally intensive, but that clearly depends
on your data rates, and that hardware you're running on.  In fact it
has a worse reputation that it really deserves.  But to toss some
numbers out there, a decent implementation in C, on a 1GHz x86, for a
RS(255, 239) encoding (239 bytes of data, plus 16 bytes of check code,
or a bit weaker than what was discussed above =96 that=92s a commonly used
code in broadcasting, so is well studied and you should be able to
find plenty of benchmarks and samples and whatnot), should come in at
100-200Mb/s for encoding (or 10K-20K cycles per block), about half
that for decoding blocks without errors, and about a fifth the
encoding rate for decoding blocks with the maximum correctable amounts
of error.  Shorter blocks require less work to process, but it's sub-
linear, so your net data rate for a fixed CPU load will go down as
block size decreases.  And note that 255 bytes is the longest possible
block for RS with 8 bit symbols.

On something like an ARM 9, quadruple the cycle counts.
0
Reply robertwessel2 (1339) 3/29/2011 3:54:28 AM


Shane williams wrote:

> On Mar 29, 4:04 am, Vladimir Vassilevsky <nos...@nowhere.com> wrote:
> 
>>Shane williams wrote:
>>
>>>On Mar 28, 6:51 am, Tim Wescott <t...@seemywebsite.com> wrote:
>>
>>>Switching rates seems fairly easy to me.  One end tells the other what
>>>rate they're switching to, the other acknowledges, if no ack then
>>>retry a couple of times.  If one end switches and the other doesn't,
>>>after one second or so of no communication, they both switch back to
>>>the slowest rate.
>>
>>Some people are just looking to find trouble for their ass. Perhaps,
>>they are masochists; they like to be fucked. Good luck with that; there
>>are almost limitless possibilities for the protocol malfunctioning.
>>
> 
> 
> Can you describe just one possibility?

For starters: for the efficient operation, the transmit and the receive 
chains should be buffered. In order to change the rate, you have to make 
sure the buffers are flushed. If you are planning switching the rate 
back and forth, that would incur significant penalty in efficiency.

VLV
0
Reply nospam (2546) 3/29/2011 5:03:01 AM

On Mar 29, 6:03=A0pm, Vladimir Vassilevsky <nos...@nowhere.com> wrote:
>
> For starters: for the efficient operation, the transmit and the receive
> chains should be buffered. In order to change the rate, you have to make
> sure the buffers are flushed. If you are planning switching the rate
> back and forth, that would incur significant penalty in efficiency.

The hardware handles the sending of a whole message at a time.  The
software gives the hardware a whole message to send and gets told when
a whole message has been received.  This is done by an interrupt
routine.  The interrupt routine will decide when to switch baud rates
or check when the other end is asking to switch so the only penalty is
a couple of extra messages and a short delay if the switch works.  If
the switch doesn't work there's a slightly bigger penalty but we won't
be switching often enough for it to matter.

0
Reply shane.2471958 (30) 3/29/2011 5:49:57 AM

On Mon, 28 Mar 2011 10:04:29 -0500, Vladimir Vassilevsky
<nospam@nowhere.com> wrote:

>
>
>Shane williams wrote:
>> On Mar 28, 6:51 am, Tim Wescott <t...@seemywebsite.com> wrote:
>> 
>>>On 03/27/2011 07:21 AM, Vladimir Vassilevsky wrote:
>>>
>>>
>>>
>>>
>>>>Shane williams wrote:
>>>
>>>>>Thanks. I'm trying to figure out whether it's possible/ viable to
>>>>>dynamically determine the fastest baud rate we can use by checking the
>>>>>error rate.
>>>
>>>>Yes. But:
>>>
>>>>1) It is easier, faster and more reliable to evaluate the channel by
>>>>transmitting a known pseudo-random test pattern rather then the actual
>>>>data.
>>>
>>>I've done this -- and it is.
>>>
>>>
>>>>2) If the baud rate is changed dynamically, how would the receivers know
>>>>the baud rate of the transmitters?
>>>
>>>There's ways.  Any good embedded programmer should be able to figure out
>>>half a dozen before they even put pen to napkin.
>>>
>>>
>>>>3) Since the system is intended to be operable even at the lowest baud,
>>>>why not always use the lowest baud?
>>>
>>>If it's like ones that I've worked with, the data over the link is a
>>>combination of high-priority "gotta haves" like operational data, and
>>>lower-priority "dang this would be nice" things like diagnostics, faster
>>>status updates, and that sort of thing.
>>>
>>>So the advantages of going up in speed are obvious.  For that matter,
>>>there may be advantages to being able to tell the a maintenance guy what
>>>not-quite-fast-enough speed can be achieved, so he can make an informed
>>>choice about what faults to look for.
>>>
>> 
>> 
>> Didn't think about that.
>> 
>> You're exactly right about the need for speed.  Background data is
>> fine at the slower rate but when an operator is doing something on the
>> system we want the response to be faster than the slowest rate gives
>> us.
>> 
>> Switching rates seems fairly easy to me.  One end tells the other what
>> rate they're switching to, the other acknowledges, if no ack then
>> retry a couple of times.  If one end switches and the other doesn't,
>> after one second or so of no communication, they both switch back to
>> the slowest rate.
>
>Some people are just looking to find trouble for their ass. Perhaps, 
>they are masochists; they like to be fucked. Good luck with that; there 
>are almost limitless possibilities for the protocol malfunctioning.


In the CAN environment (at least when using some sensible controllers
like SJA1000 listen only mode) autobauding is trivial.

Adding some Modbus RTU slaves to some existing RS-485 Modbus network
only requires listening for the traffic for a second or two.

0
Reply upsidedown (128) 3/29/2011 8:43:18 AM

Shane williams <shane.2471958@gmail.com> wrote:

>Packet length is max 270 bytes / 2700 bits or so but critical messages
>are more like about 50 bytes / 500 bits.

As someone else has previously noted you can get CRC performance data
from this paper:
http://www.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf

You are interested in CRC CCITT-16   x^16 + x^12 + x^5 + 1
At 2700 or fewer bits you get Hamming Distance of 4 which means detects
all 1, 2, & 3 bit errors but not all 4 bit errors. Because of the
factorization of this polynomial it also detects all odd numbers of bit
errrors, but at the price that all even numbers of bit errors are twice
as likely to escape detection (roughly 1 - 2^-15  probability of
detection). It also detects any burst error that corrupts any
combination of 16 or fewer bits in a row, assuming you get endian-ness
and order of the CRC right.

At 500 bits the properties are pretty much the same.

To clarify an earlier discussion point, the number of errors you are
guaranteed to detect depends on the polynomial and the message length.
So you can't just say any particular polynomial detects all x-number of
bit errors without giving a maximum length. In the case of CCITT-16 you
you get this performance (HD=4) up to 32751 data bits (not counting CRC
bits) and after that it only detects odd number of bit errors (2-bit
errors will be undetected if they are more than about 32 K bits apart).

-- Phil Koopman
http://betterembsw.blogspot.com/

Phil Koopman -- koopman@cmu.edu -- http://www.ece.cmu.edu/~koopman
0
Reply koopman1 (11) 3/29/2011 12:16:33 PM


upsidedown@downunder.com wrote:


> In the CAN environment (at least when using some sensible controllers
> like SJA1000 listen only mode) autobauding is trivial.

The whole point of using CAN is the hardware arbitration and collision 
avoidance of the bus. This won't work with autobauding.


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com


0
Reply nospam (2546) 3/29/2011 1:15:17 PM


Shane williams wrote:

> On Mar 29, 6:03 pm, Vladimir Vassilevsky <nos...@nowhere.com> wrote:
> 
>>For starters: for the efficient operation, the transmit and the receive
>>chains should be buffered. In order to change the rate, you have to make
>>sure the buffers are flushed. If you are planning switching the rate
>>back and forth, that would incur significant penalty in efficiency.
> 
> The hardware handles the sending of a whole message at a time.  The
> software gives the hardware a whole message to send and gets told when
> a whole message has been received.  This is done by an interrupt
> routine.  The interrupt routine will decide when to switch baud rates
> or check when the other end is asking to switch so the only penalty is
> a couple of extra messages and a short delay if the switch works.  If
> the switch doesn't work there's a slightly bigger penalty but we won't
> be switching often enough for it to matter.

* Having protocol logic in the Tx/Rx ISRs is a bad idea already.
* How a hub type device would work?
* What if a node somehow missed the correct baud rate, receiving garbage 
  and responding to it?
* How would you verify, troubleshoot and prove the operation?

VLV
0
Reply nospam (2546) 3/29/2011 1:49:21 PM

Vladimir Vassilevsky wrote:
> 

> 
> * Having protocol logic in the Tx/Rx ISRs is a bad idea already.

Absolutely. Lower level comms drivers should always be transparent to
data. You build protocol layers on top of that.

Maybe i'm missing something, but I don't understand what all the fuss is
about in this thread. All this kind of thing has been done to death in
the past. It would help the op to have a look at one of the original
ddcmp protocol specs to see how it should be done, with message flow,
state transitions etc. Why keep on reinventing the wheel ?...

Regards,

Chris
0
Reply meru (356) 3/29/2011 2:20:46 PM

Simon Clubley wrote:

> 
> No, but late 80s/early 90s was the start of my career and I was writing
> code for the PDP-11 before moving onto VAX then Alpha and taking in a
> range of other environments along the way.
> 
> It's quite possible you ran across me as part of that, especially if
> you attended the annual DECUS conferences.
> 
> Simon.
> 

I just thought the name sounded familiar. I too spent several years
doing systems engineering, programming macro and C on pdp and vax. Never
attended decus meetings,but was a member and still have some tapes. Worked
at dec park, racal, smiths and others during the good old 80's...

Regards,

Chris
0
Reply meru (356) 3/29/2011 2:28:56 PM

Hi Shane,

On 3/28/2011 3:28 AM, Shane williams wrote:
> On Mar 28, 6:23 pm, D Yuniskis<not.going.to...@seen.com>  wrote:
>> Regardless...  consider that if you twiddle with the baud rate
>> on any link, you will either need to make sure *all* links
>> "simultaneously" update their baud-rates (taking into
>> consideration any packets "in the pipe")
>>
>> -- or --
>>
>> you have to provide an elastic store in each node and some
>> smarts to decide what data that node can *drop* (since it's
>> outbound connection may not? be at the same rate as it's
>> inbound connection)
>>
>> [this last bit applies iff there is a real second channel
>> in each node like:
>>
>>    AAAA ----> BBBB ----> CCCC ----> DDDD
>>    AAAA       BBBB       CCCC       DDDD
>>    AAAA <---- BBBB <---- CCCC <---- DDDD
>
> It's physically a 2 wire half duplex ring with messages going in both
> directions around the ring to provide redundancy.  Say 8 nodes 1 to
> 8.  Node 1 talks to nodes 2 and 8, node 2 talks to nodes 1 and 3 etc.

Is this a synchronous protocol?  Or, are you just using a pair
of UARTs on each device to implement the CW & CCW links?

If that's the case, you have to remember to include all the
"overhead bit(-time)s" in your evaluation of the error rate
and your performance thereunder.

E.g., a start bit error is considerably different than a
*data* bit error (think about it).

> However we may end up with 3 ports per node making it a collection of
> rings or a mesh.  The loading at the slowest baud rate is approx 10%

[scratches head] then why are you worrying about running at
a higher rate?  Latency might be a reason -- assuming you
don't circulate messages effectively as they pass *through*
a node.  But, recall that you only have to pass through
32 nodes, worst case, to get *a* copy of a message to any
other node...

> for 64 nodes.  If we decide to allow mixed baud rates, each node will
> have the ability to tell its adjacent nodes to slow down when its
> message queue gets to a certain level, allowing it to cope with a
> brief surge in messages.

Depending on how you chose to allocate the Tx&Rx devices in each
link -- and, whether or not your baudrate generator allows
the Tx and Rx to run at different baudrates -- you have to:
* make sure your Tx FIFO (hardware and software) is empty before
   changing Tx baudrate
* make sure your "neighbor" isn't sending data to you when you
   change your Rx baudrate (!)

Consider that a link (a connection to *a* neighbor) that "gives you
problems" will probably (?) cause problems in all communications
with that neighbor (Tx & Rx).  So, you probably want to tie the
Tx and Rx channels of *one* device to that neighbor (vs. splitting
the Rx with the upstream and Tx with the downstream IN A GIVEN RING)

[this may seem intuitive -- or not!  For the *other* case, see end]

Now, when you change the Rx baudrate for the upstream CW neighbor,
you are also (?) changing the Tx baudrate for the downstream CCW
neighbor (the "neighbor" is the same physical node in each case).
Also, you have to consider if you will be changing the baudrate
for the "other" ring simultaneously (so you have to consider the
RTT in your switching calculations).

Chances are (bet dollars to donuts?), the two rings are in different
points of their message exchange (since the distance from message
originator to that particular node is different in the CW ring
vs. the CCW ring).  I.e., this may be a convenient time to change
the baudrate (thereby INTERRUPTING the flow of data around the ring)
for the CW ring -- but, probably *not* for the CCW ring.

[recall, changing baudrate is probably going to result in lots
of errors for the communications to/from the affected neighbor(s)]

So, you really have to wait for the entire ring to become idle
before you change baudrates -- and then must have all nodes do
so more or less concurrently (for that ring).  If you've split the
Tx and Rx like I described, then this must also happen on the
"other" ring at the same time.

Regarding the "other" way to split the Tx&Rx... have the Tx
always talk to the downstream neighbor and Rx the upstream
IN THE SAME RING.  In this case, changes to Tx+Rx baudrates
apply only to a certain ring.  So, you can change baudrate
when it is convenient (temporally) for that *ring*.

But, now the two rings are potentially operating at different
rates.  So, the "other" ring will eventually ALSO have to
have its baudrate adjusted to match (or, pass different traffic)

> Also to help the propagation delay, we might
> split long messages to a max of 50 bytes or so.

0
Reply not.going.to.be (525) 3/29/2011 3:09:43 PM

On Mar 30, 4:09=A0am, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi Shane,
>
> On 3/28/2011 3:28 AM, Shane williams wrote:
>
> > On Mar 28, 6:23 pm, D Yuniskis<not.going.to...@seen.com> =A0wrote:
> >> Regardless... =A0consider that if you twiddle with the baud rate
> >> on any link, you will either need to make sure *all* links
> >> "simultaneously" update their baud-rates (taking into
> >> consideration any packets "in the pipe")
>
> >> -- or --
>
> >> you have to provide an elastic store in each node and some
> >> smarts to decide what data that node can *drop* (since it's
> >> outbound connection may not? be at the same rate as it's
> >> inbound connection)
>
> >> [this last bit applies iff there is a real second channel
> >> in each node like:
>
> >> =A0 =A0AAAA ----> BBBB ----> CCCC ----> DDDD
> >> =A0 =A0AAAA =A0 =A0 =A0 BBBB =A0 =A0 =A0 CCCC =A0 =A0 =A0 DDDD
> >> =A0 =A0AAAA <---- BBBB <---- CCCC <---- DDDD
>
> > It's physically a 2 wire half duplex ring with messages going in both
> > directions around the ring to provide redundancy. =A0Say 8 nodes 1 to
> > 8. =A0Node 1 talks to nodes 2 and 8, node 2 talks to nodes 1 and 3 etc.
>
> Is this a synchronous protocol? =A0Or, are you just using a pair
> of UARTs on each device to implement the CW & CCW links?


Asynchronous with a pair of uarts, one for clockwise, one for counter-
clockwise.

>
> If that's the case, you have to remember to include all the
> "overhead bit(-time)s" in your evaluation of the error rate
> and your performance thereunder.
>
> E.g., a start bit error is considerably different than a
> *data* bit error (think about it).

Hmm.  I forgot about that.  A start or stop bit error means the whole
message is rejected which is good.


>
> > However we may end up with 3 ports per node making it a collection of
> > rings or a mesh. =A0The loading at the slowest baud rate is approx 10%
>
> [scratches head] then why are you worrying about running at
> a higher rate? =A0

Because not all sites can wire as a mesh.  The third port is optional
but helps the propagation delay a lot.


> Latency might be a reason -- assuming you
> don't circulate messages effectively as they pass *through*
> a node. =A0But, recall that you only have to pass through
> 32 nodes, worst case, to get *a* copy of a message to any
> other node...
>
> > for 64 nodes. =A0If we decide to allow mixed baud rates, each node will
> > have the ability to tell its adjacent nodes to slow down when its
> > message queue gets to a certain level, allowing it to cope with a
> > brief surge in messages.
>
> Depending on how you chose to allocate the Tx&Rx devices in each
> link -- and, whether or not your baudrate generator allows
> the Tx and Rx to run at different baudrates -- you have to:
> * make sure your Tx FIFO (hardware and software) is empty before
> =A0 =A0changing Tx baudrate
> * make sure your "neighbor" isn't sending data to you when you
> =A0 =A0change your Rx baudrate (!)

This is assured.  It's half duplex and the hardware sends a whole
message at a time.


>
> Consider that a link (a connection to *a* neighbor) that "gives you
> problems" will probably (?) cause problems in all communications
> with that neighbor (Tx & Rx). =A0So, you probably want to tie the
> Tx and Rx channels of *one* device to that neighbor (vs. splitting
> the Rx with the upstream and Tx with the downstream IN A GIVEN RING)

Not sure I follow but a single uart does both the tx and rx to the
same neighbor.

>
> [this may seem intuitive -- or not! =A0For the *other* case, see end]
>
> Now, when you change the Rx baudrate for the upstream CW neighbor,
> you are also (?) changing the Tx baudrate for the downstream CCW
> neighbor (the "neighbor" is the same physical node in each case).

Yes

> Also, you have to consider if you will be changing the baudrate
> for the "other" ring simultaneously (so you have to consider the
> RTT in your switching calculations).

What is RTT?


>
> Chances are (bet dollars to donuts?), the two rings are in different
> points of their message exchange (since the distance from message
> originator to that particular node is different in the CW ring
> vs. the CCW ring). =A0I.e., this may be a convenient time to change
> the baudrate (thereby INTERRUPTING the flow of data around the ring)
> for the CW ring -- but, probably *not* for the CCW ring.


I'm lost here.

>
> [recall, changing baudrate is probably going to result in lots
> of errors for the communications to/from the affected neighbor(s)]
>
> So, you really have to wait for the entire ring to become idle
> before you change baudrates -- and then must have all nodes do
> so more or less concurrently (for that ring). =A0If you've split the
> Tx and Rx like I described, then this must also happen on the
> "other" ring at the same time.
>
> Regarding the "other" way to split the Tx&Rx... have the Tx
> always talk to the downstream neighbor and Rx the upstream
> IN THE SAME RING. =A0In this case, changes to Tx+Rx baudrates
> apply only to a certain ring. =A0So, you can change baudrate
> when it is convenient (temporally) for that *ring*.
>
> But, now the two rings are potentially operating at different
> rates. =A0So, the "other" ring will eventually ALSO have to
> have its baudrate adjusted to match (or, pass different traffic)
>

I think there must be a misunderstanding somewhere  - not sure where.

0
Reply shane.2471958 (30) 3/30/2011 12:12:03 PM

On Mar 31, 1:12=A0am, Shane williams <shane.2471...@gmail.com> wrote:
>
> > > It's physically a 2 wire half duplex ring with messages going in both
> > > directions around the ring to provide redundancy. =A0Say 8 nodes 1 to
> > > 8. =A0Node 1 talks to nodes 2 and 8, node 2 talks to nodes 1 and 3 et=
c.
>
> > Is this a synchronous protocol? =A0Or, are you just using a pair
> > of UARTs on each device to implement the CW & CCW links?
>
> Asynchronous with a pair of uarts, one for clockwise, one for counter-
> clockwise.
>

oops, I made a mistake here  - one uart for neighbour one, another
uart for neighbour 2.  Tx to neighbour one is the CW data (say) and tx
to neighbour 2 is the CCW data.


0
Reply shane.2471958 (30) 3/30/2011 12:20:09 PM

Hi Shane,

On 3/30/2011 5:12 AM, Shane williams wrote:
> On Mar 30, 4:09 am, D Yuniskis<not.going.to...@seen.com>  wrote:

>> Is this a synchronous protocol?  Or, are you just using a pair
>> of UARTs on each device to implement the CW&  CCW links?
>
> Asynchronous with a pair of uarts, one for clockwise, one for counter-
> clockwise.

OK.  Been there, done that, T-shirt to prove it...

>> If that's the case, you have to remember to include all the
>> "overhead bit(-time)s" in your evaluation of the error rate
>> and your performance thereunder.
>>
>> E.g., a start bit error is considerably different than a
>> *data* bit error (think about it).
>
> Hmm.  I forgot about that.  A start or stop bit error means the whole
> message is rejected which is good.

My point was that if you *miss* a start bit, then you have -- at
the very least -- missed the "first" bit of the message (because,
if it was MARKING, the UART just ignored it and, if it was SPACING,
the UART thought *it* was the start bit).  If you are pushing
bytes (characters) down the wire at the maximum data rate (minimal
stop time between characters), then you run the risk of part of
the *next* character being "shifted" into this "misaligned" first
character.  I.e., it gets really difficult to figure out *if*
your code will be able to detect an error (because the received
byte "looks wrong") or if, BY CHANCE, the bit patterns can conspire
to look like a valid "something else".

>>> However we may end up with 3 ports per node making it a collection of
>>> rings or a mesh.  The loading at the slowest baud rate is approx 10%
>>
>> [scratches head] then why are you worrying about running at
>> a higher rate?
>
> Because not all sites can wire as a mesh.  The third port is optional
> but helps the propagation delay a lot.

Sorry, the subject wasn't clear in my question  <:-(
I mean, if you were to stick with the slowest rate, your
"10%" number *suggests* you have lots of margin -- why
push for a higher rate with the potential for more
problems?

>> Latency might be a reason -- assuming you
>> don't circulate messages effectively as they pass *through*
>> a node.  But, recall that you only have to pass through
>> 32 nodes, worst case, to get *a* copy of a message to any
>> other node...
>>
>>> for 64 nodes.  If we decide to allow mixed baud rates, each node will
>>> have the ability to tell its adjacent nodes to slow down when its
>>> message queue gets to a certain level, allowing it to cope with a
>>> brief surge in messages.
>>
>> Depending on how you chose to allocate the Tx&Rx devices in each
>> link -- and, whether or not your baudrate generator allows
>> the Tx and Rx to run at different baudrates -- you have to:
>> * make sure your Tx FIFO (hardware and software) is empty before
>>     changing Tx baudrate
>> * make sure your "neighbor" isn't sending data to you when you
>>     change your Rx baudrate (!)
>
> This is assured.  It's half duplex and the hardware sends a whole
> message at a time.

So, for each ring, you WON'T receive a message until you have
transmitted any previous message?  Alternatively, you won't
transmit a message until your receiver is finished?

What prevents two messages from being "in a ring" at the same
time (by accident)?  I.e., without violating the above, it
seems possible that node 18 can be sending to node 19 (while
19 is NOT sending to 20 and 17 is not sending to 18) at the
same time that node 3 is sending to node 4 (while neither 2
nor 4 are actively transmitting).

Since this *seems* possible, how can you be sure one message
doesn't get delayed slightly so that the second message ends
up catching up to it?  (i.e., node 23 has no way of knowing
that node 24 is transmitting to 25 so 23 *could* start sending
a message to 24 that 24 fails to notice -- in whole or in
part -- because 24 is preoccupied with its outbound message)

>> Consider that a link (a connection to *a* neighbor) that "gives you
>> problems" will probably (?) cause problems in all communications
>> with that neighbor (Tx&  Rx).  So, you probably want to tie the
>> Tx and Rx channels of *one* device to that neighbor (vs. splitting
>> the Rx with the upstream and Tx with the downstream IN A GIVEN RING)
>
> Not sure I follow but a single uart does both the tx and rx to the
> same neighbor.
>
>> [this may seem intuitive -- or not!  For the *other* case, see end]
>
>> Now, when you change the Rx baudrate for the upstream CW neighbor,
>> you are also (?) changing the Tx baudrate for the downstream CCW
>> neighbor (the "neighbor" is the same physical node in each case).
>
> Yes
>
>> Also, you have to consider if you will be changing the baudrate
>> for the "other" ring simultaneously (so you have to consider the
>> RTT in your switching calculations).
>
> What is RTT?

Round Trip Time (sorry :< )  I.e., you (each of your nodes) has
to be aware of the time it takes a message to (hopefully) make
it around the ring.

>> Chances are (bet dollars to donuts?), the two rings are in different
>> points of their message exchange (since the distance from message
>> originator to that particular node is different in the CW ring
>> vs. the CCW ring).  I.e., this may be a convenient time to change
>> the baudrate (thereby INTERRUPTING the flow of data around the ring)
>> for the CW ring -- but, probably *not* for the CCW ring.
>
> I'm lost here.

Number the nodes 1 - 10 (sequentially).
The CW node has 1 sending to 2, 2 sending to 3, ... 10 sending to 1.
The CW node has 10 sending to 9, 9 sending to 8, ... 1 sending to 10.
The nodes operate concurrently.

So, assume 7 originates a message -- destined for 3.  In the CW ring,
it is routed as 7, 8, 9, 10, 1, 2, 3.  In the CCW ring, it is routed
(simultaneously) as 7, 6, 5, 4, 3.

*If* it progresses node to node at the exact same rates in each
ring (this isn't guaranteed but "close enough for gummit work"),
then it arrives in 8 & 6 at the same time, 9 & 5, 10 & 4, 1 & 3,
2 & 2 (though different "rings"), 3 & 1, etc. (note I have assumed,
here, that it continues around until reaching it's originator...
but, that's not important).

Now, at node 9, if the CW ring decides that the baudrate needs to be
changed and it thinks "now is a good time to do so" (because it has
*just* passed it's CW message on to node 10), that action effectively
interrupts any traffic in the CW ring (until the other nodes make
the similar baudrate adjustment in the CW direction).

But, there is a message circulating in the CCW ring -- it was just
transmitted from node 5 to 4 (while 9 was sending to 10).  It will
eventually be routed to node 9 as it continues it's way around the
CCW ring.  But, *it* is moving at the original baudrate (in the CCW
ring) while node 9 is now operating at the *new* baudrate (in the
CW ring).  So, any new traffic in the CW ring will run around
that ring at a different rate than the CCW traffic.  If you only
allow one message to be active in each ring at any given time, then
this will "resolve itself" one RTT later.  But, if the "other"
ring never decides to change baudrates... ?

And, if it *does* change baudrates at the same time as the "first"
ring, then you have to wait for the CW message to have been
completely propagated *and* the CCW message as well before making
the change.  I.e., you have to let both rings go idle before
risking the switch (or, take considerable care to ensure that
a switch doesn't happen DOWNstream of a circulating message)

>> [recall, changing baudrate is probably going to result in lots
>> of errors for the communications to/from the affected neighbor(s)]
>>
>> So, you really have to wait for the entire ring to become idle
>> before you change baudrates -- and then must have all nodes do
>> so more or less concurrently (for that ring).  If you've split the
>> Tx and Rx like I described, then this must also happen on the
>> "other" ring at the same time.
>>
>> Regarding the "other" way to split the Tx&Rx... have the Tx
>> always talk to the downstream neighbor and Rx the upstream
>> IN THE SAME RING.  In this case, changes to Tx+Rx baudrates
>> apply only to a certain ring.  So, you can change baudrate
>> when it is convenient (temporally) for that *ring*.
>>
>> But, now the two rings are potentially operating at different
>> rates.  So, the "other" ring will eventually ALSO have to
>> have its baudrate adjusted to match (or, pass different traffic)
>
> I think there must be a misunderstanding somewhere  - not sure where.

You can wire two UARTs to give you two rings in TWO DIFFERENT WAYS.
Look at a segment of the ring with three nodes:

   ------> 1 AAAA 1 --------> 1 BBBB 1 --------> 1 CCCC 1 ------->
             AAAA               BBBB               CCCC
   <------ 2 AAAA 2 <-------< 2 BBBB 2 <-------- 2 CCCC 2 <-------

vs.

   ------> 1 AAAA 2 --------> 1 BBBB 2 --------> 1 CCCC 2 ------->
             AAAA               BBBB               CCCC
   <------ 1 AAAA 2 <-------< 1 BBBB 2 <-------- 1 CCCC 2 <-------

where the numbers identify the UARTs associated with each signal.

[assume tx and rx baudrates are driven by the same baudrate generator
so there is a BRG1 and BRG2 in each node]

In the first case, when you change the baudrate of a UART at some
particular node, the baudrate for that segment in *the* ring that
the UART services (left-to-right ring vs right-to-left ring) changes.
So, you must change *after* you have finished transmitting and you
will no longer be able to receive until the node upstream from you
also changes baudrate.

In the second case, when you change the baudrate of a UART at some
particular node, the baudrate for all communications with that
particular neighbor (to the left or to the right) changes.  So,
*both* rings are "broken" until that neighbor makes the comparable
change.

Look at each scenario and its consequences while messages are
circulating (in both rings!).  Changing data rates can be a very
disruptive thing as it forces the ring(s) to be emptied; some
minimum guaranteed quiescent period to provide a safety factor
(that no messages are still in transit); the actual change
to be effected; a quiescent period to ensure all nodes are
at the new speed; *then* you can start up again.

While it sounds "child-like", you might find making a drawing
and moving some coins (tokens) around the rings as if they were
messages.  It helps to picture what changes to the rings'
operation you can make and *when*.

Either try NOT to change baudrates *or* change them at times
that can be determined a priori.
0
Reply not.going.to.be (525) 3/30/2011 5:14:52 PM

On Mar 31, 6:14=A0am, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi Shane,
>
> On 3/30/2011 5:12 AM, Shane williams wrote:
>
> > On Mar 30, 4:09 am, D Yuniskis<not.going.to...@seen.com> =A0wrote:
> >> Is this a synchronous protocol? =A0Or, are you just using a pair
> >> of UARTs on each device to implement the CW& =A0CCW links?
>
> > Asynchronous with a pair of uarts, one for clockwise, one for counter-
> > clockwise.
>
> OK. =A0Been there, done that, T-shirt to prove it...

Hi, I'm out of time today.  I'll get back to this tomorrow.
Thanks.

0
Reply shane.2471958 (30) 3/31/2011 12:56:51 AM

On Mar 31, 6:14=A0am, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi Shane,
>
> On 3/30/2011 5:12 AM, Shane williams wrote:
>
>
> >> E.g., a start bit error is considerably different than a
> >> *data* bit error (think about it).
>
> > Hmm. =A0I forgot about that. =A0A start or stop bit error means the who=
le
> > message is rejected which is good.
>
> My point was that if you *miss* a start bit, then you have -- at
> the very least -- missed the "first" bit of the message (because,
> if it was MARKING, the UART just ignored it and, if it was SPACING,
> the UART thought *it* was the start bit). =A0If you are pushing
> bytes (characters) down the wire at the maximum data rate (minimal
> stop time between characters), then you run the risk of part of
> the *next* character being "shifted" into this "misaligned" first
> character. =A0I.e., it gets really difficult to figure out *if*
> your code will be able to detect an error (because the received
> byte "looks wrong") or if, BY CHANCE, the bit patterns can conspire
> to look like a valid "something else".

Hmm, ok, if the byte count goes wrong as well I guess it could - I
didn't think of that.  The ddcmp protocol actually has a 10 byte
header (can't remember if I mentioned this) with a separate crc for
the header.  The count byte for the data is in the header.  I suspect
the chance of it morphing into something valid would be pretty low in
our case - e.g. one particular byte must always have the value 0x01.

So the detection of 1,2 and 3 bit errors by crc16-ccitt doesn't allow
for start bit and stop bit errors?  I never thought of that either.


>
> >>> However we may end up with 3 ports per node making it a collection of
> >>> rings or a mesh. =A0The loading at the slowest baud rate is approx 10=
%
>
> >> [scratches head] then why are you worrying about running at
> >> a higher rate?
>
> > Because not all sites can wire as a mesh. =A0The third port is optional
> > but helps the propagation delay a lot.
>
> Sorry, the subject wasn't clear in my question =A0<:-(
> I mean, if you were to stick with the slowest rate, your
> "10%" number *suggests* you have lots of margin -- why
> push for a higher rate with the potential for more
> problems?


To get faster request/ response - a shorter propagation delay.


>
> >> Latency might be a reason -- assuming you
> >> don't circulate messages effectively as they pass *through*
> >> a node. =A0But, recall that you only have to pass through
> >> 32 nodes, worst case, to get *a* copy of a message to any
> >> other node...
>
> >>> for 64 nodes. =A0If we decide to allow mixed baud rates, each node wi=
ll
> >>> have the ability to tell its adjacent nodes to slow down when its
> >>> message queue gets to a certain level, allowing it to cope with a
> >>> brief surge in messages.
>
> >> Depending on how you chose to allocate the Tx&Rx devices in each
> >> link -- and, whether or not your baudrate generator allows
> >> the Tx and Rx to run at different baudrates -- you have to:
> >> * make sure your Tx FIFO (hardware and software) is empty before
> >> =A0 =A0 changing Tx baudrate
> >> * make sure your "neighbor" isn't sending data to you when you
> >> =A0 =A0 change your Rx baudrate (!)
>
> > This is assured. =A0It's half duplex and the hardware sends a whole
> > message at a time.
>
> So, for each ring, you WON'T receive a message until you have
> transmitted any previous message? =A0Alternatively, you won't
> transmit a message until your receiver is finished?

This is true for each uart.


>
> What prevents two messages from being "in a ring" at the same
> time (by accident)? =A0I.e., without violating the above, it
> seems possible that node 18 can be sending to node 19 (while
> 19 is NOT sending to 20 and 17 is not sending to 18) at the
> same time that node 3 is sending to node 4 (while neither 2
> nor 4 are actively transmitting).

I don't follow this.  It's not a bus.  18 and 19 can talk to each
other and no-one else hears.


>
> Since this *seems* possible, how can you be sure one message
> doesn't get delayed slightly so that the second message ends
> up catching up to it? =A0(i.e., node 23 has no way of knowing
> that node 24 is transmitting to 25 so 23 *could* start sending
> a message to 24 that 24 fails to notice -- in whole or in
> part -- because 24 is preoccupied with its outbound message)


I have a feeling there's a misunderstanding here - not sure what
though.


>
> >> Consider that a link (a connection to *a* neighbor) that "gives you
> >> problems" will probably (?) cause problems in all communications
> >> with that neighbor (Tx& =A0Rx). =A0So, you probably want to tie the
> >> Tx and Rx channels of *one* device to that neighbor (vs. splitting
> >> the Rx with the upstream and Tx with the downstream IN A GIVEN RING)
>
> > Not sure I follow but a single uart does both the tx and rx to the
> > same neighbor.
>
> >> [this may seem intuitive -- or not! =A0For the *other* case, see end]
>
> >> Now, when you change the Rx baudrate for the upstream CW neighbor,
> >> you are also (?) changing the Tx baudrate for the downstream CCW
> >> neighbor (the "neighbor" is the same physical node in each case).
>
> > Yes
>
> >> Also, you have to consider if you will be changing the baudrate
> >> for the "other" ring simultaneously (so you have to consider the
> >> RTT in your switching calculations).
>
> > What is RTT?
>
> Round Trip Time (sorry :< ) =A0I.e., you (each of your nodes) has
> to be aware of the time it takes a message to (hopefully) make
> it around the ring.
>
> >> Chances are (bet dollars to donuts?), the two rings are in different
> >> points of their message exchange (since the distance from message
> >> originator to that particular node is different in the CW ring
> >> vs. the CCW ring). =A0I.e., this may be a convenient time to change
> >> the baudrate (thereby INTERRUPTING the flow of data around the ring)
> >> for the CW ring -- but, probably *not* for the CCW ring.
>
> > I'm lost here.
>
> Number the nodes 1 - 10 (sequentially).
> The CW node has 1 sending to 2, 2 sending to 3, ... 10 sending to 1.
> The CW node has 10 sending to 9, 9 sending to 8, ... 1 sending to 10.
> The nodes operate concurrently.


Yes, they do.

>
> So, assume 7 originates a message -- destined for 3. =A0In the CW ring,
> it is routed as 7, 8, 9, 10, 1, 2, 3. =A0In the CCW ring, it is routed
> (simultaneously) as 7, 6, 5, 4, 3.
>
> *If* it progresses node to node at the exact same rates in each
> ring (this isn't guaranteed but "close enough for gummit work"),
> then it arrives in 8 & 6 at the same time, 9 & 5, 10 & 4, 1 & 3,
> 2 & 2 (though different "rings"), 3 & 1, etc. (note I have assumed,
> here, that it continues around until reaching it's originator...
> but, that's not important).

ok  - it actually dies at around about the 2&2 , 3&1 stage


>
> Now, at node 9, if the CW ring decides that the baudrate needs to be
> changed and it thinks "now is a good time to do so" (because it has
> *just* passed it's CW message on to node 10), that action effectively
> interrupts any traffic in the CW ring (until the other nodes make
> the similar baudrate adjustment in the CW direction).


No, the baud rate between any two nodes is independent of any other
two nodes.  I'm missing something here.


>
> But, there is a message circulating in the CCW ring -- it was just
> transmitted from node 5 to 4 (while 9 was sending to 10). =A0It will
> eventually be routed to node 9 as it continues it's way around the
> CCW ring. =A0But, *it* is moving at the original baudrate (in the CCW
> ring) while node 9 is now operating at the *new* baudrate (in the
> CW ring). =A0So, any new traffic in the CW ring will run around
> that ring at a different rate than the CCW traffic. =A0If you only
> allow one message to be active in each ring at any given time, then
> this will "resolve itself" one RTT later. =A0But, if the "other"
> ring never decides to change baudrates... ?
>
> And, if it *does* change baudrates at the same time as the "first"
> ring, then you have to wait for the CW message to have been
> completely propagated *and* the CCW message as well before making
> the change. =A0I.e., you have to let both rings go idle before
> risking the switch (or, take considerable care to ensure that
> a switch doesn't happen DOWNstream of a circulating message)
>
> >> [recall, changing baudrate is probably going to result in lots
> >> of errors for the communications to/from the affected neighbor(s)]
>
> >> So, you really have to wait for the entire ring to become idle
> >> before you change baudrates -- and then must have all nodes do
> >> so more or less concurrently (for that ring). =A0If you've split the
> >> Tx and Rx like I described, then this must also happen on the
> >> "other" ring at the same time.
>
> >> Regarding the "other" way to split the Tx&Rx... have the Tx
> >> always talk to the downstream neighbor and Rx the upstream
> >> IN THE SAME RING. =A0In this case, changes to Tx+Rx baudrates
> >> apply only to a certain ring. =A0So, you can change baudrate
> >> when it is convenient (temporally) for that *ring*.
>
> >> But, now the two rings are potentially operating at different
> >> rates. =A0So, the "other" ring will eventually ALSO have to
> >> have its baudrate adjusted to match (or, pass different traffic)
>
> > I think there must be a misunderstanding somewhere =A0- not sure where.
>
> You can wire two UARTs to give you two rings in TWO DIFFERENT WAYS.
> Look at a segment of the ring with three nodes:
>
> =A0 =A0------> 1 AAAA 1 --------> 1 BBBB 1 --------> 1 CCCC 1 ------->
> =A0 =A0 =A0 =A0 =A0 =A0 =A0AAAA =A0 =A0 =A0 =A0 =A0 =A0 =A0 BBBB =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 CCCC
> =A0 =A0<------ 2 AAAA 2 <-------< 2 BBBB 2 <-------- 2 CCCC 2 <-------
>
> vs.
>
> =A0 =A0------> 1 AAAA 2 --------> 1 BBBB 2 --------> 1 CCCC 2 ------->
> =A0 =A0 =A0 =A0 =A0 =A0 =A0AAAA =A0 =A0 =A0 =A0 =A0 =A0 =A0 BBBB =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 CCCC
> =A0 =A0<------ 1 AAAA 2 <-------< 1 BBBB 2 <-------- 1 CCCC 2 <-------
>
> where the numbers identify the UARTs associated with each signal.


We do the second case, except sometimes they mis-wire it so that uart
2 on B connects to uart 2 on C when it should connect to uart 1 on C -
but this doesn't matter (much) currently.


>
> [assume tx and rx baudrates are driven by the same baudrate generator
> so there is a BRG1 and BRG2 in each node]

Yes, there is.


>
> In the first case, when you change the baudrate of a UART at some
> particular node, the baudrate for that segment in *the* ring that
> the UART services (left-to-right ring vs right-to-left ring) changes.
> So, you must change *after* you have finished transmitting and you
> will no longer be able to receive until the node upstream from you
> also changes baudrate.
>
> In the second case, when you change the baudrate of a UART at some
> particular node, the baudrate for all communications with that
> particular neighbor (to the left or to the right) changes. =A0So,
> *both* rings are "broken" until that neighbor makes the comparable
> change.

Yes.


>
> Look at each scenario and its consequences while messages are
> circulating (in both rings!). =A0Changing data rates can be a very
> disruptive thing as it forces the ring(s) to be emptied; some
> minimum guaranteed quiescent period to provide a safety factor
> (that no messages are still in transit); the actual change
> to be effected; a quiescent period to ensure all nodes are
> at the new speed; *then* you can start up again.


Why do all nodes have to be at the same speed?


>
> While it sounds "child-like", you might find making a drawing
> and moving some coins (tokens) around the rings as if they were
> messages. =A0It helps to picture what changes to the rings'
> operation you can make and *when*.
>
> Either try NOT to change baudrates *or* change them at times
> that can be determined a priori.

0
Reply shane.2471958 (30) 3/31/2011 12:45:59 PM

Hi Shane,

On 3/31/2011 5:45 AM, Shane williams wrote:

[much elided]

> Hmm, ok, if the byte count goes wrong as well I guess it could - I

It can change -- or *not*!  When you miss a start bit, all
bets are off because your receiver is no longer in sync with
your data.  E.g., if you miss a start bit and are transmitting
the value 0xFF (with no parity), then the line just looks
COMPLETELY IDLE for one whole character time.  OTOH, if you
miss the start bit and are sending 0x55, then you could
"receive" any of a number of different values in place
of that 55...

> didn't think of that.  The ddcmp protocol actually has a 10 byte
> header (can't remember if I mentioned this) with a separate crc for
> the header.  The count byte for the data is in the header.  I suspect
> the chance of it morphing into something valid would be pretty low in
> our case - e.g. one particular byte must always have the value 0x01.

When you are operating in an environment in which errors are
not The Exception, it is hard to make *any* assumptions.

> So the detection of 1,2 and 3 bit errors by crc16-ccitt doesn't allow
> for start bit and stop bit errors?  I never thought of that either.

Because the start and stop bits are "out of band" (unless missing
one puts them *in* band -- for another character time!)

>>>>> However we may end up with 3 ports per node making it a collection of
>>>>> rings or a mesh.  The loading at the slowest baud rate is approx 10%
>>
>> Sorry, the subject wasn't clear in my question<:-(
>> I mean, if you were to stick with the slowest rate, your
>> "10%" number *suggests* you have lots of margin -- why
>> push for a higher rate with the potential for more
>> problems?
>
> To get faster request/ response - a shorter propagation delay.

But there are other ways to do that.  E.g., passing along the
message before it is completely received, etc.

>> So, for each ring, you WON'T receive a message until you have
>> transmitted any previous message?  Alternatively, you won't
>> transmit a message until your receiver is finished?
>
> This is true for each uart.

But, is it true for each *ring*?  I.e., in A->B->C->D->E->
you have stated that D won't be receiving while it is *sending*
to E.  This implies C won't be sending (to D) in this time.
But, that doesn't preclude *B* from sending to C in this time!
I.e., can there be more than one message circulating in each
ring?  If so, and the baud rate can be changed, how can you
guarantee that messages don't start "rear ending" the ones
ahead of them?  I.e., if D downgrades its baudrate, any
message that B is sending (to C) looks like it is "speeding"...

>> What prevents two messages from being "in a ring" at the same
>> time (by accident)?  I.e., without violating the above, it
>> seems possible that node 18 can be sending to node 19 (while
>> 19 is NOT sending to 20 and 17 is not sending to 18) at the
>> same time that node 3 is sending to node 4 (while neither 2
>> nor 4 are actively transmitting).
>
> I don't follow this.  It's not a bus.  18 and 19 can talk to each
> other and no-one else hears.

See above.

>> Since this *seems* possible, how can you be sure one message
>> doesn't get delayed slightly so that the second message ends
>> up catching up to it?  (i.e., node 23 has no way of knowing
>> that node 24 is transmitting to 25 so 23 *could* start sending
>> a message to 24 that 24 fails to notice -- in whole or in
>> part -- because 24 is preoccupied with its outbound message)
>
> I have a feeling there's a misunderstanding here - not sure what
> though.

See above.  This is where the use of coins/tokens on a graph
can be useful -- you can see how the messages can potentially
interact with each other.

>> Number the nodes 1 - 10 (sequentially).
>> The CW node has 1 sending to 2, 2 sending to 3, ... 10 sending to 1.
>> The CW node has 10 sending to 9, 9 sending to 8, ... 1 sending to 10.
>> The nodes operate concurrently.
>
> Yes, they do.
>
>> So, assume 7 originates a message -- destined for 3.  In the CW ring,
>> it is routed as 7, 8, 9, 10, 1, 2, 3.  In the CCW ring, it is routed
>> (simultaneously) as 7, 6, 5, 4, 3.
>>
>> *If* it progresses node to node at the exact same rates in each
>> ring (this isn't guaranteed but "close enough for gummit work"),
>> then it arrives in 8&  6 at the same time, 9&  5, 10&  4, 1&  3,
>> 2&  2 (though different "rings"), 3&  1, etc. (note I have assumed,
>> here, that it continues around until reaching it's originator...
>> but, that's not important).
>
> ok  - it actually dies at around about the 2&2 , 3&1 stage

So there is no way of a sender knowing that a recipient got
a message intended for it?

>> Now, at node 9, if the CW ring decides that the baudrate needs to be
>> changed and it thinks "now is a good time to do so" (because it has
>> *just* passed it's CW message on to node 10), that action effectively
>> interrupts any traffic in the CW ring (until the other nodes make
>> the similar baudrate adjustment in the CW direction).
>
> No, the baud rate between any two nodes is independent of any other
> two nodes.  I'm missing something here.

When the baud rate changes between two particular (adjacent)
nodes, there is effectively a discontinuity introduced.
As you said, "the baud rate between any two nodes is independent
of any other two nodes" so other nodes can be talking at FASTER
(or slower) speeds.  The time it takes to pass a message between
any two nodes can then vary.  Time is universally shared among
all nodes.  If D->E runs at 1200 baud and all other nodes are
running at 57600 baud, then a message from A can get to B and
then to C and then ... in the time it takes D to push a
similarly sized message out to *E*.  I.e., C has no way of
knowing if D is ready to *listen* to C, yet, since C has
no way of knowing if D has finished transmitting to E.
C can't rely on the fact the time that was required for it
to receive it's incoming message (from B) would be sufficient
for D to have passed *its* message along!

>>>> Regarding the "other" way to split the Tx&Rx... have the Tx
>>>> always talk to the downstream neighbor and Rx the upstream
>>>> IN THE SAME RING.  In this case, changes to Tx+Rx baudrates
>>>> apply only to a certain ring.  So, you can change baudrate
>>>> when it is convenient (temporally) for that *ring*.
>>
>>>> But, now the two rings are potentially operating at different
>>>> rates.  So, the "other" ring will eventually ALSO have to
>>>> have its baudrate adjusted to match (or, pass different traffic)
>>
>>> I think there must be a misunderstanding somewhere  - not sure where.
>>
>> You can wire two UARTs to give you two rings in TWO DIFFERENT WAYS.
>> Look at a segment of the ring with three nodes:
>>
>>     ------> 1 AAAA 1 --------> 1 BBBB 1 --------> 1 CCCC 1 ------->
>>               AAAA               BBBB               CCCC
>>     <------ 2 AAAA 2 <-------< 2 BBBB 2 <-------- 2 CCCC 2 <-------
>>
>> vs.
>>
>>     ------> 1 AAAA 2 --------> 1 BBBB 2 --------> 1 CCCC 2 ------->
>>               AAAA               BBBB               CCCC
>>     <------ 1 AAAA 2 <-------< 1 BBBB 2 <-------- 1 CCCC 2 <-------
>>
>> where the numbers identify the UARTs associated with each signal.
>
> We do the second case, except sometimes they mis-wire it so that uart
> 2 on B connects to uart 2 on C when it should connect to uart 1 on C -
> but this doesn't matter (much) currently.

OK, so when you change the baudrate on a UART, you interrupt
traffic in *both* rings between that node and it's neighbor.
E.g., when the *one* UART that connects B to C changes baudrate,
then nothing can be flowing from B to C *or* C to B (i.e.,
*both* rings are involved)

>> [assume tx and rx baudrates are driven by the same baudrate generator
>> so there is a BRG1 and BRG2 in each node]
>
> Yes, there is.
>
>> In the first case, when you change the baudrate of a UART at some
>> particular node, the baudrate for that segment in *the* ring that
>> the UART services (left-to-right ring vs right-to-left ring) changes.
>> So, you must change *after* you have finished transmitting and you
>> will no longer be able to receive until the node upstream from you
>> also changes baudrate.
>>
>> In the second case, when you change the baudrate of a UART at some
>> particular node, the baudrate for all communications with that
>> particular neighbor (to the left or to the right) changes.  So,
>> *both* rings are "broken" until that neighbor makes the comparable
>> change.
>
> Yes.
>
>> Look at each scenario and its consequences while messages are
>> circulating (in both rings!).  Changing data rates can be a very
>> disruptive thing as it forces the ring(s) to be emptied; some
>> minimum guaranteed quiescent period to provide a safety factor
>> (that no messages are still in transit); the actual change
>> to be effected; a quiescent period to ensure all nodes are
>> at the new speed; *then* you can start up again.
>
> Why do all nodes have to be at the same speed?

They don't!  But, if they aren't, then its more difficult to
ensure that messages don't "collide".  I.e., if someone
downstream from you starts operating at a slower rate,
then messages that *you* are sending can end up "there"
before it is ready for them.

You *can* make this work.  But, there are lots of ways it
can *break*.  That was Vladimir's point (elsewhere, up-thread).
Especially if you are *expecting* to be operating (even
temporarily) at the fringe of reliable communication!

>> While it sounds "child-like", you might find making a drawing
>> and moving some coins (tokens) around the rings as if they were
>> messages.  It helps to picture what changes to the rings'
>> operation you can make and *when*.
>>
>> Either try NOT to change baudrates *or* change them at times
>> that can be determined a priori.
0
Reply not.going.to.be (525) 4/4/2011 9:13:35 PM

46 Replies
366 Views

(page loaded in 0.826 seconds)

Similiar Articles:






7/23/2012 10:27:24 PM


Reply: