well, in my mind I have been going around in circles some recently:
considering ideas, weighting ideas, ...
so, here is another idea I have come (back) to: mini-x86.
basic idea:
it would be essentially a subset of x86, designed mostly to simplify the
creation of interpreters.
it could be kept binary compatible with conventional x86, this being a
boundary restriction (no simplifications allowed beyond which would break
compatibility, and no extensions which could not be similarly handled within
'proper' x86).
if a feature is desired, and comparable functionality exists within standard
x86, the relevant features from standard x86 should be implemented instead.
so, as noted, the goal will be to keep this as a proper subset, rather than
an incompatible branch.
the goal then would be to create compilers (such as C compilers), capable of
producing code compatible with this subset (such that an interpreter need
not implement the entire x86 ISA to be able to use the compiler output).
the 'risk', however, is that these restrictions "could" hinder the ability
to produce as efficient code, so it is not required that the compiler
produce "only" valid output. similarly, such a compiler should not produce
code which could fail to operate correctly on a true processor.
major feature removals:
Ring 0 / OS-level functionality;
x87 (likely) and MMX.
however, SSE and SSE2 may be assumed present.
general restrictions:
assume a single flat 32-bit address model, and assume that no other
possibilities exist within this context;
assume that paging is either absent, or is confined to 4kB pages;
assume that segmentation does not exist (including FS and GS, alternative
means will be needed to access TLS, such as via function calls).
deprecate AF, PF, DF, and a few others (instead, the flags will be assumed
to hold 'undefined' values, except DF for which it will be assumed invalid
to have DF!=0).
'rep movsb' would be the only valid 'rep' form (the only reason even this
being kept is that it provides a means for an interpreter to do efficient
memory->memory copies).
aas/aam/aad/... enter/leave, ... will be deprecated.
certain instruction forms may be deprecated (direct inc/dec/push/pop forms,
only the ModRM forms will be valid).
the state of eflags may be relaxed following generic arithmetic
instructions, and as such only following certain defined instructions (such
as 'cmp' and 'test') will it be allowed to depend on the state of these
flags (granted, this disallows such tricks as "and eax, eax; jz foo", ...).
direct self-modifying code would be disallowed.
....
the goal would then be to define such an x86 subset, as well as which
behaviors are valid to depend on, and which are undefined.
possibly useful would be a tool to statically validate conformance to this
subset.
this would imply having as another constraint that the code be statically
decodable / verifiable.
partial semantic validation could be attempted as well (in a manner similar
to the Java VM), although in a relaxed form (I suspect static probability is
beyond the reach of x86). as such, the semantic validation would be more an
attempt to point out likely bugs or conformance issues, rather than as an
attempt to prove that the code is valid.
a specific implementation / ABI could add these restrictions:
that certain forms of annotative metadata be present;
that the stack remain aligned between a label and all jumps to said label;
areas of stack memory accessed as a given type only be accessed as this
type;
....
these restrictions could aid in the creation of static binary translation
(likely to be unworkable for free-form x86).
thoughts?...
|
|
0
|
|
|
|
Reply
|
BGB
|
10/14/2009 8:17:36 AM |
|
"BGB / cr88192" <cr88192@MUNGED.microcosmotalk.com> wrote:
> possibly useful would be a tool to statically validate conformance to this
> subset. this would imply having as another constraint that the code be
> statically decodable / verifiable.
>
> partial semantic validation could be attempted as well (in a manner
> similar to the Java VM), although in a relaxed form (I suspect static
> probability is beyond the reach of x86). as such, the semantic validation
> would be more an attempt to point out likely bugs or conformance issues,
> rather than as an attempt to prove that the code is valid.
>
>
> a specific implementation / ABI could add these restrictions: that certain
> forms of annotative metadata be present; that the stack remain aligned
> between a label and all jumps to said label; areas of stack memory
> accessed as a given type only be accessed as this type; ...
>
> these restrictions could aid in the creation of static binary translation
> (likely to be unworkable for free-form x86).
>
>
> thoughts?...
>
Take a look at NativeClient http://code.google.com/p/nativeclient/
Although the goal of NativeClient is to define a subset of x86 where we can
guarantee security, some of the ideas are similar. There is also a research
paper you can read here: http://tinyurl.com/yjq7uvm
And a video presentation:
http://www.youtube.com/watch?v=L8m9U7p_Ntk
--
-------------------------------------
taviso@sdf.lonestar.org | finger me for my pgp key.
-------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Tavis
|
10/14/2009 10:48:24 AM
|
|
"BGB / cr88192" <cr88192@MUNGED.microcosmotalk.com> wrote in message
news:4ad58920$0$4984$9a6e19ea@unlimited.newshosting.com...
> basic idea:
> it would be essentially a subset of x86, designed mostly to simplify the
> creation of interpreters.
>
> the goal then would be to create compilers (such as C compilers), capable
> of
> producing code compatible with this subset (such that an interpreter need
> not implement the entire x86 ISA to be able to use the compiler output).
I take it that the real purpose is create x86 assembler source in a simple
enough format to enable an in-memory assembler stage to be written easily?
Although (not having written any assemblers for a few years, and not for
obj/pe output) I would have thought that once you had the basic stuff
covered (labels, forward jumps, relocations, main segment types, etc), then
adding extra opcodes is routine work.
> the 'risk', however, is that these restrictions "could" hinder the ability
> to produce as efficient code, so it is not required that the compiler
> produce "only" valid output. similarly, such a compiler should not produce
> code which could fail to operate correctly on a true processor.
>
>
> major feature removals:
> x87 (likely) and MMX.
> deprecate AF, PF, DF, and a few others (instead, the flags will be assumed
> to hold 'undefined' values, except DF for which it will be assumed invalid
> to have DF!=0).
>
> 'rep movsb' would be the only valid 'rep' form (the only reason even this
> aas/aam/aad/... enter/leave, ... will be deprecated.
> certain instruction forms may be deprecated (direct inc/dec/push/pop
> forms,
> only the ModRM forms will be valid).
These are quite serious-sounding omissions! No conventional floating point
instructions, no flags, no REP MOVSD (without having measured, I have to
assume that REP MOVSD would be faster, but the compiler can't generate
this...)
And then there's code that might have inline assembler, presumably the same
restrictions must apply.
I have done my own survey, and one of my interpreters does use a subset of
only a few dozen of the hundreds of opcodes, but it does rely on high-level
code/library routines/OS code which probably do make use of much of the
rest.
If I had to leave out instructions, it would be the boring OS ones to do
with permissions and task-switching and virtual memory tables, since I never
do any of that stuff. But other people might!
What are the disadvantages of just allowing a full assembler? Ie. one
implementing just implementing the full range of opcodes the processor is
capable of, and ignoring the few dozen complicated things that all
assemblers seem to do yet I've never had a need of.
--
bartc
|
|
0
|
|
|
|
Reply
|
bartc
|
10/14/2009 12:50:59 PM
|
|
"BGB / cr88192" <cr88192@MUNGED.microcosmotalk.com> wrote in message
news:4ad58920$0$4984$9a6e19ea@unlimited.newshosting.com...
>
> so, here is another idea I have come (back) to: mini-x86.
>
> basic idea:
> it would be essentially a subset of x86, designed mostly to simplify the
> creation of interpreters.
Interpreters are very easy with x86, esp. using the older string
instructions:
mov si,code
interp:
lodsb
xchg ax,bx
xor bh,bh
shl bl,1
call [bx+table]
jmp interp
Exit equ 0
exit:
pop ax
ret
doX equ 1
dox:
mov ah,02h
mov dl,'x'
int 21h
ret
doA equ 2
doa:
mov ah,02h
mov dl,'a'
int 21h
ret
table:
dw exit
dw dox
dw doa
code:
db doA
db doX
db Exit
You can throw in an "aam 16", e.g., db 0d4h,10h, and misc. code to use
4-bits per interpreter instruction for a smaller instruction set.
More info on small x86 interpreters:
http://www.ddj.com/184408206?pgno=3
> so, as noted, the goal will be to keep this as a proper subset, rather
> than an incompatible branch.
Are you also wanting the exact same byte sequence for x86 code to work in
16-bit PM, 32-bit PM, and 64-bit PM? This is probably something you haven't
thought about.
Multi-mode code is possible on x86. If interested, then I think the primary
choice of instructions will be the older single byte instructions, and
string instructions, since most of them function identically across cpu
modes with exception for integer size... This is because they don't have
offsets encoded into the instructions. E.g., push ax, pop ax, lodsb, stosb,
movsb, xlatb, etc.
It's not possible to do memory addressing that's compatible with both 16-bit
PM and 32-bit PM. AFAIK, there are no instructions that work identically in
both modes. _But_, you can write code that works for both, like so, using
bx and edi:
;32-bit code
00000028 89FB mov ebx,edi
0000002A 8B4720 mov eax,[edi+32]
;16-bit code
00000028 89FB mov bx,di
0000002A 8B4720 mov ax,[bx+0x20]
I haven't looked at 64-bit, so I'm not sure what's needed there, or if a
3-mode solution is available.
> the goal then would be to create compilers (such as C compilers), capable
> of producing code compatible with this subset (such that an interpreter
> need not implement the entire x86 ISA to be able to use the compiler
> output).
The x86 ports of Ron Cain's SmallC uses two registers and a stack to
implement a useful subset of C. However, you should be able to define a
subset of functionality needed to implement C. I've done so. I'm not sure
how accurate it actually is. Of course, what follows ignores sizes of
types. I.e., you'll need multiple "add" routines for various integer sizes.
With your understanding of C's typesystem, you might have a better
perspective. But, here they are:
get address &
save address =
get data from address *
save data to address =
subscript operator [] *((a)+(b))
sizeof data
offsetof data
operators on data
- 6 conditions - eq, ge, g, le, l, ne
- 6 bitwise - shr, shl, and, or, xor, not
- 5 arithmetic - mul, div, add, sub, mod
- 3 logical - and, or, not
stackframe
- prolog - create local data
- epilog - destroy local data
procedures
- call address (*)()
- return from call
infinite loop
exit loop
continue loop
conditional branch
absolute branch
> the 'risk', however, is that these restrictions "could" hinder the ability
> to produce as efficient code,
Yes, if you use a two register model like SmallC, your code won't be all
that efficient. And, if you use string instructions to implement an
interpreter-like routine, then it'll be slow too.
> 'rep movsb' would be the only valid 'rep' form (the only reason even this
> being kept is that it provides a means for an interpreter to do efficient
> memory->memory copies).
> aas/aam/aad/... enter/leave, ... will be deprecated.
cbw/cdq/cwde/cwd ?
These are useful for coding in small size or working with nybbles. They are
also useful for working with signed integers, ala a C compiler, instead of
using movsx or movzx.
> the state of eflags may be relaxed following generic arithmetic
> instructions, and as such only following certain defined instructions
> (such as 'cmp' and 'test') will it be allowed to depend on the state
> of these flags (granted, this disallows such tricks as "and eax, eax;
> jz foo", ...).
You'll probably need CF and ZF at a minimum for C's arithmetic operations.
HTH,
Rod Pemberton
|
|
0
|
|
|
|
Reply
|
Rod
|
10/14/2009 3:16:38 PM
|
|
"bartc" <bartc@MUNGED.microcosmotalk.com> wrote in message
news:4ad5c933$0$5672$9a6e19ea@unlimited.newshosting.com...
>
>
> "BGB / cr88192" <cr88192@MUNGED.microcosmotalk.com> wrote in message
> news:4ad58920$0$4984$9a6e19ea@unlimited.newshosting.com...
>
>> basic idea:
>> it would be essentially a subset of x86, designed mostly to simplify the
>> creation of interpreters.
>>
>> the goal then would be to create compilers (such as C compilers), capable
>> of
>> producing code compatible with this subset (such that an interpreter need
>> not implement the entire x86 ISA to be able to use the compiler output).
>
> I take it that the real purpose is create x86 assembler source in a simple
> enough format to enable an in-memory assembler stage to be written easily?
>
not really.
I have long since had an assembler which works on (more or less) the full
ISA, and this effort would actually add effort here because I would have to
flag a number of (generally redundant) opcode forms as N/E for this mode...
> Although (not having written any assemblers for a few years, and not for
> obj/pe output) I would have thought that once you had the basic stuff
> covered (labels, forward jumps, relocations, main segment types, etc),
> then
> adding extra opcodes is routine work.
>
adding extra opcodes is easy...
but, this is not the point here.
the point would be to make the x86 itself a lot more processable, and to
simplify interpreters, translators, ...
for example, if I were simplifying an assembler, what reason would I have to
care about, for example, eflags behavior?... I would not, but I have seen
from my interpreter effort that eflags is a hassle.
even then, my interpreter still uses a much larger subset of x86 than I am
considering (I support nearly the entire basic ISA, x87, ...).
>> the 'risk', however, is that these restrictions "could" hinder the
>> ability
>> to produce as efficient code, so it is not required that the compiler
>> produce "only" valid output. similarly, such a compiler should not
>> produce
>> code which could fail to operate correctly on a true processor.
>>
>>
>> major feature removals:
>
>> x87 (likely) and MMX.
>
>> deprecate AF, PF, DF, and a few others (instead, the flags will be
>> assumed
>> to hold 'undefined' values, except DF for which it will be assumed
>> invalid
>> to have DF!=0).
>>
>> 'rep movsb' would be the only valid 'rep' form (the only reason even this
>
>> aas/aam/aad/... enter/leave, ... will be deprecated.
>> certain instruction forms may be deprecated (direct inc/dec/push/pop
>> forms,
>> only the ModRM forms will be valid).
>
> These are quite serious-sounding omissions! No conventional floating point
> instructions, no flags, no REP MOVSD (without having measured, I have to
> assume that REP MOVSD would be faster, but the compiler can't generate
> this...)
>
well, potentially.
in a partial spec listing, I have actually stripped it down fairly severely,
and only a rare few 'convinience' forms have been left.
as for 'rep movsd', I am uncertain about this one, but AFAIK in modern
processors there does not seem to be that much of a difference between 'rep
mosb' and 'rep movsd'.
however, I did notice while writing an interpreter that these rep forms do
contribute some complexity (expecially the repe and repnz forms). since to
be handled effectively, they require special treatment and simulation
(although, the 'rep movs*' forms often degenerated to direct memory->memory
copies).
I am left thinking that if "I" had the idea of turning 'rep movsb' into a
direct (internal) memory -> memory copy, possibly the processor vendors have
had this idea as well.
although, to really know would require actual testing, which has not been
done in this case.
> And then there's code that might have inline assembler, presumably the
> same
> restrictions must apply.
>
yep.
however, it can be noted that inline ASM is generally considered
non-portable, and its use is almost invariably something which would violate
either the security model or the restrictions placed on the processing
model.
as far as 'security' goes, even a lot of C code would have to be careful (I
may need to include something analogous to an "unsafe" modifier, although it
would likely operate at the compiler level rather than the source level, and
essentially serve as a cautionary hint to a translator).
> I have done my own survey, and one of my interpreters does use a subset of
> only a few dozen of the hundreds of opcodes, but it does rely on
> high-level
> code/library routines/OS code which probably do make use of much of the
> rest.
>
this would likely apply at the level of entire PE/COFF images.
code which did not follow this model could probably not be validly
statically linked.
> If I had to leave out instructions, it would be the boring OS ones to do
> with permissions and task-switching and virtual memory tables, since I
> never
> do any of that stuff. But other people might!
>
my existing interpreter mostly does this.
however, this idea is a far more severe in terms of the features it axes.
this is why I wrote that compilers would have to be written to support this
subset:
generic compiler output will almost invariably not strictly follow such a
subset.
this subset then would then not be a general replacement for
'general-purpose' x86.
for more general-purpose uses, your idea makes more sense, but I didn't say
necessarily that this was for general-purpose uses.
> What are the disadvantages of just allowing a full assembler? Ie. one
> implementing just implementing the full range of opcodes the processor is
> capable of, and ignoring the few dozen complicated things that all
> assemblers seem to do yet I've never had a need of.
>
you can't "verify" full x86 (in the JVM sense), you can't dynamically
translate it at the image level, ...
however, sufficient restriction and model formalization may be sufficient to
allow these uses...
with conventional interpretation, very likely one would be forced to rely on
a low-level simulation of the address space, specific processor-like
behaviors (such as fiddling with eflags), whereas if the model is
formalized, it can allow operating directly in a different model (say, for
example, a 64-bit system with big-endian byte ordering), with the translator
being able to know the code wont blow up due to having viewed memory as a
flat array of bytes...
this would be sort of like how, with JBC, it conceptually assumes a 32-bit
world internally, but it is not particularly difficult to translate it into
a 64-bit world (or to C, ...).
note that translation is possible with a full ISA, but it is a little more
limited (likely, it is limited to translating short spans of instructions,
as well as simulating an x86-like address space, ...).
> --
> bartc
|
|
0
|
|
|
|
Reply
|
BGB
|
10/14/2009 3:17:07 PM
|
|
"Tavis Ormandy" <taviso@MUNGED.microcosmotalk.com> wrote in message
news:4ad5ac78$0$4987$9a6e19ea@unlimited.newshosting.com...
>
> "BGB / cr88192" <cr88192@MUNGED.microcosmotalk.com> wrote:
>> possibly useful would be a tool to statically validate conformance to
>> this
>> subset. this would imply having as another constraint that the code be
>> statically decodable / verifiable.
>>
>> partial semantic validation could be attempted as well (in a manner
>> similar to the Java VM), although in a relaxed form (I suspect static
>> probability is beyond the reach of x86). as such, the semantic validation
>> would be more an attempt to point out likely bugs or conformance issues,
>> rather than as an attempt to prove that the code is valid.
>>
>>
>> a specific implementation / ABI could add these restrictions: that
>> certain
>> forms of annotative metadata be present; that the stack remain aligned
>> between a label and all jumps to said label; areas of stack memory
>> accessed as a given type only be accessed as this type; ...
>>
>> these restrictions could aid in the creation of static binary translation
>> (likely to be unworkable for free-form x86).
>>
>>
>> thoughts?...
>>
>
> Take a look at NativeClient http://code.google.com/p/nativeclient/
>
> Although the goal of NativeClient is to define a subset of x86 where we
> can
> guarantee security, some of the ideas are similar. There is also a
> research
> paper you can read here: http://tinyurl.com/yjq7uvm
>
> And a video presentation:
>
> http://www.youtube.com/watch?v=L8m9U7p_Ntk
>
interesting, although its goes and mode of operation are somewhat
different...
currently I am considering generally using 32-bit x86 on an x86-64 target,
and without having to dig into OS-level machinery (segments, ...).
hence, my goal would instead be to allow for lightweight interpretation
and/or direct JIT, rather than more heavy-weight interpretation (where many
CPU and address-space specifics also need to be simulated).
as such, it would seem that different sets of restrictions are being placed
on x86.
however, I am left thinking that my subset may be "too" restrictive for some
uses, and may partly use a layered approach.
for example:
Layer Aleph:
very restricted subset of x86 core instructions and sequences (no x87 or
SSE, ...).
Layer Bet:
adds SSE, but not x87.
Layer Gimel:
may add x87, reduces addressing/model restrictions, ...
Layer Dalet:
essentially unmodified x86 userspace.
Aleph and Bet could be to allow direct machine translation and abstract
interpretation, as well as only allowing a limited set of opcode forms and
constructions.
Gimel and Dalet would in-turn require full interpretation or virtualization
(and would, as such, run in a simulated address space, ...).
there could be an issue of B and G are used together, since B could be
translated to native, but G could not, potentially creating an "impedence
mismatch" if handled poorly.
then again, I am quickly having some doubts (all this may be a bit of extra
effort, and I may as well make everything operate in more or less the
equivalent of D).
the issue, then, is that the only real translation approach I know of for
this is to use a 'tracing' approach (basically, translating in terms of
sequences of instructions generally terminated by a 'ret' or an indirect
jump). here, recognizing sequences could help, but more as an optimization
hint (such as proving that updating eflags will never matter following a
given opcode, and so may be skipped, ...).
or such...
> --
> -------------------------------------
> taviso@sdf.lonestar.org | finger me for my pgp key.
> -------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
BGB
|
10/14/2009 4:38:38 PM
|
|
"Rod Pemberton" <do_not_have@nohavenot.cmm> wrote in message
news:4ad5eb56$0$5122$9a6e19ea@unlimited.newshosting.com...
>
> "BGB / cr88192" <cr88192@MUNGED.microcosmotalk.com> wrote in message
> news:4ad58920$0$4984$9a6e19ea@unlimited.newshosting.com...
>>
>> so, here is another idea I have come (back) to: mini-x86.
>>
>> basic idea:
>> it would be essentially a subset of x86, designed mostly to simplify the
>> creation of interpreters.
>
> Interpreters are very easy with x86, esp. using the older string
> instructions:
>
> mov si,code
> interp:
> lodsb
> xchg ax,bx
> xor bh,bh
> shl bl,1
> call [bx+table]
> jmp interp
>
<snip>
>
> You can throw in an "aam 16", e.g., db 0d4h,10h, and misc. code to use
> 4-bits per interpreter instruction for a smaller instruction set.
>
> More info on small x86 interpreters:
> http://www.ddj.com/184408206?pgno=3
>
several people have confused this, I meant interpreters "for" x86, not
interpreters "in" x86...
AKA: to simplify the process of interpreting x86, and not requiring nearly
so much processor-level simulation (for example, eliminating the need to
simulate such things as eflags, ...).
>> so, as noted, the goal will be to keep this as a proper subset, rather
>> than an incompatible branch.
>
> Are you also wanting the exact same byte sequence for x86 code to work in
> 16-bit PM, 32-bit PM, and 64-bit PM? This is probably something you
> haven't
> thought about.
>
this much is probably not actually possible in the general case.
the major reason here being that 64-bit ops generally require a REX prefix,
which would likely mess up cross 32/64 code much more than cross 16/32.
<snip, too much divergent>
>
>> 'rep movsb' would be the only valid 'rep' form (the only reason even this
>> being kept is that it provides a means for an interpreter to do efficient
>> memory->memory copies).
>
>> aas/aam/aad/... enter/leave, ... will be deprecated.
>
> cbw/cdq/cwde/cwd ?
>
> These are useful for coding in small size or working with nybbles. They
> are
> also useful for working with signed integers, ala a C compiler, instead of
> using movsx or movzx.
>
and, I left them out in favor of movsx and movzx...
here is the current list of opcode nmonics considered valid for "layer
aleph":
adc add and
bswap
call cmp cpuid
dec div
hlt
idiv imul inc intxt
j* jmp
lea
mov movsxb movsxw movzxb movzxw mul
neg nop not
or
pop push
rol ror ret
sar sbb shl shr shld shrd sub
test
xchg xor
note that, within these, many individual (typically redundant) forms have
been removed.
for example:
op eax, imm32
op r32, imm32
op rm32, imm32
have usually been reduced down to a single form.
op rm32, imm32
....
(this likely being an issue for general purpose assemblers, which would tend
to emit disallowed opcode forms...).
(although, I decided to leave single byte push/pop forms, since these are
useful in forming prologues and epilogues, which may be required in this
scheme, and it would likely be wasteful to use 2-byte forms here).
sequence restrictions will be in place, for example, adc would only be
allowed after add (essentially as a means of producing a 64-bit add), and
similar for sub/sbb.
conditional jumps would only be allowed following certain operations
(specific arithmetic ops, cmp, test, ...).
>> the state of eflags may be relaxed following generic arithmetic
>> instructions, and as such only following certain defined instructions
>> (such as 'cmp' and 'test') will it be allowed to depend on the state
>> of these flags (granted, this disallows such tricks as "and eax, eax;
>> jz foo", ...).
>
> You'll probably need CF and ZF at a minimum for C's arithmetic operations.
>
CF and ZF exist only as "inter-opcode transients", but would not exist
otherwise in this scheme.
this still allows their use for arithmetic and jumps, but not for "general
purpose" use.
the main reason for such dislike of eflags is that damn near every core
opcode in x86 fiddles with them, but very rarely are they actually useful.
> HTH,
>
>
> Rod Pemberton
>
>
|
|
0
|
|
|
|
Reply
|
BGB
|
10/14/2009 4:39:27 PM
|
|
"BGB / cr88192" <cr88192@MUNGED.microcosmotalk.com> wrote in message
news:4ad5febf$0$5666$9a6e19ea@unlimited.newshosting.com...
> several people have confused this, I meant interpreters "for" x86, not
> interpreters "in" x86...
Me too.
An interpreter to interpret binary x86 (assuming good reasons not to use an
actual x86 for this), I would have thought would be better to at least
recognise any instruction sequence, even if it doesn't execute it, rather
than only allow a subset.
Presumably you need to allow the code being interpreted to call into the OS
and external libraries for real...
But if the purpose of the subset is just an internal language to be used
between your compiler and your interpreter, then why x86 at all? Is this to
simplify then running the code directly on the processor?
And what format will the code be in? A serious, general purpose interpreter
I suppose would have to work from exe files and decoding those will be an
extra headache.
Also, what execution model would it use, would it attempt to to model a
complete x86 processor, which sounds a huge job, or just emulate the
environment of a typical exe application, which would be far simpler (no
need to worry about external hardware for example)?
I still don't see the need to disallow many of the opcodes you mentioned.
Why no FPU instructions? Don't forget you have a real FPU to help out!
Unless you're planning to run this on a totally different processor. But I
guess your compiler uses a different scheme for floating point.
I don't know, emulating even exe level programs still sounds a big, unwieldy
task, full of loose ends, and probably there are already products which
might do this. If I was doing this, I would wave goodbye to compatibility
with x86 and use a special purpose target language, chosen so that it could
map easily to actual x86 codes when the time came to execute it properly.
--
Bartc
|
|
0
|
|
|
|
Reply
|
bartc
|
10/14/2009 6:25:42 PM
|
|
Tavis Ormandy wrote:
> Take a look at NativeClient http://code.google.com/p/nativeclient/
>
> Although the goal of NativeClient is to define a subset of x86 where we=
can
> guarantee security, some of the ideas are similar. There is also a rese=
arch
> paper you can read here: http://tinyurl.com/yjq7uvm
>
> And a video presentation:
>
> http://www.youtube.com/watch?v=3DL8m9U7p_Ntk
It is of course impossible to guarantee safety here: x86 machine code is=20
most certainly Turing-complete, and that is far more than you need in=20
order to invoke G=F6del's theorem.
I.e. you can write programs which are impossible to prove either safe or=20
unsafe.
This is related to the Halting Theorem, but the NC plug-in could handle=20
that by simply terminate the client code if the user loses patience.
Terje
--=20
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
Terje
|
10/14/2009 7:25:27 PM
|
|
"Terje Mathisen" <Terje.Mathisen@MUNGED.microcosmotalk.com> wrote in message
news:4ad625a7$0$5112$9a6e19ea@unlimited.newshosting.com...
>
> Tavis Ormandy wrote:
>> Take a look at NativeClient http://code.google.com/p/nativeclient/
>>
>> Although the goal of NativeClient is to define a subset of x86 where we=
> can
>> guarantee security, some of the ideas are similar. There is also a rese=
> arch
>> paper you can read here: http://tinyurl.com/yjq7uvm
>>
>> And a video presentation:
>>
>> http://www.youtube.com/watch?v=3DL8m9U7p_Ntk
>
> It is of course impossible to guarantee safety here: x86 machine code
> is=20
> most certainly Turing-complete, and that is far more than you need in=20
> order to invoke G=F6del's theorem.
>
> I.e. you can write programs which are impossible to prove either safe
> or=20
> unsafe.
>
> This is related to the Halting Theorem, but the NC plug-in could handle=20
> that by simply terminate the client code if the user loses patience.
>
errm, this depends mostly on how one defines "safe"...
for practical definitions of "safe", it is sufficient (you can't prove, for
example, that it will run in finite time and bounded memory, but it can be
easily enough terminated if it does exceed time and memory constraints...).
similarly, another major aspect of "safety" is to validate that it does not
do anything unsafe, and anything which can't be validated is likely to be
rejected...
so, this is the thing:
if something can't be proven to be safe, assume it is unsafe.
problems then mostly go away...
> Terje
>
> --=20
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
BGB
|
10/14/2009 8:58:58 PM
|
|
Terje Mathisen <Terje.Mathisen@MUNGED.microcosmotalk.com> wrote:
>
> Tavis Ormandy wrote:
> > Take a look at NativeClient http://code.google.com/p/nativeclient/
> >
> > Although the goal of NativeClient is to define a subset of x86 where we=
> can
> > guarantee security, some of the ideas are similar. There is also a rese=
> arch
> > paper you can read here: http://tinyurl.com/yjq7uvm
> >
> > And a video presentation:
> >
> > http://www.youtube.com/watch?v=3DL8m9U7p_Ntk
>
> It is of course impossible to guarantee safety here: x86 machine code
> is=20 most certainly Turing-complete, and that is far more than you need
> in=20 order to invoke G=F6del's theorem.
>
It is most definitely possible, and in fact a working implementation exists
that you can download and examine. I think you are misunderstanding the
problem, it is not a goal to allow every safe program to execute, only a
subset of safe programs that obey the rules that have been defined that make
validation possible.
A program such as
if (0) {
evil();
}
Is obviously safe, yet it would be rejected for containing unsafe code. This
way, We do not have to 'solve' every program, only guarantee that it could
never execute unsafe code via a simple start-to-finish disassembly.
--
-------------------------------------
taviso@sdf.lonestar.org | finger me for my pgp key.
-------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Tavis
|
10/14/2009 9:00:37 PM
|
|
"bartc" <bartc@MUNGED.microcosmotalk.com> wrote in message
news:4ad617a6$0$5639$9a6e19ea@unlimited.newshosting.com...
>
>
> "BGB / cr88192" <cr88192@MUNGED.microcosmotalk.com> wrote in message
> news:4ad5febf$0$5666$9a6e19ea@unlimited.newshosting.com...
>
>> several people have confused this, I meant interpreters "for" x86, not
>> interpreters "in" x86...
>
> Me too.
>
> An interpreter to interpret binary x86 (assuming good reasons not to use
> an
> actual x86 for this), I would have thought would be better to at least
> recognise any instruction sequence, even if it doesn't execute it, rather
> than only allow a subset.
>
it does not make a whole lot of difference if either way it will #UD or
cause the code to be rejected...
> Presumably you need to allow the code being interpreted to call into the
> OS
> and external libraries for real...
>
OS and libraries will not run from within the interpreted code, so this is
not an issue.
any libraries within this space will be necessary either to follow these
restrictions, or hook into the interpreter.
> But if the purpose of the subset is just an internal language to be used
> between your compiler and your interpreter, then why x86 at all? Is this
> to
> simplify then running the code directly on the processor?
>
direct execution is one option, when avaiable.
another option is to allow me to use a much more general purpose (AKA: full
x86) interpreter, without necessarily having all code in a "raw" form.
I have consistently opted against bytecode on the grounds of realizing that
it would be too much work to modify my compiler/... to support it, but x86
related stuff is all over the place...
> And what format will the code be in? A serious, general purpose
> interpreter
> I suppose would have to work from exe files and decoding those will be an
> extra headache.
>
I am using PE/COFF EXE's and DLL's.
for dynamically generated code, COFF will be the norm.
note that, although PE/COFF is horridly ugly, it is not all that difficult
to load (or, at least vs the cost of writing an x86 interpreter...).
> Also, what execution model would it use, would it attempt to to model a
> complete x86 processor, which sounds a huge job, or just emulate the
> environment of a typical exe application, which would be far simpler (no
> need to worry about external hardware for example)?
>
the full interpreter (what in the 'mini-x86' approach I am calling 'layer
dalet'), is basically just an interpreter faking x86 userspace.
I have no intention of emulating the HW.
the 'aleph' and 'bet' layers will, instead, only partially model even the
semantics of the processor, so it will relate to real x86 about as much as
three-address-code relates to C (as in, the model is somewhat abstracted,
and one no-longer exactly knows such things as the exact size and endianess
of the registers, physical memory layout, ...).
however, when directly executed (in an interpreter or CPU) these layers will
remain as valid x86.
> I still don't see the need to disallow many of the opcodes you mentioned.
> Why no FPU instructions? Don't forget you have a real FPU to help out!
> Unless you're planning to run this on a totally different processor. But I
> guess your compiler uses a different scheme for floating point.
>
'bet' will use SSE for floating point.
'aleph' will not have floating point.
the reason to disable the FPU is that it is both complicated and ugly, as
well as problematic to abstract.
the others would get in the way of simplicity and the ability to abstract
over things.
'bet' may include a lot of the 'redundant' opcode encodings I had dropped
from aleph (which is now down to around 38 opcode nmonics...).
I also made an opcode map for aleph, which is notable in that much of the
single-byte opcode space is now empty...
> I don't know, emulating even exe level programs still sounds a big,
> unwieldy
> task, full of loose ends, and probably there are already products which
> might do this. If I was doing this, I would wave goodbye to compatibility
> with x86 and use a special purpose target language, chosen so that it
> could
> map easily to actual x86 codes when the time came to execute it properly.
>
this is an option, but I have 'other' reasons...
doing weirdness with x86 is currently the path of least resistence in my
case (since, after all, I am hardly 'starting clean', and essentially have
to work things around in what is essentially now a big 350 kloc or so mass
of low-level compiler and VM related code...).
my project also includes JBC support, but JBC does not quite address my
needs in this case...
> --
> Bartc
>
|
|
0
|
|
|
|
Reply
|
BGB
|
10/15/2009 1:38:11 AM
|
|
Tavis Ormandy wrote:
> Terje Mathisen<Terje.Mathisen@MUNGED.microcosmotalk.com> wrote:
>> It is of course impossible to guarantee safety here: x86 machine code
>> is=20 most certainly Turing-complete, and that is far more than you need
>> in=20 order to invoke G=F6del's theorem.
>>
>
> It is most definitely possible, and in fact a working implementation exists
> that you can download and examine. I think you are misunderstanding the
I have now watched the video, and I agree that the severely limited
subset of x86 opcodes they allow, does make it much easier to verify
that programs are safe.
The main problems they avoid are arbitrary return addresses, but I don't
feel quite confident that you could not locate a 32-byte aligned address
within some library/system code, and jump to that via a calculated address.
As long as you allow any form of indirect jump (i.e. via some form of
writable data), it becomes quite expensive to verify that it is
impossible to ever jump out of the sandbox.
When the target of NC is to run at "native code speed", then you should
not slow the code down to the point where it runs slower than JIT'ed
Java. :-)
> problem, it is not a goal to allow every safe program to execute, only a
> subset of safe programs that obey the rules that have been defined that make
> validation possible.
I agree.
It is a very fun project, and a nice intellectual challenge to figure
out how to do it. :-)
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
Terje
|
10/15/2009 7:21:54 AM
|
|
Terje Mathisen <Terje.Mathisen@munged.microcosmotalk.com> wrote in part:
> The main problems they avoid are arbitrary return addresses,
> but I don't feel quite confident that you could not locate a
> 32-byte aligned address within some library/system code, and
> jump to that via a calculated address.
Or even an all-data exploit: use an unbounded input function
like gets() to load the stack with hostile data and a return
address pointing at some library code like exec() or int0x80 .
Perhaps stacks should grow upwards so overflows run into new
data rather than clobbering old data.
-- Robert
|
|
0
|
|
|
|
Reply
|
Robert
|
10/15/2009 1:52:37 PM
|
|
Terje Mathisen <Terje.Mathisen@MUNGED.microcosmotalk.com> wrote:
> I have now watched the video, and I agree that the severely limited subset
> of x86 opcodes they allow, does make it much easier to verify that
> programs are safe.
>
> The main problems they avoid are arbitrary return addresses, but I don't
> feel quite confident that you could not locate a 32-byte aligned address
> within some library/system code, and jump to that via a calculated
> address.
That's true, this is solved using segmentation, only validated code is
visible from within cs, and attempting to jump to non-validated code results
in #GP.
> As long as you allow any form of indirect jump (i.e. via some form of
> writable data), it becomes quite expensive to verify that it is impossible
> to ever jump out of the sandbox.
Well i wouldnt say expensive, direct branches can be verified at load time
and are totally legal, indirect branches do take a minor performance hit but
it's pretty bearable imho..
So instead of jmp eax, you need and eax, 0xfffffe; jmp eax..just to prove to
the validator you're not trying to branch to an unvalidated isntruction
boundary, cs limits take care of making sure the code is validated...not too
bad :-)
> When the target of NC is to run at "native code speed", then you should
> not slow the code down to the point where it runs slower than JIT'ed Java.
> :-)
Hopefully not even within an order of magnitude :-)
>
> > problem, it is not a goal to allow every safe program to execute, only a
> > subset of safe programs that obey the rules that have been defined that
> > make validation possible.
>
> I agree.
>
> It is a very fun project, and a nice intellectual challenge to figure out
> how to do it. :-)
>
> Terje
>
--
-------------------------------------
taviso@sdf.lonestar.org | finger me for my pgp key.
-------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Tavis
|
10/15/2009 1:53:19 PM
|
|
"BGB / cr88192" <cr88192@MUNGED.microcosmotalk.com> wrote in message
news:4ad5febf$0$5666$9a6e19ea@unlimited.newshosting.com...
>
> several people have confused this, I meant interpreters "for" x86, not
> interpreters "in" x86...
>
Well, of course we did...
Why would you want an interpreter "for" x86 when you only want a very small
subset of x86? Once you've reduced the size of the instruction set below
8-bit encoding, an interpreter "in" x86 is much easier to implement.
From the list below, you've got roughly 47 instructions. With some more
reductions in that list, you could get 32 instructions for 5-bit encoding,
and perhaps 16 instructions for 4-bit encoding. When the instruction set
size is small enough, you could just as easily come up with your own
instructions instead.
> here is the current list of opcode nmonics considered valid for "layer
> aleph":
>
> adc add and
> bswap
> call cmp cpuid
> dec div
> hlt
> idiv imul inc intxt
> j* jmp
> lea
> mov movsxb movsxw movzxb movzxw mul
> neg nop not
> or
> pop push
> rol ror ret
> sar sbb shl shr shld shrd sub
> test
> xchg xor
>
If you need to reduce the list further, some possible suggestions:
drop test - use and instead
drop cmp - use sub instead
drop neg - use sub instead
drop not - use cmp, sbb instead
drop inc, dec - use add, sub instead
drop cpuid - should the code have access to this?
drop hlt, nop - not needed
drop rol, ror - not used in C...
maybe drop shrd, shld - could use other shift instructions
maybe drop xchg - could use push-mov-pop or triple xor sequence
maybe drop jmp - could use call, perhaps with pop
perhaps drop push, pop - use mov against stack pointer
pehaps drop add, sub - use clc prior to adc, sbb
perhaps chose one only: signed or unsigned
- either: idiv, imul, movsx, sar
- or: div, mul, movzx, shr
HTH,
Rod Pemberton
|
|
0
|
|
|
|
Reply
|
Rod
|
10/15/2009 1:53:59 PM
|
|
Robert Redelmeier <redelm@ev1.net.invalid> wrote:
>
> Terje Mathisen <Terje.Mathisen@munged.microcosmotalk.com> wrote in part:
> > The main problems they avoid are arbitrary return addresses, but I don't
> > feel quite confident that you could not locate a 32-byte aligned address
> > within some library/system code, and jump to that via a calculated
> > address.
>
> Or even an all-data exploit: use an unbounded input function like gets()
> to load the stack with hostile data and a return address pointing at some
> library code like exec() or int0x80 .
There is no exploit necessary you can legally do this. The attack you are
describing can be simplified to 'how do you prevent an indirect branch (ret)
to an unaligned address?', this is solved using several methods that are
invisible to the programmer (the assembler handles it, but you can do it
manually if you prefer).
1) indirect branches must be proven safe, so jmp eax is only permitted if it
exists in a sequence like and eax, 0xffffffe; jmp eax, thus proving it is to
an aligned address.
2) ret is not allowed, instead you would do an equivalent statement like pop
eax; and eax...jmp eax (check the source for actual implementation).
3) code segment limits prevent non validated code from being executed, and
of course the validator prevents you from touching cs with far calls, and so
on.
> Perhaps stacks should grow upwards so overflows run into new data rather
> than clobbering old data.
>
> -- Robert
>
Actually, this attack is not really useful in this context, there is no need
to find an exploit to 'overwrite' the return address, as you are permitted
to just mov there. Or just, mover your stack somewhere else with mov esp,
whatever. Or if you dont like stacks, xor esp, esp :-)
--
-------------------------------------
taviso@sdf.lonestar.org | finger me for my pgp key.
-------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Tavis
|
10/15/2009 3:22:48 PM
|
|
Tavis Ormandy wrote:
> Terje Mathisen<Terje.Mathisen@MUNGED.microcosmotalk.com> wrote:
>> I have now watched the video, and I agree that the severely limited subset
>> of x86 opcodes they allow, does make it much easier to verify that
>> programs are safe.
>>
>> The main problems they avoid are arbitrary return addresses, but I don't
>> feel quite confident that you could not locate a 32-byte aligned address
>> within some library/system code, and jump to that via a calculated
>> address.
>
> That's true, this is solved using segmentation, only validated code is
> visible from within cs, and attempting to jump to non-validated code results
> in #GP.
Which of course means that you'll never run x64 code this way. OTOH,
this is probably OK for the next 5 years, at which time a lightweight
(hypervisor) VM host is the way to sandbox any kind of process. :-)
>
>> As long as you allow any form of indirect jump (i.e. via some form of
>> writable data), it becomes quite expensive to verify that it is impossible
>> to ever jump out of the sandbox.
>
> Well i wouldnt say expensive, direct branches can be verified at load time
> and are totally legal, indirect branches do take a minor performance hit but
> it's pretty bearable imho..
>
> So instead of jmp eax, you need and eax, 0xfffffe; jmp eax..just to prove to
That's AND EAX, 0xfffffff80 or AND EAX,0xfffffff0 afaik? All branch
targets are at least 16-byte and by default 32-byte aligned according to
the video.
BTW, the AND'ing of all addresses does cost at least one extra cycle, on
top of the AND opcode, due to the AGI stall.
Using POP / AND / JMP to return from functions is probably the most
expensive safeguard, since that messes up the return stack predictor as
well.
> the validator you're not trying to branch to an unvalidated isntruction
> boundary, cs limits take care of making sure the code is validated...not too
> bad :-)
>
>> When the target of NC is to run at "native code speed", then you should
>> not slow the code down to the point where it runs slower than JIT'ed Java.
>> :-)
>
> Hopefully not even within an order of magnitude :-)
Order of magnitude?
Fast interpreters run at approx 10% of native speed, while JIT code can
get into 50-90%, depending upon the type of code.
Since you want at least an order of magnitude speedup, the relevant Java
code you comapre with must be running at around 2-5% or less?
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
Terje
|
10/15/2009 3:23:56 PM
|
|
"Rod Pemberton" <do_not_have@nohavenot.cmm> wrote in message
news:4ad72977$0$4987$9a6e19ea@unlimited.newshosting.com...
>
> "BGB / cr88192" <cr88192@MUNGED.microcosmotalk.com> wrote in message
> news:4ad5febf$0$5666$9a6e19ea@unlimited.newshosting.com...
>>
>> several people have confused this, I meant interpreters "for" x86, not
>> interpreters "in" x86...
>>
>
> Well, of course we did...
>
> Why would you want an interpreter "for" x86 when you only want a very
> small
> subset of x86? Once you've reduced the size of the instruction set below
> 8-bit encoding, an interpreter "in" x86 is much easier to implement.
>
> From the list below, you've got roughly 47 instructions. With some more
> reductions in that list, you could get 32 instructions for 5-bit encoding,
> and perhaps 16 instructions for 4-bit encoding. When the instruction set
> size is small enough, you could just as easily come up with your own
> instructions instead.
>
I have not been changing the opcode encodings, only reducing their number.
however, the vast majority in this set use either single-byte forms, or
SingleByte ModRM forms.
a few use the 0F escape prefix, and there is 66 which I will probably regard
as part of the instruction rather than as a prefix.
however, the 'aleph' set has been reduced a little more since this list.
>> here is the current list of opcode nmonics considered valid for "layer
>> aleph":
>>
>> adc add and
>> bswap
>> call cmp cpuid
>> dec div
>> hlt
>> idiv imul inc intxt
>> j* jmp
>> lea
>> mov movsxb movsxw movzxb movzxw mul
>> neg nop not
>> or
>> pop push
>> rol ror ret
>> sar sbb shl shr shld shrd sub
>> test
>> xchg xor
>>
>
> If you need to reduce the list further, some possible suggestions:
>
> drop test - use and instead
> drop cmp - use sub instead
these are 'traditionally' used, and/sub are register-destructive, would
require compiler alteration.
> drop neg - use sub instead
> drop not - use cmp, sbb instead
> drop inc, dec - use add, sub instead
could reduce performance, would require compiler alteration.
> drop cpuid - should the code have access to this?
> drop hlt, nop - not needed
cpuid and hlt have been moved to 'bet'. hlt had been used in my interpreter
mostly as a means of breaking the interpreter loop (or, IOW, ending the
timeslice).
nop actually carries meaning in my case, since I have been using the
multi-byte nop as an escape-code to encode metadata references (such as
exception handling info, ...).
'intxt' has been moved to 'bet' (actually, secretly the assembler still
allows it, but its use is mostly for internal communication with the
interpreter, and it is not entirely clear how to deal with this issue).
> drop rol, ror - not used in C...
could be useful in special-purpose fragments though.
> maybe drop shrd, shld - could use other shift instructions
these are useful for 64-bit shifts.
otherwise, I would need several shifts, lots of register use, and some
arithmetic ops, which is not nearly so useful. it would also require
altering my compiler to compensate for this loss.
> maybe drop xchg - could use push-mov-pop or triple xor sequence
less efficient, may impact compiler.
> maybe drop jmp - could use call, perhaps with pop
errm... no.
> perhaps drop push, pop - use mov against stack pointer
would impact compiler and hurt prologues/epilogues.
foo:
push ebp
mov ebp, esp
push esi
push edi
....
lea esp, [ebp-8]
pop edi
pop esi
pop ebp
ret
> pehaps drop add, sub - use clc prior to adc, sbb
no clc (since conveptually I have dropped eflags).
add/adc are used mostly for implementing 64-bit integer math.
> perhaps chose one only: signed or unsigned
> - either: idiv, imul, movsx, sar
> - or: div, mul, movzx, shr
>
not sure the point, both sets have uses.
> HTH,
>
>
> Rod Pemberton
>
>
|
|
0
|
|
|
|
Reply
|
BGB
|
10/15/2009 4:08:06 PM
|
|
Terje Mathisen <Terje.Mathisen@MUNGED.microcosmotalk.com> wrote:
> > So instead of jmp eax, you need and eax, 0xfffffe; jmp eax..just to
> > prove to
>
> That's AND EAX, 0xfffffff80 or AND EAX,0xfffffff0 afaik? All branch
> targets are at least 16-byte and by default 32-byte aligned according to
> the video.
Right, it was just to illustrate the mechanism by which you prove safety.
> BTW, the AND'ing of all addresses does cost at least one extra cycle, on
> top of the AND opcode, due to the AGI stall.
It's not all addresses, just the ones that can't be verified at load time,
but yes, you're correct - it's sub-optimal.
> Using POP / AND / JMP to return from functions is probably the most
> expensive safeguard, since that messes up the return stack predictor as
> well.
Very true, there's no way around this, but the impact is largely minimal.
> > the validator you're not trying to branch to an unvalidated isntruction
> > boundary, cs limits take care of making sure the code is validated...not
> > too bad :-)
> >
> > > When the target of NC is to run at "native code speed", then you
> > > should not slow the code down to the point where it runs slower than
> > > JIT'ed Java. :-)
> >
> > Hopefully not even within an order of magnitude :-)
>
> Order of magnitude?
This statement was not intended to be taken literally ;-)
> Fast interpreters run at approx 10% of native speed, while JIT code can
> get into 50-90%, depending upon the type of code.
>
JIT code can get 50% speed of what native code? gcj-like compiled code? This
seems like a mismatched comparison.
I'm not familiar with the detailed performance measurements, but I'm told
that the impact of meeting validation requirements for most real gcc
compiled code is negligible. But personally, I'm not really interested in
performance, I think it's a cool technology that allows x86 programmers to
do fun things :-) (other people are interested in performance though, and
study it in detail, I'm just a security guy).
--
-------------------------------------
taviso@sdf.lonestar.org | finger me for my pgp key.
-------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Tavis
|
10/15/2009 5:30:40 PM
|
|
Robert Redelmeier wrote:
> Terje Mathisen<Terje.Mathisen@munged.microcosmotalk.com> wrote in part:
>> The main problems they avoid are arbitrary return addresses,
>> but I don't feel quite confident that you could not locate a
>> 32-byte aligned address within some library/system code, and
>> jump to that via a calculated address.
>
> Or even an all-data exploit: use an unbounded input function
> like gets() to load the stack with hostile data and a return
> address pointing at some library code like exec() or int0x80 .
>
> Perhaps stacks should grow upwards so overflows run into new
> data rather than clobbering old data.
That doesn't really help:
Whenever you call something and pass it a buffer on your own stack, it
can overwrite that, all the way into its own return address, and then
return anywhere at all.
This means that stack direction doesn't make any serious difference to
how hard/easy it is to take advantage of security holes.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
Terje
|
10/15/2009 5:32:05 PM
|
|
Tavis Ormandy <taviso@munged.microcosmotalk.com> wrote in part:
> Actually, this attack is not really useful in this context, there is no need
> to find an exploit to 'overwrite' the return address, as you are permitted
> to just mov there. Or just, mover your stack somewhere else with mov esp,
> whatever. Or if you dont like stacks, xor esp, esp :-)
Fine, then just _what_ are you protecting? If arbitrary code
can open any outbound socket, that is useful for propagation
or exploit. If it can open an inbound socket or read a file,
the exploit can be more detailed.
If you just want to prevent hangs & crashes, just set watchdog
timers and maybe mark pages CoW. Modern OSes routine do this.
-- Robert
>
|
|
0
|
|
|
|
Reply
|
Robert
|
10/15/2009 7:23:15 PM
|
|
"Tavis Ormandy" <taviso@MUNGED.microcosmotalk.com> wrote in message
news:4ad73e48$0$4967$9a6e19ea@unlimited.newshosting.com...
>
> Robert Redelmeier <redelm@ev1.net.invalid> wrote:
>
>>
>> Terje Mathisen <Terje.Mathisen@munged.microcosmotalk.com> wrote in part:
>> > The main problems they avoid are arbitrary return addresses, but I
>> > don't
>> > feel quite confident that you could not locate a 32-byte aligned
>> > address
>> > within some library/system code, and jump to that via a calculated
>> > address.
>>
>> Or even an all-data exploit: use an unbounded input function like gets()
>> to load the stack with hostile data and a return address pointing at some
>> library code like exec() or int0x80 .
>
> There is no exploit necessary you can legally do this. The attack you are
> describing can be simplified to 'how do you prevent an indirect branch
> (ret)
> to an unaligned address?', this is solved using several methods that are
> invisible to the programmer (the assembler handles it, but you can do it
> manually if you prefer).
>
> 1) indirect branches must be proven safe, so jmp eax is only permitted if
> it
> exists in a sequence like and eax, 0xffffffe; jmp eax, thus proving it is
> to
> an aligned address.
> 2) ret is not allowed, instead you would do an equivalent statement like
> pop
> eax; and eax...jmp eax (check the source for actual implementation).
> 3) code segment limits prevent non validated code from being executed, and
> of course the validator prevents you from touching cs with far calls, and
> so
> on.
>
yeah, or in my case, I am handling this sort of thing by running 'untrusted'
code ('gimel' and 'dalet' in my naming scheme), in a virtual address space
(as well as simulating the processor in a much more conventional sense).
note that, sadly, even with a JIT and some optimizations, a virtual address
space is likely to have some performance hit.
one idea for x86-64 (and a high-speed VA space) could be to map a largish
region, and essentially place anything inside this region. the JIT could
then be designed to be unable to reach outside this space.
I did not use this approach in my interpreter, since using the interpreter
'may' be sensible on x86, and even on x86-64 (at least, Win64) the space is
not "that" cheap (Win64 only uses a subset of the somewhat larger space, so
the pointer-space is much larger than that of usable VA's).
granted, something like:
mov r8, ... //get pointer into r8
and r8, 0x0FFFFFFF //256MB sandbox
add r8, r15 //r15 here being the sandbox base
mov rax, [r8] //load from r8
....
would not exactly be free either (probably a notable hit for indirect memory
access).
note: my interpreters current VA scheme is a good deal more expensive, and
may be partly forsaken/reworked at some point (I have actually written much
of the interpreter machinery such that it doesn't know how the VA's work,
giving me a little more reason for if/when I have a better idea).
so, as is (typical interpreter handling):
it will attempt first to 'resolve' an address to a pointer given an address
and a range;
failing this, it will try to "read" or "write" the range using a call (this
being intended mostly for the possibility of a software pagetable setup);
failing this, it will assume that the memory is accessible, and raise a
simulated #GP.
I may consider adopting some of the ideas from NaCl in my 'gimel' layer, in
particular, 'gimel' may still require that all instructions be decodable
('dalet' would not require this, only the absence of system instructions).
the alignment requirements seem a little odd to me though, so I have mixed
feelings (granted, I typically do align functions, usually to 16, just I
feel uncertain of forcing it for every jump target).
I would probably force this internally though, rather than changing the code
itself (so, from the POV of the machine code, it would seem like "just a
little silly validator rule"). does not make as much sense though when the
address space is simulated, so I don't know...
>> Perhaps stacks should grow upwards so overflows run into new data rather
>> than clobbering old data.
>>
>> -- Robert
>>
>
> Actually, this attack is not really useful in this context, there is no
> need
> to find an exploit to 'overwrite' the return address, as you are permitted
> to just mov there. Or just, mover your stack somewhere else with mov esp,
> whatever. Or if you dont like stacks, xor esp, esp :-)
>
yep...
it is probable in my case that my 'aleph' and 'bet' modes would include
memory-access validation, and so would attempt to prove that memory accesses
are typesafe and correct.
the issue here though is that it could reject a good amount of otherwise
sane C code, since it could not entirely prove the memory safety (that, or
force emitting bounds-checked output).
however, it is worth noting that 'aleph' and 'bet' would apply mostly to
dynamic translation of the machine code to run natively within the host
addess space (and for very "light and fast" interpreters, since many things
I have seen which would add overhead could be reasonably avoided).
'gimel' and 'dalet' would be much closer to traditional sandboxing or
virtualization (and will, by extension, not run within the host's address
space).
NaCl is an interesting technology, so I may be interested to see where it
goes...
I don't know if any of my ideas are applicable, however...
> --
> -------------------------------------
> taviso@sdf.lonestar.org | finger me for my pgp key.
> -------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
BGB
|
10/15/2009 7:23:46 PM
|
|
Terje Mathisen <Terje.Mathisen@munged.microcosmotalk.com> wrote in part:
> Robert Redelmeier wrote:
>> Perhaps stacks should grow upwards so overflows run into new
>> data rather than clobbering old data.
>
> That doesn't really help:
>
> Whenever you call something and pass it a buffer on your own
> stack, it can overwrite that, all the way into its own return
> address, and then return anywhere at all.
True enough, for nested calls where buffer is in the callers local area.
But if the buffer is in the callee's local area, then direction helps.
-- Robert
|
|
0
|
|
|
|
Reply
|
Robert
|
10/15/2009 7:24:43 PM
|
|
Tavis Ormandy wrote:
> 1) indirect branches must be proven safe, so jmp eax is only permitted if it
> exists in a sequence like and eax, 0xffffffe; jmp eax, thus proving it is to
> an aligned address.
How about setting the code up so that the JMP EAX is 32-byte aligned?
Unless EAX is the only possible register used for branching, that would
allow you to do an indirect branch to this instruction, using a
secondary register, which would then branch to the desired (misaligned)
address.
However, if only EAX (or some other register) is dedicated to all such
jumps, then it would have to be aligned before the first branch and the
second one would simply be an infinite loop.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
Terje
|
10/15/2009 7:25:25 PM
|
|
"Tavis Ormandy" <taviso@MUNGED.microcosmotalk.com> wrote in message
news:4ad7294f$0$4980$9a6e19ea@unlimited.newshosting.com...
>
> Terje Mathisen <Terje.Mathisen@MUNGED.microcosmotalk.com> wrote:
>> I have now watched the video, and I agree that the severely limited
>> subset
>> of x86 opcodes they allow, does make it much easier to verify that
>> programs are safe.
>>
>> The main problems they avoid are arbitrary return addresses, but I don't
>> feel quite confident that you could not locate a 32-byte aligned address
>> within some library/system code, and jump to that via a calculated
>> address.
>
> That's true, this is solved using segmentation, only validated code is
> visible from within cs, and attempting to jump to non-validated code
> results
> in #GP.
>
except...
that this will not exactly be prone to work on Win64 and friends (even in
32-bit mode, since AFAIK segmentation is still sort of, errm, broke...).
granted, emulation and/or JIT are still options, granted, they are not as
efficient...
either way, it doesn't presently work on my system due to lacking python...
then again, I am left doubting it would work ever then as, errm, I am
running 64-bit Windows...
>> As long as you allow any form of indirect jump (i.e. via some form of
>> writable data), it becomes quite expensive to verify that it is
>> impossible
>> to ever jump out of the sandbox.
>
> Well i wouldnt say expensive, direct branches can be verified at load time
> and are totally legal, indirect branches do take a minor performance hit
> but
> it's pretty bearable imho..
>
yep.
depending on the means of implementing a VA scheme, this case could also
still allow "free" use.
it could be optimized for both a big flat space, or for 'spans' (my current
primary VA scheme), but would be a problem with simulated paging (which
remains a present source of uncertainty... so, for the time being, my
interpreter does not have page tables...).
> So instead of jmp eax, you need and eax, 0xfffffe; jmp eax..just to prove
> to
> the validator you're not trying to branch to an unvalidated isntruction
> boundary, cs limits take care of making sure the code is validated...not
> too
> bad :-)
>
yep.
my interpreter internally does things differently:
for pure interpretation, it does not bother.
for the trace-translation (not yet started), all indirect jumps (such as
ret) would essentially have to go through a trace-table (which worries me by
adding an O(log n) overhead...).
using an address hash could probably allow an O(1) average case though
(falling back to an O(log n) binary search...).
direct calls/jumps in this scheme would use fixed trace indices.
(this is N/A for 'aleph' and 'bet', which could use a different translation
strategy).
>> When the target of NC is to run at "native code speed", then you should
>> not slow the code down to the point where it runs slower than JIT'ed
>> Java.
>> :-)
>
> Hopefully not even within an order of magnitude :-)
>
yep.
well, I am looking at JIT'ed x86, since I tend not to assume that the host
and client are even running the same ISA...
I did look at qemu, briefly, but was not feeling inclined to dig around in
its internals enough to where it would do what I wanted.
basically, I find it frustrating at times that most code does not conform to
my expectations / design sensibilities, which in this case would mean that:
I could get it to easily get it to build as a library;
I could easily get it to build on Windows;
I could easily get it to build via MSVC.
projects using 'configure' and piles of GCC'isms, external library
dependencies, ..., are not so useful here.
that it works, fine, I will not debate this, but I am not going near the
code, and will not expend much time or effort trying to get it to work,
should it 'just happen' to be broken and not work...
one could argue, maybe I am too fussy, or endlessly 'reinventing the wheel',
but I remember the long past, where I did try to leverage a lot of existing
code, and often ended up with a pain and a mess far worse than had I just
started clean.
so, now, I have limited patience for what I see as poorly
written/organized/... code.
it is also sort of like how it is often sensible to start clean on a new
piece of functionality, even if it may seem externally "similar" to existing
code (even within the same project), as in many cases, the cost of
maintainence may exceed that of clean code (except in the sad case where one
ends up with parts which are not easily replaced).
granted, it is the common intuition that it is better to centralize and
reuse systems, this being what would allow simpler and more easily
maintainable code, but as I see it, this in not always the case, as
centralization tends to lead to project crystalization, which may in turn
may cause it to become overly brittle.
NiH may then be, at times, an effective strategy. (although, by this I mean
to reingineer the implementation, but not necessarily the interface or
design).
I am partly unhappy in that my project, itself, managing to solidify in some
less than ideal ways on the larger scale.
the project is developing too many dependencies, on itself...
(likely a concept rather alien to many opensource developers, judging from a
lot of the code I encounter on a regular basis...).
but, on the smaller scale, an interpreter is an interpreter, a smaller piece
of a much bigger puzzle.
|
|
0
|
|
|
|
Reply
|
BGB
|
10/15/2009 7:25:51 PM
|
|
Tavis Ormandy wrote:
> Terje Mathisen<Terje.Mathisen@MUNGED.microcosmotalk.com> wrote:
>> Fast interpreters run at approx 10% of native speed, while JIT code can
>> get into 50-90%, depending upon the type of code.
>>
>
> JIT code can get 50% speed of what native code? gcj-like compiled code? This
> seems like a mismatched comparison.
Just-In-Time compilation, with automatic runtime profiling (from the
initial interpreted iterations) can get very close to normal C(++) -O2
optimized speed.
>
> I'm not familiar with the detailed performance measurements, but I'm told
> that the impact of meeting validation requirements for most real gcc
> compiled code is negligible. But personally, I'm not really interested in
> performance, I think it's a cool technology that allows x86 programmers to
> do fun things :-) (other people are interested in performance though, and
> study it in detail, I'm just a security guy).
I agree that it is cool, as I said it must have been fun figuring out
exactly what's needed to make it work. :-)
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
Terje
|
10/15/2009 7:26:19 PM
|
|
"Terje Mathisen" <Terje.Mathisen@MUNGED.microcosmotalk.com> wrote in message
news:4ad77725$0$5637$9a6e19ea@unlimited.newshosting.com...
>
> Tavis Ormandy wrote:
>> 1) indirect branches must be proven safe, so jmp eax is only permitted if
>> it
>> exists in a sequence like and eax, 0xffffffe; jmp eax, thus proving it is
>> to
>> an aligned address.
>
> How about setting the code up so that the JMP EAX is 32-byte aligned?
>
> Unless EAX is the only possible register used for branching, that would
> allow you to do an indirect branch to this instruction, using a
> secondary register, which would then branch to the desired (misaligned)
> address.
>
> However, if only EAX (or some other register) is dedicated to all such
> jumps, then it would have to be aligned before the first branch and the
> second one would simply be an infinite loop.
>
alternatively:
disallow the jump itself being aligned...
me partly imagining how one would have hacked GAS to pull off all these
alignments, but oh well...
> Terje
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
BGB
|
10/15/2009 8:15:41 PM
|
|
Robert Redelmeier <redelm@ev1.net.invalid> wrote:
>
> Tavis Ormandy <taviso@munged.microcosmotalk.com> wrote in part:
> > Actually, this attack is not really useful in this context, there is no
> > need to find an exploit to 'overwrite' the return address, as you are
> > permitted to just mov there. Or just, mover your stack somewhere else
> > with mov esp, whatever. Or if you dont like stacks, xor esp, esp :-)
>
> Fine, then just _what_ are you protecting?
Most safe operations are permitted. Manipulating your call stack is clearly
safe, nobody would consider this malicious.
> If arbitrary code can open any
> outbound socket, that is useful for propagation or exploit.
Of course this is an unsafe operation, and so is not permitted.
> If it can
> open an inbound socket or read a file, the exploit can be more detailed.
Again, this is an example of an unsafe operation, and so would not be
permitted. Although these are really two examples of the same unsafe
operation, directly requesting a system service. This is impossible as int
or sysenter (for example) are not permitted by the validator, and any
examples of these instructions that have not been validated are not visible
thanks to segmentation. Additionally, alignment restrictions prevent you
from doing mov bx, 0xcd80; jmp $-2. I suspect this will be clearer if you
read the paper.
Requesting system services is possible via a transition to trusted code,
which of course validates the request to make sure it is permitted.
> If you just want to prevent hangs & crashes, just set watchdog timers and
> maybe mark pages CoW. Modern OSes routine do this.
Thank you for your valuable insight. Hangs and crashes are considered safe
oparations, you can cause them if you like (of course a user can terminate
your application at any time, and will most likely do so if you are not
doing anything useful except spinning).
--
-------------------------------------
taviso@sdf.lonestar.org | finger me for my pgp key.
-------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Tavis
|
10/16/2009 3:31:18 AM
|
|
On Oct 14, 10:17=A0am, "BGB / cr88192"
<cr88...@MUNGED.microcosmotalk.com> wrote:
> well, in my mind I have been going around in circles some recently:
> considering ideas, weighting ideas, ...
>
> so, here is another idea I have come (back) to: mini-x86.
>
> basic idea:
> it would be essentially a subset of x86, designed mostly to simplify the
> creation of interpreters.
>
> it could be kept binary compatible with conventional x86, this being a
> boundary restriction (no simplifications allowed beyond which would break
> compatibility, and no extensions which could not be similarly handled wit=
hin
> 'proper' x86).
>
> if a feature is desired, and comparable functionality exists within stand=
ard
> x86, the relevant features from standard x86 should be implemented instea=
d.
>
> so, as noted, the goal will be to keep this as a proper subset, rather th=
an
> an incompatible branch.
>
> the goal then would be to create compilers (such as C compilers), capable=
of
> producing code compatible with this subset (such that an interpreter need
> not implement the entire x86 ISA to be able to use the compiler output).
>
> the 'risk', however, is that these restrictions "could" hinder the abilit=
y
> to produce as efficient code, so it is not required that the compiler
> produce "only" valid output. similarly, such a compiler should not produc=
e
> code which could fail to operate correctly on a true processor.
>
> major feature removals:
> Ring 0 / OS-level functionality;
> x87 (likely) and MMX.
>
> however, SSE and SSE2 may be assumed present.
>
> general restrictions:
> assume a single flat 32-bit address model, and assume that no other
> possibilities exist within this context;
> assume that paging is either absent, or is confined to 4kB pages;
> assume that segmentation does not exist (including FS and GS, alternative
> means will be needed to access TLS, such as via function calls).
>
> deprecate AF, PF, DF, and a few others (instead, the flags will be assume=
d
> to hold 'undefined' values, except DF for which it will be assumed invali=
d
> to have DF!=3D0).
>
> 'rep movsb' would be the only valid 'rep' form (the only reason even this
> being kept is that it provides a means for an interpreter to do efficient
> memory->memory copies).
>
> aas/aam/aad/... enter/leave, ... will be deprecated.
> certain instruction forms may be deprecated (direct inc/dec/push/pop form=
s,
> only the ModRM forms will be valid).
>
> the state of eflags may be relaxed following generic arithmetic
> instructions, and as such only following certain defined instructions (su=
ch
> as 'cmp' and 'test') will it be allowed to depend on the state of these
> flags (granted, this disallows such tricks as "and eax, eax; jz foo", ...=
).
>
> direct self-modifying code would be disallowed.
>
> ...
>
> the goal would then be to define such an x86 subset, as well as which
> behaviors are valid to depend on, and which are undefined.
>
> possibly useful would be a tool to statically validate conformance to thi=
s
> subset.
> this would imply having as another constraint that the code be statically
> decodable / verifiable.
>
> partial semantic validation could be attempted as well (in a manner simil=
ar
> to the Java VM), although in a relaxed form (I suspect static probability=
is
> beyond the reach of x86). as such, the semantic validation would be more =
an
> attempt to point out likely bugs or conformance issues, rather than as an
> attempt to prove that the code is valid.
>
> a specific implementation / ABI could add these restrictions:
> that certain forms of annotative metadata be present;
> that the stack remain aligned between a label and all jumps to said label=
;
> areas of stack memory accessed as a given type only be accessed as this
> type;
> ...
>
> these restrictions could aid in the creation of static binary translation
> (likely to be unworkable for free-form x86).
>
> thoughts?...
To disallow self-modifying code? Man, are you crazy?
This is only technique how to find out the exact length of fields in
binary data file.
And it is the most powerful technique in LL programing ever.
Btw, in standard 16b programming variables are just part of code, how
can you change them without "self-modifying" code?
Well... but it would be good, if your CPU could allow writing into
memory just to the particular range (something like current buffer-
overflow protection). This one would buy my voice... :)
|
|
0
|
|
|
|
Reply
|
Processor
|
10/16/2009 8:07:03 AM
|
|
"Tavis Ormandy" <taviso@MUNGED.microcosmotalk.com> wrote in message
news:4ad73e48$0$4967$9a6e19ea@unlimited.newshosting.com...
>
> The attack you are
> describing can be simplified to 'how do you prevent an indirect branch
> (ret) to an unaligned address?'
Simply by not allowing any x86 control flow instructions (ones which change
eip non-sequentially): jcc, jmp, call, ret. You should concoct a safe
method to transfer to different addresses, e.g., only allowing call
instruction for "branching", trapping ret, perform hidden checks on return
value. Or, you could concoct an aritificial or escape instruction sequence
for "branching". Such emulator escape instructions are used in Virtual PC,
NTVDM, or DOSBox.
Rod Pemberton
|
|
0
|
|
|
|
Reply
|
Rod
|
10/16/2009 10:25:32 AM
|
|
"Robert Redelmeier" <redelm@ev1.net.invalid> wrote in message
news:4ad72925$0$4954$9a6e19ea@unlimited.newshosting.com...
>
> Perhaps stacks should grow upwards so overflows run into new
> data rather than clobbering old data.
>
The solution is to separate control flow from data so that data operations
can't affect control flow. FORTH does this by using two stacks, albeit not
to solve this issue. One stack is for data. The other stack is for control
flow. On x86, this would mean, e.g., esp and ebp, are used for two stacks.
You'd have to insert "xchg esp, ebp" instructions in appropriate locations:
1) prior to a call instruction (set control flow stack)
2) first instruction in a called routine (set data stack)
3) prior to a ret/retf instruction (set control flow stack)
4) after a call instruction (set data stack)
Rod Pemberton
|
|
0
|
|
|
|
Reply
|
Rod
|
10/16/2009 10:25:45 AM
|
|
On 16 Oct, 09:07, Processor-Dev1l
<processor.de...@MUNGED.microcosmotalk.com> wrote:
> On Oct 14, 10:17=3DA0am, "BGB / cr88192"
>
>
>
> <cr88...@MUNGED.microcosmotalk.com> wrote:
> > well, in my mind I have been going around in circles some recently:
> > considering ideas, weighting ideas, ...
>
> > so, here is another idea I have come (back) to: mini-x86.
>
> > basic idea:
> > it would be essentially a subset of x86, designed mostly to simplify th=
e
> > creation of interpreters.
>
> > it could be kept binary compatible with conventional x86, this being a
> > boundary restriction (no simplifications allowed beyond which would bre=
ak
> > compatibility, and no extensions which could not be similarly handled w=
it=3D
> hin
> > 'proper' x86).
>
> > if a feature is desired, and comparable functionality exists within sta=
nd=3D
> ard
> > x86, the relevant features from standard x86 should be implemented inst=
ea=3D
> d.
>
> > so, as noted, the goal will be to keep this as a proper subset, rather =
th=3D
> an
> > an incompatible branch.
>
> > the goal then would be to create compilers (such as C compilers), capab=
le=3D
> =A0of
> > producing code compatible with this subset (such that an interpreter ne=
ed
> > not implement the entire x86 ISA to be able to use the compiler output)=
..
>
> > the 'risk', however, is that these restrictions "could" hinder the abil=
it=3D
> y
> > to produce as efficient code, so it is not required that the compiler
> > produce "only" valid output. similarly, such a compiler should not prod=
uc=3D
> e
> > code which could fail to operate correctly on a true processor.
>
> > major feature removals:
> > Ring 0 / OS-level functionality;
> > x87 (likely) and MMX.
>
> > however, SSE and SSE2 may be assumed present.
>
> > general restrictions:
> > assume a single flat 32-bit address model, and assume that no other
> > possibilities exist within this context;
> > assume that paging is either absent, or is confined to 4kB pages;
> > assume that segmentation does not exist (including FS and GS, alternati=
ve
> > means will be needed to access TLS, such as via function calls).
>
> > deprecate AF, PF, DF, and a few others (instead, the flags will be assu=
me=3D
> d
> > to hold 'undefined' values, except DF for which it will be assumed inva=
li=3D
> d
> > to have DF!=3D3D0).
>
> > 'rep movsb' would be the only valid 'rep' form (the only reason even th=
is
> > being kept is that it provides a means for an interpreter to do efficie=
nt
> > memory->memory copies).
>
> > aas/aam/aad/... enter/leave, ... will be deprecated.
> > certain instruction forms may be deprecated (direct inc/dec/push/pop fo=
rm=3D
> s,
> > only the ModRM forms will be valid).
>
> > the state of eflags may be relaxed following generic arithmetic
> > instructions, and as such only following certain defined instructions (=
su=3D
> ch
> > as 'cmp' and 'test') will it be allowed to depend on the state of these
> > flags (granted, this disallows such tricks as "and eax, eax; jz foo", .=
...=3D
> ).
>
> > direct self-modifying code would be disallowed.
>
> > ...
>
> > the goal would then be to define such an x86 subset, as well as which
> > behaviors are valid to depend on, and which are undefined.
>
> > possibly useful would be a tool to statically validate conformance to t=
hi=3D
> s
> > subset.
> > this would imply having as another constraint that the code be statical=
ly
> > decodable / verifiable.
>
> > partial semantic validation could be attempted as well (in a manner sim=
il=3D
> ar
> > to the Java VM), although in a relaxed form (I suspect static probabili=
ty=3D
> =A0is
> > beyond the reach of x86). as such, the semantic validation would be mor=
e =3D
> an
> > attempt to point out likely bugs or conformance issues, rather than as =
an
> > attempt to prove that the code is valid.
>
> > a specific implementation / ABI could add these restrictions:
> > that certain forms of annotative metadata be present;
> > that the stack remain aligned between a label and all jumps to said lab=
el=3D
> ;
> > areas of stack memory accessed as a given type only be accessed as this
> > type;
> > ...
>
> > these restrictions could aid in the creation of static binary translati=
on
> > (likely to be unworkable for free-form x86).
>
> > thoughts?...
>
> To disallow self-modifying code? Man, are you crazy?
> This is only technique how to find out the exact length of fields in
> binary data file.
> And it is the most powerful technique in LL programing ever.
> Btw, in standard 16b programming variables are just part of code, how
> can you change them without "self-modifying" code?
> Well... but it would be good, if your CPU could allow writing into
> memory just to the particular range (something like current buffer-
> overflow protection). This one would buy my voice... :)
Self-modifying code should have been prohibited years ago....
What did you mean that it's the only way to find out the exact length
of fields in a binary data file?
What is standard 16b?
James
|
|
0
|
|
|
|
Reply
|
James
|
10/16/2009 11:42:17 AM
|
|
Rod Pemberton wrote:
> "Robert Redelmeier"<redelm@ev1.net.invalid> wrote in message
> news:4ad72925$0$4954$9a6e19ea@unlimited.newshosting.com...
>>
>> Perhaps stacks should grow upwards so overflows run into new
>> data rather than clobbering old data.
>>
>
> The solution is to separate control flow from data so that data operations
> can't affect control flow. FORTH does this by using two stacks, albeit not
> to solve this issue. One stack is for data. The other stack is for control
> flow. On x86, this would mean, e.g., esp and ebp, are used for two stacks.
> You'd have to insert "xchg esp, ebp" instructions in appropriate locations:
>
> 1) prior to a call instruction (set control flow stack)
> 2) first instruction in a called routine (set data stack)
> 3) prior to a ret/retf instruction (set control flow stack)
> 4) after a call instruction (set data stack)
It would probably be significantly more efficient to simply leave ESP as
the CALL/RET pointer, and use another register for all data stack operation.
With EDI your could still have one-byte PUSH operations from EAX with
STOSD. :-)
Anyway, assuming EBP, your stack-based parameter calls would look like
this (let the EBP stack grow up):
MOV [EBP+4],EAX ; Param 3
MOV [EBP+8],ECX ; Param 2
MOV [EBP+12],EDX ; Param 1
ADD EBP,12
CALL BlockMove
and in BlockMove which is a leaf procedure:
BlockMove proc
MOV [EBP+4],ESI ; Save working regs
MOV [EBP+8],EDI
MOV EDI[EBP]
MOV ESI,[EBP-4]
MOV ECX,[EBP-8]
REP MOVSD
MOV ESI,[EBP+4] ; Restore
MOV EDI,[EBP+8]
SUB EBP,12 ; Get rid of params
RET
BlockMove endp
The actual runtime of BlockMove would be pretty much identical to code
compiled to use ESP as the data stack, i.e. the MOVs to save/restore
ESI/EDI will be at least as fast as PUSH/POP operations. The number of
opcode bytes go from 1 to 3 for each instruction, but we avoid the
potential AGIs from back-to-back stack operations.
Terje
PS. You would of course prefer to use register-based calling conventions
here!
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
Terje
|
10/16/2009 2:01:03 PM
|
|
Tavis Ormandy <taviso@munged.microcosmotalk.org> wrote in part:
> Robert Redelmeier <redelm@ev1.net.invalid> wrote:
>> Fine, then just _what_ are you protecting?
>
> Most safe operations are permitted. Manipulating your call
> stack is clearly safe, nobody would consider this malicious.
??? most current exploits involve stack manipulation via
input overflows. Clearly unsafe, unless you separate return
addresses as Rod suggests, or restore the return address as part
of a function epilog. Do not overly blame the input overflows,
because any ability to overwrite return addresses is exploitable.
If the call stack contains return addresses (as is hardwired
on x86) then stack manipulation can change control flow,
specifically so the next "RET" instruction executed returns
unexpectedly to some exploitable address outside the validated
code. With exploit data since the library/system address
was certainly not written as an exploit itself.
Any ability to jump is also exploitable -- just put the necessary
exploit code (short) _inside_ an instruction `mov eax, 0x80cd`
and jmp to the middle of the instruction.
Clean code is a bit of a pipe-dream, although people have been
trying with things like HLLs and interpreters. This becomes
a varient of the halting theorum. Current security is
achieved by resource control although it would be nice to
have orthogonal means.
-- Robert
|
|
0
|
|
|
|
Reply
|
Robert
|
10/16/2009 4:49:05 PM
|
|
"James Harris" <james.harris.1@MUNGED.microcosmotalk.com> wrote in message
news:4ad85c19$0$5671$9a6e19ea@unlimited.newshosting.com...
>
> On 16 Oct, 09:07, Processor-Dev1l
> <processor.de...@MUNGED.microcosmotalk.com> wrote:
>> On Oct 14, 10:17=3DA0am, "BGB / cr88192"
>>
<snip>
>>
>> To disallow self-modifying code? Man, are you crazy?
>> This is only technique how to find out the exact length of fields in
>> binary data file.
>> And it is the most powerful technique in LL programing ever.
>> Btw, in standard 16b programming variables are just part of code, how
>> can you change them without "self-modifying" code?
>> Well... but it would be good, if your CPU could allow writing into
>> memory just to the particular range (something like current buffer-
>> overflow protection). This one would buy my voice... :)
>
> Self-modifying code should have been prohibited years ago....
>
it has its uses, although rarely do I do it in the traditional sense of the
word.
more often, it is producing code and dynamically linking it into the image,
then calling it via function pointers.
however, this need not be "self-modifying" in the strict sense...
> What did you mean that it's the only way to find out the exact length
> of fields in a binary data file?
>
I have little idea...
> What is standard 16b?
>
I think he means 16 bit, and the idea of placing variables in the CS
segment.
personally, in my stuff I am likely to disallow this as well, and require
that all data be placed in '.data' (or maybe '.rodata'/... for read-only,
but I usually just use '.data'...).
I may generally require that '.text' can be disassembled, which means no
placing of arbitrary data in this section...
> James
|
|
0
|
|
|
|
Reply
|
BGB
|
10/16/2009 4:49:42 PM
|
|
Robert Redelmeier <redelm@ev1.net.invalid> wrote:
>
> Tavis Ormandy <taviso@munged.microcosmotalk.org> wrote in part:
> > Robert Redelmeier <redelm@ev1.net.invalid> wrote:
> > > Fine, then just _what_ are you protecting?
> >
> > Most safe operations are permitted. Manipulating your call stack is
> > clearly safe, nobody would consider this malicious.
>
> ??? most current exploits involve stack manipulation via input overflows.
You're thinking too high level.
> Clearly unsafe, unless you separate return addresses as Rod suggests, or
> restore the return address as part of a function epilog.
I think perhaps you're not reading my posts where I explain how control flow
and validation is restricted. Perhaps you should download the code, and try
to execute a system call.
I've already described this, but here is what I think you will try, in
order, and what will happen.
A: You will try to execute int 0x80.
R: This will fail at load time, the int instruction is not permitted.
A: You will try to modify the return address to point to an occurrence of cd
80 somewhere else.
R: This will fail, 'ret' is not permitted, an equivalent statement is
required.
A: You will try to find an occurrence of cd 80 outside of your code segment
(some library mapped into the address space)
R: This will fail, the processor will raise #GP.
A: You will try to 'hide' an occurrence of cd 80 inside another instruction
and jmp to it directly:
B: This will fail, branch targets are validated.
A: You will try to 'hide' an occurrence of cd 80 inside another instruction
and jmp to it indirectly:
B: This will fail at runtime time, there are alignment restrictions, so you
can only jmp to a valid instruction boundary.
> Do not overly
> blame the input overflows, because any ability to overwrite return
> addresses is exploitable.
You are describing an entirely different concept. This is an untrusted user
attacking a trusted application, the purpose of nativeclient is to safely
allow untrusted applications.
> If the call stack contains return addresses (as is hardwired on x86) then
> stack manipulation can change control flow, specifically so the next "RET"
> instruction executed returns unexpectedly to some exploitable address
> outside the validated code.
Impossible for two reasons:
- There is no 'outside the validated code', you need to refresh your
understanding of segmentation.
- ret is not permitted, you must use an equivalent sequence that proves
safety.
> With exploit data since the library/system
> address was certainly not written as an exploit itself.
Again, you do not understand segmentation.
>> Additionally, alignment restrictions prevent you
>> from doing mov bx, 0xcd80; jmp $-2. I suspect this will be clearer if you
>> read the paper.
> Any ability to jump is also exploitable -- just put the necessary exploit
> code (short) _inside_ an instruction `mov eax, 0x80cd` and jmp to the
> middle of the instruction.
I see you read my posts as closely as you read the intel documentation on
segmentation :-)
> Clean code is a bit of a pipe-dream, although people have been trying with
> things like HLLs and interpreters.
Wrong, a working implementation exists that you can download right now. It
supports Linux, Windows and Mac.
> This becomes a varient of the halting
> theorum. Current security is achieved by resource control although it
> would be nice to have orthogonal means.
>
Completely wrong, you are confusing multiple ideas. It is not necessary to
solve anything, a proof is included in the paper I linked to.
I realise you won't read it, but it's worth a shot.
--
-------------------------------------
taviso@sdf.lonestar.org | finger me for my pgp key.
-------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Tavis
|
10/16/2009 6:10:06 PM
|
|
"Tavis Ormandy" <taviso@MUNGED.microcosmotalk.com> wrote in message
news:4ad8b6fe$0$4940$9a6e19ea@unlimited.newshosting.com...
>
> A: You will try to execute int 0x80.
> R: This will fail at load time, the int instruction is not permitted.
I'm thoroughly confused by that statement. As I understood both the video
and .pdf, "salt", e.g., NaCl or Native Client, statically checks x86 code
which can call functions using a number of APIs: SRPC, NPAPI, etc. What C
compiler generates x86 code without generating calls to a critical API,
e.g., BIOS, Linux, Windows, etc. without using an "int" instruction? AFAIK,
none do. Is there an undocumented requirement for special compiler options,
e.g., gcc's -nostdlib?
Other comments on "salt":
1) From what I gathered, NaCl uses x86 fault mechanisms and segmentation
which are dependent on cpu mode. I.e., it'll work for 32-bit, maybe 16-bit,
but probably won't work on 64-bit.
2) The video claims future support for other platforms such as ARM. This
will require an x86 cpu emulator for those platforms. In which case, the
claimed advantages for native x86 (speed, instruction access, etc.) are
moot. An entirely different design will be needed for those platforms.
This make NaCl only useful on x86.
3) Given that one has to implement an alternate sequence for "ret"
(mentioned below...). Why isn't NaCl using dynamic binary translation? Why
should the user have to rewrite x86 assembly compiled by some C compiler to
delete and replace a "ret" instruction with another sequence to produce code
which is "safe" according to NaCl's static checks? Shouldn't NaCl do the
translation itself and then do the checking itself to confirm the accuracy
of the translation? Haven't you guys studied the operation of Dec's FX!32?
Papers are available on Citeseer. Also, having the user replace "ret" is
likely to introduce offset errors if the "ret" is replaced by a sequence of
larger size in bytes.
x86 code --> dynamic binary translation --> static code analysis --> native
x86 execution
> - There is no 'outside the validated code', you need to refresh your
> understanding of segmentation.
Even so, you are executing x86 directly on an x86 cpu, i.e., natively. If
the call stack is manipulatable and that incorrect information is used for
generating control flow information such as a return address either "inside"
or "outside validated code," then the application can restart execution
somewhere other than where intended. Look up ret2libc attacks on Wikipedia.
> - ret is not permitted, you must use an equivalent sequence that proves
> safety.
This is inline with my comments not to allow any x86 instructions which
affect eip non-sequentially.
> you must use an equivalent sequence
It seems I missed that... But, I'm quite sure neither the video nor the
..pdf mentions that anywhere. I do not recall mention of the need to use
"equivalent sequences".
As I understood it, NaCl is supposed to execute _existing_ x86 instructions
*natively* on x86. The video AFAIR doesn't mention anything about how
specific instruction "sandboxing" and "checking" occurs. And, the .pdf,
only covers a few trivial examples...
> Again, you do not understand segmentation.
Segmentation doesn't exist in 64-bit mode... How does NaCl intend to
operate on newer AA-64 OSes? How does NaCl intend to implement segmentation
on ARM? Isn't NaCL locking itself into 16/32-bit x86 hardware dependent
functionality (segmentation) - which will need to be emulated via software
for other platforms?
> >> Additionally, alignment restrictions prevent you
> >> from doing mov bx, 0xcd80; jmp $-2. I suspect this will be clearer if
you
> >> read the paper.
Uh, no offense, but that .pdf paper was virtually devoid of any useful
explanation of what NaCl actually does... Unfortunately, the list of found
errors was more informative.
Rod Pemberton
|
|
0
|
|
|
|
Reply
|
Rod
|
10/17/2009 10:43:37 AM
|
|
"Rod Pemberton" <do_not_have@nohavenot.cmm> wrote:
Thank you for looking a little deeper before responding.
>
> "Tavis Ormandy" <taviso@MUNGED.microcosmotalk.com> wrote in message
> news:4ad8b6fe$0$4940$9a6e19ea@unlimited.newshosting.com...
> >
> > A: You will try to execute int 0x80. R: This will fail at load time, the
> > int instruction is not permitted.
>
> Is there an undocumented requirement for special compiler
> options, e.g., gcc's -nostdlib?
No, all support code is also validated.
> Other comments on "salt":
>
> 1) From what I gathered, NaCl uses x86 fault mechanisms and segmentation
> which are dependent on cpu mode. I.e., it'll work for 32-bit, maybe
> 16-bit, but probably won't work on 64-bit.
Yep, correct.
> 2) The video claims future support for other platforms such as ARM. This
> will require an x86 cpu emulator for those platforms. In which case, the
> claimed advantages for native x86 (speed, instruction access, etc.) are
> moot. An entirely different design will be needed for those platforms.
> This make NaCl only useful on x86.
Correct, just as x86 binaries are only useful on x86, but hopefully you
agree that x86 binaries are useful (or why are you subscribed to this group?
:-))
> 3) Given that one has to implement an alternate sequence for "ret"
> (mentioned below...). Why isn't NaCl using dynamic binary translation?
Speed.
> Why should the user have to rewrite x86 assembly compiled by some C
> compiler to delete and replace a "ret" instruction with another sequence
> to produce code which is "safe" according to NaCl's static checks?
A user doesnt, this is handled by the toolchain.
> Haven't you guys studied the
> operation of Dec's FX!32? Papers are available on Citeseer.
In fact, I maintained the Linux/Alpha port for several years.
> Also, having
> the user replace "ret" is likely to introduce offset errors if the "ret"
> is replaced by a sequence of larger size in bytes.
Of course you're not expected to do this manually, the toolchain does it.
>
> > - There is no 'outside the validated code', you need to refresh your
> > understanding of segmentation.
>
> Even so, you are executing x86 directly on an x86 cpu, i.e., natively. If
> the call stack is manipulatable and that incorrect information is used for
> generating control flow information such as a return address either
> "inside" or "outside validated code," then the application can restart
> execution somewhere other than where intended. Look up ret2libc attacks
> on Wikipedia.
I work in information security, it's unlikely you're familiar with an attack
that I don't have to study every day. I'm happy to explain how this works,
but please don't assume I'm clueless and just not familiar with an attack
you've read about.
Please take my word for it, you're thinking at too high a level - there is
no need to attack the program, because you wrote it - just rewrite it to do
what you want. The program is assumed to be hostile and cannot be trusted,
it has to prove that it's safe.
> > - ret is not permitted, you must use an equivalent sequence that proves
> > safety.
>
> This is inline with my comments not to allow any x86 instructions which
> affect eip non-sequentially.
And it is also inline with how i described that control flow has to be
PROVEN safe. GUARANTEED to be to a validated instruction - there is no way
to "overwrite" it, the machnism must be able to GUARANTEE that it can only
be to validated code.
> > Again, you do not understand segmentation.
>
> Segmentation doesn't exist in 64-bit mode...
Correct.
--
-------------------------------------
taviso@sdf.lonestar.org | finger me for my pgp key.
-------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Tavis
|
10/17/2009 12:31:13 PM
|
|
Tavis Ormandy wrote:
> A: You will try to 'hide' an occurrence of cd 80 inside another instruction
> and jmp to it indirectly:
> B: This will fail at runtime time, there are alignment restrictions, so you
> can only jmp to a valid instruction boundary.
So how do you handle the case where that CD 80 is the immediate data
inside another instruction, and the CD 80 bytes happens to be 32-byte
aligned?
I guess you could insert NOPs in the code wherever this happens?
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
Terje
|
10/18/2009 7:25:07 AM
|
|
Terje Mathisen <Terje.Mathisen@MUNGED.microcosmotalk.com> wrote:
>
> Tavis Ormandy wrote:
> > A: You will try to 'hide' an occurrence of cd 80 inside another
> > instruction and jmp to it indirectly: B: This will fail at runtime time,
> > there are alignment restrictions, so you can only jmp to a valid
> > instruction boundary.
>
> So how do you handle the case where that CD 80 is the immediate data
> inside another instruction, and the CD 80 bytes happens to be 32-byte
> aligned?
>
> I guess you could insert NOPs in the code wherever this happens?
>
Yep, you got it :-)
--
-------------------------------------
taviso@sdf.lonestar.org | finger me for my pgp key.
-------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Tavis
|
10/18/2009 10:08:45 AM
|
|
"Tavis Ormandy" <taviso@MUNGED.microcosmotalk.com> wrote in message
news:4ad9b911$0$5680$9a6e19ea@unlimited.newshosting.com...
> "Rod Pemberton" <do_not_have@nohavenot.cmm> wrote:
> > "Tavis Ormandy" <taviso@MUNGED.microcosmotalk.com> wrote in message
> > news:4ad8b6fe$0$4940$9a6e19ea@unlimited.newshosting.com...
> > >
> > 3) Given that one has to implement an alternate sequence for "ret"
> > (mentioned below...). Why isn't NaCl using dynamic binary translation?
>
> Speed.
So, you're claiming that static checking for "ret" or worse an alternate
sequence for "ret" is actually faster? (cough) That's a bit hard for me to
believe.
> > Why should the user have to rewrite x86 assembly compiled by some C
> > compiler to delete and replace a "ret" instruction with another sequence
> > to produce code which is "safe" according to NaCl's static checks?
>
> A user doesnt, this is handled by the toolchain.
Ah, that's another hidden requirement: You must use our modified toolchain.
Like it or not, that's a really big deal. It really neutralizes the
adoption of NaCl by others. Who wants to install and use yet another f...
toolchain? We've already got our x86 toolchains.
Since one is effectively being forced to use NaCl toolchain to generate code
for NaCl, the *only* purpose of the static checking of x86 code by NaCl is
to confirm the x86 code was produced with the NaCl toolchain... It's code
from the NaCl toolchain: safe. It's code not from the NaCl toolchain:
not-safe. Given that, what's the point of actually statically checking x86?
All one needs to do is determine if the code came from the NaCl toolchain.
There are simpler ways to do that: public-key encryption, digital
steganography, heuristics detection like for adware, malware, virii... You
could add PKE to the customized code header. With proper selection of
useable instructions and binary steganography, code from the NaCl toolchain
could be easy to confirm by frequency analysis. Maybe try this one:
Hydan
http://www.crazyboy.com/hydan/
> There is
> no need to attack the program, because you wrote it - just rewrite it to
do
> what you want.
That's clueless... What if that is *exactly* what I want? What if I
believe your theories and implementation are flawed and desire to test them?
What if I believe it's acceptable to create anarchist, self-replicating,
self-rewriting applications that test all possible ways to bypass NaCl's
sandbox? You claim trust: "no need to attack" and " just rewrite it to do
what you want," but then disregard trust and embrace the opposite
immediately afterwards: "program is assumed to be hostile". Choose one.
> GUARANTEED to be to a validated instruction
That doesn't mean that there aren't valid instructions encoded into other
instructions or data. In x86, there is NO METHOD to determine the start of
an x86 instruction if you don't ALREADY know where it is. So, how can you
verify that the location of the instruction you are about to execute is
correct? (Can't.) Due to the density of the single byte x86 instructions,
every byte can be used to start an instruction sequence. So, who cares if
the instruction pointed to is "validated"? You can't determine if the
pointed to instruction is the CORRECT pointed to instruction - only that
it's a LEGAL pointed to instruction. It's a worthless claim - unless you
eliminate all instructions with offsets or data, and all two or three byte
instructions. In which case, dynamic binary translation is likely to be
faster since you no longer have any useful and quick x86 instructions left
to work with...
Let's take the control flow instructions, e.g., jcc, jmp, call, ret, iret,
that can be corrupted. The jcc instructions can be blocked from
modification if the code segment is non-writable. The jmp instruction, IF
it doesn't use a register value, can be blocked from modification with a
non-writable code segment. If it uses a register value, you're completely
out-of-luck with static analysis. There is NO METHOD to determine the value
of the register without dynamic analysis. The call, ret, and iret control
flow data is on a stack. A stack which must be writable - in order to place
the data on the stack in the first place. That data is then used...
> The program is assumed to be hostile and cannot be trusted,
> it has to prove that it's safe.
Given that logic... There is no need to not trust the program - since I
wrote and/or rewrote it to do exactly what I wanted it to do as you said I
should do previously, therfore it's proven.
> I'm happy to explain how this works,
> but please don't assume I'm clueless and just not familiar with an attack
> you've read about.
Ah, well, your presentation of your knowledge in these areas here is clearly
totally flawless. It's so flawless that no one here, like me, assumed that
you're totally incompetent in regards to x86.
BFN,
Rod Pemberton
|
|
0
|
|
|
|
Reply
|
Rod
|
10/18/2009 10:09:01 AM
|
|
Tavis Ormandy wrote:
> Terje Mathisen<Terje.Mathisen@MUNGED.microcosmotalk.com> wrote:
>
>>
>> Tavis Ormandy wrote:
>>> A: You will try to 'hide' an occurrence of cd 80 inside another
>>> instruction and jmp to it indirectly: B: This will fail at runtime time,
>>> there are alignment restrictions, so you can only jmp to a valid
>>> instruction boundary.
>>
>> So how do you handle the case where that CD 80 is the immediate data
>> inside another instruction, and the CD 80 bytes happens to be 32-byte
>> aligned?
>>
>> I guess you could insert NOPs in the code wherever this happens?
>>
>
> Yep, you got it :-)
>
OK, at this point you have a _very_ special tool chain, with no option
at all to using anything else to produce code for it.
After all, it isn't only OS calls that must be disallowed on any 32-byte
boundary: The same goes for _any_ kind of branch opcode embedded inside
another instruction, i.e. RET, CALL, IRET, JMP, Jcc etc.
This is very similar to the Burroughs (?) machine with a Trusted
Compiler which would sign all executables produced by it. Nothing else
was allowed by the OS to run.
Turned out to be less useful in the real world.
I'm afraid NaCl is similar: Nice project, fun research, but not really
useful. :-(
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
Terje
|
10/18/2009 1:23:32 PM
|
|
"Rod Pemberton" <do_not_have@nohavenot.cmm> wrote:
> So, you're claiming that static checking for "ret" or worse an alternate
> sequence for "ret" is actually faster? (cough) That's a bit hard for me
> to believe.
Yes, a single load time check is faster than dynamic binary translation.
> > > Why should the user have to rewrite x86 assembly compiled by some C
> > > compiler to delete and replace a "ret" instruction with another
> > > sequence to produce code which is "safe" according to NaCl's static
> > > checks?
> >
> > A user doesnt, this is handled by the toolchain.
>
> Ah, that's another hidden requirement: You must use our modified
> toolchain.
No, you misunderstand. An assembler is provided that understands these
requirements and makes them easier to use, you can just do it manually if
you prefer.
> All one needs to do is determine if the code came from the NaCl toolchain.
Please, let's assume that we're both familiar with simple ideas like proof
carrying code. It is not appropriate for a project like this.
> > GUARANTEED to be to a validated instruction
>
> That doesn't mean that there aren't valid instructions encoded into other
> instructions or data.
Correct, but you cannot transfer control to them.
> In x86, there is NO METHOD to determine the start
> of an x86 instruction if you don't ALREADY know where it is.
> So, how can you verify that the location of the instruction you are about
> to execute is correct? (Can't.)
Actually, you can. I've tried to explain this previously, but we apply some
alignment restrictions to code, so we verify the following things at load
time:
- No instruction spans a 32byte boundary, that is at every 32byte boundary
within cs, there is the first byte of a validated instruction.
- You may _only_ transfer control flow to the follwing instruction OR a
32byte boundary, this is enforced by scanning for branch instructions.
If it's a direct branch, we just validate the target and allow it (i.e. it's
to an instruction boundary we've seen).
If it's indirect, it MUST be in a sequence like this:
and eax, MASK
jmp eax
Thus proving it is to a 32byte boundary. I've already mentioned this before.
Therefore, I've just done what you said is impossible.
<snip>
> > I'm happy to explain how this works, but please don't assume I'm
> > clueless and just not familiar with an attack you've read about.
>
> Ah, well, your presentation of your knowledge in these areas here is
> clearly totally flawless. It's so flawless that no one here, like me,
> assumed that you're totally incompetent in regards to x86.
>
Incompetent? ouch. My presentation may well be poor, I'm not a teacher or
writer. I'm dissapointed you're not willing to trust I'm familiar with x86
so that we can have a reasonable discussion, but as you wish.
--
-------------------------------------
taviso@sdf.lonestar.org | finger me for my pgp key.
-------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Tavis
|
10/18/2009 5:14:08 PM
|
|
"Terje Mathisen" <Terje.Mathisen@MUNGED.microcosmotalk.com> wrote in message
news:4adb16d4$0$5112$9a6e19ea@unlimited.newshosting.com...
>
> Tavis Ormandy wrote:
>> Terje Mathisen<Terje.Mathisen@MUNGED.microcosmotalk.com> wrote:
>>
>>>
>>> Tavis Ormandy wrote:
>>>> A: You will try to 'hide' an occurrence of cd 80 inside another
>>>> instruction and jmp to it indirectly: B: This will fail at runtime
>>>> time,
>>>> there are alignment restrictions, so you can only jmp to a valid
>>>> instruction boundary.
>>>
>>> So how do you handle the case where that CD 80 is the immediate data
>>> inside another instruction, and the CD 80 bytes happens to be 32-byte
>>> aligned?
>>>
>>> I guess you could insert NOPs in the code wherever this happens?
>>>
>>
>> Yep, you got it :-)
>>
> OK, at this point you have a _very_ special tool chain, with no option
> at all to using anything else to produce code for it.
>
> After all, it isn't only OS calls that must be disallowed on any 32-byte
> boundary: The same goes for _any_ kind of branch opcode embedded inside
> another instruction, i.e. RET, CALL, IRET, JMP, Jcc etc.
>
> This is very similar to the Burroughs (?) machine with a Trusted
> Compiler which would sign all executables produced by it. Nothing else
> was allowed by the OS to run.
>
> Turned out to be less useful in the real world.
>
> I'm afraid NaCl is similar: Nice project, fun research, but not really
> useful. :-(
>
yeah...
for my effort, I have ended up deciding on, for now, a mostly unmodified x86
ISA. the only real changes I am at present allowing are those that are
outside of what is accessible to userspace code (well, along with the
official absence of MMX, mostly me assuming that people/compilers are not
likely to use it, but a few "bits and pieces" of it exist, such as SSE2
instructions which just happen to also work with MMX registers, ...).
actually, in a way, it is sort of like DosBox vs QEMU...
DosBox does an imperfect, but sufficiently close, emulation of the HW and of
DOS, such that DOS apps (including Win3.11) will still work. at the same
time though, it allows sharing the filesystem with the host OS, which makes
copying files into/out-of the simulated DOS easier.
QEMU, OTOH, does a much better emulation, and can run things like Win2K or
XP, but is unable to (directly) share the filesystem (although, apparently
QEMU puts the simulated OS in a network which includes the host system, so
maybe it might be possible that someone could mount a network share or FTP
server or similar...).
so, using existing toolchains is possible in my approach, although there are
downsides:
as-is, for now, images will need to be PE/COFF (I "may" consider ELF if
needed, since on Linux it would mean having to set up a cross-compiler, or
my tools, to produce PE/COFF, but using PE/COFF and ELF in the same place
could result in wackiness...).
it is unlikely that C++ will be "integrated" with my project;
I suspect a C++ compiler may be beyond my abilities and resources;
however, virtual x86 will still allow C++, just it is likely that
integration will be poor (C-level).
the hard part will be making it not suck...
as for security, my strategy will revolve around "securing the borders"
(hopefully...), since whatever happens in the simulated CPU should not be
allowed to impact the host.
when things are more developed, a possible line of testing could be to throw
garbage at the interpreter and see if I can get it to crash, but sadly, it
will not likely hold together at this point...
as-is, beyond bugs and implementation holes, the interpreter is likely to be
unusably slow...
but, I have a strategy:
I will probably begin implementing trace-translation (where a "trace" will
be a sequence of instructions starting at a jump target, and likely ending
at either an indirect jump, or in a few other cases). if done well, this
could largely compensate for its overhead, since there should be close to a
1:1 mapping in many cases.
similar, traces may use "register caching", where simulated regs may be
mapped to native regs, but there is not necessarily a direct mapping (for
example, 'eax' could easily end up in 'r11d', for example...).
the trace-translator will probably not use my existing compiler logic, and
so would likely be a more specialized translator.
even without JIT, this could still help optimize the interpreter (since it
could help avoid decoding opcodes).
the major downside:
self-modifying code would either not behave well, or could cause a "severe"
performance impact (as in, thrashing the trace cache).
another possible issue is managing cache-space (for example, if I limited to
64k, 256k, or 1M decoded opcodes, and maybe 64k or 128k active traces).
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
BGB
|
10/18/2009 9:23:27 PM
|
|
Terje Mathisen <Terje.Mathisen@MUNGED.microcosmotalk.com> wrote:
> > > I guess you could insert NOPs in the code wherever this happens?
> >>
> >
> > Yep, you got it :-)
> >
> OK, at this point you have a _very_ special tool chain, with no option at
> all to using anything else to produce code for it.
There are options, you're not required to use a trusted toolchain, and you
can meet the constraints any way you like.
Nativeclient is really a solution to a different problem, I want to be able
to run applications you've written, but without having to make a trust
decision.
> After all, it isn't only OS calls that must be disallowed on any 32-byte
> boundary: The same goes for _any_ kind of branch opcode embedded inside
> another instruction, i.e. RET, CALL, IRET, JMP, Jcc etc.
Correct, although we ban any instruction, your way would work too.
> This is very similar to the Burroughs (?) machine with a Trusted Compiler
> which would sign all executables produced by it. Nothing else was allowed
> by the OS to run.
I think the real difference is that there are no toolchain restrictions -
you're free to implement the constraints any way you like.
> Turned out to be less useful in the real world.
>
> I'm afraid NaCl is similar: Nice project, fun research, but not really
> useful. :-(
>
Well, I hope you're wrong :-)
--
-------------------------------------
taviso@sdf.lonestar.org | finger me for my pgp key.
-------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Tavis
|
10/18/2009 9:25:42 PM
|
|
On Oct 18, 1:14=A0pm, Tavis Ormandy <tav...@MUNGED.microcosmotalk.com>
wrote:
>
> --
> -------------------------------------
> tav...@sdf.lonestar.org | finger me for my pgp key.
> -------------------------------------------------------
No offense intended, but I believe I will pass on the opportunity to
"finger" you. :)
Just wanted to say that SDF is an interesting organization. Have you
visited alt.folklore.computers much?
Nathan.
http://delicious.com/evenbit
http://www.frontiernet.net/~fys/faq/
|
|
0
|
|
|
|
Reply
|
Nathan
|
10/18/2009 11:14:43 PM
|
|
|
46 Replies
166 Views
(page loaded in 0.286 seconds)
Similiar Articles: simulate jump process - comp.soft-sys.matlabinput & output in assembly - comp.lang.asm.x86 je EndOfPrint ; Yes, then jump out of loop ... (Linux has a syscall to move a specific process into V86 ... or C++ ... HP-UX 11.11 overwriting core file(s) - How to prevent? - comp.sys ...thought: "Mini-x86"... - comp.lang.asm.x86... Layer Aleph: very restricted subset of x86 core ... you are describing can be simplified to 'how do you prevent ... Sub Pixel Accuracy From Non Integer Values - comp.soft-sys.matlab ...So I was thinking some sort of sub pixel accuracy conversion needs to occur. ... thought: "Mini-x86"... - comp.lang.asm.x86... the flags will be assumed to hold ... solaris 10 64 bit on hp a1540n ... ideas and thoughts - comp.unix ...thought: "Mini-x86"... - comp.lang.asm.x86... some recently: considering ideas ... in 16-bit PM, 32-bit PM, and 64-bit PM? This is probably something you haven't thought ... FAQ -- assembly-language/x86/general/part1 - comp.lang.asm.x86 ...> Assembly language needs a wiki, but I'd prefer an x86 one. Just thinking > out loud as I go through this FAQ.) > I *am* leaning toward a smaller "Welcome" style mini ... Floating point problem (again) - comp.lang.asm.x86... double. float operations are not faster than double on the x86 chip. The only exception I can think of ... and executing these loops on a variety of supercomputers, mini ... Safari: show URL when mouse moves over a link? - comp.sys.mac.apps ...I've been using Firefox for a while, but I thought I'd give ... like TF Cavalier - comp.fonts... the scrolling mini ... FAQ -- assembly-language/x86/general/part1 - comp.lang ... Too many arguments in call to 'shape' - comp.lang.fortran ...... dummy arguments and so on fixed up, and in this mini ... string lengths, while it uses C_SIZE_T (I think) to ... x86_64-w64-mingw32/4.6.0/ lto-wrapper.exe Target: x86 ... How to get a .lib from .dll - comp.lang.asm.x86... too many people saw URLs end in ".com" and wrongly thought ... and a "permanently in fog" London in a magical Mini car ... How to get a .lib from .dll - comp.lang.asm.x86 Hello ... Best Assembler for 8-bit Apples? - comp.sys.apple2.programmer ...Up until now, I've been using the Apple Monitor and Mini ... count leading zero - comp.lang.asm.x86 Best Assembler ... loops in BASIC - comp.sys.apple2.programmer I don't think ... how long for a rebuild? (IBM ServeRAID) - comp.periphs.scsi ...... are all new, and I pressed CTRL + i to launch the mini ... You can configure ... I m trying to get Solaris 10 x86 ... ... comp.lang.fortran From the comments of all of you, I think ... Moveable containers on a web page like iGoogle? - comp.lang.java ...... portletfunctionality like security, but I don't think there is much boilerplate code for making quick widgets/mini ... input & output in assembly - comp.lang.asm.x86... as ... if A&&B else optimization and profile question - comp.soft-sys ...... if A && B do else do another thing I am thinking of ... encountering the follwoing problem now (which is a mini ... fpu code optimisation request - comp.lang.asm.x86 ... NMEA ref.clock better than my ISP's timeserver? - comp.protocols ...Running ntp4.2.5p175-win-x86-bin configured with a ... I think much of that is a) ntp's > handling of rate ... Garmin GPS18X-LVC PC Mini-ITX ME6000 600MHz C3 NetBSD-5 ... MC: All pairs of factors of an integer - comp.sys.hp48Mini-challenge for RPL enthusiasts by Joe Horn It's ... Virgil, great job - I think you won. My program is over ... How to perform division ? - comp.lang.asm.x86 I do not ... Re: Link error, module machine type conflicts with target machine ...Re: thought: "Mini-x86"..... not implement the entire x86 ISA to be able to use the compiler output).... from my interpreter effort that eflags is a hassle. Running z/OS on x86: Should IBM release a mini-mainframe?The idea is to create "mini-mainframes," or z/OS running on x86, aimed at a departmental or office ... What did you think of this feature? Write to SearchDataCenter.com's ... 7/17/2012 9:16:46 PM
|