Demonstration of Continuation based C
on GCC

Shinji KONO

University of the Ryukyus

Motivation of Continuation on GCC

   Architecture independent Assembler
Controlling Stack
Continuation
Programming Unit
Progress from Presented one in Continuation Festa 2008

Idea

    Eliminate function call from C
goto a code segment with parameter

       __code f() { goto g() ; }
__code f(__code (*g)()) { goto g() ; }


Co-exists with normal C


    int main1(int n)
{
goto carg1(0,1,2,3,4,
exit___code=__return,
exit_env=__environment);
return n;
}
goto (*exit1)(0,env);

Return to the caller of main1 (return-continuation)

Good point of code segment

 
Stateless and depends on Input and output only
if we have no global variable
functional in some sense
but it is not a function
no return
splitting code segment is easy
C compatible
any function call in code segment

2 Implementations

Micro C base

    full implementation of C
MIPS, PPC, IA32, Intel64, ARM
GCC base Implementation

    gcc version 4.4,4.5,4.6 http://hg.sourceforge.jp/view/cbc/GCC/

Implementation detail

Force fixed stack size
 
Arguments copy

    to ensure TCE
removed by optimization
Fast call option

    non standard ABI of arguments
conventional one is very slow (especially in i386)
so gcc does inter-functional optimization
less arguments on stack

Implementation of return-continuation

        __return generates something like...

        ({
__label__ _cbc_exit0;
int retval;
void _cbc_internal_return(int retval_, void *_envp){
retval = retval_;
goto _cbc_exit0;
}
if (0) {
_cbc_exit0:
return retval;
}
_cbc_internal_return;
});

Applications

 
Simulation and Model checking

    Scheduling code segment
Enumerate all possible execution
Kernel API description

    read write suspension select socket communication
Game Program

grep

grep

 
describing state machine
    compiling CbC from regular expression
quite fast (My student said it is the world fastest grep)

__code state_2_3_9_13_23_24_25(char* s) {
        switch(*s++) {
case 'y': goto state_14_15(s); break;
case 'H': goto state_10_11(s); break;
case 'e': goto state_4_5(s); break;
case '\0': goto accept(); break;
default: goto reject();
}
}

It works as I expected

 
Happy?

    So far so good, but easy implementation has its own...

Problems

    quite difficult to write
definitely not for human
gcc language issue
recursive type
fast call
code quality
undebuggable (no stack trace)
type conflict on reconnection

GCC type system

    no recursive type on C
some arguments have to be omitted

    __code
a4(int i,__code conv())
{
goto (*conv)(i+1,a6);
}
__code
a6(int i,int j,int k,__code conv())
{
goto conv(i+1,j,k,a8);
}

Argument and prototype

    We have automatic prototype generator, but
we have to write a lot of function types

    __code (*conv)(int,__code (*)());
__code a2(int i,__code conv());
__code a3(int i,__code conv());
__code a4(int i,__code conv());
__code a5(int i,__code conv());
__code a6();
__code a7();
__code a8();
__code a9();

My compiler is faster than gcc

 
What are you doing, GCC?
copy remains
bad argument ABI
some architecture prohibit TCE (Sparc, PowerPC)
Bug on indirect jump

My compiler is faster than gcc (mc code)


    ## 1702 ## 15:  result *= n;
movq %r15,%rsi
imull %esi,%r14d
## 1703 ## 16: n--;
addl $-1,%r15d
## 1704 ## 17: goto factorial(n,result,orig,print,exit1,exit1env);
jmp _factorial
....
## 1700 ## 13: goto (*print)(n,result,orig,print,exit1,exit1env);
movq %r12,%rdi
jmp *%rdi

My compiler is faster than gcc (gcc code)


    L12:    imull   %edi, %eax
testl %esi, %esi
.p2align 4,,6
je L5
movl %esi, %edi
L11: leal -1(%rdi), %esi
cmpl $-1, %esi
jne L12
...
L5: movl %eax, %esi
movq %r10, %rcx
xorl %edi, %edi
xorl %eax, %eax
addq $24, %rsp
LCFI7:
jmp *%r10

Implementation difficulty

 
curses on gcc implementer

        drastic change on each version  hg graph 
Gimple, SAA, RTL
buggy (4.4, 4.5)
not enough information on non optimization mode -O2 is necessary
gcc easyly give up on TCE (Tail call Elimination)
They hate TCE?
Fastcall is eliminated on 4.6!

Return-continuation difficulty

    handling environment
thread safe
trampoline
some architecture does not support
gcc dependent
setjmp?
slow but acceptable?

undebuggable

    no stack trace
inserting debug segment
It is not C++

    industry people said so
CbC++?

impossibility of reconnection

    type miss match
fix argument type on applications


New direction

 
data segment
new syntax
new implementation

Data segment

    Dual of code segment
code segment accepts data segments only
no type conflict on reconnection
Describing memory allocation
Data segment is typed

Pipeline Execution

    Pipeline 

new syntax

    D
go
completely new something
typedefrec on C
or no syntax

    code data structure only

new implementation

    LLVM
hand made JIT
gcc
Java

Thank you

Come to JSSST 2011 on Okinawa! 9/26-9/29


CbC libc/crt0

    continuation base system call
who create it?
should be gcc built-in

    no so popular
TCE forcing may be useful

Simulation and Model checking

 
code segment scheduler
state enumerator, state memory database

Simulation and Model checking

 
First, make single process code segments.


Simulation and Model checking

 
Second, put scheduler among them.


Simulation and Model checking

 
Enumerate all states and keep it.


Game Program

 

Shooting Game on PS2

Code Segment as movement and collision

Intended to parallel execution in PS3

We have task manager in PS3 Cell Many Core, but
it is not written in CbC, but in C++.

Programming Unit

    Smaller programming unit than function
Larger programming unit than statement
As Implementation language
no more machine code generation

Co-exists with normal C

 
If you want to back here, use normal function.


int main( int ac, char *av[])
{
    int n;
n = main1(123);
printf("#0089:321=%d\n",n);
return n;
}

Translation from C

    We know how to do, but
c2cbc is not yet written
Translation results is rather large
We cannot write large code segments by hands

Concurrent Execution

    code segment is atomic
no other way
if code segments are executed in parallel,
atomicity should be maintained by code itself
do it in manually
no hidden type less stack