- How do you recommend getting started, to write my first high performance runtime system using proto-runtime?
Good question. A high performance runtime system, that inter-operates with the runtimes of other languages together within the same application, is a complicated thing by its very nature. The proto-runtime modularization is subject to this inherent complexity. I have done what I can to make things clean and give guidance, but it's still a complex undertaking. The good news is that despite the complexity, it only takes a few days of learning curve to accomplish it! Those will be an intense few days, to wrap your head around the concepts, but in the end, the volume is small.
The first thing to do is read the TACO paper about proto-runtime. It gives the concepts and code fragments for each part of implementing your own language's behavior.
The second thing to do is copy one of the existing language runtimes that is close to what you want to write. Get it working on the provided test case, then modify it piece by piece, testing as you go. The best way to modify is simply leave the original in place, and add new constructs. Copy the wrapper library and request handler of an existing construct, then modify the code, line by line, into what you want. The comments should help explain things as you go. The paper should provide guidance on the code as well.
To make things extra simple, start by adding some simple constructs, such as mutex acquire and release. This will simply get your feet wet, raising questions about various aspects of the system, which will be easier to learn about when implementing something simple.
Next, try breaking your runtime in specific places, like resume a VP twice, or forget to resume it, or use the wrong logic inside the request handler. Connect your language's shared library to the sequential debug version of the proto-runtime instance library. This gives you repeatable scheduling, which you can single-step through and get repeatable behavior. Also, check out the wiki page on debugging.
With that under your belt, it should be much more straight forward to implement the functionality of your own constructs.
- Why do I have to read that long, complicated paper? Why can't I just read a simple man page and start implementing the parallel behavior?
A runtime system is complicated by its fundamental nature. This learning curve is a tradeoff.. it does, indeed, require some head-smoke to wrap one's thinking around this stuff.. but if you can handle it, then it cuts months of effort down to days, and provides services and portability. Not as simple as you expect or demand, but try implementing one of these high performance runtimes from scratch, then come back and read the paper again.. won't seem so complicated after all!
- Why do you recommend copying an existing language runtime and modifying it? I want to just roll my own, get my hands dirty, figure it all out down to the bare metal!
Nice initiative, I admire that desire. I'm the same way, actually. These FAQs might be a bit of help, but you will experience an exceptionally complicated thing, that will leave you feeling bewildered and frustrated and hating me for not making it easier for you, and not providing you detailed explanations of each aspect of the workings of the system. I've done what I can to help people who are willing to follow the suggested path of copying and incrementally modifying. Unfortunately, I have too few resources to make it easy for the ambitious who want to do things their own way.
- where is the list of all API calls?
- Why can't I just do things that make sense?
It turns out that the internals of any high performance runtime system are very low level and complex (simplifying loses performance and portability and interoperability). Some patterns have been put into the decomposition, into modules, in order to allow plugins to be possible. It is not immediately obvious why a given pattern within the code has to be followed, or even clear that there is a pattern underlying the way the code has been organized! At first glance, many patterns don't make sense, seem clunky, or look stupid. "Why can't I just do what looks simpler and straight forward?" Unfortunately, it's because the complexity makes the implementation full of traps for those who try to do things themselves. Which is unfortunate, but inescapable given the desire to be modular, and to have multiple languages interoperate, all while being high performance.
As one example of such a pattern, the language has to provide a "resume" function. BUT, other request handlers cannot call this! Instead, it has to be registered with proto-runtime, and then when a request handler wants to resume a VP, it calls the proto-runtime primitive. The PR primitive in turn calls the registered "resume" function.
Why? Because internally, multiple runtimes from multiple languages share the same proto-runtime, and they each have their own Assigner function. And, proto-runtime offers the feature of creating an Assigner overlord, which is used instead of the individual language assigners. The individual languages don't know whether the overlord is there or not. Only proto-runtime internally knows this. It switches out the language supplied resume functions and replaces them with an overlord resume.
There are many "hidden" complexities like this, and most implementers lack the patience to learn them all before they start writing code! It's also a lot of work explaining them all, and keeping the explanations up to date. Therefore, the easiest way to get started is to copy an existing language that is close to the one you want, then start adding new constructs, by copying wrapper libraries and request handlers for existing constructs, then modifying them. This way, you start with a working system, and incrementally modify it, checking each time that it continues to work. The issue forcing this style of development is the underlying, fundamental, complexity of high performance parallel runtime systems inter-operating.. it's not a matter of design choices. At some point, tools can be provided that automate such patterns, and simplify development.
- Is there some special order to things in the code? In some places I see things that seem like they have to be done in a particular order. Will I break things if I get the order wrong?
Good question! I have no good answer. It's best to copy an existing language that is close to the one you want to create, get it working on the provided test cases, then modify it piece by piece, testing as you go. This will tell you if there was some subtle pattern in the code that you've inadvertently changed.
- For the data structures I create for languages, are there required fields in them that PR expects to be there? For example, in my langEnv or my request structure?
In general, the language defined data structures are free.. however, there may be some subtleties, for example, the request handlers have a fixed signature, and the code has to accept a void * then cast it to the language specific type. See "Is there some special order to things in the code?"
- What is langData?
It is a hook, by which a language can attach things to a VP. Not all languages need this. Assume yours doesn't, at first, then it will become apparent if you hit some functionality that does need it.
- what is dataRetFromReq?
Sometimes a request handler needs to send data back to the wrapper library. It does so by placing it into this field of the VP data struct.
- What are all those functions I have to register in the init?
The proto-runtime instance (AKA the core controller) needs to perform operations that involve language details, such as creating a langData (see "What is langData?"), which contains language-specific data, inside a VP data struct, or making a VP or task ready to be assigned. These are things that the proto-runtime internals are responsible for, but need help on exact details from the language.. so, the language implementor registers "helper" functions that proto-runtime invokes as it needs them.
- Why can't I call my language's own "resume" handler?
See "Where should I call my language's resume function?" and "Why can't I just do things that make sense?"
- Where should I call my language's resume function?
Nowhere, ever. Instead, register it in the init. Then, call the PR_PI__make_slave_ready -- it will in turn call the resume function (except if an override assigner has been defined for the application). To see why, see "Why can't I just do things that make sense?" for the reason.
- What is the distinction between PR_WL__ and PR_PI__ functions?
WL is for the wrapper library PI is for the request handlers and assigner (paper defines req hdlrs plus assigner as the plugin)
The "PI" functions will break things if used in the wrapper library! and vice versa. It's a matter of synchronization and protection from shared access.. The WL version is created to run in the application VP context, and either never operates on runtime internal shared data, or else it includes its own separate protection. The PI version assumes that it is running inside the proto-runtime context, and that the instance (core controller) is already providing protection of shared internal runtime data against data races.
- Why can't I use a "PR_PI__" function in a wrapper library or application code?
Because wrapper library code and application code run outside the proto-runtime context. PR_PI__ functions assume that the proto-runtime context is currently in the CPU, and that the proto-runtime instance is providing protection of shared internal runtime data. See "What is the distinction between WL and PI?"
- Why can't I call a top-level "PR_Main__" function from inside the seed VP or other application code that runs inside a VP?
The PR_Main__ functions use native OS constructs, such as pthreads mutexes, to interface to the proto-runtime system. They are written to be run inside the OS's threads, from outside proto-runtime. So, calling them from inside the seed VP or an application VP breaks things! For example "PR_Main__wait_for_activity_to_end" uses pthreads constructs to suspend the "main()" OS thread, and expects to be signalled by the proto-runtime system when the system shuts down. If it's used inside the seed VP, it breaks! The seedVP has to end in order for proto-runtime to shutdown, in order to send the pthread notify() that unblocks the "PR_Main__wait.." function.
- Why do I have to call endVP or endTask?
The runtime needs to be informed when a VP or task completes.. it performs internal bookkeeping. If these aren't called, the runtime never knows that they have ended, and the bookkeeping is messed up and things break.
- At the end of my seed function, should I use PR__end_seed_VP or PR__end_process_from_inside?
Proto-runtime instance has a way of detecting when there is no more work possible. It counts the live VPs and live tasks.. so, it is acceptable to end the seed VP before all the work of the process is complete.. proto-runtime will not end the process until all possibility of work has ended. Or, if it's certain that all work is done, then can use end_process to force the process to shutdown. In fact, end_process can be called from application VP, anywhere in the code, and it will force shutdown -- however it will not interrupt a long-running work unit that already started.. it simply will cancel all pending work, and shutdown as soon as work currently in progress ends.