[stci] Re: Architecture

  • From: Richard Graham <rlgraham@xxxxxxxx>
  • To: stci@xxxxxxxxxxxxx
  • Date: Wed, 26 Sep 2007 11:09:49 -0400

Totally agree


On 9/26/07 10:46 AM, "Greg Watson" <g.watson@xxxxxxxxxxxx> wrote:

> I think infrastructure control and user data streams should be
> logically distinct in the design. The former exists for the life of
> the infrastructure, while the latter exist only for the duration of a
> session. However, the design should allow an implementation to
> provide these logical channels in the most optimum manner for the
> platform, which may include shared resources. Does this address both
> Steve and Graham's issues?
> 
> Greg
> 
> On Sep 26, 2007, at 10:18 AM, Richard Graham wrote:
> 
>> I am not sure that we need a separate an persistent control
>> network.  What
>> we do need is a way to satisfy QoS requirements.  This may mean
>> that we need
>> to use two separate sets of h/w and s/w resources, but there are
>> also cases
>> where this is not possible.  What we need is a design that will
>> accommodate
>> either case, w/o the user having to know or worry about such
>> details.  There
>> are already well defined protocols for distinct data flows using
>> the same
>> set of resources, and we should feel free to use those ideas.  For
>> example,
>> with tcp and MPI (these are not apples, I know) communications,
>> distinct
>> flows can share the same resources, and the user rarely ever even
>> things of
>> this.
>> 
>> Rich
>> 
>> 
>> On 9/26/07 9:54 AM, "Steve Cooper" <coopers@xxxxxxxxxx> wrote:
>> 
>>>> 
>>>>> 4. There is a separate control network for communication with
>>>>> infrastructure components.
>>>> 
>>>> Yes, I agree.  The stci components would have to have a way to
>>>> communicate with each other out-of-band.
>>>> 
>>>> I think this makes sense, I'll combine the diagrams.
>>> 
>>> This notion of a separate (and persistent) control network was
>>> what I was
>>> getting at in my email on September 19th. See below.
>>> 
>>> Steve
>>> 
>>> 
>>> 
>>> The STCI is a meta-tool whose purpose is to facilitate the
>>> deployment of,
>>> and utilization by, other tools.
>>> 
>>> If we think of the STCI as a tool then  the most effective way to
>>> deploy
>>> the STCI tool would be to use the STCI; however, you can't use the
>>> STCI
>>> until its deployed. We have a classic chicken and egg problem.
>>> 
>>> The first state in the Session Lifecycle diagram says 'Stage agent,
>>> plug-in, etc..'. Doesn't this assume a pre-existing communication
>>> infrastructure? We need to specify what this pre-existing
>>> infrastructure
>>> looks like, when it is deployed and how tool specific utilization
>>> takes
>>> place.
>>> 
>>> The STCI needs to have the following characteristics to fulfill
>>> its role as
>>> the tool to deploy and configure the external tools.
>>>       One instance sharable by all tools
>>>       Facilitates the scalable deployment of other tools
>>>          A 'dedicated' stream that can be utilized in configuring
>>> tool
>>>          specific infrastructure and components.
>>>       Scalable start-up
>>>          Hardcoded interconnect (topology) established during STCI
>>>          install/setup/configure.
>>>          Interconnects all nodes within the administration domain.
>>>       Supports infrastructure staging
>>>          Tool specific topology setup.
>>>          Tool specific components (agent, plug-ins, etc..).
>>>       Supports data staging.
>>>          Includes library and executables.
>>>          On-demand
>>>          Caching
>>>       Interconnect that is intrinsic to the STCI tool
>>>          Includes STCI plug-ins and agents to support efficient
>>> external
>>>          tool configuration.
>>>          Looks and operates a lot like a normal external tool
>>>          stream/session.
>>> 
>>> Does this make sense to you?
>>> 
>>> How does this relate to the STCI Deployment Lifecycle state diagram?
>>> 
>>> Install
>>>       Load all the STCI software onto the HPC head node.
>>>       Run the configuration scripts.
>>> Configure
>>>       Enumerate the nodes inside the administration domain to be
>>> included
>>>       in the STCI.
>>>       Establish a topology to be used to interconnect said nodes.
>>>       Identify the job scheduler.
>>>       Establish the session manager
>>>       Configure the nodes
>>>          Distribute the STCI software including STCI plug-ins and
>>> STCI
>>>          agents.
>>>          Distribute the configuration instructions. (who connects
>>> to whom)
>>>       Start-up the STCI.
>>> Enabled
>>>       Bootstrap the STCI tool interconnect.
>>>       Enable the session manager
>>>       Start listening for STCI requests from external tools
>>> Disabled
>>>       Disable the session manager
>>>       Tear down the STCI tool interconnect.
>>> 
>>> 
>>> Steve
>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
> 


Other related posts: