WebTP Application Programming Interface(API)
1. Overview
This section gives an overview of the transport services provided by WebTP with special emphasis on the semantics of reliability, duplicates detection, and in-order delivery.
Connections & Pipes
Before transferring any data using WebTP, an application must first open a connection to the remote application. But unlike a connection in TCP, a connection in WebTP is a lightweight object whose only purpose is to provide the application with a handle to the underlying transparent communication channel. The communication channel between two IP hosts is called a pipe, which is an abstraction of the network path between the two hosts. From the point of view of the application programmer, only connections are visible. Pipes are internal to the implementations of WebTP. The reason for introducing the concept of connections and pipes is to partition the transport protocol into application-support functions and network control functions. Application-support functions such as automatic retransmission is implemented at the connection level; network control functions such as congestion control is implemented. As a result of this separation, different connections sharing the same network path between two end-hosts can now operate coopertively to adapt to congestion and probe for available bandwidth, instead of competing with each other. The major advantage for this approach of separting the transport into two layers is to put application into firm control of how to best use the bandwidth, while integrating and reusing congestion control across connections.
Reliability
WebTP is aware of the boundaries of ADUs and it supports ADU-level reliability. The reliability requirements for ADU with a single connection can be specified individually. When the application makes a request for transmission of ADUs, the sender-side application indicates whether each ADU is reliable or unreliable. A reliable ADU is guaranteed to be delivered from the sender to the receiver without an error. This necessarily involves retransmission of lost packets for the ADU. Lost packets for an unreliable ADU are not retransmitted. At the receiver side, if all packets of an unreliable ADU are received correctly, they are assembled into the ADU and delivered. If the receiver decides that some packets of the unreliable ADU are lost, the complete ADU is dropped and the transport notifies the application that some unreliable ADU has been dropped. Since a single packet loss triggers the dropping of the entire unreliable ADU, the application should be aware of this fact and tries to size the ADUs so that each will fit into no more than a few transport packets.
Duplicate Detection
There are two types of WebTP connections (or correspondingly sockets): WebTP and WebTPFast. WebTP guarantees that no more than one copy of each ADU is delivered to a receiving socket of the regular type. This is currently achieved by doing a 3-way handshake when a pipe is first opened. Although WebTPFast sockets do provide such guarantees, WebTP still tries to detect and minimize the number of possible duplicates delivered to the application.
Sequencing
WebTP does not provide transport support for sequencing the ADUs in the order they are sent at the sender-side application. When sequencing is necessary, it can be done at the application level with the help of library functions and application level framing.
Priority
WebTP supports two priority levels at the granularity of an ADU: normal and high. A high priority ADU is given scheduling priority at the sender, and is delivered to the application without delay at the receiver side. The details are left to the implementation. However, the particular WebTP implemenation should make sure that an application which always sends high priority ADUs does not starve other applications sending normal-priority ADUs. Applications should use the high priority level sparingly.
2. Connection Management
The WebTP application programming interface consists of three components: connection management, bandwidth management, and ADU management. Connection management deals with the setup and teardown of application-to-application communication channels. Bandwidth management deals with the sharing of bandwidth among the connections sharing the same network level pipe. ADU management deals with the sending and receiving of ADUs. Because pipes are internal to WebTP implementation, it does not show up in the API.
When an application wants to communicate with remote hosts, it first opens a socket with the UNIX-style socket() and bind() calls [BSD]. Each socket is in direct correspondence with a WebTP connection.
NAME
socket - create an endpoint for communication
SYNOPSIS
int
socket(int domain, int type, int protocol)
DESCRIPTION
Socket() creates an endpoint for communication and returns a descriptor.
domain parameter:
PF_INET (ARPA Internet Protocols)
type paramenter:
SOCK_ADU
SOCK_ADU_NO_DUP
A SOCK_ADU (aka WebTP) type socket provides unsequenced ADU-based communications. ADU can be of arbitrary large size; WebTP will fragment an ADU into smaller packets before transmission if necessary. An ADU is either reliable or unreliable. If a packet comprising a reliable ADU is lost during transmission, it will be retransmitted automatically. A lost packet that belongs to an unreliable ADU will not be retransmitted automatically. Instead, the sender application(and possibly the receiver appicaton) will be notified of such event. Regardless of the reliability option, the order in which ADUs are received is not guaranteed to be the same as the order in which they were sent.
A SOCK_ADU_NO_DUP (aka WebTPFast) type socket provides the same service as a SOCK_ADU type socket except that the transport layer guarantees that an ADU is delivered no more than once. For reliable ADUs, this implies an ADU is delivered exactly once. WebTP currently guarantees SOCK_ADU_NO_DUP by doing a 3-way handshake.
protocol parameter:
webtp 100
RETURN VALUES
A -1 is returned if an error occurs, otherwise the return value is a descriptor referencing the socket.
NAME
bind - bind a name (an address) to a socket
SYNOPSIS
int
bind(int s, const struct sockaddr *name, int namelen)
DESCRIPTION
Bind() assigns a name to an unnamed socket. When a socket is created with socket(2) it exists in a name space (address family) but has no name assigned. Bind() requests that name be assigned to the socket.
RETURN VALUES
If the bind is successful, a 0 value is returned. A return value of -1 indicates an error, which is further specified in the global errno.
NAME
connect - initiate a connection to the socket
SYNOPSIS
int
connect(int s, const struct sockaddr *name, int namelen)
DESCRIPTION
s parameter: a valid socket returned by calling socket()
name parameter: the socket address of the server
namelen parameter: size of sockaddr in number of bytes
RETURN VALUES
A 0 return value indicates success; -1 indicates an error.
NAME
listen - listen for incoming connections
SYNOPSIS
int
listen(int s, int backlog);
DESCRIPTION
To accept connections, an application first calls socket() to get a listening socket, then calls listen() to specify the willingness to accept incoming connection.
NAME
accept - accept a conneciton on a socket
SYNOPSIS
int
accept(int s, struct sockaddr *addr, int *addrlen)
DESCRIPTION
s is a socket that has been prepared by calling socket(), bind(), and listen().
addr is the address of client
addrlen is the length of the data structure addr
RETURN VALUES
The call returns -1 on error. If it succeeds, it returns a non-negative integer that is a descriptor for the accepted socket.
NAME
close - closes a connection.
SYNOPSIS
int
close(int d)
DESCRIPTION
The close() call deletes a descriptor from the per-process object reference table. If this is the last reference to the underlying object, the object will be deactivated. On the last closeof a socket, associated naming information and queued data are discarded
3. Bandwidth Management
NAME
getsockopt, setsockopt - get and set options on sockets
SYNOPSIS
int
getsockopt(int s, int level, int optname, void *optval, int *optlen)
int
setsockopt(int s, int level, int optname, const void *optval, int optlen)
DESCRIPTION
Getsockopt() and setsockopt() manipulate the options associated with a socket. Options may exist at multiple protocol levels; they are always present at the uppermost ``socket'' level. When manipulating socket options the level at which the option resides and the name of the option must be specified. To manipulate options at the socket level, level is specified as SOL_SOCKET. To manipulate options at any other level the protocol number of the appropriate protocol controlling the option is supplied. For example, to indicate that an option is to be interpreted by the WebTP protocol, level should be set to the protocol number of WebTP, which is 100. For more information see getprotoent(3).
|
optname |
optval type |
optval |
|
0 |
char* |
Name of the traffic class. All WebTP outgoing traffic is scheduled according to a class name. Currently four classes are defined: "interactive", "bulk", "realtime_stream", and "buffered_stream". Class names can be heirarchical, such as "realtime_stream.elastic" or "realtime_stream.inelastic". The administrator of host defines different traffic classes and corresponding policies for bandwidth allocation. All available traffic class names are listed in /etc/webtp_classes. |
|
1 |
float* |
Rate in bits per second. The currently available rate can be found out by calling getsockopt(). If the application wants to specify a constant rate at which it wishes to send, it can call setsockopt(). If such a rate cannot be guaranteed by the scheduler, setsockopt() will fail. |
|
2 |
float* |
Round-trip time, measured in microseconds. |
RETURN VALUES
If the bind is successful, a 0 value is returned. A return value of -1 indicates an error, which is further specified in the global errno.
4. ADU Management
Blocks of data are sent through the socket interface via UNIX style send() and recv() calls.
NAME
send, recv - send or receive ADU
SYNOPSIS
int
send(int sockfd, const char *buff, int nbytes, int flags)
int
recv(int sockfd, char *buff, int nbytes, int flags)
DESCRIPTION
The parameter sockfd specifies the connection to operate on. Buff is a pointer to the ADU(s) prepended with headers to be sent in send(). During recv(), it points to an empty buffer to be used for storing incoming ADU(s) and their headers. Please refer to the section on ADU Framing for a detailed description of the format of ADU header. When sending, nbytes indicates the number of byptes of ADU contents including the prepended headers. When receiving, nbytes indicates the capacity of the buffer. The parameter Flags is not used at this moment, and should be set to 0.
[Note: Although the transport tries to respect the ADU boundaries, we believe that sending and receiving data across the socket interface should be in number of bytes, instead of in units of ADUs. The ADU framing is hidden in the stream of data (Please refer to the section on ADU Framing). First, the resulting APIs are have similar semantics as TCP and UDP, and resembles that of the Berkeley Packet Filter (BPF). Second, since the size of ADUs can be as large as 16MB, we have to allow partial ADUs to be passed across the interface boundary. ]
RETURN VALUES
These calls return the number of bytes sent or received if the operation is successful. Otherwise, -1 is returned, and a more detailed reason is available in the global variable errno.
ERRORS
|
[EAGAIN] |
send(): The transport layer cannot send the ADUs immediately (e.g. limited by congestion control). Try calling send() later. |
|
[ENOBUFS] |
The transport do not have enough buffer space to buffer the first ADU to be sent at this moment. Try sending smaller ADUs. |
ADU Framing
Complex ADU framing and naming should be left to the application as the name Application Level Framing (ALF) implies, possibly with the help of libaries for different styles of ADU frames. For example, a video or audio frame may look like RTP frames. However, there should be an agreement between the transport and the applications on the common parts of the ADU framing scheme. We propose a scheme in which four-byte ADU header is prepended to the ADU data. The first byte of the ADU header is the option field containing thee following information.
The last three bytes of the ADU header are interpreted as a number indicating the length of the ADU in number of bytes.
+--------+--------+--------+--------+ |SERU0000| ADU Length | +--------+--------+--------+--------+ | ADU Data | +--------+--------+--------+--------+ | | | ... | | | +--------+--------+--------+--------+ |SER00000| ADU Length | +--------+--------+--------+--------+ | ADU Data | +--------+--------+--------+--------+ | | | ... | | | | |
|
ADU 1 Header |
ADU 1 Data |
ADU 2 Header |
ADU 2 Data |
ADU 3 Header |
ADU 3 Data |
The first ADU, the last ADU or both can be partial ADUs.
Rationale for ADU Sending and Receiving Interface
Our send() API takes the similar form as the send() system all in UNIX system.
send(int sockfd, char *buff, int nbytes, int flags)
In the case of sending data, our scheme allows the application to determine the appropriate number of bytes to send each time, based on a number of considerations. For instance, in the case of delay sensitive data the application can chose a small number of bytes to send. In other cases, the application can chose a large chunk of data to send in order to reduce the frequency of the send() system call. The application may also want to buffer most of its unsent data in its own address space for dynamic rendering. For instance, it can reorder ADUs; it can change the reliability requirement for an ADU; and it can drop outdated data. An appropriate scheme should balance three things. The transport has enough data to send and is not idling while the network capacity is available. The application keeps most unsent data for dynamical rendering. The number of system calls should be at a level that does not overburden the CPU. Note that the transport always sends the ADUs to the network in the order it receives them from the application.
Receiving data from the transport to the application takes the same form as send() with the recv() API. The difference from the sending case is that the transport tries to deliver data as quickly as possible, subject to the consideration of overhead from system calls.
Alternatives to API Design
The API design needs to be considered more carefully. There should be some flow control going on between the application and the transport. Or, maybe we should use popdata() upcalls for sending data. WebTP calls popdata()--a function implemented by the application--to retreive a few packets to be sent immediately whenver the scheduler decides to send packets from this connection. The advantage of this approach is that the transport layer takes care of flow control without adding any extra procedures to the interface. Popdata() could be passed in the maximum and minimum number of bytes that can be accepted by the transport. The minimum indicates the minimum number of bytes needed to keep the transport busy, and can be ignored by the application. The maximum is determined by the allocated buffer space at the transport.
Another approach is to encourage applications to use asynchronous I/O. After openning a socket, the application can call fcntl() to set the socket to be asynchronous. A user-level signal handler is used to handle any SIGIO signals raised by WebTP when an ADU can be sent or received. This approach is attractive because it can be implemented on top of our current synchronous interface without introducing any new procedures to the interface. It also conforms to the standard asynchronous I/O interface of UNIX and so is more readily understood by UNIX programmers.