[gameprogrammer] Gamasutra Article Rough Draft (Long)

Gamasutra hired me to do a feature article for them, most likely to be 
published at the end of this month.  I've finished the first draft and 
wanted to get people's opinions on it.

It's pretty long but they have a 3000 word requirement.

Thanks in advance.

---------------------

Title: Designing secure, flexible, and high performance game network 
architectures

Multiplayer support in console games has traditionally been a secondary 
feature, appealing to the hardcore market and generating relatively few 
extra sales.  This has been changing as online gaming goes mainstream, 
especially since consoles began supporting broadband.  Many developers 
find themselves required to add multiplayer for the first time and are 
in the position of developing multiplayer systems from scratch.  The 
straightforward solution is to implement a reliable packet system and 
use this to encode and transmit game-level messages.  This is penny-wise 
and pound-foolish, requiring more time in the long run and limiting 
player and feature support.  A game-level network architecture can save 
time, improve flexibility, and can address many cheating issues that 
can't be solved easily elsewhere.  While every architecture is game 
specific, this article will present high level ideas that can be used in 
part or whole along with optimization ideas that can benefit existing 
systems.

Any networking architecture deals with three major issues: security, 
flexibility, and performance.  The first thing to consider is what 
topology to use, as this is probably the single most important issue to 
resolve.  Traditionally, games take one of two approaches: peer-to-peer 
or client-server.  In a peer-to-peer system data is stored on the client 
and authorized by a consensus system whereas with client-server the 
server is trusted and stores most to all game data.  Peer-to-peer avoids 
putting network load on a single system.  This can be advantageous for 
data that doesn't need to be routed through a server, such as directed 
chat messages.  Another situation where peer-to-peer is beneficial is 
when you don't have a single system that would be a suitable as a server 
- such as when all systems are consoles.  Peer to peer communications 
can provide better performance but a client-server system is more secure 
and flexible.

One architecture commonly used is a client-server object replication 
system.  In this system, specific objects and data members are 
transmitted on creation or changes.  This can happen explicitly, with a 
call to serialize and replicate, or implicitly, where data is polled 
every update cycle and compared to the value it had the last update 
cycle.  This is a good approach.  It is straightforward and easy to 
understand.  It fits into existing single player systems.  With 
implementations that replicate objects automatically, scripts can modify 
data without knowledge of the underlying system and still achieve 
network synchronization.  However, it is less efficient than a 
hand-tweaked system, sending data independent of context and often 
redundantly.  It is also less secure, creating a strong coupling between 
game-code and network activity.  As game-code changes are made, 
especially by programmers that aren't familiar with network security, 
vulnerabilities arise.  We're going to expand on this to address these 
problems efficiently.

The architecture suggestions described herein are intended to provide 
the following qualities:
1. Cheat-resistant.  When applicable, data and some game-code is 
maintained server side, optionally supplemented with peer-to-peer 
communications to avoid unnecessary server traffic.
2. Flexible and extensible.  Intentional abstraction achieves loose 
coupling of game code with network code.
3. Low bandwidth utilization.  Low level context related optimizations 
are exposed to higher level routines so scripters can define algorithms 
and rules by which to send data.
4. High performance.  Optimizations and features described should be 
fast, require low memory utilization, and be optional improvements to 
existing proven algorithms.

SECTION HEADER: Security and cheat-resistance

If you plan to sell more than a few thousand copies of your game then 
security should be considered just as important as performance.  The 
more popular your game is, the more people will try to hack it.  As soon 
as one hack is found it will spread through the ranks of would-be 
cheaters like wildfire with negative press soon to follow.  This article 
doesn't cover cheating in its entirety, which deserves a whole book of 
its own (Applied Cryptology by Bruce Schneier is excellent), but we can 
stop some cheats by making wise architectural decisions.

Some of the most common methods used by cheaters are listed in the 
following table:
<TABLE HERE>
Exploiting bugs or bad design - Game programming level - UO - PKing 
without gaining notoriety with firefields.
In-game network attacks - Game-level network architecture - 
Counterstrike - Intentionally causing lag to slow down the game and kill 
people when they can't move
Modified memory or EXEs - Both game-level network architecture and 
client security layer - America's Army - Able to see through teargas, 
bushes, through third party memory modifier.  Also sending invalid 
packets to crash the server.
Hacked packets - Network level security - Quake - Changing the direction 
field in a shoot packet as an aimbot
Falsified connections - Network level security - <Find example of this> 
   - Stealing passwords

Cheating can be addressed through implementation and topology.  Your low 
level networking implementation should include packet tampering 
detection, encryption, and falsified connections.  Executable and system 
level protection involves data encryption, obfuscation, and validation. 
   Game network-system issues will be covered throughout this article, 
beginning with network topologies.

Sensitive data on the server:

The traditional game topology is client-server, with most to all 
sensitive data kept server-side.  This works well in most cases.  I'm 
going to go one step further, and also suggest that game algorithms 
insofar as practical be kept server-side as well.  It's easiest to 
illustrate this with an example:

Typical server-side system:
Client has a Capture the Flag (CTF) script or game mode.
Client requests start of a CTF game.
Server sends flag location, player scores, player names, and starting teams.
The contents of client packets deal with players, flags, names, and teams.

Server-side system with data on the server, communicating through a 
generic command system.
Server sends a message "Show text 'Starting CTF' at <x,y>' for <n> seconds"
Server sends a message "Spawn object <flag with ID> at position 
<x1,y1,z1> with properties <animation, can be picked up, transfers to 
killer, reset to position at point x2,y2,z2>"
Client packets deal with abstract IDs, strings, and groups.

Abstract programming is much more difficult to design and implement than 
concrete programming.  In practice what you do will depend largely on 
the existing system, advance requirements knowledge, time, and reuse 
goals.  The intent one should strive for is to keep game-specific data 
out of the client.  The game code, on both server and client, should 
never assume the type of an object sent over the network as an ID.  Data 
must be validated upon reception, by verifying ID types, and bounding 
numeric and other data.  For example, if the server programmer thinks in 
advance an ID is a flag, it's easy to make a blind cast from an invalid 
ID and crash the server.  If the server programmer is only dealing in 
generics, such an assumption would be much less likely.

Often you don't want one client to know the IP address of another 
client.  Allowing players to know one another’s IP’s can lead to denial 
of service attacks and spoofed connections. However, in certain 
contexts, such as setting up chat servers or with small games, it can be 
reasonably safe for one client to know the IP of another client.  In 
that case it's worth considering supporting a peer-to-peer system to 
transmit non-sensitive data directly, rather than through the server. 
Even with a client-server architecture, in some cases it would still be 
necessary to perform server migration, for example a 4 player real time 
strategy game.

Sensitive data on the client:

When data is on the client, such as with a peer to peer topology, you 
have to rely on a trust metric based on certain assumptions:
1. Cheaters are likely to cheat right away, if for no other reason than 
to check if the cheat works.
2. Cheaters are likely to perform the same cheat many times in a row.  A 
speed hack will speed up your character every frame, rather than one frame.
3. A cheater's data is likely to be widely divergent from the data on 
the other systems - For example, gain 100 HP instead of gain 1 HP.
4. It's likely that only one or two systems will be doing the same cheat 
at the same time.

The process is straightforward.  A new peer/client is assigned an 
initial trust rating.  This value is continuously increased over time. 
Every time there is suspicious data from that client, subtract from the 
trust rating.  For a client-server based system, you are done.  If x <= 
0, or if x drops too quickly, kick that client.  With a peer to peer 
system, if x <= 0, or if x drops too quickly, then compare this value to 
that of other systems.  If a majority of systems all report similar 
changes to the trust rating, kick the offending system out of the group. 
  Otherwise set the trust rating back to some average of the systems. 
These values are all contextual and I have no specific numbers to 
recommend.  A good rule of thumb is to make unintentional latency cause 
decreases in the trust rating that are marginally less than what is 
actually considered a cheat.

SECTION HEADER: EFFICIENT MESSENGING
With cheating behind us, it's time to address messaging.  Broadly 
speaking, messaging is used for:

1. Object replication
2. Remote Method Invocation (RMI)
3. Data synchronization

Object replication involves sending a message when an object is created 
or destroyed by an authoritative system so that this same object is 
created or destroyed on other systems.  Data members of that object may 
be synchronized as well and inherit the update rules of the object. 
Invoking procedures on remote systems involves parsing commands and data 
out of messages, ensuring correct encoding, and mapping the command to a 
function on the remote machine with the data in the message passed as 
arguments to that function.  Data synchronization involves determining 
which memory address to update given a system independent identifier, 
ensuring that this memory address is valid to write to in the given 
context, and changing the value.  As this article is about architecture, 
I won't go into specific implementations but will instead explore 
improvements to these operations.

SUBSECTION HEADER: RELEVANCE BASED MESSENGING
Relevance based messaging is the practice of sending only relevant data 
to relevant systems.  In this case we are referring to object creation 
and destruction, and variable updates be they members of that object or 
globals.  Culling can be performed in both these cases.  The most 
typical culling method for objects is distance based, where objects will 
be created at range X, and destroyed at range X+Y from the target 
system's field of interest.  The most typical culling method for 
variables is to not send if the data is identical.  When those variables 
are class variables and the class is culled, then those variables can 
also be culled.

The reason more advanced culling methods are not usually used is that 
they are contextual and we don't know the context at the network system 
level.  The solution I propose is simple: provide the standard solutions 
as default behaviors, and expose the culling method or method properties 
to the higher level in the form of callbacks, overridable functions, a 
parameter list, or some other method.  Architecturally, this means that 
each system which broadcasts updates needs to know the prior value 
per-system for each synchronized object or variable.  If memory is a 
concern, we can limit saving prior values to only variables that change 
frequently.

SUBSECTION HEADER: ENCODING NETWORK COMMANDS, TEXT, AND DATA
<ITALICS>Encoding network commands</ITALICS>
Games often send the same command frequently.  A common solution is to 
use the first byte of a packet to represent a command identifier.  This 
works but we can shave off a few bits by creating a frequency table of 
network commands which indicate likely or previously recorded session 
averages.  For example:

Per system:
Movement: Called 84 times
Press trigger: Called 22 times
New player joined: Called 5 times.

We can use this frequency table as input for arithmetic or Huffman 
encoding, which will then give us an optimal bit-wise representation of 
each command.  Table generation can be performed beforehand so as to not 
slow down the game at runtime.  This will need to be done twice - once 
for server commands and once for client commands.

<ITALICS>Encoding text commands</ITALICS>
We may also apply this technique to commonly sent strings.  On 
connection, a server can send a string table to the client of all 
strings that are mapped to a unique identifier.  More complex 
implementations may encode special characters in the string for printf 
style format specifiers.  From that point on, one bit can be sent, 
indicating a string lookup (or not), followed by a string identifier. 
If desired, this can also be based off some optimal encoding method as 
was done with commands.

<ITALICS>Encoding data</ITALICS>
Memory and processing permitting, we may also wish to perform global 
compression.  The technique is the same.  Generate a frequency table 
based on previously recorded server output and client output sent from 
the parsing system.  In the case of messages sent reliable and ordered, 
adaptive run-time compression is also possible for data sent reliably 
and ordered.

A more useful and practical approach is to write data as bit streams 
that perform simple run-time contextual compression.  The approach used 
by the network API RakNet is to provide WriteCompressed and 
ReadCompressed calls that recursively write a <1> if the high half of 
the data is all 0's  Otherwise write a 0, then the rest of the data. 
The approach used by the Torque network library is to specify the min 
and max values of a data element by which the minimum number of bits 
will be used to write that data.  The first technique is safer and more 
flexible, while the second technique provides better compression.  If 
you decide it is worthwhile to use the second technique, this can be 
extended by a further parameter: granularity.  It's often the case that 
we don't care whether a position is 503.135224 or 503.  By specifying 
the bits of granularity, we can perform this reduction automatically.

In either case, the option to specify compression parameters should be 
exposed to the higher level so that scripters or implementers can take 
advantage of it.  In the case of specifying bits, it would be a very 
good idea to use a utility class to specify parameters and share this 
class between systems to reduce maintenance and the chance of mistakes.

SECTION HEADER: PACKET CONTENT

One way to reduce the number of client network commands as well as 
reduce the likelihood of cheating is to have the client send inputs, 
rather than the results of inputs.  This is straightforward and can make 
programming easier in some ways.

Send:
Movement commands
Use key command
Jump key command
Switch to weapon #5 command.

Don't send
"Pickup red team's flag"
"Kill player X"

The difference is that the set of actions has a one-to-many relationship 
with the set of input commands.  Once you implement the input commands 
you gain the actions for free, and actions are a moving target while 
commands tend to stay fixed.  This is easier to program.  From the 
context of cheating, it limits the amount of code you have to deal with 
and in the worst-case scenario the client won't be able to do anything 
they couldn't have done within the normal context of the game.  In 
practice it's not possible to achieve this level of abstraction and 
still have good game play because entropy will cause clients to get out 
of synch.

We can address entropy by including relevant data whenever we send a 
client command.  For example, by including position and orientation 
along with a movement command.  This layer should be exposed to 
scripters or gameplay programmers that can identify variables either 
directly or by some identifier.

While the server doesn't send input commands the same principle applies. 
  Exposing the command to let scripters include the associated variables 
can make optimization easier.

SECTION HEADER: Summary of data to maintain.

Throughout this article I have referred to applying properties to 
commands, variables, objects, systems, and strings.  It may be more 
clear in certain implementations to aggregate these properties into a 
single system, whereby you can define all these properties up-front. 
This table summaries the properties and extra data I've covered:

Per command:
- A unique identifier that can be used to find the command in a look-up 
table.
- A bitwise encoding representing this command.
- A list of variables and objects that need to be synchronized when this 
command is used.

Per variable:
- A unique networkable identifier that is two way mappable.  From the 
variable we should be able to get the identifier, and from the 
identifier we should be able to get the variable, given the context of 
the enclosing struct or class, if any.
- What properties or method to use to interpolate this variable, if it 
is interpolated at all.
- If this variable is written with compression, and if so the number of 
bits, min, and max values
- What culling function to call or properties to use, if any, to 
determine if a variable has changed enough to be worth sending.
- What object this variable is a member of, if any.  Used to determine 
if a send is not necessary because the object itself is culled.
- Which system(s) are allowed to update this variable.

Per object:
- A unique networkable identifier that is two way mappable.  From the 
object we should be able to get the identifier and from the identifier 
we should be able to get the object.
- What culling algorithm to call or properties to use, if any, to 
determine if an object is in the area of interest (to send object 
creation and updates).  Also what algorithm to process, if any, to 
determine if the object is in the area of disinterest (to send object 
destruction).
- The list of synchronized member variables.
- Which system(s) are allowed to create and destroy this class.

Per remote system, on a server or peer
- Which objects this system has instantiated
- The field of interest for this system (usually the camera position)
- The last values for the variables that were sent to this system.

Per encoded string:
- A unique identifier by which programmers or scripters can refer to 
this string.
- A bitwise encoding of this identifier that can be transmitted over the 
network
- Optionally, what fields each string expects to take, if format 
specification is used.

SECTION HEADER: Synchronizing non-critical data

Some data isn't critical but it would be nice if it was synchronized 
among systems.  For example, showing the same type of particle effects 
or playing the same animation frame at the same time.  Novice developers 
may try to synchronize this data with messages, which is wasteful.  A 
better and somewhat well-known method is to synchronize one or more 
integers which can be used as keyframe indices, random number generator 
seeds, or similar purposes.

SECTION HEADER: Low level network system requirements

It's important to point out that I didn't cover the low level network 
system that this system relies on.  At a minimum the low level network 
layer should support, in order of most important to least:

Reliable and unreliable messaging
Ordering and sequencing with multiple channel support
Secure connections with strong packet encryption and tampering detection
A wide range of statistical reports
Automatic message to packet aggreggation and splitting
Remote method invocation
Automatic flow control
Message priority level support
Native internet simulation capabilities (packetloss, spikes, lag, out of 
order packets)
Path MTU discovery with the ability to set this explicitly
Native bitstream support

Optional but helpful features include:
Multithreading support
Packet level compression
IO completion ports (for Windows based games)
Cross platform

SECTION HEADER: Conclusion
This article is intended to provide architectural level ideas that can 
be used to make your game-level network system programming easier and 
less prone to cheats.  It doesn't provide specific implementations, 
which are game specific.  For further information, I recommend searching 
Gamasutra, Gamedev.net, Google, Flipcode.  Excellent mailing lists 
include the GameProgrammer.com list, and GD-Algorithms-list.

(TODO - I probably need more here!)


---------------------
To unsubscribe go to http://gameprogrammer.com/mailinglist.html


Other related posts: