[gameprogrammer] Gamasutra Article Rough Draft (Long)
- From: Kevin Jenkins <gameprogrammer@xxxxxxxxxx>
- To: gameprogrammer@xxxxxxxxxxxxx
- Date: Tue, 09 Nov 2004 18:35:11 -0800
Gamasutra hired me to do a feature article for them, most likely to be
published at the end of this month. I've finished the first draft and
wanted to get people's opinions on it.
It's pretty long but they have a 3000 word requirement.
Thanks in advance.
---------------------
Title: Designing secure, flexible, and high performance game network
architectures
Multiplayer support in console games has traditionally been a secondary
feature, appealing to the hardcore market and generating relatively few
extra sales. This has been changing as online gaming goes mainstream,
especially since consoles began supporting broadband. Many developers
find themselves required to add multiplayer for the first time and are
in the position of developing multiplayer systems from scratch. The
straightforward solution is to implement a reliable packet system and
use this to encode and transmit game-level messages. This is penny-wise
and pound-foolish, requiring more time in the long run and limiting
player and feature support. A game-level network architecture can save
time, improve flexibility, and can address many cheating issues that
can't be solved easily elsewhere. While every architecture is game
specific, this article will present high level ideas that can be used in
part or whole along with optimization ideas that can benefit existing
systems.
Any networking architecture deals with three major issues: security,
flexibility, and performance. The first thing to consider is what
topology to use, as this is probably the single most important issue to
resolve. Traditionally, games take one of two approaches: peer-to-peer
or client-server. In a peer-to-peer system data is stored on the client
and authorized by a consensus system whereas with client-server the
server is trusted and stores most to all game data. Peer-to-peer avoids
putting network load on a single system. This can be advantageous for
data that doesn't need to be routed through a server, such as directed
chat messages. Another situation where peer-to-peer is beneficial is
when you don't have a single system that would be a suitable as a server
- such as when all systems are consoles. Peer to peer communications
can provide better performance but a client-server system is more secure
and flexible.
One architecture commonly used is a client-server object replication
system. In this system, specific objects and data members are
transmitted on creation or changes. This can happen explicitly, with a
call to serialize and replicate, or implicitly, where data is polled
every update cycle and compared to the value it had the last update
cycle. This is a good approach. It is straightforward and easy to
understand. It fits into existing single player systems. With
implementations that replicate objects automatically, scripts can modify
data without knowledge of the underlying system and still achieve
network synchronization. However, it is less efficient than a
hand-tweaked system, sending data independent of context and often
redundantly. It is also less secure, creating a strong coupling between
game-code and network activity. As game-code changes are made,
especially by programmers that aren't familiar with network security,
vulnerabilities arise. We're going to expand on this to address these
problems efficiently.
The architecture suggestions described herein are intended to provide
the following qualities:
1. Cheat-resistant. When applicable, data and some game-code is
maintained server side, optionally supplemented with peer-to-peer
communications to avoid unnecessary server traffic.
2. Flexible and extensible. Intentional abstraction achieves loose
coupling of game code with network code.
3. Low bandwidth utilization. Low level context related optimizations
are exposed to higher level routines so scripters can define algorithms
and rules by which to send data.
4. High performance. Optimizations and features described should be
fast, require low memory utilization, and be optional improvements to
existing proven algorithms.
SECTION HEADER: Security and cheat-resistance
If you plan to sell more than a few thousand copies of your game then
security should be considered just as important as performance. The
more popular your game is, the more people will try to hack it. As soon
as one hack is found it will spread through the ranks of would-be
cheaters like wildfire with negative press soon to follow. This article
doesn't cover cheating in its entirety, which deserves a whole book of
its own (Applied Cryptology by Bruce Schneier is excellent), but we can
stop some cheats by making wise architectural decisions.
Some of the most common methods used by cheaters are listed in the
following table:
<TABLE HERE>
Exploiting bugs or bad design - Game programming level - UO - PKing
without gaining notoriety with firefields.
In-game network attacks - Game-level network architecture -
Counterstrike - Intentionally causing lag to slow down the game and kill
people when they can't move
Modified memory or EXEs - Both game-level network architecture and
client security layer - America's Army - Able to see through teargas,
bushes, through third party memory modifier. Also sending invalid
packets to crash the server.
Hacked packets - Network level security - Quake - Changing the direction
field in a shoot packet as an aimbot
Falsified connections - Network level security - <Find example of this>
- Stealing passwords
Cheating can be addressed through implementation and topology. Your low
level networking implementation should include packet tampering
detection, encryption, and falsified connections. Executable and system
level protection involves data encryption, obfuscation, and validation.
Game network-system issues will be covered throughout this article,
beginning with network topologies.
Sensitive data on the server:
The traditional game topology is client-server, with most to all
sensitive data kept server-side. This works well in most cases. I'm
going to go one step further, and also suggest that game algorithms
insofar as practical be kept server-side as well. It's easiest to
illustrate this with an example:
Typical server-side system:
Client has a Capture the Flag (CTF) script or game mode.
Client requests start of a CTF game.
Server sends flag location, player scores, player names, and starting teams.
The contents of client packets deal with players, flags, names, and teams.
Server-side system with data on the server, communicating through a
generic command system.
Server sends a message "Show text 'Starting CTF' at <x,y>' for <n> seconds"
Server sends a message "Spawn object <flag with ID> at position
<x1,y1,z1> with properties <animation, can be picked up, transfers to
killer, reset to position at point x2,y2,z2>"
Client packets deal with abstract IDs, strings, and groups.
Abstract programming is much more difficult to design and implement than
concrete programming. In practice what you do will depend largely on
the existing system, advance requirements knowledge, time, and reuse
goals. The intent one should strive for is to keep game-specific data
out of the client. The game code, on both server and client, should
never assume the type of an object sent over the network as an ID. Data
must be validated upon reception, by verifying ID types, and bounding
numeric and other data. For example, if the server programmer thinks in
advance an ID is a flag, it's easy to make a blind cast from an invalid
ID and crash the server. If the server programmer is only dealing in
generics, such an assumption would be much less likely.
Often you don't want one client to know the IP address of another
client. Allowing players to know one another’s IP’s can lead to denial
of service attacks and spoofed connections. However, in certain
contexts, such as setting up chat servers or with small games, it can be
reasonably safe for one client to know the IP of another client. In
that case it's worth considering supporting a peer-to-peer system to
transmit non-sensitive data directly, rather than through the server.
Even with a client-server architecture, in some cases it would still be
necessary to perform server migration, for example a 4 player real time
strategy game.
Sensitive data on the client:
When data is on the client, such as with a peer to peer topology, you
have to rely on a trust metric based on certain assumptions:
1. Cheaters are likely to cheat right away, if for no other reason than
to check if the cheat works.
2. Cheaters are likely to perform the same cheat many times in a row. A
speed hack will speed up your character every frame, rather than one frame.
3. A cheater's data is likely to be widely divergent from the data on
the other systems - For example, gain 100 HP instead of gain 1 HP.
4. It's likely that only one or two systems will be doing the same cheat
at the same time.
The process is straightforward. A new peer/client is assigned an
initial trust rating. This value is continuously increased over time.
Every time there is suspicious data from that client, subtract from the
trust rating. For a client-server based system, you are done. If x <=
0, or if x drops too quickly, kick that client. With a peer to peer
system, if x <= 0, or if x drops too quickly, then compare this value to
that of other systems. If a majority of systems all report similar
changes to the trust rating, kick the offending system out of the group.
Otherwise set the trust rating back to some average of the systems.
These values are all contextual and I have no specific numbers to
recommend. A good rule of thumb is to make unintentional latency cause
decreases in the trust rating that are marginally less than what is
actually considered a cheat.
SECTION HEADER: EFFICIENT MESSENGING
With cheating behind us, it's time to address messaging. Broadly
speaking, messaging is used for:
1. Object replication
2. Remote Method Invocation (RMI)
3. Data synchronization
Object replication involves sending a message when an object is created
or destroyed by an authoritative system so that this same object is
created or destroyed on other systems. Data members of that object may
be synchronized as well and inherit the update rules of the object.
Invoking procedures on remote systems involves parsing commands and data
out of messages, ensuring correct encoding, and mapping the command to a
function on the remote machine with the data in the message passed as
arguments to that function. Data synchronization involves determining
which memory address to update given a system independent identifier,
ensuring that this memory address is valid to write to in the given
context, and changing the value. As this article is about architecture,
I won't go into specific implementations but will instead explore
improvements to these operations.
SUBSECTION HEADER: RELEVANCE BASED MESSENGING
Relevance based messaging is the practice of sending only relevant data
to relevant systems. In this case we are referring to object creation
and destruction, and variable updates be they members of that object or
globals. Culling can be performed in both these cases. The most
typical culling method for objects is distance based, where objects will
be created at range X, and destroyed at range X+Y from the target
system's field of interest. The most typical culling method for
variables is to not send if the data is identical. When those variables
are class variables and the class is culled, then those variables can
also be culled.
The reason more advanced culling methods are not usually used is that
they are contextual and we don't know the context at the network system
level. The solution I propose is simple: provide the standard solutions
as default behaviors, and expose the culling method or method properties
to the higher level in the form of callbacks, overridable functions, a
parameter list, or some other method. Architecturally, this means that
each system which broadcasts updates needs to know the prior value
per-system for each synchronized object or variable. If memory is a
concern, we can limit saving prior values to only variables that change
frequently.
SUBSECTION HEADER: ENCODING NETWORK COMMANDS, TEXT, AND DATA
<ITALICS>Encoding network commands</ITALICS>
Games often send the same command frequently. A common solution is to
use the first byte of a packet to represent a command identifier. This
works but we can shave off a few bits by creating a frequency table of
network commands which indicate likely or previously recorded session
averages. For example:
Per system:
Movement: Called 84 times
Press trigger: Called 22 times
New player joined: Called 5 times.
We can use this frequency table as input for arithmetic or Huffman
encoding, which will then give us an optimal bit-wise representation of
each command. Table generation can be performed beforehand so as to not
slow down the game at runtime. This will need to be done twice - once
for server commands and once for client commands.
<ITALICS>Encoding text commands</ITALICS>
We may also apply this technique to commonly sent strings. On
connection, a server can send a string table to the client of all
strings that are mapped to a unique identifier. More complex
implementations may encode special characters in the string for printf
style format specifiers. From that point on, one bit can be sent,
indicating a string lookup (or not), followed by a string identifier.
If desired, this can also be based off some optimal encoding method as
was done with commands.
<ITALICS>Encoding data</ITALICS>
Memory and processing permitting, we may also wish to perform global
compression. The technique is the same. Generate a frequency table
based on previously recorded server output and client output sent from
the parsing system. In the case of messages sent reliable and ordered,
adaptive run-time compression is also possible for data sent reliably
and ordered.
A more useful and practical approach is to write data as bit streams
that perform simple run-time contextual compression. The approach used
by the network API RakNet is to provide WriteCompressed and
ReadCompressed calls that recursively write a <1> if the high half of
the data is all 0's Otherwise write a 0, then the rest of the data.
The approach used by the Torque network library is to specify the min
and max values of a data element by which the minimum number of bits
will be used to write that data. The first technique is safer and more
flexible, while the second technique provides better compression. If
you decide it is worthwhile to use the second technique, this can be
extended by a further parameter: granularity. It's often the case that
we don't care whether a position is 503.135224 or 503. By specifying
the bits of granularity, we can perform this reduction automatically.
In either case, the option to specify compression parameters should be
exposed to the higher level so that scripters or implementers can take
advantage of it. In the case of specifying bits, it would be a very
good idea to use a utility class to specify parameters and share this
class between systems to reduce maintenance and the chance of mistakes.
SECTION HEADER: PACKET CONTENT
One way to reduce the number of client network commands as well as
reduce the likelihood of cheating is to have the client send inputs,
rather than the results of inputs. This is straightforward and can make
programming easier in some ways.
Send:
Movement commands
Use key command
Jump key command
Switch to weapon #5 command.
Don't send
"Pickup red team's flag"
"Kill player X"
The difference is that the set of actions has a one-to-many relationship
with the set of input commands. Once you implement the input commands
you gain the actions for free, and actions are a moving target while
commands tend to stay fixed. This is easier to program. From the
context of cheating, it limits the amount of code you have to deal with
and in the worst-case scenario the client won't be able to do anything
they couldn't have done within the normal context of the game. In
practice it's not possible to achieve this level of abstraction and
still have good game play because entropy will cause clients to get out
of synch.
We can address entropy by including relevant data whenever we send a
client command. For example, by including position and orientation
along with a movement command. This layer should be exposed to
scripters or gameplay programmers that can identify variables either
directly or by some identifier.
While the server doesn't send input commands the same principle applies.
Exposing the command to let scripters include the associated variables
can make optimization easier.
SECTION HEADER: Summary of data to maintain.
Throughout this article I have referred to applying properties to
commands, variables, objects, systems, and strings. It may be more
clear in certain implementations to aggregate these properties into a
single system, whereby you can define all these properties up-front.
This table summaries the properties and extra data I've covered:
Per command:
- A unique identifier that can be used to find the command in a look-up
table.
- A bitwise encoding representing this command.
- A list of variables and objects that need to be synchronized when this
command is used.
Per variable:
- A unique networkable identifier that is two way mappable. From the
variable we should be able to get the identifier, and from the
identifier we should be able to get the variable, given the context of
the enclosing struct or class, if any.
- What properties or method to use to interpolate this variable, if it
is interpolated at all.
- If this variable is written with compression, and if so the number of
bits, min, and max values
- What culling function to call or properties to use, if any, to
determine if a variable has changed enough to be worth sending.
- What object this variable is a member of, if any. Used to determine
if a send is not necessary because the object itself is culled.
- Which system(s) are allowed to update this variable.
Per object:
- A unique networkable identifier that is two way mappable. From the
object we should be able to get the identifier and from the identifier
we should be able to get the object.
- What culling algorithm to call or properties to use, if any, to
determine if an object is in the area of interest (to send object
creation and updates). Also what algorithm to process, if any, to
determine if the object is in the area of disinterest (to send object
destruction).
- The list of synchronized member variables.
- Which system(s) are allowed to create and destroy this class.
Per remote system, on a server or peer
- Which objects this system has instantiated
- The field of interest for this system (usually the camera position)
- The last values for the variables that were sent to this system.
Per encoded string:
- A unique identifier by which programmers or scripters can refer to
this string.
- A bitwise encoding of this identifier that can be transmitted over the
network
- Optionally, what fields each string expects to take, if format
specification is used.
SECTION HEADER: Synchronizing non-critical data
Some data isn't critical but it would be nice if it was synchronized
among systems. For example, showing the same type of particle effects
or playing the same animation frame at the same time. Novice developers
may try to synchronize this data with messages, which is wasteful. A
better and somewhat well-known method is to synchronize one or more
integers which can be used as keyframe indices, random number generator
seeds, or similar purposes.
SECTION HEADER: Low level network system requirements
It's important to point out that I didn't cover the low level network
system that this system relies on. At a minimum the low level network
layer should support, in order of most important to least:
Reliable and unreliable messaging
Ordering and sequencing with multiple channel support
Secure connections with strong packet encryption and tampering detection
A wide range of statistical reports
Automatic message to packet aggreggation and splitting
Remote method invocation
Automatic flow control
Message priority level support
Native internet simulation capabilities (packetloss, spikes, lag, out of
order packets)
Path MTU discovery with the ability to set this explicitly
Native bitstream support
Optional but helpful features include:
Multithreading support
Packet level compression
IO completion ports (for Windows based games)
Cross platform
SECTION HEADER: Conclusion
This article is intended to provide architectural level ideas that can
be used to make your game-level network system programming easier and
less prone to cheats. It doesn't provide specific implementations,
which are game specific. For further information, I recommend searching
Gamasutra, Gamedev.net, Google, Flipcode. Excellent mailing lists
include the GameProgrammer.com list, and GD-Algorithms-list.
(TODO - I probably need more here!)
---------------------
To unsubscribe go to http://gameprogrammer.com/mailinglist.html
- Follow-Ups:
- [gameprogrammer] Re: Gamasutra Article Rough Draft (Long)
- From: Ray Gomez-Bravo
- References:
- [gameprogrammer] Re: Vector Graphics (seconded request)
- From: Jon Harrop
Other related posts:
- » [gameprogrammer] Gamasutra Article Rough Draft (Long)
- » [gameprogrammer] Re: Gamasutra Article Rough Draft (Long)
- » [gameprogrammer] Re: Gamasutra Article Rough Draft (Long)
- » [gameprogrammer] Re: Gamasutra Article Rough Draft (Long)
- [gameprogrammer] Re: Gamasutra Article Rough Draft (Long)
- From: Ray Gomez-Bravo
- [gameprogrammer] Re: Vector Graphics (seconded request)
- From: Jon Harrop