DBD2 Glossary

(Definitions of Terms & Concepts)

Canonical Hostname

A host's 'canonical name' can be thought of as the name applications that run on it see as itself. Alternatively, the 'canonical hostname' is the name returned by the 'hostname' connamd. The term 'canonical' is defined by dictionary.com as:

adjective Also ca·non·ic.

pertaining to, established by, or conforming to a canon or canons.
included in the canon of the Bible.
authorized; recognized; accepted:
canonical works.
Mathematics. (of an equation, coordinate, etc.) in simplest or standard form.
following the pattern of a musical canon.

The #3 definition applies to the term 'canonical host'. It is the 'recognized' or 'official name' of a host. Secondary names or aliases may also be assigned and (for webservers) often are), but the canonical name is unique for every host.

IP Hosts may be assigned multiple names for a variety of reasons. Webservers are probably the most common example of a host having multiple names (one for the host itself -- the 'canonical name' and one hostname for each web-site being hosted -- often with a unique address assigned to each webserver if that webserver supports the 'https' protocol.

Multi-homed hosts (hosts with multiple interfaces connected to different IP subnets) will also have multiple hostnames (at least one hostname to identify each connected interface, but only one will be the 'canonical hostname').

Compiler Symbol

For those not familiar with the concept of compiler symbols, a compiler symbol is an instance of a variable used to give instructions to the compiler on how and what to compile in any given project. In the "C" and C++ programming languages, these variables can be created with the '#define' command or deleted with '#undef'.

Compiler symbols can hold values or they can act like boolean variables where 'on' means they have been declared and 'off' means they have not. DBD2 uses a series of compiler-symbols with names in all-caps and starting with 'CAP_' to manage optional features. The value of these variables is meaningless - all that matters is whether they were declared or not during compilation (at compile-time). Other compiler-symbols used by DBD2 hold important configuraiton values (such as buffer sizes or number of 'extra' facilities) that the compiler uses during compilation to calculate the size of various buffers as well as program stack sizes.

The DBD2 code uses 'CAP_*' compiler-symbols to tell the compiler to omit or substitute 'chunks' of code and data-structures depending on whether they have been declared. This allows the single DBD2 'code-base' to produce compiled binary images with very different characteristics based on what pieces of code were selected by the 'CAP_*' symbols used to compile the individual binary image. This approach also simplifies development because the same exact code (and bug-fixes) are used for all non-optional code in all binary images.

Filter-Related Terms and Concepts

A filter-file is simply a text-file that contains comments and a list of filter-elements that define the filtering to be applied to each message in an input or output stream of syslog events in DBD2.

A filter-element is one of:
- The string 'dd' (default discard) or 'dp' (default pass) on a line by itself.
- A keystring followed by a delimited list of fields. The field-delimiter is the first non-whitespace character following the keystring of any individual line.
A Keystring is a string composed of the (case-sensitive) values:
- '^': The field-string must be a case-sensitive match at the start of the message-string component of the message.
- '$': The field-string must be a case-sensitive match at the end of the message-string component of the message.
- 'c': The field-string must be a case-sensitive match anywhere within the message-string component of the message.
- 'n': The field-string must NOT be a case-sensitive match anywhere within the message-string component of the message.
- 'h': The field-string must be a case-NON-sensitive match anywhere within the hostname component of the message.
- 'f': The field-string must be '*' or must match the incoming facility of the message as modified by any input-specification-option actions.
- 'p': The field-string must be '*' or must match the incoming priority of the message as modified by any input-specification-option actions.
- 'F': The field-string must be '*' or must match the incoming facility of the message as it was originally recieved but with missing fields filled by global default values.
- 'P': The field-string must be '*' or must match the incoming priority of the message as it was originally recieved but with missing fields filled by global default values.
- '+': This is a special field with a complicated description. Please see detailed filter information.
A filter-match occurs when the conditions specified by all keystring elements of a particular filter-element have been satisfied. Once a match occurs and the specified action has been completed, the processing of this filter-file is terminated.

The result of 'applying a filter' is that DBD2 will take one of the following actions:

The filtered copy of the message is dropped from further processing (a 'discard').
The message continues processing in an unmodified form (a 'pass').
Some component of this copy of the message was modified and future processing of this data-stream will use the modifications made by the filter (a transformation).

Note that when an input filter is executed, there is still only one 'copy' of the message. When output-filters are executed, there are multiple 'copies' (one for each output destination it (might be sent to).

Inheritance

'Inheritance' is the property of a variable that causes it to take a default value from another variable (usually of the same or similar name) that has a larger scope and that may have been changged from its default value at a different point in the configuration.

For example, the threadpool parameter for the number of reader-threads will take the compiled-in value if not otherwise changed.
'The scope of the compiled-in value is all configurations on all systems.'
A global value exists that can globally change this default setting for a specific configuration and run-time environment.
A 'threadmaps' value also exists that can change the global setting for an individual threadpool within a specific run-time environment.
A connection-specification option also exists that can change the threadpool value to meet the minimum needs of that particular connection.
At each step, the value is 'inherited' from a larger scope and may be modified to affect the default values used by variables with smaller scopes.

Input and Output Terminology

This entry is an attempt to avoid or resolve any confusion that may be caused by similar-sounding terms such as 'input type', 'input connection', 'input specification', etc. For virtually all 'input-*' terms, there is a corresponding 'output-*' term.

A source (destination) is an external (to DBD2) entity that generates (receives) traffic, thereby looking (to DBD2) like a 'traffic generator' ('traffic sink'). Input examples include: an instance of Syslogd2, an instance of dbcl or a user-program that 'talks' directly to dbd2. Output examples include each defined database or output file.

An input connection is a conceptual 'connection' from an input-source (to a destination) that allows the flow of data into (or out of) DBD2. An 'output-connection' in this context may exist to a file as easily as to a socket or remote database host.

An input or output type is a 'class' of inputs or outputs such as an output-files vs output-databases or input TCP sockets vs UDP or Linux sockets.

The input keyword is a specific instance of an input type and introduces an input specification. This keyword defines a socket-type of input.

An input specification (or Input-spec) is a configuration-file expression describing an input connection, including all specified options. The term "input-spec is used to refer to the general concept (or format) of input-configuration entries. It should be noted that the term input specification is itself a form of the more general term connection specification which encompasses both input and output specifications.

Latch

To 'latch' a value is to record the fact that it has already been seen (set) in order to detect (and prevent) subsequent attempts to set that same variable at a later time.

In DBD2, virtually all variables are latched as is the configuration-file line-number on which they are first encountered. This is to support the 'first-come-first-serve' policy that states "only the first instance of a variable or setting encountered will be used. All subsequent encounters will be ignored (with or without an error report)".

FIFO, Queue, and Ring-Buffer

FIFO stands for "First In First Out". A queue is any collection of events to be serviced where additions are made at the 'tail' end of the queue and events are processed from the 'head' of the queue. Queues are therefore (by definition) FIFO structures.

A buffer is any sequential 'chunk' of memory of a given size. Imagine a buffer being used to implement a queue using two pointers: a head-pointer and a tail-pointer:

When first initialized the two pointers are equal and the queue is empty.

After each incoming data-element is added to the queue, the head-pointer is incremented.

After each element is removed from the queue, the tail-pointer is incremented.

It is easy to imagine that the end of the buffer will be reached very quickly and our 'program' would crash, but what if after each pointer was incremented, they were then tested to see if they had 'walked off the end' of the allocated buffer and if so, they were set back to the beginning of the buffer? Doing this check-and-reset would have the effect of 'joining' the two ends of our buffer together, creating a 'ring-buffer' where our imaginary pointers can chase each other as long as they observe a few basic rules:

If the tail-pointer ever becomes equal to the head-pointer after incrementing, the buffer is empty and the tail-pointer should stop processing.

If the head-pointer is ever just one element less than the head-pointer before adding data, DO NOT ADD THE DATA because the queue is full and the additional data will over-write the next entry the tail-pointer should process (by 'one element less', I mean after adusting both pointers for the 'joint' where the two ends of our buffer meet to form the 'ring').

Schedules

A schedule in DBD2 is just an array of integer time intervals that is combined with either a global or individualized index value to provide a graduated sequence of intervals at which a process is automatically run. The concept is based on the recognition that events for which some timers are assignded will behave predictably, but not on a constant schedule. For example, if a network connection failure occurs, there may be many reasons. If the remote box is being rebooted or the application restarted, the failure should be resolved within minutes. If a network switch, router or gateway has failed and there is a redundant configuration, either seconds or minutes would be required for the backup to detect the failure and assume the primary load. If an internet link has failed, the time might be minutes to hours, while more serious issues (such as a fiber-optic link being cut by an excavation) may take days to recover or provide alternate routing for.

From the standpoint of a host-to-host connection, the above possibilities (and others) look like a continuum where many (most?) outages will be repaired fairly quickly (within minutes) but the longer the link stays down, the less likely it is to get fixed within the next (short) time-interval (5-10 minutes). This is the concept behind a schedule. Typically a schedule starts with small increments that will re-test frequently, but will move to longer time intervals if the link remains down.

This concept also applies to the start-up environment. Initial failures should be expected - especially if (unbeknownst to DBD2) the entire environment is rebooting after a power outage, but over a (relatively short) period of time, systems will come on-line and connections can be re-established. If (however), connections cannot be established within the first few minutes of startup, there's a good chance something more serious (and longer-lasting) has occurred, so frequent re-testing will do little good.

All DBD2 schedules can be reduced to simple timers by specifying only one entry. Schedules are configured by specifying a space-separated list of interval time-strings. The last entry of the schedule as special meaning. If it is a positive value, a zero is added to the schedule as a terminator.

Schedules work on a concept of 'success' or 'failure'. If we are talking about re-opening a failed connection, a 'failure' occurs when the connection cannot be restored immediately. This results in the index value incrementing. A 'success' occurs when the connection is restored. The index is reset to zero (which restarts the graduated recovery-schedule).

When (if) a schedule reaches the last entry, various actions will be taken based on the value of the last entry.
Because schedules are used for more than just re-opening failed connections, the terminology below is going to sound somewhat cumbersome.

Last entry is 0: The schedule will repeat the last positive entry until it is reset by a 'success'. Each entry covered by this schedule will maintain its own index value.

Last entry is -1: The schedule will terminate and deactivate itself. Each entry covered by this schedule will maintain its own index value.

Last entry is -2: The schedule will 'repeat' the last positive entry until it is reset by a 'success'. A single global index will be used for this schedule causing each entry to to be checked with each run. 'Success' is defined as ANY elements reporting success or all elements already in a 'successful' state.

Last entry is -3: The schedule will terminate and deactivate itself. A single index value will be maintained globally and all elements covered by this schedule will get tested each time. A single global index will be used for this schedule causing each entry to to be checked with each run. 'Success' is defined as ANY elements reporting success or all elements already in a 'successful' state.

Schedules work on a concept of 'success' or 'failure'. If we are talking about re-opening a failed connection, a 'failure' occurs when the connection cannot be restored immediately. This results in the index value incrementing. A 'success' occurs when the connection is restored. The index is reset to zero (which restarts the graduated recovery-schedule).

The scheduled process always runs based on the 1st schedule entry (usually the smallest). For larger individual time intervals, the entry maintains a 'next-check-time' that will cause it to be skipped if is is not yet time to check that value. This can actually save time by not waiting on connection-timeouts if the schedule says it is too soon to retest an individual failed connection. However, the process must still run frequently in case other (more recent failures with smaller intervals) are ready to be re-tested.

S-D String (sdstring)

Syslogd2 has borrowed an idea from the MariaDB log file(s) that it refers to as the 'SD-String'. The name is derived from a similar field defined in the rfc for Syslog Version 1.

-->Syslogd2's sister project (DBD2) is nearing its first production release and a method was needed to pass database fields from Syslogd2 (or other sources such as log-files or syslog-client applications) into DBD2.
-->The approach selected is to create (at the front of the message-string) a new 'field' called the 'sdstring' that can be (and will usually be) null [or the empty string or ("")]. When non-null, this string will consist of a white-space separated list of individual, square-bracketed name=value pairs (or potentially just value) This new 'field' starts and stops with the square brackets:

<facpri-value>time-field hostname-field [name=value] [name=value] ... [name=value] original message string here

It is planned to update Syslogd2 at some future time to recognize the difference between its current sdstring definition and the Syslog Version 1 rfc via parsing (and if that fails to change the delimiter from square brackets to something else), but the need is still present to have a way for syslog-generators to create fields in a syslog message that DBD2 can recognize, read and act on.

Selector-String-Component

A Selector-String-Component is simply a fragment of a selector-string that itself is a component of an output-line in traditional syslog configuration files.
For eample, in the output line:

kern.*;user.*;mail.warn;*.err;ftp.none - /var/log/file.log

the selector-stringg is everything before the '-' character (not including the whitespace). A selector-string-compoent is a string composed of one or more of the semi-colon-separated elements that make up the selector-string.

Size-String

A size-string is a format for inputting large numeric values representing (almost exclusively) file-sizes. The size-string is composed of an integer value followed by a single letter denoting a 'unit-value'. The entire purpose of a size-string is to provide a human-readable, mistake-resistant, shortcut method of specifying large file sizes.

b: bytes

k: 1000 bytes
K: 1024 bytes

m: 1000 x 1000 bytes
M: 1024 x 1024 bytes

g: 1000 x 1000 x 1000 bytes
G: 1024 x 1024 x 1024 bytes

The default unit is byte if not otherwise specified.

SoftComment

DBD2 uses a concept referred to as 'soft comments' in the syntax of its primary (and included) configuration files (soft comments are not supported in ancillary files). A soft comment 'looks' like a comment to other parsers, but are read as configuration commands by DBD2's more complex parser.
Syslog considers that the configuration file consists of 3 types of lines:

Output-lines: These are the only kinds of non-comment lines found in traditional configuration files. They specify (by facility and priority) what data is to be sent to which output destinations.
Command-line entries: These lines are uniquely identifiable because (by definition) command-line entries found in theh configuration file must start with a tilde ('~') as the first non-whitespace, non-hashtag ('#') character of the line. All DBD2 configuration commands are command-line options that (with only a few exceptions may be either on the actual command-line or placed into the configuration file.
Comments: Comments consist of any blank line or any line starting with a hashtag ('#'). Comments in DBD2 are divided into two types: soft comments and hard comments:
- Hard comments start at the 2nd hashtag character found in any given line.
- Soft comments are any text found between the first hashtag character and the 2nd hashtag character in any single line of the configuration file. Soft comments are a (perhaps misguided) attempt to allow a single deployed configuration file to be used with both DBD2 and other syslog processors. Using soft comments, DBD2 can 'hide' its non-standard configuration statements from other parsers by disguising them as comments.
  
  Soft comments can be enabled or disabled as often as desired in the configuration file but will only affect how DBD2 handles the parsing of output-lines. Soft comments are enabled/disabled with the '--enable softcomment' or '--disable softcomment' command-line options.
  
  Soft-comments are considered to be in an 'always-on' state for all command-line statements. If a single hashtag is encountered as the first non-whitespace character of an input line while parsing, DBD2 checks the next non-whitespace word in the line. Three results are possible:
  1. The next word is a tilde ('~'): The line is processed as command-line input to DBD2. Note: DBD2 requires that all lines in the configguration file that contain command-line input must start with a tilde (that may optionally follow a single hashtag character).
  2. The next word is a valid facility name or the '*' wildcard or the reserved facilty pseudo-name 'none': DBD2 attempts to parse that word as a syslector-string and (if successful) processes the rest of the line as an output-line. This allows syslog to hide any output lines containing an option list or that contain non-standard facility names.
  3. The next word is anything else: Syslog discards the line as if it were a 'hard comment'.

To allow DBD2 to ignore lines that it does not understand, use the --skip <n> command-line opiton. This command tells the parser to ignore the next 'n' physical lines in the file (or to end-of-file whichever comes first).

Note that when softcomment is disabled, DBD2 treats output lines exactly as traditional configuration files do (a single hashtag is a comment), but when softcomment is enabled, DBD2 will still read and process any output-line preceded by a single hashtag. Also, for all command-line parameters in the configuration file, 2 hashtags are required to 'comment-out' a line.

SQL-Index Variables

An sql-index-variable is an index into a conceptual array of variables (variables that have not yet been provided to DBD2). In database templates an sql-index-variable is represented by a a placeholder value consisting of a dollar-sign ('$') followed by a positive numeric value. An sql-variable is configured using the 'var' keyword (example: var 3 username,user,name). In this example, sql-index number '3' receives its value from incoming data (in 'name=value'format) when the name matches one of 'username', 'user' or 'name'. Required placeholders that have no matching data elements are filled with an empty (null) string ("").

Templates

DBD2 uses several different types of templates in its configuration. Here is an overview:

'Database Template': This is a pre-formatted sql-query string with the data-values replaced by 'sql-index-variables'. Multiple database templates may be configured for each database with each unique query-string having a data-base-unique template-index.
'Routing template': This is a character-string consisting of one or more 'template-id'(s).
'Template-Id' or 'Routing-ID': This is a database-id followed by a numeric 'template-index' value to identify the template within that database.
'Database-Id': A non-case-sensitive letter from the ASCII character set ('a'-'z') (aka the 'destId' value) used to identify each defined database in the system.
'Template-Index': The numeric value assigned by the user to each database-template string in the configuration-file. Each 'Template-index' must be unique within each output-section in order to uniquely identify a particular database-template.

Examples of template-ids are 'a1' and 'z400'.
Examples of routing templates are 'a1z4' and 'a3a8a9y13z8q400'. (Each routing template component unambiguously identifies a destination database and a corresponding template within that database to process.)

Threadpool

A DBD2 threadpool is a 'working unit' consisting of some number of execution-threads, a subroutine to execute and a data-structure that usually includes a FIFO buffer consisting of some number of 'message-slots' (lines) to store (as yet) unprocessed messages.

Input-threadpools (socket, kernel, name-pipe and file reading threadpools) do not use FIFO buffers since their input comes from external system buffers. The Houskeeping threadpool gets its input from the scheduling code in the parent thread.

Threadpool configuration consists of specifying how many execution-threads to create and how many 'storage slots' for unprocessed input to allocate in the FIFO buffer.

When CAP_WORKERTHREADS is declared, the worker-threadpool parameters can be optionally configured from the input-specifications instead of a dedicated (worker-thread-specific) keyword since worker-threadpool settings are directly related to the estimated amount of work submitted by the input-threadpools.

Every threadpool is identified by a non-negative integer and each threadpool-type maintains a distinct sequence of numeric values. The support of most threadpool types other than socket-input threadpools depends on the declaration of optional capabilities.

The default threadpool number for any given threadpool type is always 0 [zero].
Specialty threadpools are limited to a single threadpool instance.
Specialty threadpools are 'kernel', 'user' and 'housekeeping' threadpools.
Multiple instances of non-specialty threadpools may be created simply by referencing them.
Unused threadpools (those that are defined but never assigned connections to service) are deleted before being created.

Technical Note: There is one (partial) exception to the statement that all configured threadpools use index 0 as their default. Because the subroutine algorithms for 'socket' and 'tailfile' types of input threadpools are incompatible, tailfiles and sockets cannot share the same threadpool id-number even though they share the same numeric sequence set. They are both still configured as if threadpool number 0 was the default, but internally, DBD2 will move the tailfile threadpool number 0 to a negative value to deconflict the two.

Time Interval

All DBD2 timers and schedules that define time intervals for recurring operations measure their intervals from the terminaion of one 'pass' to the scheduled time of the next pass. This represents a subtle, but significantly different perception than is commonly found which is that processes will always start at the scheduled time. The DBD2 approach allows for the realities of its multi-threaded design by allowing a scheduled process-start to be delayed until the required resources are released by all competing processes.

This approach leads directly to the observation that 'intervals' as set in the configuration file are just minimum values and may vary by several seconds based on resource-conflict wait-times.

It also leads to the visible result that 'mark' time stamps in local files will 'drift' over time since the process that excutes the 'mark' function takes finite time to execute.

Early development of DBD2 found that when timed processes are scheduled for execution, they will (at some point) inevitably come into conflict with each other. The result (if uncontrolled) is resource conflicts and unexpected results as scheduled processes perhaps skip a scheduled run-time or delay their execution resulting in backing up other scheduled processes, etc, (or worst case) interfere with each other, resulting in a catestrophic application abort.

Several of DBD2's background processes are potentially time-consuming and lock data-structure resources for (comparatively) long time periods. This tendency is most easily understood by imagining a configuration with 20 or 30 output files and sockets trying to check the entire list (including unspooling of any spooled data) while a file-rotation cycle is also scheduled to be on-going. Who gets to control the integrity of the output-data-structures to insure the executing output-routines do not crash the system ?

Time-String

Similar to a size-string and for mostly the same reasons, a time-string is also an integer value followed by a single letter:

s: seconds

m: minutes

h: hours

d: days

The default unit is seconds if not otherwise specified.