DBD2 Glossary
(Definitions of Terms & Concepts)
A host's 'canonical name' can be thought of as the name applications that run on it see as itself. Alternatively, the 'canonical hostname'
is the name returned by the 'hostname' connamd.
The term 'canonical' is defined by dictionary.com as:
adjective Also ca·non·ic.
For those not familiar with the concept of compiler symbols, a compiler symbol is an instance of a variable used to give instructions
to the compiler on how and what to compile in any given project.
In the "C" and C++ programming languages, these variables can be created with the '#define' command or deleted with '#undef'.
Compiler symbols can hold values or they can act like boolean variables where 'on' means they have been declared and 'off' means they have not.
DBD2 uses a series of compiler-symbols with names in all-caps and starting with 'CAP_' to manage optional features. The value of these
variables is meaningless - all that matters is whether they were declared or not during compilation (at compile-time).
Other compiler-symbols used by DBD2 hold important configuraiton values (such as buffer sizes or number of 'extra' facilities) that the
compiler uses during compilation to calculate the size of various buffers as well as program stack sizes.
The DBD2 code uses 'CAP_*' compiler-symbols to tell the compiler to omit or substitute 'chunks' of code and data-structures depending on
whether they have been declared.
This allows the single DBD2 'code-base' to produce compiled binary images with very different characteristics based on what pieces of code
were selected by the 'CAP_*' symbols used to compile the individual binary image.
This approach also simplifies development because the same exact code (and bug-fixes) are used for all non-optional code in all binary images.
A filter-file is simply a text-file that contains comments and a list of filter-elements that define the filtering
to be applied to each message in an input or output stream of syslog events in DBD2.
The result of 'applying a filter' is that DBD2 will take one of the following actions:
Note that when an input filter is executed, there is still only one 'copy' of the message. When output-filters are executed, there are multiple 'copies' (one for each output destination it (might be sent to).
'Inheritance' is the property of a variable that causes it to take a default value from another variable (usually of the same or similar name) that has a larger scope and
that may have been changged from its default value at a different point in the configuration.
For example, the threadpool parameter for the number of reader-threads will take the compiled-in value if not otherwise changed.
'The scope of the compiled-in value is all configurations on all systems.'
A global value exists that can globally change this default setting for a specific configuration and run-time environment.
A 'threadmaps' value also exists that can change the global setting for an individual threadpool within a specific run-time environment.
A connection-specification option also exists that can change the threadpool value to meet the minimum needs of that particular connection.
At each step, the value is 'inherited' from a larger scope and may be modified to affect the default values used by variables with smaller scopes.
This entry is an attempt to avoid or resolve any confusion that may be caused by similar-sounding terms such as 'input type', 'input connection', 'input specification', etc. For virtually all 'input-*' terms, there is a corresponding 'output-*' term.
A source (destination) is an external (to DBD2) entity that generates (receives)
traffic, thereby looking (to DBD2) like a 'traffic generator' ('traffic sink'). Input examples include: an instance of
Syslogd2, an instance of dbcl or a user-program that 'talks' directly to dbd2.
Output examples include each defined database or output file.
An input connection is a conceptual 'connection' from an input-source (to a destination) that allows the
flow of data into (or out of) DBD2.
An 'output-connection' in this context may exist to a file as easily as to a socket or remote database host.
An input or output type is a 'class' of inputs or outputs such as an output-files vs output-databases or input TCP sockets vs UDP or Linux sockets.
The input keyword is a specific instance of an input type and introduces an input specification.
This keyword defines a socket-type of input.
An input specification (or Input-spec) is a configuration-file expression describing an input connection,
including all specified options.
The term "input-spec is used to refer to the general concept (or format) of input-configuration entries.
It should be noted that the term input specification is itself a form of the more general term
connection specification which encompasses both input and output specifications.
To 'latch' a value is to record the fact that it has already been seen (set) in order to detect (and prevent) subsequent attempts to set that same variable at a later time.
In DBD2, virtually all variables are latched as is the configuration-file line-number on which they are first encountered.
This is to support the 'first-come-first-serve' policy that states "only the first instance of a variable or setting encountered will be used.
All subsequent encounters will be ignored (with or without an error report)".
FIFO stands for "First In First Out". A queue is any collection of events to be serviced where additions are made at the 'tail' end of the queue and events are processed from the 'head' of the queue. Queues are therefore (by definition) FIFO structures.
A buffer is any sequential 'chunk' of memory of a given size.
Imagine a buffer being used to implement a queue using two pointers: a head-pointer and a tail-pointer:
When first initialized the two pointers are equal and the queue is empty.
After each incoming data-element is added to the queue, the head-pointer is incremented.
After each element is removed from the queue, the tail-pointer is incremented.
It is easy to imagine that the end of the buffer will be reached very quickly and our 'program' would crash, but what if after each pointer was incremented, they were then tested to see if they had 'walked off the end' of the allocated buffer and if so, they were set back to the beginning of the buffer?
Doing this check-and-reset would have the effect of 'joining' the two ends of our buffer together, creating a 'ring-buffer' where our imaginary pointers can chase each other as long as they observe a few basic rules:
If the tail-pointer ever becomes equal to the head-pointer after incrementing, the buffer is empty and the tail-pointer should stop processing.
If the head-pointer is ever just one element less than the head-pointer before adding data, DO NOT ADD THE DATA because the queue is full and the additional data will over-write the next entry the tail-pointer should process (by 'one element less', I mean after adusting both pointers for the 'joint' where the two ends of our buffer meet to form the 'ring').
A schedule in DBD2 is just an array of integer time intervals that is combined with either a global or individualized index value to provide a graduated sequence of intervals at which a process is automatically run.
The concept is based on the recognition that events for which some timers are assignded will behave predictably, but not on a constant schedule.
For example, if a network connection failure occurs, there may be many reasons. If the remote box is being rebooted or the application restarted, the failure should be resolved within minutes.
If a network switch, router or gateway has failed and there is a redundant configuration, either seconds or minutes would be required for the backup to detect the failure and assume the primary load.
If an internet link has failed, the time might be minutes to hours, while more serious issues (such as a fiber-optic link being cut by an excavation) may take days to recover or provide alternate routing for.
From the standpoint of a host-to-host connection, the above possibilities (and others) look like a continuum where many (most?) outages will be repaired fairly quickly (within minutes) but the longer the link stays down, the less likely it is to get fixed within the next (short) time-interval (5-10 minutes). This is the concept behind a schedule. Typically a schedule starts with small increments that will re-test frequently, but will move to longer time intervals if the link remains down.
This concept also applies to the start-up environment. Initial failures should be expected - especially if (unbeknownst to DBD2) the entire environment is rebooting after a power outage, but over a (relatively short) period of time, systems will come on-line and connections can be re-established. If (however), connections cannot be established within the first few minutes of startup, there's a good chance something more serious (and longer-lasting) has occurred, so frequent re-testing will do little good.
All DBD2 schedules can be reduced to simple timers by specifying only one entry. Schedules are configured by specifying a space-separated list of interval time-strings. The last entry of the schedule as special meaning. If it is a positive value, a zero is added to the schedule as a terminator.
Schedules work on a concept of 'success' or 'failure'. If we are talking about re-opening a failed connection, a 'failure' occurs when the connection cannot be restored immediately. This results in the index value incrementing. A 'success' occurs when the connection is restored. The index is reset to zero (which restarts the graduated recovery-schedule).
When (if) a schedule reaches the last entry, various actions will be taken based on the value of the last entry.
Because schedules are used for more than just re-opening failed connections, the terminology below is going to sound somewhat cumbersome.
Last entry is 0: The schedule will repeat the last positive entry until it is reset by a 'success'. Each entry covered by this schedule will maintain its own index value.
Last entry is -1: The schedule will terminate and deactivate itself. Each entry covered by this schedule will maintain its own index value.
Last entry is -2: The schedule will 'repeat' the last positive entry until it is reset by a 'success'.
A single global index will be used for this schedule causing each entry to to be checked with each run.
'Success' is defined as ANY elements reporting success or all elements already in a 'successful' state.
Last entry is -3: The schedule will terminate and deactivate itself. A single index value will be maintained globally and all elements covered by this schedule will get tested each time.
A single global index will be used for this schedule causing each entry to to be checked with each run.
'Success' is defined as ANY elements reporting success or all elements already in a 'successful' state.
Schedules work on a concept of 'success' or 'failure'. If we are talking about re-opening a failed connection, a 'failure' occurs when the connection cannot be restored immediately. This results in the index value incrementing. A 'success' occurs when the connection is restored. The index is reset to zero (which restarts the graduated recovery-schedule).
The scheduled process always runs based on the 1st schedule entry (usually the smallest). For larger individual time intervals, the entry maintains a 'next-check-time' that will cause it to be skipped if is is not yet time to check that value. This can actually save time by not waiting on connection-timeouts if the schedule says it is too soon to retest an individual failed connection. However, the process must still run frequently in case other (more recent failures with smaller intervals) are ready to be re-tested.
Syslogd2 has borrowed an idea from the MariaDB log file(s) that it refers to as the 'SD-String'. The name is derived from a similar field defined in the rfc for Syslog Version 1.
-->Syslogd2's sister project (DBD2) is nearing its first production release
and a method was needed to pass database fields from Syslogd2 (or other sources such as log-files or syslog-client applications)
into DBD2.
-->The approach selected is to create (at the front of the message-string) a new 'field' called the 'sdstring' that can be (and will
usually be) null [or the empty string or ("")].
When non-null, this string will consist of a white-space separated list of individual, square-bracketed name=value pairs
(or potentially just value) This new 'field' starts and stops with the square brackets:
<facpri-value>time-field hostname-field [name=value] [name=value] ... [name=value] original message string here
It is planned to update Syslogd2 at some future time to recognize the difference between its current sdstring definition and the Syslog Version 1 rfc via parsing (and if that fails to change the delimiter from square brackets to something else), but the need is still present to have a way for syslog-generators to create fields in a syslog message that DBD2 can recognize, read and act on.
A Selector-String-Component is simply a fragment of a selector-string that itself is a component of an output-line in traditional syslog configuration files.
For eample, in the output line:
kern.*;user.*;mail.warn;*.err;ftp.none - /var/log/file.log
the selector-stringg is everything before the '-' character (not including the whitespace). A selector-string-compoent is a string composed of one or more of the semi-colon-separated elements that make up the selector-string.
A size-string is a format for inputting large numeric values representing (almost exclusively) file-sizes. The size-string is composed of an integer value followed by a single letter denoting a 'unit-value'. The entire purpose of a size-string is to provide a human-readable, mistake-resistant, shortcut method of specifying large file sizes.
The default unit is byte if not otherwise specified.
DBD2 uses a concept referred to as 'soft comments' in the syntax of its primary (and included) configuration files (soft comments are not supported in ancillary files).
A soft comment 'looks' like a comment to other parsers, but are read as configuration commands by DBD2's more complex parser.
Syslog considers that the configuration file consists of 3 types of lines:
To allow DBD2 to ignore lines that it does not understand, use the --skip <n> command-line opiton.
This command tells the parser to ignore the next 'n' physical lines in the file (or to end-of-file whichever comes first).
Note that when softcomment is disabled, DBD2 treats output lines exactly as traditional configuration files do (a single hashtag is a comment), but when softcomment is enabled, DBD2 will still read and process any output-line preceded by a single hashtag.
Also, for all command-line parameters in the configuration file, 2 hashtags are required to 'comment-out' a line.
An sql-index-variable is an index into a conceptual array of variables (variables that have not yet been provided to DBD2). In database templates an sql-index-variable is represented by a a placeholder value consisting of a dollar-sign ('$') followed by a positive numeric value. An sql-variable is configured using the 'var' keyword (example: var 3 username,user,name). In this example, sql-index number '3' receives its value from incoming data (in 'name=value'format) when the name matches one of 'username', 'user' or 'name'. Required placeholders that have no matching data elements are filled with an empty (null) string ("").
DBD2 uses several different types of templates in its configuration. Here is an overview:
Examples of template-ids are 'a1' and 'z400'.
Examples of routing templates are 'a1z4' and 'a3a8a9y13z8q400'. (Each routing template component unambiguously identifies a destination database and a
corresponding template within that database to process.)
A DBD2 threadpool is a 'working unit' consisting of some number of execution-threads, a subroutine to execute and a data-structure that usually includes a FIFO buffer consisting of some number of 'message-slots' (lines) to store (as yet) unprocessed messages.
Input-threadpools (socket, kernel, name-pipe and file reading threadpools) do not use FIFO buffers since their input comes from external system buffers. The Houskeeping threadpool gets its input from the scheduling code in the parent thread.
Threadpool configuration consists of specifying how many execution-threads to create and how many 'storage slots' for unprocessed input to allocate in the FIFO buffer.
When CAP_WORKERTHREADS is declared, the worker-threadpool parameters can be optionally configured from the input-specifications instead of a dedicated (worker-thread-specific) keyword since worker-threadpool settings are directly related to the estimated amount of work submitted by the input-threadpools.
Every threadpool is identified by a non-negative integer and each threadpool-type maintains a distinct sequence of numeric values.
The support of most threadpool types other than socket-input threadpools depends on the declaration of optional capabilities.
The default threadpool number for any given threadpool type is always 0 [zero].
Specialty threadpools are limited to a single threadpool instance.
Specialty threadpools are 'kernel', 'user' and 'housekeeping' threadpools.
Multiple instances of non-specialty threadpools may be created simply by referencing them.
Unused threadpools (those that are defined but never assigned connections to service) are deleted before being created.
Technical Note: There is one (partial) exception to the statement that all configured threadpools use index 0 as their default. Because the subroutine algorithms for 'socket' and 'tailfile' types of input threadpools are incompatible, tailfiles and sockets cannot share the same threadpool id-number even though they share the same numeric sequence set. They are both still configured as if threadpool number 0 was the default, but internally, DBD2 will move the tailfile threadpool number 0 to a negative value to deconflict the two.
All DBD2 timers and schedules that define time intervals for recurring operations measure their intervals from the terminaion of one 'pass' to the scheduled time of the next pass. This represents a subtle, but significantly different perception than is commonly found which is that processes will always start at the scheduled time. The DBD2 approach allows for the realities of its multi-threaded design by allowing a scheduled process-start to be delayed until the required resources are released by all competing processes.
This approach leads directly to the observation that 'intervals' as set in the configuration file are just minimum values and may vary by several seconds based on resource-conflict wait-times.
It also leads to the visible result that 'mark' time stamps in local files will 'drift' over time since the process that excutes the 'mark' function takes finite time to execute.
Early development of DBD2 found that when timed processes are scheduled for execution, they will (at some point) inevitably come into conflict with each other.
The result (if uncontrolled) is resource conflicts and unexpected results as scheduled processes perhaps skip a scheduled run-time or delay their execution resulting in backing up other scheduled processes, etc, (or worst case) interfere with each other, resulting in a catestrophic application abort.
Several of DBD2's background processes are potentially time-consuming and lock data-structure resources for (comparatively) long time periods.
This tendency is most easily understood by imagining a configuration with 20 or 30 output files and sockets trying to check the entire list (including unspooling of any spooled data) while a file-rotation cycle is also scheduled to be on-going.
Who gets to control the integrity of the output-data-structures to insure the executing output-routines do not crash the system ?
Similar to a size-string and for mostly the same reasons, a time-string is also an integer value followed by a single letter:
The default unit is seconds if not otherwise specified.