When you start XL-Parser for the first time, you are asked if you want to set default configuration. If you click Yes,
you will be asked to select a directory where:
1 The GeoIP database (GeoLite2-City.mmdb) will be located if it already exists.
2 The OUI database (oui.txt) will be downloaded and converted (OUI.db) if it doesn't already
exist. It could exist in the XL-Tools user directory;
3 The IIN database (IIN.zip) will be downloaded and converted (IIN.db) if it
doesn't already exist. It could exist in the XL-Tools user directory;
4 The TLD Database (effective_tld_names.dat) will be downloaded if it doesn't already exist. It
could exist in the XL-Whois user directory;
8 The XL-Whois Database is searched and selected. If it can't been found, you will be asked
if XL-Whois is installed on the system and where is the database. The XL-Whois Database is created and is only updated when
you are using XL-Whois. If you don't have this database (or don't need it), just click on Cancel;
9 The Resolved TLD Database (Resolve TLD.db) will be downloaded if it doesn't already exist. It
could exist in the XL-Tools user directory;
10 The Datetime Database (DT.db) will be downloaded if it doesn't already exist. It
could exist in the XL-Tools user directory;
11 Finally, following default options will be selected or set:
Check for update at startup for the tool is selected;
Tool will try to figure out what is your local timezone. In case of failure, default timezone is set to
America/New York;
A XL-Parser.ini configuration file will also be created to keep your preferences in memory. You can also
set elements above manually, in case you already have some of these databases.
Settings
General
In Tool section, we have the following functions and options:
Export Lang.ini: Use this function to translate XL-Parser GUI. See Translation for help about this functionality.
Open user dir: This is the directory where settings are saved. Copy the content of
this directory if you want to backup your settings.
Check Update: Check on le-tools.com
if a tool update is available.
Check for update at startup: When XL-Parser starts, check on website for available
update of the tool.
In Functions section, you have the following options:
Choose to start XL-Parser in Full Screen;
Choose to Remember position of the GUI (usefull if you have multiple monitors);
GeoIP language: The GeoIP2 database from Maxmind provides data in
different language. You can set your preferred language here.
Set the NsLookup timeout. Default is 10 seconds.
Set the Local timezone. Default is America/New_York.
Set the Default language. Default is en-US.
Databases
In this tab, you have to set the location of four databases:
OUI Database: This database is the OUI (Organizationally Unique Identifier) that comes from the IEEE Standards Association. Direct link to this database is http://standards-oui.ieee.org/oui.txt. The downloaded database
(oui.txt) can't be used directly by XL-Parser. It must be converted to SQLite format. The import function (oui.txt to OUI.db)
can be used separately.
GeoIP Database: This database is the GeoLite City that comes from Maxmind.
To get this database and updates, you should follow these instructions:
Download the GeoIP update tool and install it
in the default directory (C:\ProgramData\MaxMind\GeoIPUpdate\).
Get a license key from your Maxmind account,
download the config file (GeoIP.conf), copy it to the default directory (C:\ProgramData\MaxMind\GeoIPUpdate\) and create
a "GeoIP" subfolder.
Execute the command line tool (geoipupdate.exe).
Create an automated task with Task Scheduler on Windows to update the database regularly.
IIN Database: This database is the Issuer Identification Number (IIN) database for credit and other payment card numbers. It
is downloaded from le-tools.com. Most of data come from the (now deleted) page List of Issuer Identification Numbers on
Wikipedia. It was mixed with another database from a banker association.
In this tab, you have to set the location of four databases:
Expression history and database: The History database is used to keep an history of the expressions you used when extracting.
The Expression database is used when you manually save your expressions. See Expressions window
for more details.
Log format database: This database contains patterns and regexs related to the different log formats supported by XL-Parser for
Web log analysis. See Log format database window for more details.
Log analysis filters database: This database contains your filters for Web log analysis. See
Log analysis filters database window for more details.
Saved queries database: This database contains your saved SQL queries for Web log analysis. See
Query database for more details.
XL-Toolkit Databases
In this tab, you have to set the location of three databases:
XL-Whois Database: This database is the one created by XL-Whois while you're
using it.
Resolve TLD Database: This database is used to provide details about a TLD (ex.: United States for .us).
There are four types of input: folder, file (or list of files), database and clipboard. Each type has its own options:
Folder: With folder, you have access to File Filters and
File Formats. You can also check Subfolders to include
subfolders in the search.
File(s): You can select one or multiples files. Once you've made your selection, XL-Parser will
examines it and determine which File Formats should be selected.
Database: This type is associated with Web log analysis. The
database must have been created by XL-Parser. When you select this type, the Functions section
will be modified to show available functions. When you select a valid database, XL-Parser will gather informations about it and
show these informations in the Current database tab of the
Functions section.
Clipboard: If you select this type, search will applied to the clipboard text content.
File Filters Window
When you select a Folder as input, you can use filters to restrict the scope of the search to files that match your
criteria. You can add, edit (by double clicking a filter in the grid) or delete filter. You can also save your filters for a later use.
File Filters can be shared between XL-FileTools and XL-Parser. When you click on the add button, the
following window will show up:
You have four types of file filters:
Contains
By default, the keyword or regular expression applies to the relative path (folder or file, but excluding input directory).
You can select Filename only option to exclude folders. If you select Regex option, a colored
border around the textfield will indicate if your regex is ok. If not, you can press the ? button to see the
error message.
File size
With Equal operator, you have the possibility to include a list of sizes, but only one size unit. You must
also know that if you select a size unit other than bytes, file size will be rounded down to the nearest integer.
Last accessed
With Equal operator, you have the possibility to include a list of dates (format must be the same as in the
control, ex.: 2017-04-22).
Last modified
Similar to previous filter.
File Formats window
XL-Parser can read many file formats including: text, unicode, Doc and Docx, Xls and Xlsx, Evt and Evtx, Pdf, Msg and Zip (text files inside
the archive). A cool feature here is that you can set the file extension to apply to the File format parser. For example, you can search a docx
file as it was a zip file by selecting Zip and adding .docx in the extension list. This will allows you to search in metadata of the docx file.
When you select Doc or Docx format, you can choose to parse the file with MS Word (if installed on your
system) or not. The legacy format (doc) will be search as unicode as the new format (docx) will be searched as zip.
For unicode format, you must select between two types: HTML and Other. With
HTML, XL-Parser will search the charset in the HTML file header. With other, it will try to determine
the appropriate encoding by analysing the file header.
Any change to extension list or options are saved in a file (name: FileFormats.json) in the user directory. The
Reset button restore options to default values.
Important: XL-Parser is not a searching tool which mean that you should not check all formats, select a full volume and hope
that it will find all the things.
Functions section
The functions section is divided with tabs. Shown tabs will depends on input type:
You can add, edit (by double clicking an expression in the grid) or delete as much expression as you want. If more than one expression is
used, priority depends on the order in the grid. You cannot set hierarchy of operators but you can change order of the expressions by
using the up and down button on the right. If one expression contains capture groups (with an AND operator), it should be placed at end. All
the expressions in the grid can be saved for a later used. There are also Extraction options available (see further down).
On top right corner, you have a Load results button. It can be use to open saved results. On bottom right corner, you
have a Rebuild list of files button. When you do a search, XL-Parser has to browse the input and list the files that will
be searched. The list of files is cached, so if you do another search without changing the input, XL-Parser doesn't have to rebuild this
list, so it will be faster. If content of the directory (file(s) added or deleted), you should click on the Rebuild list of
files button.
In the Expressions tab, you can set or edit a single, or a list of expressions. Expressions can be a string or a regular expression. Operator
must be set if you have more than one expression. For regular expression, capture groups are supported. Note that you should not use capture
groups with Invert option, as it cannot capture something on the a line that it doesn't keep. Comment (optional) can be used
to identify what the expression is supposed to extract. You can use the regex shortcuts to help you build your regular expression. On the right
side of the window, you have two button. The first one is used to add or edit the expression in the Expression database. The
second one is used to open the Expression tool window:
The Expression tool window can be used to test your expression. If this one contains capture groups, the extracted data will be in the
results section.
Expression history
The Expression history contains expressions you have used. This feature can be disabled. You can also also set how many expressions
must be keep in the history. When this limit is reached, older expressions are replaced by the new ones. You can select any of this expression
by double clicking on it or you can also right click to access the context menu.
Expression database
The Expression database contains expressions you have saved. You can select any of this expression by double clicking on it or you can also
right click to access the context menu.
Extraction options
Here are some explanations about the options:
Mode: By default, XL-Parser search (and extract) your expressions on each line* of a file. You can change this mode to
Search by file, so the whole file will have to meet all your expressions. With this mode, result will be a list of files instead a list of
extracted lines.
Max results: This option is used to limit the number of extracted lines for each file or for the whole process.
Include context: By default, only lines that contains your expression will be extracted. But sometimes, for chat logs
as an example, you want to see what is going one before and/or after the line that matches your expression.
No duplicates: If you have a list of expressions with the OR operator, lines are tested for each of the expression
and duplicates are allowed. This allows you to get results for each expression, but this also leads to a longer processing time. If you
don't need to get results for each expressions and want to accelerate the process, you can check this option.
* For some file formats, a line has a different meaning. For a doc(x), a line is a paragraph. For a Pdf, a line is a page.
Special objects
Special objects are expressions that XL-Parser can extract and gives informations about results. Each object has its own set of options
that can be selected. Special objects are:
IPv4 addresses: An IPv4 address is made up of four series of one to three digits, separated by dots. Each serie
has a value range from 0 to 255. To avoid mistake, XL-Parser ignore result followed by a dot. For example, if a line contains the following
sequence: 207.46.0.0.207.46.255.255, XL-Parser will only keep the IP address 207.46.255.255. This object has the following options:
Resolve ISP (XL-Whois): With this option, XL-Parser will match the IPv4 address to the known ISP, stored in your
XL-Whois database.
Resolve GeoIP: With this option, XL-Parser will gather GeoIP details about the IPv4 address in the GeoIP database.
NS Lookup: With this option, XL-Parser will do a lookup on every IPv4 addresses. You must be aware that lookup
require a network request, so it might take a while if you have a lot of results. You should use this option with caution.
Resolve TLD: If you check the NS Lookup option and if the lookup gives a result, XL-Parser
can also gather informations about the domain name TLD, in the Resolve TLD database.
IPv6 addresses: An IPv6 address is made up of eight series of four hexadecimal digits, separated by colons. The
notation can be compressed (ex.: leading zero may be omitted). XL-Parser should find them all. Options are the same as for IPv4
addresses, except that Resolve GeoIP is not supported.
URLs: XL-Parser finds expressions starting with http:, ftp:, www.,
./ or ../ as well as links in HTML tags: src= and href=. It
rejects internal links (starting by #) and the mailto:. XL-Parser accepts the following characters
inside the URL: +, /, ,, %, @, $,
., #, -, ;, &, ,
\, =, ?, : and _. Available options are:
Remove parameters: With this option, XL-Parser will remove parameters that follow the ? sign.
For example, if XL-Parser finds this: http://10.10.10.10/index.php?param1=1¶m2=2, it will
extract http://10.10.10.10/index.php.
Resolve TLD: With this option, XL-Parser will gather informations about the domain name TLD, in the
Resolve TLD database.
NS Lookup: With this option, XL-Parser will do a lookup on the extracted hostname inside the URL.
Resolve ISP (XL-Whois): If you check the NS Lookup option and if the lookup gives a result,
XL-Parser will match the IP address to the known ISP, stored in your XL-Whois database.
Emails: An email address follows many syntax rules. By default, XL-Parser will extract any expression that looks
like an email address. To reduce the number of false positives, you should use the Validate email domain option that
will use the TLD database (public suffix list) to validate the right part of the address. Other options are
similar to the ones in the URLs extraction.
Hostnames: A hostname is a domain name, preceded by a variable number of expressions that allows to designate a
particular host and to associate it with an IP address. The path hierarchy to this host is represented in reverse order. For example,
the hostname host22-44.adsl.domain.ca could designate the host host22-44, from the group ADSL, from the domain domain.ca. XL-Parser validate the domain name (by using the TLD database)
of all extracted hostnames. Options are similar to the ones in the URLs extraction.
Domain names: Similar of the hostname extraction, except only the domain name is extracted. Thereafter, this
domain name is validated (by using the TLD database). To reduce the number of false positives and errors,
domain names followed by a dot (.), a hyphen (-) or another letter are not extracted. Options are similar to the ones in the URLs
extraction.
MAC addresses: A MAC address is made up of six groups of two hexadecimal digits (lower case or upper case)
separated by hyphens (-) or colons (:). XL-Parser will extract the MAC address with both separator
as long as it not preceded or followed by the same separator. The Resolve MAC Address OUI will use the OUI database to identify the vendor (Organizationally Unique Identifier) asscoiated to the extracted MAC
address.
Credit cards: This includes credit cards, but also bank cards and other numbers related to magnetic cards. This
kind of numbers are composed from 12 to 19 digits. For the 16-digits numbers (most usual), XL-Parser consider also the format where
the numbers are grouped by four and separated by spaces (XXXX XXXX XXXX XXXX) or hyphens
(XXXX-XXXX-XXXX-XXXX). If you select Resolve Issuing company, XL-Parser
will use the IIN database to identify the issuying company asscoiated to the extracted number.
When you start extraction (Expressions or Special objects), a pop-up window appears. You can see the progress of the extraction
(updated after each searched file). As the process runs, it stores extracted data in a temp file (.XL-Parser_tmp.txt), in the report
directory (default is user directory, see report options further). When the process is done, the
grid will contains stats about extracted expressions (or objects). You can see results by file or by
expression/object.
Double click on a row to see results for each file or for each expression/object. This will create and open the results in a text
file. If you right click on a row in the by file grid, you have access to the following functions:
Show results for selected item(s): Show results for one or multiple files.
Show results for selected item(s) (No duplicates): Same as previous, but without duplicate. An additonnal
column is added that indicates how much duplicates have been found. Results will be presented by number of repetitions in
descending order.
Show results for selected item(s) (Results only): By default, when XL-Parser shows results from
by file grid, each line begins with the expression (or object) that matched. From by
expression/object grid, it begins with the file where the result has been found. This function show results
without any prefix.
Show results for selected item(s) (Results only - No Duplicates): Same as previous, but without
duplicate.
Open file(s): Open the file with the associated viewer in Windows (based on file extension).
Copy file(s): You will have to select a destination folder.
Select all: Select all rows.
Copy selected rows: Selected row values go to clipboard (same as regular text copy).
If you right click on a row in the by expression/object grid, available are Show results for selected
item(s), Show results for selected item(s) (No duplicates), Show results for selected item(s)
(Results only), Show results for selected item(s) (Results only - No Duplicates), Select
all and Copy selected rows.
You can Save results for a later use. If you click on the Build Report button, the following
window will show up:
Available options and default values will depend on the type of search and the number of results. For an extraction by expressions, default report format is TXT (TSV) as for an extraction by
special objects, default report format is XLSX. If many
expressions/special objects have been extracted, you have an additional option: One sheet per expression/object
for XLSX, One page per expression/object for HTML or One file per expression/object for
TXT (TSV). Include Source (the file where the result has been found) is only available for
extraction by expressions.
Whatever report format you have choosen, you can include Column headers, Source (for
extraction by expressions) and Expression/Object. These last two options
will add a column containing the corresponding value.
Web log analysis for files
Web log analysis tab interface is different, based on the input. Above is what it's look like, if input is
Folder/File(s). This function is for web logs only. The
purpose of this function is to create (or update) a SQLite database to store web logs, so it will be possible to do advanced
query on it.
When you create a database, informations about log files are stored in the database, including number of lines. For example, you
create the database in the middle of April, including log files from January. When you update the database in june, XL-Parser examine
the folder. Log files for January to March won't be added. The first part of April won't be added (based on number of lines), but
all new lines will be added, as well as the other log files (May and June).
The log format is a regular expression that will match every fields of every lines in the log file. Access logs of a web server
can be customized, so log formats in XL-Parser can also be customized. The first button on the right opens the Log format database Window and the second button call a function that will examine the input and
try to determine the appropriate log format.
By default, XL-Parser parses log files by last-modified time ascending order. If you want to verify or change this, you can click
on File order button. A window like the one below will open:
You can sort the files by clicking on the header of the column you wanna sort with, or you can order files manually by clicking
on the up and down button.
Set source files to readonly option will apply read-only attribute to every extracted log files. This will prevent
any accidental modification.
When you create (or update) the database, you can add a lot of informations to the database including ISP
(based on your XL-Whois database), GeoIP, Useragent
details and Weekdays. You can also limit the number of added lines by using Filters. If you have a big bunch of log files (~1 Gb or bigger), you should filter additionnal resources
if possible. This will reduce the size of the destination database. It's not unusual that requests for images represent more than
half of all requests in the log files.
If you use filters at the creation of the database, you must be aware that some filters may increase the whole processing time.
These filters are the ones that require resolving like ISP, GeoIP, useragent details or the filters based on time. If you have filters
that don't require resolving, you better use them before. Filters based on existing fields are faster. For example, filtering useragent with the string Windows NT 6.1 will be faster than
filtering UA_OS with the value Windows 7. That will
produce the same result, but to get the UA_OS detail, the useragent will have to be resolved
before testing the filter.
Resolving of ISP, GeoIP and useragent are done at the end (if you don't have filter that require these infos). Resolving Weekday
is done while processing. Because each entry have a datetime field, the number of resolved weekdays should be exactly the same as the
number of entries.
When you use filters, filtered lines (lines that don't matched the selected Filters) will be logged
in a file ([Database filename].FilteredLines.log), except if you unselect the option.
When a parsed line doesn't match the Log format, it will be logged in a file ([Database
filename].RejectedLines.log), except if you unselect the option. You may also ask XL-Parser to ignore comments
while logging rejected lines (starting with #).
Finally, you must enter the output database file name and directory. The Select database when finished option
will select the database as input when finished.
When you start the process, the main program will create the database, examine web logs and store all settings. Then the main
program will start XL-Parser-process.exe (in the program folder). A window will appears and shows progress and stats:
Log format database window
The Log format database contains the regular expressions that XL-Parser will use to parse log files. Every format has a name, a sample,
the pattern and the regex. You can add, edit or delete any log format.
In this window, you can add or edit a log format. Here are some explanations about each field:
Name: This is the name of format. It must be unique for each format in the database.
Sample: The sample is a line that match the regex. To see if it matches, look at the Parsed field.
It must contain the extracted data. Extract from input will extract a line from one of the log files.
Pattern: The pattern helps you create a log format without knowing regular expression. The pattern is also
important to correctly identify each field. The available field are: remoteIP, datetime,
http_method, http_request, http_params, http_protocol,
http_status, size, referer, useragent and other.
You should use the field list to help you select correct field and option, but you must add separator and syntax in the textfield.
For example, suppose you have "GET /index.html HTTP/1.1" in the sample. The pattern should be
"http_method http_request http_protocol". The fields useragent and
referer have an option about spaces. If it can contain spaces as Mozilla/4.5 [en]
(WinNT; I), you must check the option. The IIS log format style use + sign instead of spaces. The only
field that require particular settings is the datetime.
The datetime format use the Datetime database that could be the same that you use within
XL-Tools. The list correspond to the datetime marked as input (Use As). See following
next.
Fields may contain spaces ("Apache Style"): As space is used to be the default field separator in web log files,
one field should not contain spaces. With Microsoft IIS that uses Extended
Log Format, spaces are replaced by plus (+) sign. On Apache, that uses Common Log Format, field with spaces are surrounded by double-quotes, so instead of spaces, the field cannot contain
double-quotes. If you check this option, double-quotes will be rejected from any field. You must explicitly include double-quotes (used
as delimiter) in your pattern. Ex. : "useragent-s"
Use IIS header: If log files are related to IIS and/or contains at least one line that defines header as comment
(ex.: #Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version
cs(User-Agent) cs(Referer) sc-status sc-bytes cs-bytes, you could use this button. XL-Parser will extract this line and
build the pattern with it. Important: Be sure to check if the datetime has the correct timezone, because IIS log
doesn't indicate the timezone inside its log files (defaut is UTC).
Regex: By default, you doesn't have to do anything for this field. With the Match pattern
option checked, XL-Parser will create the regular expression that will match the log format pattern.
Parsed: If everything is ok, you should see every field and its value in this grid.
Datetime database window
The Datetime database is used to store and manage all datetime formats that will be available in the tool. This database is used with
Log format (input), with Split logs (input) and with
report options when Query database. You can add, edit or delete
any datetime object in the database. If you select same datetime database that you use with XL-Tools, new objects will be usable in both tools.
When you add or edit a datetime object, you must provide the following details:
Sample: The sample is an exemple of the datetime object. The sample will be
used to identify the datetime object so it must be unique.
Pattern: The pattern is based on the well known strftime
function. You can use the combobox on right side of the textfield to select appropriate symbol. The
most important symbols are listed in the combobox. You should also be able to use any symbol supported
by strftime, even if not listed.
Regex: The regex must match the pattern and is used to extract the datetime object
from any string. If the Match pattern option is selected, it will be created automatically
while you're entering the pattern.
Timezone: This is the default timezone to be used if no timezone is provided in the
datetime object (ex.: timezone is provided if pattern contains %Z, %z or %s symbol).
Parsed: If regex and pattern match the sample (which should indicate that everything
is ok), the parsed datetime object will be shown here.
Use as: The datetime object could be used as input format, output format or both. It
can also be set to none so it will be stored in the database, but not available for input or output in
XL-Tools main interface.
Comment: This is optional. The comment is used to give more information about the
datetime object.
If the Add or Edit button are still disabled after you filled all the
fileds, you can click on the ? to see what is missing or is incorrect.
Ambiguous timezone abbreviations
Some timezone abbreviations are ambiguous, because there are matching more than one timezone. For example,
CST could be Central Standard Time (USA), China Standard Time, or Cuba Standard Time. You can take a look here. So, I decided to set those timezone
with the following values:
ADT = -0300 (Atlantic Daylight Time)
AST = -0400 (Atlantic Standard Time)
BST = +0100 (British Summer Time)
BT = +0600 (Baghdad Time)
CAT = +0200 (Central Africa Time)
CCT = +0630 (Cocos Islands Time)
CDT = -0500 (Central Daylight Time)
CST = -0600 (Central Standard Time)
EAST = -0600 (Easter Island Standard Time)
ECT = -0500 (Ecuador Time)
EDT = -0400 (Eastern Daylight Time)
EST = -0500 (Eastern Standard Time)
FDT = -0100 (Fernando de Noronha Daylight Time)
FST = -0200 (Fernando de Noronha Standard Time)
GST = -0200 (South Georgia Time)
IDT = +0300 (Israel Daylight Time)
IST = +0200 (Israel Standard Time)
MET = +0100 (Middle European Time)
NFT = -0230 (Newfoundland Daylight Time)
NST = -0330 (Newfoundland Standard Time)
PST = -0800 (Pacific Standard Time)
SAST = +0200 (South Africa Standard Time)
SST = +0800 (Singapore Standard Time)
WAST = +0200 (West Africa Summer Time)
WST = +0800 (Western Standard Time)
I have try to set it to the most popular value, but I could be wrong. If the datetime you need to parse have
a different offset, you can parse the datetime without the timezone and set the appropriate offset in the input
format.
Filters window
Filters are divided in tree groups: White List, Regular and IP List. When you use
White List filters, every entries that will match the filters will be ignored (or removed). Using Regular
filters does the opposite. Every entries that match those filters will be selected and others will be ignored. IP List is
just a type of regular filters that will match the remoteIP field only.
Filters are grouped by category. If you want to create a new category, you must use the Add button without
selecting any category. If you want to create a new filter in an existing category, you must select the category first, then push
the Add button. You can edit or delete any filter.
Filters are in fact a set of filters. You can, edit or delete any filter in the set.
Here are some explanations about each field:
Operator: Value are OR or AND.
Field: Field name in the database (or logs).
Condition: Available conditions depend on the selected field and the type of values. For text, available values
are usually is, is not, contains or does not contain You have
different conditions for datetime objects or numeric values.
Value: The type of value depends on the selected field and the condition. For contains condition
for example, you have the Regex option. For Weekday, you can choose the value from a list. For datetime, you can set
a date and/or a time.
A field filter rule can be selected/added to the Field Filter database tab.
Split logs
This function can be used to split logs (or any text files) into smaller pieces. Files can be splitted by size, by
number of lines or by time.
By size
You have to provide a size for the chunks. Files that exceed this size will be splitted. The final size of each chunks may be a bit
different because XL-Parser don't split lines. If you select the Use datetime in filename option (log files only), XL-Parser
will extract the first timestamp of each file and use it to name the file. Otherwise, it will use a simple counter.
By number of lines
This function is similar to the previous, except it will simply count the lines instead of evaluating the size of each line. The Use
datetime in filename option is also available.
By time
This function is for log files only. With this function, XL-Parser will split the logs based on the timestamp of each line. The log files
can be splitted by hour, by day, by week or by month. Each file will be
named using the first timestamp of the file.
Web log analysis for XL-Parser database
Once you've created a log database with XL-Parser (see Web log analysis for files) and you select
the database as input (don't forget to select the Database radio button), you have access to the following functions (tabs):
Current database
When you select a database as input, XL-Parser opens it and gather informations about this database. In this tab, you can see these
informations:
General:
Last update: Date and time when the database have been created or updated.
Period: Dates of the first and last entries of the database.
Number of:
Entries: Number of entries in the log table.
Filtered: Number of lines filtered (lines that doesn't matched the selected Filters).
Rejected: Number of line rejected (lines that doesn't matched the Log
format).
Source:
Log format: The log format that was used to parse the logs.
Source file(s)...: Details about source log file(s). This will open a pop-up window:
In the source file(s) grid, you can see informations about the log files that have been parsed to build the database. The
Exists column indicates if the file can still be found on the drive. You can change the path by right clicking on a
line. If the path contains other log files, all the grid will be updated at the same time.
Rejected: This will open the rejected lines file ([Database filename].RejectedLines.log)
created when database was created. This file should be in the same directory of the database.
Filtered: This will open the filtered lines file ([Database filename].FilteredLines.log)
created when database was created. This file should be in the same directory of the database.
Resolved:
ISP: If the corresponding option has been used when creating/updating the database, this value indicates how
much Internet Service Providers (ISP) have been resolved even if the result was unknown. So this may also indicate how much distinct IP
addresses are in the logs.
GeoIP: Same as previous, except GeoIP only applies to IPv4 addresses in XL-Parser.
UAs: Number of resolved useragents.
Weekdays: Number of resolved weekdays. It should be the same value as the number of entries because each
entry must contain a valid timestamp.
Update database
You can update a log database or create a new one with different settings. For example, you may want to update resolved ISPs after doing
some whois or you may want to create a filtered database, based on the full database you've created in a first time.
When you use update, the estimated time left is about the current task. For example, when it resolve IP addresses, the
estimated time left represents the time to resolve all IP addresses.
Query database
In this tab, you can query the database. XL-Parser provides some tools to help you build a SQL Query. If you're familiar with SQL language,
you can create (and save) your own queries. XL-Parser uses SQLite for database format. A database contains five tables:
INFO: This table contains informations about the database (shown in Current Database tab).
FILES: This table contains informations about the source files that have been parsed to create the database (also shown
in Current Database tab).
LOG: This table contains all the requests.
IP: This table contains resolved ISP and GeoIP related to any distinct IP addresses found in the log (if option has
been used when creating/updating the database).
UA: This table contains details related to any distinct useragents found in the log (if option has been used when
creating/updating the database). Available details include: type (ex.: pc, smartphone, tablet, etc.),
os, browser, device and lang.
Note that all timestamps inside the database are in unixtime format (UTC timezone). They will be converted when building the report (see
report options further). One exception: Time of the day is stored based on your timezone at the time the
database was created which means time is not ajusted with daylight saving time (or inverse).
To build a query, you must first select which columns will be extracted in the output:
You can select all columns or any column(s) from the LOG table. DISTINCT option can be selected to
remove duplicates. If you select the Count option, output will not be the value but the number of values that match the
query. For example, SELECT COUNT(*) FROM LOG will give you the number of rows in the
LOG table. SELECT DISTINCT remoteIP FROM LOG will give you the list of all distinct
remote IP addresses in the LOG table. After selecting the column(s), you may select one or multiple filters. Then, you may want to add some clauses:
Group by: This clause will arrange identical results into groups. For example, SELECT
remoteIP FROM LOG GROUP BY remoteIP will give you the list of all unique remote IP addresses in the LOG table.
Order by: This clause will sort the results by ascending (ASC) or descending (DESC)
order. For example, SELECT DISTINCT http_request,http_status FROM LOG ORDER BY http_status ASC will
give you the list of all unique web requests, ordered by the http status code (200, 302, 404, 503, etc.)
Having count: This clause allows you to specify a count condition to apply to the results. For example, SELECT remoteIP FROM LOG GROUP BY remoteIP HAVING COUNT(remoteIP) > 100 will give you the list of unique IP
adresses that send more than 100 requests.
The most interesting things can be done if you know how to build your own SQL queries. For more informations about SQLite syntax, I suggest
you this SQLite Tutorial. The query can be tested at any time
by pressing the Test button.
When you're satisfied with your query, you can save it in a database for a later use.
You can use an existing category or create a new one.
Queries are stored by category and name in the database. They can be edited or deleted at any time. The date correspond to the time the
entry was created or the last time the entry was edited. The used value correspond to the number of the time the query was used. You can
sort the grid using by clicking any column header.
When you're done, you can process your query. Report will be produced using default options that you can change at any time:
Here are some explanations about each option:
Folder: This is the folder where report will be saved.
Format: Report can be saved in tree different formats: XLSX, HTML and TXT
(TSV). You should be very careful with HTML format because web logs could contain malicious code. By default, XL-Parser sanitize
output when using this format.
Datetime format: As mentionned before, timestamps are stored in unixtime format (UTC) in the database. This option
allows you to change the output format and timezone using your datetime format database. The list of choices includes all datetime objects
that have been set to Use as output (or both).
Includes:
Column headers: First row (or line) will be the name of each field.
Source: Use this option if you want to know where (original log file) the request is coming from. You can only use
this option if your query return all rows.
ISP: Add a column that return resolved ISP from the IP table. You can only use this option if your
query return the remoteIP column and the database contains resolved ISP data.
GeoIP: Same as previous, but for the GeoIP data.
Useragent details: Add a column that return useragent details from the UA table. You can only use
this option if your query return the useragent column and the database contains resolved Useragent data.
Weekday: Based on the timestamp, this will add a column with the weekday name, based on your default
language.
For example, you can use a SQL query to extract all unique IP addresses from the log and add the column ISP and GeoIP.
Suspicious activities
This function is experimental. Suspicious activities are abnormal activities. Detecting abnormal activities without
knowing the normal activities can be hazardous, but most of abnormal activities come from Web scanners and hacking tools. Most of these
tools are built to test and obtain results and not to evade detection. To detect suspicious activities, XL-Parser uses 14 indicators
that you can select or not. Here are some explanations about each indicator:
High number of requests: Web scanners and hacking tools are used to generate many more requests than regular users.
XL-Parser determine the average number of requests for each user and get the list of IP addresses that generate the higher number of
requests in this list.
Request length (nbr): Web scanners and hacking tools are used to inject a lot of things in their requests, so
they will produce longer requests than normal. XL-Parser determine the average length of requests in web logs and get the list of IP
addresses that generate the highest number of requests above average. The option for this indicator is the minimum length to consider
when establishing the average length of requests in web logs.
Request length (max): Same as previous, but result are sorted by length, which mean the IP address that generate
the request with the greatest length comes first.
URI Encoding: To inject their malicious code or to detect vulnerabilities on your website, web scanners and
hacking tools use special chars that are not supported (or cannot be pass) in HTTP or as URL. URI encoding (Percent-encoding) will then be used to bypass
this limitation or to bypass filtering. By default, XL-Parser will only search for the % symbol, but you can
reduce the number of false positive by providing a list specific characters. Each characters must be separated by the |
symbol. Ex.: %22|%27|%28|%29|%5B|%5D|%2B.
HTTP Method: Common (HTTP method are GET, POST, HEAD, TRACE,
OPTIONS and CONNECT. Regular users will probably only use GET or
POST, but malicious users, web scanners and hacking tools will use other methods while probing your website.
One of the most interesting target for malicious users are your forms, because it's one of the most common place where vulnerabilities
can be found. GET method can be used for forms, but it may be associated with other kind of requests as POST is always associated
with forms. When POST is used, parameters will not be stored in your web logs, so XL-Parser will not detect the other malicious
behavior related to the injected parameters. So, a high number of POST from a specific user can be a really good indicator, but it
can also generate a lot of false positives if POST is a common method used on your website. You can provide any method as option.
Each method must be separated by the | symbol. Ex.: POST|TRACE|HEAD|OPTIONS|CONNECT.
High number of errors: The use of web scanners and hacking tools can produce a lot of errors. As example,
forced browsing technique will produce a lot of 404 errors. You can provide any status code as option. Each code must be separated
by the | symbol. Ex.: 40|50 will consider all 40x and 50x errors.
SQL query: If you are using a SQL database on your website, you will be targeted. The presence of SQL queries
in your web logs is certainly suspicious, because SQL queries are supposed to be built and sent by your scripts to the database and
not by your visitors, so there are not supposed to be found in web logs. Most common SQL instructions that can be used by malicious
users, web scanners or hacking tools are UNION and SELECT but you can edit this by providing
any keyword, in the same way as other indicators.
Use of quotes: Quotes and double quotes are used with several injection techniques.
Directory traversal: Malicious users, web scanners and hacking tools often try to get access to files that are
outside of the website space. A common way to do this is using directory traversal, so you will find the ../ (dot
dot slash) in their requests. The expression can be found in many encoded ways like ..%2f,
%2e%2e/, %2e%2e%2f, or you may also find the Windows
variant ..\.
Remote file inclusion: A web address in the request may indicate a remote file inclusion attempt.
Admin or login scan: Web scanners will try to locate your login pages or your admin sections. Those pages may
also be targeted by brute forcing tools.
Web scanner: Web scanners can be detected in many ways. Here you can set the list of all keywords that can be
found in request, parameters or in the useragent field. A list is provided with the tool, but this list is not complete.
No useragent: Regular users, as legitimate robots, use the useragent field to identify themselves and/or to
indicate what kind of system they are running and technologies they can accept. Not providing a useragent is not a normal behavior
and can indicate some bad intentions. To be able to use this indicator, useragents must be found in the logs for regular users. As
option, you must provide the maximum length of the useragent, to be considered as not provided. This indicator is the slowest because
the length of each useragent must be calculated.
Many useragents: Web scanners and hacking tools can try to inject data in the useragent field, so a high number
of different useragents for a single user can be something suspicious. For this indicator, XL-Parser will sort IP addresses by the
number of different useragents there are associated with. Be aware that shared IP (as used with Carrier-grade NAT (CGN) for example),
may produce false positives.
Any change to indicator options are saved in a file (name: SA_Indicators.json) in the user directory. The
Reset button restore options to default values. You can also Save and Load different
set of options.
XL-Parser use each indicator and produce two list of results: By IP address and Activities only.
The By IP address list represent the list of IP addresses that are associated with the most suspicious things inside
the logs. A score is attributed to each IP address and the list is sorted using the value. A higher score means that the IP address
is associated with many suspicious activities. Here it how it works:
1 For each indicator, XL-Parser get a list of IP address that match the indicator. The
limit option indicates the maximum number of IP addresses to keep for each indicator. If the value is
10, it means that XL-Parser will only consider the top 10 of the list. The list is sorted
by the number of matches. As example, if the indicator is the High number of errors,
the first IP address in list will be the one associated with higher
number of errors.
2 The rank of the IP address in each list is multiplied by the score
associated with the indicator. As examples, if the indicator is the High number of errors
with a limit of 10 and a score of 4, the IP address that will have generated the higher number of errors will receive a score of
40 (10*4) for this indicator. The second IP address in list will receive a score of
36 (9*4).
The limit option and the number of selected indicators will determined the maximum results you can get. If all
limits are set to 10 and you use 10 indicators, the
maximum of results you can get is 100 (10*10). But the IP addresses associated with the most
suspicious activities will be found in many indicator lists, so you won't probably get as many results. Anyway, you can set the maximum
of results (globaly).
Select columns and Report options options will be used when querying database from the result
window (see further). To change these options without having to reanalyze, you just have to select different options in the Report
options window and click on the Save button. Also, you can open previously saved results by using the
Load results button. When you are viewing previously saved results, you can still modify report options, but you will
have to select the saved result file from the Open Filename window.
Here is an example of results:
In the result grid, you can double-click on any row to get the requests for this particular IP address, or select more than one row
and right-click on the selection. The right-click open a context menu with the following functions:
Get request from selected IP(s): Launch a new process and get all requests for the selected IP address(es). Report
options are the ones that were selected in the main interface.
Get useragent for selected IP(s): Launch a new process and get all useragent(s) for the selected IP address(es).
Resolve/update all IPs (add columns): Add or update ISP and GeoIP columns. If the columns don't exist, they will be
created. If the columns already exist, data will be updated for each IP address. Note that the data is not gathered from the log database,
but resolved from the original databases (XL-Whois database and GeoIP database). So data will be added to grid even it doesn't exist in
the log database. Also, if the data is updated in the database where it comes from, you can update the data in the grid by executing this
function. This is particulary interesting if some ISP values are marked as unknown.
Send unknown IPs to clipboard (ISP): This function will make a list of all unknown IP address(es) based on the ISP
column and push it to the clipboard. After that, you can use XL-Whois to identify the unknown ISP(s). Then you can execute the
Resolve/update all IPs function to update the data.
Select All: This will select all the grid content, including the headers.
Copy selected row(s): This will copy the selected row(s) to the clipboard.
The Activities only results reprsents the number of requests that match each indicator. Double-click on a row (or many
rows) to get the requests for this indicator(s).
Reports produced from the result grids will use the options in the main window.