XL-Parser Documentation

First start

When you start XL-Parser for the first time, you are asked if you want to set default configuration. If you click Yes, you will be asked to select a directory where:

  1. The GeoIP database (GeoLiteCity.dat) will be downloaded if it doesn't already exist. It could exist in the XL-Tools user directory;
  2. The OUI database (oui.txt) will be downloaded and converted (OUI.db) if it doesn't already exist. It could exist in the XL-Tools user directory;
  3. The IIN database (IIN.zip) will be downloaded and converted (IIN.db) if it doesn't already exist. It could exist in the XL-Tools user directory;
  4. The TLD Database (effective_tld_names.dat) will be downloaded if it doesn't already exist. It could exist in the XL-Whois user directory;
  5. An Expression database and an Expression history database will be created.
  6. The Log format database (LF.db) will be downloaded;
  7. A Log analysis filters database will be created;
  8. The XL-Whois Database is searched and selected. If it can't been found, you will be asked if XL-Whois is installed on the system and where is the database. The XL-Whois Database is created and is only updated when you are using XL-Whois. If you don't have this database (or don't need it), just click on Cancel;
  9. The Resolved TLD Database (Resolve TLD.db) will be downloaded if it doesn't already exist. It could exist in the XL-Tools user directory;
  10. The Datetime Database (DT.db) will be downloaded if it doesn't already exist. It could exist in the XL-Tools user directory;
  11. Finally, following default options will be selected or set:
    • Check for update at startup for the tool is selected;
    • NsLookup timeout is set to 10 seconds;
    • Check for update at startup for GeoIP database and TLD database are selected;
    • Default language is set to en-US;
    • Tool will try to figure out what is your local timezone. In case of failure, default timezone is set to America/New York;

A XL-Parser.ini configuration file will also be created to keep your preferences in memory. You can also set elements above manually, in case you already have some of these databases.

Top of the page

Config Window

XL-Parser Config Window - General

General

In Tool section, we have the following functions:

  • Export Lang.ini: Use this function to translate XL-Parser GUI. See Translation for help about this functionality.
  • Open user dir: This is the directory where all user databases and settings are saved. Copy all this directory if you want to backup your settings.
  • Check Update: Check on le-tools.com if a tool update is available.
  • Check for update at startup: Check if and update is available everytime XL-Parser is started.

In Functions section, you can:

  • Choose to start XL-Parser in Full Screen;
  • Choose to Remember position of the GUI (usefull if you have multiple monitors);
  • Set the NsLookup timeout. Default is 10 seconds.
  • Set the Local timezone. Default is America/New_York.
  • Set the Default language. Default is en-US.

Top of the page

Databases

XL-Parser Config Window - Databases

In this tab, you have to set the location of three databases:


Top of the page

XL-Parser Databases

XL-Parser Config Window - XL-Parser Databases

In this tab, you have to set the location of four databases:

  • Expression history and database: The History database is used to keep an history of the expressions you used when extracting. The Expression database is used when you manually save your expressions. See Expressions Window for more details.
  • Log format database: This database contains patterns and regexs related to the different log formats supported by XL-Parser for Web log analysis. See Log format database Window for more details.
  • Log analysis filters database: This database contains your filters for Web log analysis. See Log analysis filters database Window for more details.
  • Saved queries database: This database contains your saved SQL queries for Web log analysis. See Query database for more details.
Top of the page

XL-Toolkit Databases

XL-Parser Config Window - XL-Toolkit Databases

In this tab, you have to set the location of four databases:

  • XL-Whois Database: This database is the one created by XL-Whois while you're using it.
  • Resolve TLD Database: This database is used to provide details about a TLD (ex.: United States for .us).
  • Datetime Database: This database contains different datetime formats to be used as input in Web log analysis and as output when using Query Database. See Datetime database Window for more details.
Top of the page

Input section

XL-Parser - Input section

There are four types of input: folder, file (or list of files), database and clipboard. Each type has its own options:

  • Folder: With folder, you have access to File Filters and File Formats. You can also check Subfolders to include subfolders in the search.
  • File(s): You can select one or multiples files. Once you've made your selection, XL-Parser will examines it and determine which File Formats should be selected.
  • Database: This type is associated with Web log analysis. The database must have been created by XL-Parser. When you select this type, the Functions section will be modified to show available functions. When you select a valid database, XL-Parser will gather informations about it and show these informations in the Current database tab of the Functions section.
  • Clipboard: If you select this type, search will applied to the clipboard text content.

File Filters Window

XL-Parser - File Filters Window

When you select a Folder as input, you can use filters to restrict the scope of the search to files that match your criteria. You can add, edit (by double clicking a filter in the grid) or delete filter. You can also save your filters for a later use. File Filters can be shared between XL-FileTools and XL-Parser. When you click on the add button, the following window will show up:

XL-Parser - Select File Filters Window

You have four types of file filters:

  • Contains: By default, the keyword or regular expression applies to the relative path (folder or file, but excluding input directory). You can select Filename only option to exclude folders. If you select Regex option, a colored border around the textfield will indicate if your regex is ok. If not, you can press the ? button to see the error message.

  • XL-Parser - Select File Filters Window
  • File size: With Equal operator, you have the possibility to include a list of sizes, but only one size unit. You must also know that if you select a size unit other than bytes, file size will be rounded down to the nearest integer.

  • XL-Parser - Select File Filters Window
  • Last accessed: With Equal operator, you have the possibility to include a list of dates (format must be the same as in the control, ex.: 2017-04-22).
  • Last modified: Similar to previous filter.

File Formats Window

XL-Parser - File Formats Window

XL-Parser can read many file formats including: text, unicode, Doc and Docx, Xls and Xlsx, Evt and Evtx, Pdf, Msg and Zip (text files inside the archive). A cool feature here is that you can set the file extension to apply to the File format parser. For example, you can search a docx file as it was a zip file by selecting Zip and adding .docx in the extension list. This will allows you to search in metadata of the docx file.

When you select Doc or Docx format, you can choose to parse the file with MS Word (if installed on your system) or not. The legacy format (doc) will be search as unicode as the new format (docx) will be searched as zip.

For unicode format, you must select between two types: HTML and Other. With HTML, XL-Parser will search the charset in the HTML file header. With other, it will try to determine the appropriate encoding by analysing the file header.

Any change to extension list or options are saved in a file (name: FileFormats.json) in the user directory. The Reset button restore options to default values.

Important: XL-Parser is not a searching tool which mean that you should not check all formats, select a full volume and hope that it will find all the things.

Top of the page

Functions section

The functions section is divided with tabs. Shown tabs will depends on input type. For Folder, File(s) and Clipboard, you have access to Extraction, Web Log analysis and Split logs tabs:

XL-Parser - Functions section

With a XL-Parser log database as input, you have access to Current Database, Update database, Query Database and Suspicious activities tabs:

XL-Parser - Functions section
Top of the page

Extraction

XL-Parser provides two types of extraction: by Expressions and by Special objects.

Expressions

XL-Parser - Functions Extraction Expression

You can add, edit (by double clicking an expression in the grid) or delete as much expression as you want. If more than one expression is used, priority depends on the order in the grid. You cannot set hierarchy of operators but you can change order of the expressions by using the up and down button on the right. If one expression contains capture groups (with an AND operator), it should be placed at end. All the expressions in the grid can be saved for a later used. There are also Extraction options available (see further down).

On top right corner, you have a Load results button. It can be use to open saved results. On bottom right corner, you have a Rebuild list of files button. When you do a search, XL-Parser has to browse the input and list the files that will be searched. The list of files is cached, so if you do another search without changing the input, XL-Parser doesn't have to rebuild this list, so it will be faster. If content of the directory (file(s) added or deleted), you should click on the Rebuild list of files button.

Expressions Window

The Expression Window is divided in three tabs: Expressions, Expression history and Expression database.

XL-Parser - Functions Extraction Expression Window

In the Expressions tab, you can set or edit a single, or a list of expressions. Expressions can be a string or a regular expression. Operator must be set if you have more than one expression. For regular expression, capture groups are supported. Note that you should not use capture groups with Invert option, as it cannot capture something on the a line that it doesn't keep. Comment (optional) can be used to identify what the expression is supposed to extract. You can use the regex shortcuts to help you build your regular expression. On the right side of the window, you have two button. The first one is used to add or edit the expression in the Expression database. The second one is used to open the Expression tool window:

XL-Parser - Functions Extraction Expression Tool Window

The Expression tool window can be used to test your expression. If this one contains capture groups, the extracted data will be in the results section.

XL-Parser - Functions Extraction Expression Window

The Expression history contains expressions you have used. This feature can be disabled. You can also also set how many expressions must be keep in the history. When this limit is reached, older expressions are replaced by the new ones. You can select any of this expression by double clicking on it or you can also right click to access the context menu.

XL-Parser - Functions Extraction Expression Window

The Expression database contains expressions you have saved. You can select any of this expression by double clicking on it or you can also right click to access the context menu.

XL-Parser - Functions Extraction Options Window

Here are some explanations about the options:

  • Mode: By default, XL-Parser search (and extract) your expressions on each line* of a file. You can change this mode to Search by file, so the whole file will have to meet all your expressions. With this mode, result will be a list of files instead a list of extracted lines.
  • Max results: This option is used to limit the number of extracted lines for each file or for the whole process.
  • Include context: By default, only lines that contains your expression will be extracted. But sometimes, for chat logs as an example, you want to see what is going one before and/or after the line that matches your expression.
  • No duplicates: If you have a list of expressions with the OR operator, lines are tested for each of the expression and duplicates are allowed. This allows you to get results for each expression, but this also leads to a longer processing time. If you don't need to get results for each expressions and want to accelerate the process, you can check this option.

* For some file formats, a line has a different meaning. For a doc(x), a line is a paragraph. For a Pdf, a line is a page.

Top of the page

Special objects

XL-Parser - Functions Extraction Special objects

Special objects are expressions that XL-Parser can extract and gives informations about results. Each object has its own set of options that can be selected. Special objects are:

  • IPv4 addresses: An IPv4 address is made up of four series of one to three digits, separated by dots. Each serie has a value range from 0 to 255. To avoid mistake, XL-Parser ignore result followed by a dot. For example, if a line contains the following sequence: 207.46.0.0.207.46.255.255, XL-Parser will only keep the IP address 207.46.255.255. This object has the following options:
    • Resolve ISP (XL-Whois): With this option, XL-Parser will match the IPv4 address to the known ISP, stored in your XL-Whois database.
    • Resolve GeoIP: With this option, XL-Parser will gather GeoIP details about the IPv4 address in the GeoIP database.
    • NS Lookup: With this option, XL-Parser will do a lookup on every IPv4 addresses. You must be aware that lookup require a network request, so it might take a while if you have a lot of results. You should use this option with caution.
    • Resolve TLD: If you check the NS Lookup option and if the lookup gives a result, XL-Parser can also gather informations about the domain name TLD, in the Resolve TLD database.
  • IPv6 addresses: An IPv6 address is made up of eight series of four hexadecimal digits, separated by colons. The notation can be compressed (ex.: leading zero may be omitted). XL-Parser should find them all. Options are the same as for IPv4 addresses, except that Resolve GeoIP is not supported.
  • URLs: XL-Parser finds expressions starting with http:, ftp:, www., ./ or ../ as well as links in HTML tags: src= and href=. It rejects internal links (starting by #) and the mailto:. XL-Parser accepts the following characters inside the URL: +, /, ,, %, @, $, ., #, -, ;, &,  , \, =, ?, : and _. Available options are:
    • Remove parameters: With this option, XL-Parser will remove parameters that follow the ? sign. For example, if XL-Parser finds this: http://10.10.10.10/index.php?param1=1¶m2=2, it will extract http://10.10.10.10/index.php.
    • Resolve TLD: With this option, XL-Parser will gather informations about the domain name TLD, in the Resolve TLD database.
    • NS Lookup: With this option, XL-Parser will do a lookup on the extracted hostname inside the URL.
    • Resolve ISP (XL-Whois): If you check the NS Lookup option and if the lookup gives a result, XL-Parser will match the IP address to the known ISP, stored in your XL-Whois database.
  • Emails: An email address follows many syntax rules. By default, XL-Parser will extract any expression that looks like an email address. To reduce the number of false positives, you should use the Validate email domain option that will use the TLD database (public suffix list) to validate the right part of the address. Other options are similar to the ones in the URLs extraction.
  • Hostnames: A hostname is a domain name, preceded by a variable number of expressions that allows to designate a particular host and to associate it with an IP address. The path hierarchy to this host is represented in reverse order. For example, the hostname host22-44.adsl.domain.ca could designate the host host22-44, from the group ADSL, from the domain domain.ca. XL-Parser validate the domain name (by using the TLD database) of all extracted hostnames. Options are similar to the ones in the URLs extraction.
  • Domain names: Similar of the hostname extraction, except only the domain name is extracted. Thereafter, this domain name is validated (by using the TLD database). To reduce the number of false positives and errors, domain names followed by a dot (.), a hyphen (-) or another letter are not extracted. Options are similar to the ones in the URLs extraction.
  • MAC addresses: A MAC address is made up of six groups of two hexadecimal digits (lower case or upper case) separated by hyphens (-) or colons (:). XL-Parser will extract the MAC address with both separator as long as it not preceded or followed by the same separator. The Resolve MAC Address OUI will use the OUI database to identify the vendor (Organizationally Unique Identifier) asscoiated to the extracted MAC address.
  • Credit cards: This includes credit cards, but also bank cards and other numbers related to magnetic cards. This kind of numbers are composed from 12 to 19 digits. For the 16-digits numbers (most usual), XL-Parser consider also the format where the numbers are grouped by four and separated by spaces (XXXX XXXX XXXX XXXX) or hyphens (XXXX-XXXX-XXXX-XXXX). If you select Resolve Issuing company, XL-Parser will use the IIN database to identify the issuying company asscoiated to the extracted number.

Top of the page

Extraction Results

XL-Parser - Functions Extraction Results

When you start extraction (Expressions or Special objects), a pop-up window appears. You can see the progress of the extraction (updated after each searched file). As the process runs, it stores extracted data in a temp file (.XL-Parser_tmp.txt), in the report directory (default is user directory, see report options). When the process is done, the grid will contains stats about extracted expressions (or objects). You can see results by file or by expression/object.

Double click on a row to see results for each file or for each expression/object. This will create and open the results in a text file. If you right click on a row in the by file grid, you have access to the following functions:

  • Show results for selected item(s): Show results for one or multiple files.
  • Show results for selected item(s) (No duplicates): Same as previous, but without duplicate. An additonnal column is added that indicates how much duplicates have been found. Results will be presented by number of repetitions in descending order.
  • Show results for selected item(s) (Results only): By default, when XL-Parser shows results from by file grid, each line begins with the expression (or object) that matched. From by expression/object grid, it begins with the file where the result has been found. This function show results without any prefix.
  • Show results for selected item(s) (Results only - No Duplicates): Same as previous, but without duplicate.
  • Open file(s): Open the file with the associated viewer in Windows (based on file extension).
  • Copy file(s): You will have to select a destination folder.
  • Select all: Select all rows.
  • Copy selected rows: Selected row values go to clipboard (same as regular text copy).

If you right click on a row in the by expression/object grid, available are Show results for selected item(s), Show results for selected item(s) (No duplicates), Show results for selected item(s) (Results only), Show results for selected item(s) (Results only - No Duplicates), Select all and Copy selected rows.

You can Save results for a later use. If you click on the Build Report button, the following window will show up:

XL-Parser - Functions Extraction Report options Window

Available options and default values will depend on the type of search and the number of results. For an extraction by expressions, default report format is TXT (TSV) as for an extraction by special objects, default report format is XLSX. If many expressions/special objects have been extracted, you have an additional option: One sheet per expression/object for XLSX, One page per expression/object for HTML or One file per expression/object for TXT (TSV). Include Source (the file where the result has been found) is only available for extraction by expressions.

Whatever report format you have choosen, you can include Column headers, Source (for extraction by expressions) and Expression/Object. These last two options will add a column containing the corresponding value.

Top of the page

Web log analysis for files

XL-Parser - Functions Web log analysis

Web log analysis tab interface is different, based on the input. Above is what it's look like, if input is Folder/File(s). This function is for web logs only. The purpose of this function is to create (or update) a SQLite database to store web logs, so it will be possible to do advanced query on it.

When you create a database, informations about log files are stored in the database, including number of lines. For example, you create the database in the middle of April, including log files from January. When you update the database in june, XL-Parser examine the folder. Log files for January to March won't be added. The first part of April won't be added (based on number of lines), but all new lines will be added, as well as the other log files (May and June).

The log format is a regular expression that will match every fields of every lines in the log file. Access logs of a web server can be customized, so log formats in XL-Parser can also be customized. The first button on the right opens the Log format database Window and the second button call a function that will examine the input and try to determine the appropriate log format.

By default, XL-Parser parses log files by last-modified time ascending order. If you want to verify or change this, you can click on File order button. A window like the one below will open:

XL-Parser - File Order Window

You can sort the files by clicking on the header of the column you wanna sort with, or you can order files manually by clicking on the up and down button.

Set source files to readonly option will apply read-only attribute to every extracted log files. This will prevent any accidental modification.

When you create (or update) the database, you can add a lot of informations to the database including ISP (based on your XL-Whois database), GeoIP, Useragent details and Weekdays. You can also limit the number of added lines by using Filters. If you have a big bunch of log files (~1 Gb or bigger), you should filter additionnal resources if possible. This will reduce the size of the destination database. It's not unusual that requests for images represent more than half of all requests in the log files.

If you use filters at the creation of the database, you must be aware that some filters may increase the whole processing time. These filters are the ones that require resolving like ISP, GeoIP, useragent details or the filters based on time. If you have filters that don't require resolving, you better use them before. Filters based on existing fields are faster. For example, filtering useragent with the string Windows NT 6.1 will be faster than filtering UA_OS with the value Windows 7. That will produce the same result, but to get the UA_OS detail, the useragent will have to be resolved before testing the filter.

Resolving of ISP, GeoIP and useragent are done at the end (if you don't have filter that require these infos). Resolving Weekday is done while processing. Because each entry have a datetime field, the number of resolved weekdays should be exactly the same as the number of entries.

When you use filters, filtered lines (lines that don't matched the selected Filters) will be logged in a file ([Database filename].FilteredLines.log), except if you unselect the option.

When a parsed line doesn't match the Log format, it will be logged in a file ([Database filename].RejectedLines.log), except if you unselect the option. You may also ask XL-Parser to ignore comments while logging rejected lines (starting with #).

Finally, you must enter the output database file name and directory. The Select database when finished option will select the database as input when finished.

When you start the process, the main program will create the database, examine web logs and store all settings. Then the main program will start XL-Parser-process.exe (in the program folder). A window will appears and shows progress and stats:

XL-Parser - Log analysis progress Window
Top of the page

Log format database Window

XL-Parser - Log format Database Window

The Log format database contains the regular expressions that XL-Parser will use to parse log files. Every format has a name, a sample, the pattern and the regex. You can add, edit or delete any log format.

XL-Parser - Log format database - Add Window

In this window, you can add or edit a log format. Here are some explanations about each field:

  • Name: This is the name of format. It must be unique for each format in the database.
  • Sample: The sample is a line that match the regex. To see if it matches, look at the Parsed field. It must contain the extracted data. Extract from input will extract a line from one of the log files.
  • Pattern: The pattern helps you create a log format without knowing regular expression. The pattern is also important to correctly identify each field. The available field are: remoteIP, datetime, http_method, http_request, http_params, http_protocol, http_status, size, referer, useragent and other. You should use the field list to help you select correct field and option, but you must add separator and syntax in the textfield. For example, suppose you have "GET /index.html HTTP/1.1" in the sample. The pattern should be "http_method http_request http_protocol". The fields useragent and referer have an option about spaces. If it can contain spaces as Mozilla/4.5 [en] (WinNT; I), you must check the option. The IIS log format style use + sign instead of spaces. The only field that require particular settings is the datetime. XL-Parser - Log format database - Add datetime The datetime format use the Datetime database that should be the same that you use within XL-Tools. The list correspond to the datetime marked as input (Use As). See following next.
  • Use IIS header: If log files are related to IIS and/or contains at least one line that defines header as comment (ex.: #Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Referer) sc-status sc-bytes cs-bytes, you could use this button. XL-Parser will extract this line and build the pattern with it. Important: Be sure to check if the datetime has the correct timezone, because IIS log doesn't indicate the timezone inside its log files (defaut is UTC).
  • Regex: By default, you doesn't have to do anything for this field. With the Match pattern option checked, XL-Parser will create the regular expression that will match the log format pattern.
  • Parsed: If everything is ok, you should see every field and its value in this grid.
Top of the page

Datetime database Window

XL-Parser - Datetime Database Window

The Datetime database is used to store and manage all datetime formats that will be available in the tool. This database is used with Log format (input), with Split logs (input) and with report options when Query Database. You can add, edit or delete any datetime object in the database. If you select same datetime database that you use with XL-Tools, new objects will be usable in both tools.

XL-Parser - Datetime Database - Edit

When you add or edit a datetime object, you must provide the following details:

  • Sample: The sample is an exemple of the datetime object. The sample will be used to identify the datetime object so it must be unique.
  • Pattern: The pattern is based on the well known strftime function. You can use the combobox on right side of the textfield to select appropriate symbol. The most important symbols are listed in the combobox. You should also be able to use any symbol supported by strftime, even if not listed.
  • Regex: The regex must match the pattern and is used to extract the datetime object from any string. If the Match pattern option is selected, it will be created automatically while you're entering the pattern.
  • Timezone: This is the default timezone to be used if no timezone is provided in the datetime object (ex.: timezone is provided if pattern contains %Z, %z or %s symbol).
  • Parsed: If regex and pattern match the sample (which should indicate that everything is ok), the parsed datetime object will be shown here.
  • Use as: The datetime object could be used as input format, output format or both. It can also be set to none so it will be stored in the database, but not available for input or output in XL-Tools main interface.
  • Comment: This is optional. The comment is used to give more information about the datetime object.

If the Add or Edit button are still disabled after you filled all the fileds, you can click on the ? to see what is missing or is incorrect.

Ambiguous timezone abbreviations

Some timezone abbreviations are ambiguous, because there are matching more than one timezone. For example, CST could be Central Standard Time (USA), China Standard Time, or Cuba Standard Time. You can take a look here. So, I decided to set those timezone with the following values:

  • ADT = -0300 (Atlantic Daylight Time)
  • AST = -0400 (Atlantic Standard Time)
  • BST = +0100 (British Summer Time)
  • BT = +0600 (Baghdad Time)
  • CAT = +0200 (Central Africa Time)
  • CCT = +0630 (Cocos Islands Time)
  • CDT = -0500 (Central Daylight Time)
  • CST = -0600 (Central Standard Time)
  • EAST = -0600 (Easter Island Standard Time)
  • ECT = -0500 (Ecuador Time)
  • EDT = -0400 (Eastern Daylight Time)
  • EST = -0500 (Eastern Standard Time)
  • FDT = -0100 (Fernando de Noronha Daylight Time)
  • FST = -0200 (Fernando de Noronha Standard Time)
  • GST = -0200 (South Georgia Time)
  • IDT = +0300 (Israel Daylight Time)
  • IST = +0200 (Israel Standard Time)
  • MET = +0100 (Middle European Time)
  • NFT = -0230 (Newfoundland Daylight Time)
  • NST = -0330 (Newfoundland Standard Time)
  • PST = -0800 (Pacific Standard Time)
  • SAST = +0200 (South Africa Standard Time)
  • SST = +0800 (Singapore Standard Time)
  • WAST = +0200 (West Africa Summer Time)
  • WST = +0800 (Western Standard Time)

I have try to set it to the most popular value, but I could be wrong. If the datetime you need to parse have a different offset, you can parse the datetime without the timezone and set the appropriate offset in the input format.

Top of the page

Filters Window

XL-Parser - Filters Window

Filters are divided in tree groups: White List, Regular and IP List. When you use White List filters, every entries that will match the filters will be ignored (or removed). Using Regular filters does the opposite. Every entries that match those filters will be selected and others will be ignored. IP List is just a type of regular filters that will match the remoteIP field only.

Filters are grouped by category. If you want to create a new category, you must use the Add button without selecting any category. If you want to create a new filter in an existing category, you must select the category first, then push the Add button. You can edit or delete any filter.

XL-Parser - Filters - Edit

Filters are in fact a set of filters. You can, edit or delete any filter in the set.

XL-Parser - Filters - Add a filter rule

Here are some explanations about each field:

  • Operator: Value are OR or AND.
  • Field: Field name in the database (or logs).
  • Condition: Available conditions depend on the selected field and the type of values. For text, available values are usually is, is not, contains or does not contain You have different conditions for datetime objects or numeric values.
  • Value: The type of value depends on the selected field and the condition. For contains condition for example, you have the Regex option. For Weekday, you can choose the value from a list. For datetime, you can set a date and/or a time.

A field filter rule can be selected/added to the Field Filter database tab.

Top of the page

Split logs

This function can be used to split logs (or any text files) into smaller pieces. Files can be splitted by size, by number of lines or by time.

By size

XL-Parser - Split logs by size

You have to provide a size for the chunks. Files that exceed this size will be splitted. The final size of each chunks may be a bit different because XL-Parser don't split lines. If you select the Use datetime in filename option (log files only), XL-Parser will extract the first timestamp of each file and use it to name the file. Otherwise, it will use a simple counter.

By number of lines

XL-Parser - Split logs by number of lines

This function is similar to the previous, except it will simply count the lines instead of evaluating the size of each line. The Use datetime in filename option is also available.

By time

XL-Parser - Split logs by time

This function is for log files only. With this function, XL-Parser will split the logs based on the timestamp of each line. The log files can be splitted by hour, by day, by week or by month. Each file will be named using the first timestamp of the file.

Top of the page

Web log analysis for XL-Parser database

Once you've created a log database with XL-Parser (see Web log analysis for files) and you select the database as input (don't forget to select the Database radio button), you have access to the following functions (tabs): Current Database, Update Database, Query Database and Suspicious activities.

XL-Parser - Log Analysis - Database - Current Database

Current Database

When you select a database as input, XL-Parser opens it and gather informations about this database. In this tab, you can see these informations:

  • General:
    • Last update: Date and time when the database have been created or updated.
    • Period: Dates of the first and last entries of the database.
    • Number of:
      • Entries: Number of entries in the log table.
      • Filtered: Number of lines filtered (lines that doesn't matched the selected Filters).
      • Rejected: Number of line rejected (lines that doesn't matched the Log format).
  • Source:
    • Log format: The log format that was used to parse the logs.
    • Filters: The filters that were used (if so).
    • Source file(s)...: Details about source log file(s). This will open a pop-up window:

    • XL-Parser - Source File(s)

      In the source file(s) grid, you can see informations about the log files that have been parsed to build the database. The Exists column indicates if the file can still be found on the drive. You can change the path by right clicking on a line. If the path contains other log files, all the grid will be updated at the same time.

    • Rejected: This will open the rejected lines file ([Database filename].RejectedLines.log) created when database was created. This file should be in the same directory of the database.
    • Filtered: This will open the filtered lines file ([Database filename].FilteredLines.log) created when database was created. This file should be in the same directory of the database.
  • Resolved:
    • ISP: If the corresponding option has been used when creating/updating the database, this value indicates how much Internet Service Providers (ISP) have been resolved even if the result was unknown. So this may also indicate how much distinct IP addresses are in the logs.
    • GeoIP: Same as previous, except GeoIP only applies to IPv4 addresses in XL-Parser.
    • UAs: Number of resolved useragents.
    • Weekdays: Number of resolved weekdays. It should be the same value as the number of entries because each entry must contain a valid timestamp.
Top of the page

Update Database

XL-Parser - Log Analysis - Database - Update Database

You can update a log database or create a new one with different settings. For example, you may want to update resolved ISPs after doing some whois or you may want to create a filtered database, based on the full database you've created in a first time.

When you use update, the estimated time left is about the current task. For example, when it resolve IP addresses, the estimated time left represents the time to resolve all IP addresses.

Top of the page

Query Database

XL-Parser - Log Analysis - Database - Query Database

In this tab, you can query the database. XL-Parser provides some tools to help you build a SQL Query. If you're familiar with SQL language, you can create (and save) your own queries. XL-Parser uses SQLite for database format. A database contains five tables:

  • INFO: This table contains informations about the database (shown in Current Database tab).
  • FILES: This table contains informations about the source files that have been parsed to create the database (also shown in Current Database tab).
  • LOG: This table contains all the requests.
  • IP: This table contains resolved ISP and GeoIP related to any distinct IP addresses found in the log (if option has been used when creating/updating the database).
  • UA: This table contains details related to any distinct useragents found in the log (if option has been used when creating/updating the database). Available details include: type (ex.: pc, smartphone, tablet, etc.), os, browser, device and lang.

Note that all timestamps inside the database are in unixtime format (UTC timezone). They will be converted when building the report (see report options). One exception: Time of the day is stored based on your timezone at the time the database was created which means time is not ajusted with daylight saving time (or inverse).

To build a query, you must first select which columns will be extracted in the output:

XL-Parser - Log Analysis - Database - Query Database - Select Columns 
				Window

You can select all columns or any column(s) from the LOG table. DISTINCT option can be selected to remove duplicates. If you select the Count option, output will not be the value but the number of values that match the query. For example, SELECT COUNT(*) FROM LOG will give you the number of rows in the LOG table. SELECT DISTINCT remoteIP FROM LOG will give you the list of all distinct remote IP addresses in the LOG table. After selecting the column(s), you may select one or multiple filters. Then, you may want to add some clauses:

XL-Parser - Log Analysis - Database - Query Database - Clauses
  • Group by: This clause will arrange identical results into groups. For example, SELECT remoteIP FROM LOG GROUP BY remoteIP will give you the list of all unique remote IP addresses in the LOG table.
  • Order by: This clause will sort the results by ascending (ASC) or descending (DESC) order. For example, SELECT DISTINCT http_request,http_status FROM LOG ORDER BY http_status ASC will give you the list of all unique web requests, ordered by the http status code (200, 302, 404, 503, etc.)
  • Having count: This clause allows you to specify a count condition to apply to the results. For example, SELECT remoteIP FROM LOG GROUP BY remoteIP HAVING COUNT(remoteIP) > 100 will give you the list of unique IP adresses that send more than 100 requests.

The most interesting things can be done if you know how to build your own SQL queries. For more informations about SQLite syntax, I suggest you this SQLite Tutorial. Your query is tested as you type it. If everything is ok, the textfield has a green border around it. If any error, the border changes to red. You may get details about the error by clicking on Not Ready link at the bottom of the window.

When you're satisfied with your query, you can save it in a database for a later use.

XL-Parser - Log Analysis - Database - Query Database - Save query window

You can use an existing category or create a new one.

XL-Parser - Log Analysis - Database - Query Database - Saved queries

Queries are stored by category and name in the database. They can be edited or deleted at any time. The date correspond to the time the entry was created or the last time the entry was edited. The used value correspond to the number of the time the query was used. You can sort the grid using by clicking any column header.

When you're done, you can process your query. Report will be produced using default options that you can change at any time:

XL-Parser - Log Analysis - Database - Query Database - Report options

Here are some explanations about each option:

  • Folder: This is the folder where report will be saved.
  • Format: Report can be saved in tree different formats: XLSX, HTML and TXT (TSV). You should be very careful with HTML format because web logs could contain malicious code. By default, XL-Parser sanitize output when using this format.
  • Datetime format: As mentionned before, timestamps are stored in unixtime format (UTC) in the database. This option allows you to change the output format and timezone using your datetime format database. The list of choices includes all datetime objects that have been set to Use as output (or both).
  • Includes:
    • Column headers: First row (or line) will be the name of each field.
    • Source: Use this option if you want to know where (original log file) the request is coming from. You can only use this option if your query return all rows.
    • ISP: Add a column that return resolved ISP from the IP table. You can only use this option if your query return the remoteIP column and the database contains resolved ISP data.
    • GeoIP: Same as previous, but for the GeoIP data.
    • Useragent details: Add a column that return useragent details from the UA table. You can only use this option if your query return the useragent column and the database contains resolved Useragent data.
    • Weekday: Based on the timestamp, this will add a column with the weekday name, based on your default language.

For example, you can use a SQL query to extract all unique IP addresses from the log and add the column ISP and GeoIP.

Top of the page

Suspicious activities

XL-Parser - Log Analysis - Database - Suspicious activities

This function is experimental. Suspicious activities are abnormal activities. Detecting abnormal activities without knowing the normal activities can be hazardous, but most of abnormal activities come from Web scanners and hacking tools. Most of these tools are built to test and obtain results and not to evade detection. To detect suspicious activities, XL-Parser uses 13 indicators that you can select or not. Here are some explanations about each indicator:

  • High number of requests: Web scanners and hacking tools are used to generate many more requests than regular users. XL-Parser determine the average number of requests for each user and get the list of IP addresses that generate the higher number of requests in this list.
  • Request length (nbr): Web scanners and hacking tools are used to inject a lot of things in their requests, so they will produce longer requests than normal. XL-Parser determine the average length of requests in web logs and get the list of IP addresses that generate the highest number of requests above average. The option for this indicator is the minimum length to consider when establishing the average length of requests in web logs.
  • Request length (max): Same as previous, but result are sorted by length, which mean the IP address that generate the request with the greatest length comes first.
  • URI Encoding: To inject their malicious code or to detect vulnerabilities on your website, web scanners and hacking tools use special chars that are not supported (or cannot be pass) in HTTP or as URL. URI encoding (Percent-encoding) will then be used to bypass this limitation or to bypass filtering. By default, XL-Parser will only search for the % symbol, but you can reduce the number of false positive by providing a list specific characters. Each characters must be separated by the | symbol. Ex.: %22|%27|%28|%29|%5B|%5D|%2B.
  • HTTP Method: Common (HTTP method are GET, POST, HEAD, TRACE, OPTIONS and CONNECT. Regular users will probably only use GET or POST, but malicious users, web scanners and hacking tools will use other methods while probing your website. One of the most interesting target for malicious users are your forms, because it's one of the most common place where vulnerabilities can be found. GET method can be used for forms, but it may be associated with other kind of requests as POST is always associated with forms. When POST is used, parameters will not be stored in your web logs, so XL-Parser will not detect the other malicious behavior related to the injected parameters. So, a high number of POST from a specific user can be a really good indicator, but it can also generate a lot of false positives if POST is a common method used on your website. You can provide any method as option. Each method must be separated by the | symbol. Ex.: POST|TRACE|HEAD|OPTIONS|CONNECT.
  • High number of errors: The use of web scanners and hacking tools can produce a lot of errors. As example, forced browsing technique will produce a lot of 404 errors. You can provide any status code as option. Each code must be separated by the | symbol. Ex.: 40|50 will consider all 40x and 50x errors.
  • SQL query: If you are using a SQL database on your website, you will be targeted. The presence of SQL queries in your web logs is certainly suspicious, because SQL queries are supposed to be built and sent by your scripts to the database and not by your visitors, so there are not supposed to be found in web logs. Most common SQL instructions that can be used by malicious users, web scanners or hacking tools are UNION and SELECT but you can edit this by providing any keyword, in the same way as other indicators.
  • Use of quotes: Quotes and double quotes are used with several injection techniques.
  • Directory traversal: Malicious users, web scanners and hacking tools often try to get access to files that are outside of the website space. A common way to do this is using directory traversal, so you will find the ../ (dot dot slash) in their requests. The expression can be found in many encoded ways like ..%2f, %2e%2e/, %2e%2e%2f, or you may also find the Windows variant ..\.
  • Remote file inclusion: A web address in the request may indicate a remote file inclusion attempt.
  • Admin or login scan: Web scanners will try to locate your login pages or your admin sections. Those pages may also be targeted by brute forcing tools.
  • Web scanner: Web scanners can be detected in many ways. Here you can set the list of all keywords that can be found in request, parameters or in the useragent field. A list is provided with the tool, but this list is not complete.
  • No useragent: Regular users, as legitimate robots, use the useragent field to identify themselves and/or to indicate what kind of system they are running and technologies they can accept. Not providing a useragent is not a normal behavior and can indicate some bad intentions. To be able to use this indicator, useragents must be found in the logs for regular users. As option, you must provide the maximum length of the useragent, to be considered as not provided. This indicator is the slowest because the length of each useragent must be calculated.
  • Many useragents: Web scanners and hacking tools can try to inject data in the useragent field, so a high number of different useragents for a single user can be something suspicious. For this indicator, XL-Parser will sort IP addresses by the number of different useragents there are associated with. Be aware that shared IP (as used with Carrier-grade NAT (CGN) for example), may produce false positives.

Any change to indicator options are saved in a file (name: SA_Indicators.json) in the user directory. The Reset button restore options to default values.

XL-Parser use each indicator and produce two list of results: By IP address and Activities only. The By IP address list represent the list of IP addresses that are associated with the most suspicious things inside the logs. A score is attributed to each IP address and the list is sorted using the value. A higher score means that the IP address is associated with many suspicious activities. Here it how it works:

  1. For each indicator, XL-Parser get a list of IP address that match the indicator. The limit option indicates the maximum number of IP addresses to keep for each indicator. If the value is 10, it means that XL-Parser will only consider the top 10 of the list. The list is sorted by the number of matches. As example, if the indicator is the High number of errors, the first IP address in list will be the one associated with higher number of errors.
  2. The rank of the IP address in each list is multiplied by the score associated with the indicator. As examples, if the indicator is the High number of errors with a limit of 10 and a score of 4, the IP address that will have generated the higher number of errors will receive a score of 40 (10*4) for this indicator. The second IP address in list will receive a score of 36 (9*4).

The limit option and the number of selected indicators will determined the maximum results you can get. If all limits are set to 10 and you use 10 indicators, the maximum of results you can get is 100 (10*10). But the IP addresses associated with the most suspicious activities will be found in many indicator lists, so you won't probably get as many results. Anyway, you can set the maximum of results (globaly).

Select columns and Report options options will be used when querying database from the result window (see below). To change these options without having to reanalyze, you just have to select different options and use the Save options button. Also, you can open previously saved results by using the Load results button. When you are viewing previously saved results, you can still modify report options, but you will have to select the saved result file from the Open Filename window.

Here is an example of results:

XL-Parser - Log Analysis - Database - Suspicious activities - 
				Results Window

In the result grid, you can double-click on any row to get the requests for this particular IP address, or select more than one row and right-click on the selection. The right-click open a context menu with the following functions:

  • Get request from selected IP(s): Launch a new process and get all requests for the selected IP address(es). Report options are the ones that were selected in the main interface.
  • Get useragent for selected IP(s): Launch a new process and get all useragent(s) for the selected IP address(es).
  • Resolve/update all IPs (add columns): Add or update ISP and GeoIP columns. If the columns don't exist, they will be created. If the columns already exist, data will be updated for each IP address. Note that the data is not gathered from the log database, but resolved from the original databases (XL-Whois database and GeoIP database). So data will be added to grid even it doesn't exist in the log database. Also, if the data is updated in the database where it comes from, you can update the data in the grid by executing this function. This is particulary interesting if some ISP values are marked as unknown.
  • Send unknown IPs to clipboard (ISP): This function will make a list of all unknown IP address(es) based on the ISP column and push it to the clipboard. After that, you can use XL-Whois to identify the unknown ISP(s). Then you can execute the Resolve/update all IPs function to update the data.
  • Select All: This will select all the grid content, including the headers.
  • Copy selected row(s): This will copy the selected row(s) to the clipboard.

The Activities only results reprsents the number of requests that match each indicator. Double-click on a row (or many rows) to get the requests for this indicator(s).

Reports produced from the result grids will use the options in the main window.

Top of the page

 

Credits to Free Website Templates for the template of this Website