When you start XL-Parser for the first time, you are asked if you want to set default configuration. If you click Yes, you will be asked to select a directory where:
A XL-Parser.ini configuration file will also be created to keep your preferences in memory. You can also set elements above manually, in case you already have some of these databases.
In Tool section, we have the following functions and options:
In Functions section, you have the following options:
There are four types of input: folder, file (or list of files), database and clipboard. Each type has its own options:
When you select a Folder as input, you can use filters to restrict the scope of the search to files that match your criteria. You can add, edit (by double clicking a filter in the grid) or delete filter. You can also save your filters for a later use. File Filters can be shared between XL-FileTools and XL-Parser. When you click on the add button, the following window will show up:
You have four types of file filters:
By default, the keyword or regular expression applies to the relative path (folder or file, but excluding input directory). You can select Filename only option to exclude folders. If you select Regex option, a colored border around the textfield will indicate if your regex is ok. If not, you can press the ? button to see the error message.
XL-Parser can read many file formats including: text, unicode, Doc and Docx, Xls and Xlsx, Evt and Evtx, Pdf, Msg and Zip (text files inside the archive). A cool feature here is that you can set the file extension to apply to the File format parser. For example, you can search a docx file as it was a zip file by selecting Zip and adding .docx in the extension list. This will allows you to search in metadata of the docx file.
When you select Doc or Docx format, you can choose to parse the file with MS Word (if installed on your system) or not. The legacy format (doc) will be search as unicode as the new format (docx) will be searched as zip.
For unicode format, you must select between two types: HTML and Other. With HTML, XL-Parser will search the charset in the HTML file header. With other, it will try to determine the appropriate encoding by analysing the file header.
Any change to extension list or options are saved in a file (name: FileFormats.json) in the user directory. The Reset button restore options to default values.
Important: XL-Parser is not a searching tool which mean that you should not check all formats, select a full volume and hope that it will find all the things.
The functions section is divided with tabs. Shown tabs will depends on input type:
XL-Parser provides two types of extraction:
You can add, edit (by double clicking an expression in the grid) or delete as much expression as you want. If more than one expression is used, priority depends on the order in the grid. You cannot set hierarchy of operators but you can change order of the expressions by using the up and down button on the right. If one expression contains capture groups (with an AND operator), it should be placed at end. All the expressions in the grid can be saved for a later used. There are also Extraction options available (see further down).
On top right corner, you have a Load results button. It can be use to open saved results. On bottom right corner, you have a Rebuild list of files button. When you do a search, XL-Parser has to browse the input and list the files that will be searched. The list of files is cached, so if you do another search without changing the input, XL-Parser doesn't have to rebuild this list, so it will be faster. If content of the directory (file(s) added or deleted), you should click on the Rebuild list of files button.
The Expression window is divided in three tabs:
In the Expressions tab, you can set or edit a single, or a list of expressions. Expressions can be a string or a regular expression. Operator must be set if you have more than one expression. For regular expression, capture groups are supported. Note that you should not use capture groups with Invert option, as it cannot capture something on the a line that it doesn't keep. Comment (optional) can be used to identify what the expression is supposed to extract. You can use the regex shortcuts to help you build your regular expression. On the right side of the window, you have two button. The first one is used to add or edit the expression in the Expression database. The second one is used to open the Expression tool window:
The Expression tool window can be used to test your expression. If this one contains capture groups, the extracted data will be in the results section.
Here are some explanations about the options:
* For some file formats, a line has a different meaning. For a doc(x), a line is a paragraph. For a Pdf, a line is a page.
When you start extraction (Expressions or Special objects), a pop-up window appears. You can see the progress of the extraction (updated after each searched file). As the process runs, it stores extracted data in a temp file (.XL-Parser_tmp.txt), in the report directory (default is user directory, see report options further). When the process is done, the grid will contains stats about extracted expressions (or objects). You can see results by file or by expression/object.
Double click on a row to see results for each file or for each expression/object. This will create and open the results in a text file. If you right click on a row in the by file grid, you have access to the following functions:
If you right click on a row in the by expression/object grid, available are Show results for selected item(s), Show results for selected item(s) (No duplicates), Show results for selected item(s) (Results only), Show results for selected item(s) (Results only - No Duplicates), Select all and Copy selected rows.
You can Save results for a later use. If you click on the Build Report button, the following window will show up:
Available options and default values will depend on the type of search and the number of results. For an extraction by expressions, default report format is TXT (TSV) as for an extraction by special objects, default report format is XLSX. If many expressions/special objects have been extracted, you have an additional option: One sheet per expression/object for XLSX, One page per expression/object for HTML or One file per expression/object for TXT (TSV). Include Source (the file where the result has been found) is only available for extraction by expressions.
Whatever report format you have choosen, you can include Column headers, Source (for extraction by expressions) and Expression/Object. These last two options will add a column containing the corresponding value.
Web log analysis tab interface is different, based on the input. Above is what it's look like, if input is Folder/File(s). This function is for web logs only. The purpose of this function is to create (or update) a SQLite database to store web logs, so it will be possible to do advanced query on it.
When you create a database, informations about log files are stored in the database, including number of lines. For example, you create the database in the middle of April, including log files from January. When you update the database in june, XL-Parser examine the folder. Log files for January to March won't be added. The first part of April won't be added (based on number of lines), but all new lines will be added, as well as the other log files (May and June).
The log format is a regular expression that will match every fields of every lines in the log file. Access logs of a web server can be customized, so log formats in XL-Parser can also be customized. The first button on the right opens the Log format database Window and the second button call a function that will examine the input and try to determine the appropriate log format.
By default, XL-Parser parses log files by last-modified time ascending order. If you want to verify or change this, you can click on File order button. A window like the one below will open:
You can sort the files by clicking on the header of the column you wanna sort with, or you can order files manually by clicking on the up and down button.
Set source files to readonly option will apply read-only attribute to every extracted log files. This will prevent any accidental modification.
When you create (or update) the database, you can add a lot of informations to the database including ISP (based on your XL-Whois database), GeoIP, Useragent details and Weekdays. You can also limit the number of added lines by using Filters. If you have a big bunch of log files (~1 Gb or bigger), you should filter additionnal resources if possible. This will reduce the size of the destination database. It's not unusual that requests for images represent more than half of all requests in the log files.
If you use filters at the creation of the database, you must be aware that some filters may increase the whole processing time. These filters are the ones that require resolving like ISP, GeoIP, useragent details or the filters based on time. If you have filters that don't require resolving, you better use them before. Filters based on existing fields are faster. For example, filtering useragent with the string Windows NT 6.1 will be faster than filtering UA_OS with the value Windows 7. That will produce the same result, but to get the UA_OS detail, the useragent will have to be resolved before testing the filter.
Resolving of ISP, GeoIP and useragent are done at the end (if you don't have filter that require these infos). Resolving Weekday is done while processing. Because each entry have a datetime field, the number of resolved weekdays should be exactly the same as the number of entries.
When you use filters, filtered lines (lines that don't matched the selected Filters) will be logged in a file ([Database filename].FilteredLines.log), except if you unselect the option.
When a parsed line doesn't match the Log format, it will be logged in a file ([Database filename].RejectedLines.log), except if you unselect the option. You may also ask XL-Parser to ignore comments while logging rejected lines (starting with #).
Finally, you must enter the output database file name and directory. The Select database when finished option will select the database as input when finished.
When you start the process, the main program will create the database, examine web logs and store all settings. Then the main program will start XL-Parser-process.exe (in the program folder). A window will appears and shows progress and stats:
The Log format database contains the regular expressions that XL-Parser will use to parse log files. Every format has a name, a sample, the pattern and the regex. You can add, edit or delete any log format.
In this window, you can add or edit a log format. Here are some explanations about each field:
The Datetime database is used to store and manage all datetime formats that will be available in the tool. This database is used with Log format (input), with Split logs (input) and with report options when Query database. You can add, edit or delete any datetime object in the database. If you select same datetime database that you use with XL-Tools, new objects will be usable in both tools.
When you add or edit a datetime object, you must provide the following details:
If the Add or Edit button are still disabled after you filled all the fileds, you can click on the ? to see what is missing or is incorrect.
Some timezone abbreviations are ambiguous, because there are matching more than one timezone. For example, CST could be Central Standard Time (USA), China Standard Time, or Cuba Standard Time. You can take a look here. So, I decided to set those timezone with the following values:
I have try to set it to the most popular value, but I could be wrong. If the datetime you need to parse have a different offset, you can parse the datetime without the timezone and set the appropriate offset in the input format.
Filters are divided in tree groups: White List, Regular and IP List. When you use White List filters, every entries that will match the filters will be ignored (or removed). Using Regular filters does the opposite. Every entries that match those filters will be selected and others will be ignored. IP List is just a type of regular filters that will match the remoteIP field only.
Filters are grouped by category. If you want to create a new category, you must use the Add button without selecting any category. If you want to create a new filter in an existing category, you must select the category first, then push the Add button. You can edit or delete any filter.
Filters are in fact a set of filters. You can, edit or delete any filter in the set.
Here are some explanations about each field:
A field filter rule can be selected/added to the Field Filter database tab.
This function can be used to split logs (or any text files) into smaller pieces. Files can be splitted by size, by number of lines or by time.
You have to provide a size for the chunks. Files that exceed this size will be splitted. The final size of each chunks may be a bit different because XL-Parser don't split lines. If you select the Use datetime in filename option (log files only), XL-Parser will extract the first timestamp of each file and use it to name the file. Otherwise, it will use a simple counter.
Once you've created a log database with XL-Parser (see Web log analysis for files) and you select the database as input (don't forget to select the Database radio button), you have access to the following functions (tabs):
When you select a database as input, XL-Parser opens it and gather informations about this database. In this tab, you can see these informations:
Copyright (c) 2015-2021 Alain Rioux