Introduction
Logpoint is a platform consisting of multiple products. Each product is made up of interconnected components. These individual components, which have their own functionalities, are named Service.
A Service is designed to perform a specific function and works with other services to provide a fully functional Logpoint. If any one of the services is not performing efficiently, it can impact other dependent services, leading to issues in Logpoint's functionalities. For that reason, it is essential for these services to always function efficiently.
To ensure the overall efficiency of Logpoint, these services can be tuned. For example, the percentage of CPU and RAM a service can use. Increasing or decreasing the CPU and RAM percentage will prevent bottlenecks for the service but also not underutilize the allocated CPU and RAM. If one of the services in your Logpoint doesn’t use much resources, its CPU and RAM can be set to low so that other services can utilize that memory and CPU. However, if that service is under heavy load, it can be tuned to use the maximum CPU and RAM so the service does not cause a system bottleneck. For that reason, service parameters are configurable to match system load and requirements.
Every service has its own parameters stored in Service Config File. Configuring a parameter can optimize the performance of a service leading to an overall efficient system.
Service Config File
Each service has its own config file that stores all the configuration parameters that are important for a service to run. These config files are generated every time the system boots up or when there are any changes in the MongoDB collection that needs to be reflected in the config files. The config file stores all the configuration parameters that a service is using to run. If the configuration changes and the service does not respond to the change, it may be because of the config file. In such cases, Logpoint recommends manually regenerating the config file. Regenerating the config file ensures that a service responds to the change in the configuration parameters.
Service config regeneration command:
/opt/immune/installed/config_updater/apps/config_updater/regenerate_config.sh
Autotuner
Autotuner is a service that automatically tunes service parameters depending on system requirements. It uses a certain threshold to determine whether the parameter need tuning or not. For example: If the premerger service is using up all the allocated RAM and is facing a lack of memory issue, then Autotuner can tune the parameter to increase the allocated RAM and ensure the smooth working of the service. After Logpoint v7.2.0, Autotuner is significantly enhanced to tune the service based on the requirement and the complexity of the task it is performing. If the service is not giving results at the set efficiency, then autotuner can add memory to the service to increase its efficiency and enhance the performance of the service and overall system performance. Autotuner can now dynamically restart a service or allocate additional memory to it if the performance of a service falls below a defined threshold, preventing service crashes and ensuring stable performance.
Manual Tuning of Service
Autotuner can tune the parameters of the service to prevent service crashes and increase service efficiency but it is not perfect. Depending on the use case and system requirements, human intervention through manual tuning is essential to ensure every service works efficiently. Manual tuning is performed from the lp_service_config file.
LP Services Config
You can also tune the services from one config file lp_services_config.json located at /opt/immune/storage
To do so,
- Access Logpoint via the terminal using ssh support@<Machine IP> and enter the support password
- Go to /opt/immune/storage and see if there is a lp_services_config.json. By default, it is not there.
- Create a file lp_services_config.json and add the parameter in proper JSON format
- Regenerate the config using the command: /opt/immune/bin/lido /opt/immune/installed/config-updater/apps/config_updater/regenerate_all.sh
- Restart the corresponding service: sudo sv restart /opt/immune/etc/service/<service_name>
Syntax:
{
"<service_name>": {"<parameter>": "<value>"}
}
Tuning one service parameter:
{
"merger": {"heap_size": 10240}
}
Tuning multiple service parameters:
{
"premerger": {"heap_size": 4096, "jsonThreads":4},
"index_searcher_PaloAlto":{"heap_size":10240, "num_of_indexing_threads": 5, "num_index_cache": 32},
"normalizer": {"no_of_services": 20},
"file_keeper":{"no_of_threads": 3,"heap_size": 2048}
}
Frequently Tuned Services
These are the commonly tuned services:
- Premerger
- Merger
- Analyzer
- Indexsearcher
- Normalizer
- File Keeper
- Syslog Collector
- Enrich_db_populator
- Autotuner
- Batch Processor
- Enrichment Service
Premerger
It is responsible for processing the queries used in alerts and dashboards. It sends the query request, receives the query result, aggregates the result, and gives the response. Based on the response, the dashboard is populated or an alert incident is triggered.
If Premerger is not running efficiently, it affects alert and dashboard. For that reason, it is crucial to tune its parameters to ensure optimal performance.
Example
{
"premerger": {"heap_size": 1024}
}
Parameter | What is it? | When to tune? | Values | Effect |
---|---|---|---|---|
heap_size | Amount of heap memory allocated to Premerger | G1 Garbage Collector(G1GC): If Full GC is running continuously Shenandoah GC: If memory consumption by the service is equal to the allocated heap size and If Out Of Memory (OOM) occurs frequently for the service |
Depends on System Load, EPS, and No. of Searches. |
Overall system memory increases or decreases |
max_free_heap_ratio | Percentage of available heap the service can hold after running GC | Only for G1GC, only tuned if heap_size is tuned | 60 Recommended | Excess memory is released back to the OS |
maxClauseCount | Maximum number of clauses permitted per BooleanQuery | When there are large no. of entries in a search query list |
Default: 1024 Recommended: Number of entries in biggest list * 2.5 |
Search may be slow if the value is increased |
Merger
Merger takes the request from the requester, forwards it to IndexSearcher, merges all the responses, and gives the response back to the requester. The requester can be any other services such as Permerger, Websearcher, or API. Merger takes those requests and sends them to the respective service to give the response.
If Merger is not running efficiently, it affects alert, dashboard, search and report.
Example
{
"merger": {"heap_size": 1024}
}
Parameter | What is it? | When to tune? | Values | Effect |
---|---|---|---|---|
heap_size | Amount of heap memory allocated to Merger. | G1 Garbage Collector(G1GC): If Full GC is running continuously. Shenandoah GC: If memory consumption by the service is equal to the allocated heap size and If Out Of Memory (OOM) occurs frequently for the service. |
Depends on System Load, EPS, and No. of Searches. |
Overall system memory increases or decreases. |
max_free_heap_ratio | Percentage of the available heap that the service can hold after running GC | Only for G1GC, only tuned if heap_size is tuned | 60 Recommended | Excess memory is released back to the OS |
no_of_threads | Number of Merger services to run parallelly in different threads | Logpoint has enough resources and a large number of searches | Default: Depends on the No. of cores. Min: 1 Max: Number of cores / 2 (Recommended) |
Increase in CPU/Memory usage |
Index Searcher
Index Searcher performs two functions: indexing and searching. It indexes the normalized log and searches the indexed log. It is a repo-dependent service, which means each repo has its own index searcher.
Indexing
It receives the normalized log forwarded by the file keeper, indexes it based on log_ts, adds index_ts to the key-value pair, and stores it in the storage. index_ts is the time at which the log is indexed by the index searcher.
Searching
When a search request comes to the index searcher, it searches the index and returns the log as per the search query.
If IndexSearcher is down or performing poorly, it can affect alerts, search, and dashboards. At worst, incidents aren’t generated, search doesn’t work and dashboards aren’t populated.
Example
{
"index_searcher_default": {"heap_size": 1024}
}
Parameter | What is it? | When to tune? | Values | What's its effect? |
---|---|---|---|---|
heap_size | Amount of heap memory allocated to Index Searcher | G1 Garbage Collector(G1GC): If Full GC is running continuously. Shenandoah GC: If memory consumption by the service is equal to the allocated heap size and If Out Of Memory (OOM) occurs frequently for the service. |
Depends on System Load, EPS, and No. of Searches. Min. 512 MB Max. 32 GB (Recommended) |
Overall system memory increases or decreases. |
max_free_heap_ratio | Percentage of the available heap that the service can hold after running GC | Only for G1GC, only tuned if heap_size is tuned | 60 Recommended | Excess memory is released back to the OS |
merge_factor | Defines how often the segments are merged. Default value is 10, a new segment is created for every 10 documents. When the number of segments reaches 10, the segments themselves are merged to create a single segment. |
When there are a lot of active merging thread or When merging takes significant time |
Default 10 Increase when there are a lot of active merging threads and decrease when there are fewer active merging threads Increase when the merge time is high and decrease when it is low |
Increasing it will increase CPU and Disk Read/Write |
num_of_indexing_threads | Number of indexing threads to run for indexing logs | When indexing MPS is high, and a queue is seen in the index searcher |
Default: Max(1, No. of cores / 8) Suggested Tuning: Min(No. of cores / 2, 20) |
CPU usage increases or decreases |
num_live_threads | Number of searching threads to run for the search |
When there are large No. of search requests in the index searcher's benchmarker log or If no. of responses is significantly less than no. of requests in the premerger benchmarker log |
Default: Max(10, No.of cores / 2) Suggested Tuning: Max(No. of cores) |
CPU usage increases or decreases Heap size increases or decreases |
num_index_cache | Maximum number of indexes that a cache can hold. |
If Logpoint has a large no. of live searches with a large time range and When there is sufficient system memory available |
Depends on the average time range/search interval of the alert Default: Max(5, No.of cores / 2) |
Heap memory of index searcher increases |
maxClauseCount | Set the maximum number of clauses permitted per BooleanQuery. | When there are large no. of entries in the search query list |
Default: 1024 Recommended: No of entries in largest list * 2.5 |
Search may be slow if the maxClauseCount value increases |
num_of_db_indexing_threads | Number of indexing threads to run for indexing delayed logs (logs older than maxNormalTimePeriod defined in file keeper) |
Increase when indexing MPS is high, and a queue is seen in the index searcher Decrease when indexing MPS is low and there isn't any queue
|
Depends on system load Default: Max(1, No. of cores/8) Suggested Tuning: Min(No. of cores / 2, 20) |
CPU usage increases or decreases |
Analyzer
It processes the pattern finding query. When the merger gets the pattern finding query, it forwards it to the analyzer. Analyzer can process a maximum of 10 of these queries at a time.
If the Analyzer service performs poorly, alerts and dashboards with pattern finding queries are affected
Example
{
"analyzer": {"heap_size": 1024}
}
Parameter | What is it? | When to tune? | Values | Effect |
---|---|---|---|---|
heap_size | Amount of heap memory allocated to Index Searcher | G1 Garbage Collector(G1GC): If Full GC is running continuously. Shenandoah GC: If memory consumption by the service is equal to the allocated heap size and If Out Of Memory (OOM) occurs frequently for the service. |
Depends on System Load, EPS, and No. of Searches. Min. 512 MB Max. 32 GB (Recommended) |
Overall system memory increases or decreases. |
max_free_heap_ratio | Percentage of the available heap that the service can hold after running GC | Only for G1GC, only tuned if heap_size is tuned | 60 Recommended | Excess memory is released back to the OS |
allowable_concurrent_analysis | No. of correlation queries that can be run concurrently |
When the system is processing a large No. of correlation queries or When the system is processing very less No. of correlation queries than the set value of allowable_concurrent_analysis |
Default: 10 Increase if there are a large number of correlation queries running in the system Decrease if there are fewer correlation queries running than the set value Decrease if you want to run analyzer service without allocating too much memory |
CPU, Memory, and Disk Read/Write increases or decreases |
max_num_queuable_analysis |
No. of correlation queries that can be queued in the buffer |
When the system is processing a large No. of correlation queries |
Default: 100 Increase if there are a large number of correlation queries running in the system |
CPU, Memory, and Disk Read/Write increases or decreases |
searcher_response_timeout | Number of seconds after which searcher/merger response will be timed out |
When there are a large number of queries getting time out |
Default: 300s | CPU, Memory, and Disk Read/Write increases or decreases |
maxClauseCount | Set the maximum number of clauses permitted per BooleanQuery. | When there are large No. of entries in the list in a search query |
Default: 1024 Recommended: No of entries in largest list * 2.5 |
Search may be slow if the maxClauseCount value is increased |
Normalizer
It checks the normalization_policy attached to the log and uses it to extract key-value pairs from the raw log in order to normalize it. It also adds log_ts to the key-value pair and checks if the log needs to be enriched or not. If enrichment is required, it forwards the log to the enrichment service. If not, it sends the log to the store handler. Normalizer can also forward the normalized log to the remote logpoint if it is set as Raw Syslog Forwarder.
If Normalizer performs poorly, it results in a buffer in the log collection pipeline and ultimately halts log collection.
Example
{
"normalizer": {"no_of_services": 8}
}
Parameter | What is it? | When to tune? | Values | Effect |
---|---|---|---|---|
no_of_services | Number of normalizer services to run |
When there is a queue in the Normalization Layer (port 5502) and each individual normalizer has large enough throughput (500 MPS) |
Default: No. of cores / 4 Min: 1 Max: No. of cores / 2 (Recommended) |
CPU and Memory increases |
File Keeper
It is a repo-dependent service, which means each repo has its own file keeper. It receives the normalized log from the store handler, stores the log in the repo (according to the routing_policy attached to the log), adds an _offset (location of the stored raw logs) to the key-value pair, and forwards it to the index searcher.
If File Keeper performs poorly, it can cause buffer in the previous layer and eventually halt the log collection.
Example
{
"file_keeper_default": {"no_of_threads": 16}
}
Parameter | What is it? | When to tune? | Values | Effect |
---|---|---|---|---|
no_of_threads | Number of File Keeper threads to run for storing logs |
When the indexing MPS is high and there is a queue in File Keeper
|
Depends on system load Default: Max(1, No. of cores / 8) Suggested Tuning: Min(No. of cores / 2, 20) |
CPU usage increases |
heap_size | Amount of heap memory allocated to Index Searcher | G1 Garbage Collector(G1GC): If Full GC is running continuously. Shenandoah GC: If memory consumption by the service is equal to the allocated heap size and If Out Of Memory (OOM) occurs frequently for the service. |
Depends on System Load, EPS, and No. of Searches. Min. 512 MB Max. 32 GB (Recommended) |
Overall system memory increases or decreases. |
max_free_heap_ratio | Percentage of the available heap that the service can hold after running GC | Only for G1GC, only tuned if heap_size is tuned | 60 Recommended | Excess memory is released back to the OS |
storage.base.path | Main repo path. Determines where to store the primary and the buffered logs. | When file keeper buffers are consistently higher than the default location at /opt/makalu/storage |
Default: /opt/makalu/storage New path should be writable by log inspect user |
Changes the location of the repo to store logs. |
max_normal_time_period |
Defines the maximum interval (in hours) up to which the difference between log_ts and current time can be considered “real-time”. If the difference between the current time and the log_ts is greater than this value, logs are treated as delayed logs and sent to OldLogsKeeper DB. |
When you want to redefine real-time logs and delayed logs for file keeper. or When you need to process old logs as real-time logs |
Default: 15 |
If this value is set low, most of the logs are treated as delayed logs If this value is set high, most of the logs are treated as real-time logs |
no_of_db_storage_threads | Number of File Keeper threads to run for handling delayed logs | When there is a large buffer in OLDLogsKeeper. |
Default: 2 Min: 1 Max: 5 (Recommended) |
CPU and Disk Read/Write increases or decreases |
max_open_files |
Determines the no. of files that File Keeper can open at one time. If the number of open files exceeds this value, the oldest entry is added to the CompressionQueue. |
When you want to change the no. of files that the file keeper can open at one time. If the value set is to too high, it can cause the "max open files reached" error. If the value set is to too low, it can cause the "Compression Queue Size Limit Reached" error. |
Default: 75 |
Number of open files increases |
Syslog Collector
It collects logs from either external devices or other logpoint machines, parses the collected logs, and forwards them to either the normfront or store handler. If Syslog Collector is performing poorly, syslog collection is impacted and can eventually stop entirely.
Example
{
"syslog_collector": {"no_of_threads": 16}
}
Parameter | What is it? | When to tune? | Values | Effect |
---|---|---|---|---|
ssl_ciphers | It is the list of supported cipher suites that are accepted by the Syslog Collector for TLS communication. | When you need to add, remove or change the supported TLS ciphers. |
Default: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 TLS_ECDHE_ECDSA_WITH_AES_256_CCM TLS_ECDHE_ECDSA_WITH_AES_128_CCM TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 |
It changes the cipher suites for TLS communication. |
no_of_threads | Number of processing threads for UDP server. |
When EPS is high but the UDP server is unable to process. Only if enough CPU resource is available. |
Default: Min(No. of cores, 8) | CPU usage increases |
queue_size | Queue size for incoming UDP logs. | IF EPS is high and UDP Task pool full warning is seen in syslog collector service. | Default: 1000000 Max: 5000000 (Recommended) |
Memory consumption increases |
Enrich_db_populator
It takes the information from enrichment sources like CSV, ThreatIntelligence, or LDAP and populates the enrichment database. enrichment database has additional data that can be added to the log. If enrich_db_populator performs poorly, it can result in logs not being enriched.
Example
{
"enrich_db_populator": {"max_total_enrich_db_size": 5}
}
Parameter | What is it? | When to tune? | Values | Effect |
---|---|---|---|---|
Max_total_enrich_db_size |
The size limit of the enrichment.db |
When the amount of enrichment data fills the default database size |
Default: 4 GB |
Disk usage increases |
Enrichment Service
It enriches logs by using the data from the enrichment database. It adds new information from the enrichment database to the log and forwards the log to the store handler. It is also called the enrichment layer and is an optional layer.
If it is not performing well, it can impact the whole log collection pipeline including the halt of log collection.
Example
{
"enrichment_service": {"heap_size": 1024}
}
Parameter | What is it? | When to tune? | Values | Effect |
---|---|---|---|---|
heap_size | Amount of heap memory allocated to Enrichment Service | G1 Garbage Collector(G1GC): If Full GC is running continuously. Shenandoah GC: If memory consumption by the service is equal to the allocated heap size and If Out Of Memory (OOM) occurs frequently for the service. |
Depends on System Load, EPS, and No. of Searches. Min: 512 MB Max: 32 GB (Recommended) |
Overall system memory increases or decreases. |
number_of_primary_threads | Number of threads used for enrichment service. | If EPS is very high and Enrichment request is very high. Only If queue exists in 5540 port. |
Default: 4 |
CPU usage increases. |
Batch Processor
It processes the files collected and forwarded by collectors/fetchers. Some of the collectors/fetchers send the files to the batch processor for processing and it parses the logs inside the file by applying the parsing rule attached to the log. If it is performing poorly, the files a collector or fetcher retrieves are not processed.
Example
{
"batch_processor": {"max_workers": 8}
}
Parameter | What is it? | When to tune? | Values | Effect |
---|---|---|---|---|
max_workers | No. of workers process for batch processor |
When it is lagging behind during log collection |
Default: 4 Min: 1 Max: No. of CPU / 4 (Recommended) |
CPU and Memory usage increases |
Autotuner
Autotuner is a service that is responsible for automatically tuning service parameters depending on the requirements of the system. It uses a certain threshold number to determine if the parameters need tuning or not.
Example
{
"autotuner": {"min_allowed_throughput": 60}
}
Parameter | What is it? | When to tune? | Values | Effect |
---|---|---|---|---|
heap_increment_pct | Percentage of heap to be increased during one up tuning | When a service is constantly facing a Lack of heap memory issue |
Default: Min of 512MB(for system < 64 GB RAM) or 1024MB(for system > 64 GB RAM). or 20% |
Increases the heap memory by the applied percentage. |
min_allowed_throughput |
Minimum threshold for service throughput, after which Autotuner tunes the parameter |
When a service is not performing efficiently | Default: 50% | Increases the heap if service efficiency is below the set threshold. |
restart_action | Whether to restart the service, when the threshold is below min_allowed_throughput | To allow Autotuner to restart the service |
Default: Enable and Disable |
It allows the autotuner to restart the service. |
restart_count_threshold |
Minimum threshold for the number of times a service is restarted before increasing its heap size |
When you want to increase heap size immediately after a service restarts once or after it restarts multiple times |
Default: 5 | It increases the heap size after a service restarts for set no. of times |
run_interval |
The frequency the autotuner service loop should run |
To change the number of frequency autotuner runs to check for service throughput | Default: 300s | Autotuner runs in the specified interval. |
Comments
Article is closed for comments.