Logpoint Service Parameter Tuning – Logpoint Service Desk

Introduction

Logpoint is a platform consisting of multiple products. Each product is made up of interconnected components. These individual components, which have their own functionalities, are named Service.

A Service is designed to perform a specific function and works with other services to provide a fully functional Logpoint. If any one of the services is not performing efficiently, it can impact other dependent services, leading to issues in Logpoint's functionalities. For that reason, it is essential for these services to always function efficiently.

To ensure the overall efficiency of Logpoint, these services can be tuned. For example, the percentage of CPU and RAM a service can use. Increasing or decreasing the CPU and RAM percentage will prevent bottlenecks for the service but also not underutilize the allocated CPU and RAM. If one of the services in your Logpoint doesn’t use much resources, its CPU and RAM can be set to low so that other services can utilize that memory and CPU. However, if that service is under heavy load, it can be tuned to use the maximum CPU and RAM so the service does not cause a system bottleneck. For that reason, service parameters are configurable to match system load and requirements.

Every service has its own parameters stored in Service Config File. Configuring a parameter can optimize the performance of a service leading to an overall efficient system.

Service Config File

Each service has its own config file that stores all the configuration parameters that are important for a service to run. These config files are generated every time the system boots up or when there are any changes in the MongoDB collection that needs to be reflected in the config files. The config file stores all the configuration parameters that a service is using to run. If the configuration changes and the service does not respond to the change, it may be because of the config file. In such cases, Logpoint recommends manually regenerating the config file. Regenerating the config file ensures that a service responds to the change in the configuration parameters.

Service config regeneration command:

/opt/immune/installed/config_updater/apps/config_updater/regenerate_config.sh

Autotuner

Autotuner is a service that automatically tunes service parameters depending on system requirements. It uses a certain threshold to determine whether the parameter need tuning or not. For example: If the premerger service is using up all the allocated RAM and is facing a lack of memory issue, then Autotuner can tune the parameter to increase the allocated RAM and ensure the smooth working of the service. After Logpoint v7.2.0, Autotuner is significantly enhanced to tune the service based on the requirement and the complexity of the task it is performing. If the service is not giving results at the set efficiency, then autotuner can add memory to the service to increase its efficiency and enhance the performance of the service and overall system performance. Autotuner can now dynamically restart a service or allocate additional memory to it if the performance of a service falls below a defined threshold, preventing service crashes and ensuring stable performance.

Manual Tuning of Service

Autotuner can tune the parameters of the service to prevent service crashes and increase service efficiency but it is not perfect. Depending on the use case and system requirements, human intervention through manual tuning is essential to ensure every service works efficiently. Manual tuning is performed from the lp_service_config file.

LP Services Config

You can also tune the services from one config file lp_services_config.json located at /opt/immune/storage

To do so,

Access Logpoint via the terminal using ssh support@<Machine IP> and enter the support password
Go to /opt/immune/storage and see if there is a lp_services_config.json. By default, it is not there.
Create a file lp_services_config.json and add the parameter in proper JSON format
Regenerate the config using the command: /opt/immune/bin/lido /opt/immune/installed/config-updater/apps/config_updater/regenerate_all.sh
Restart the corresponding service: sudo sv restart /opt/immune/etc/service/<service_name>

Syntax:

{
"<service_name>": {"<parameter>": "<value>"}
}

Tuning one service parameter:

{
"merger": {"heap_size": 10240}
}

Tuning multiple service parameters:

{
"premerger": {"heap_size": 4096, "jsonThreads":4},
"index_searcher_PaloAlto":{"heap_size":10240, "num_of_indexing_threads": 5, "num_index_cache": 32},
"normalizer": {"no_of_services": 20},
"file_keeper":{"no_of_threads": 3,"heap_size": 2048}
}

Frequently Tuned Services

These are the commonly tuned services:

Premerger
Merger
Analyzer
Indexsearcher
Normalizer
File Keeper
Syslog Collector
Enrich_db_populator
Autotuner
Batch Processor
Enrichment Service

Premerger

It is responsible for processing the queries used in alerts and dashboards. It sends the query request, receives the query result, aggregates the result, and gives the response. Based on the response, the dashboard is populated or an alert incident is triggered.

If Premerger is not running efficiently, it affects alert and dashboard. For that reason, it is crucial to tune its parameters to ensure optimal performance.

Example

{
"premerger": {"heap_size": 1024} 
}

Parameter

What is it?

When to tune?

Values

Effect

heap_size

Amount of heap memory allocated to Premerger

G1 Garbage Collector(G1GC): If Full GC is running continuously

Shenandoah GC: If memory consumption by the service is equal to the allocated heap size and If Out Of Memory (OOM) occurs frequently for the service

Depends on System Load, EPS, and No. of Searches.
Min. 512 MB
Max. 32 GB (Recommended)

Overall system memory increases or decreases

max_free_heap_ratio

Percentage of available heap the service can hold after running GC

Only for G1GC, only tuned if heap_size is tuned

60 Recommended

Excess memory is released back to the OS

maxClauseCount

Maximum number of clauses permitted per BooleanQuery

When there are large no. of entries in a search query list

Default: 1024

Recommended: Number of entries in biggest list * 2.5

Search may be slow if the value is increased

Merger

Merger takes the request from the requester, forwards it to IndexSearcher, merges all the responses, and gives the response back to the requester. The requester can be any other services such as Permerger, Websearcher, or API. Merger takes those requests and sends them to the respective service to give the response.

If Merger is not running efficiently, it affects alert, dashboard, search and report.

Example

{
"merger": {"heap_size": 1024} 
}

Parameter	What is it?	When to tune?	Values	Effect
heap_size	Amount of heap memory allocated to Merger.	G1 Garbage Collector(G1GC): If Full GC is running continuously. Shenandoah GC: If memory consumption by the service is equal to the allocated heap size and If Out Of Memory (OOM) occurs frequently for the service.	Depends on System Load, EPS, and No. of Searches. Min. 512 MB Max. 32 GB (Recommended)	Overall system memory increases or decreases.
max_free_heap_ratio	Percentage of the available heap that the service can hold after running GC	Only for G1GC, only tuned if heap_size is tuned	60 Recommended	Excess memory is released back to the OS
no_of_threads	Number of Merger services to run parallelly in different threads	Logpoint has enough resources and a large number of searches	Default: Depends on the No. of cores. Min: 1 Max: Number of cores / 2 (Recommended)	Increase in CPU/Memory usage

Index Searcher

Index Searcher performs two functions: indexing and searching. It indexes the normalized log and searches the indexed log. It is a repo-dependent service, which means each repo has its own index searcher.

Indexing

It receives the normalized log forwarded by the file keeper, indexes it based on log_ts, adds index_ts to the key-value pair, and stores it in the storage. index_ts is the time at which the log is indexed by the index searcher.

Searching

When a search request comes to the index searcher, it searches the index and returns the log as per the search query.

If IndexSearcher is down or performing poorly, it can affect alerts, search, and dashboards. At worst, incidents aren’t generated, search doesn’t work and dashboards aren’t populated.

Example

{
"index_searcher_default": {"heap_size": 1024}
}

Parameter	What is it?	When to tune?	Values	What's its effect?
heap_size	Amount of heap memory allocated to Index Searcher	G1 Garbage Collector(G1GC): If Full GC is running continuously. Shenandoah GC: If memory consumption by the service is equal to the allocated heap size and If Out Of Memory (OOM) occurs frequently for the service.	Depends on System Load, EPS, and No. of Searches. Min. 512 MB Max. 32 GB (Recommended)	Overall system memory increases or decreases.
max_free_heap_ratio	Percentage of the available heap that the service can hold after running GC	Only for G1GC, only tuned if heap_size is tuned	60 Recommended	Excess memory is released back to the OS
merge_factor	Defines how often the segments are merged. Default value is 10, a new segment is created for every 10 documents. When the number of segments reaches 10, the segments themselves are merged to create a single segment.	When there are a lot of active merging thread or When merging takes significant time	Default 10 Increase when there are a lot of active merging threads and decrease when there are fewer active merging threads Increase when the merge time is high and decrease when it is low	Increasing it will increase CPU and Disk Read/Write
num_of_indexing_threads	Number of indexing threads to run for indexing logs	When indexing MPS is high, and a queue is seen in the index searcher	Default: Max(1, No. of cores / 8) Suggested Tuning: Min(No. of cores / 2, 20)	CPU usage increases or decreases
num_live_threads	Number of searching threads to run for the search	When there are large No. of search requests in the index searcher's benchmarker log or If no. of responses is significantly less than no. of requests in the premerger benchmarker log	Default: Max(10, No.of cores / 2) Suggested Tuning: Max(No. of cores)	CPU usage increases or decreases Heap size increases or decreases
num_index_cache	Maximum number of indexes that a cache can hold.	If Logpoint has a large no. of live searches with a large time range and When there is sufficient system memory available	Depends on the average time range/search interval of the alert Default: Max(5, No.of cores / 2)	Heap memory of index searcher increases
maxClauseCount	Set the maximum number of clauses permitted per BooleanQuery.	When there are large no. of entries in the search query list	Default: 1024 Recommended: No of entries in largest list * 2.5	Search may be slow if the maxClauseCount value increases
num_of_db_indexing_threads	Number of indexing threads to run for indexing delayed logs (logs older than maxNormalTimePeriod defined in file keeper)	Increase when indexing MPS is high, and a queue is seen in the index searcher Decrease when indexing MPS is low and there isn't any queue	Depends on system load Default: Max(1, No. of cores/8) Suggested Tuning: Min(No. of cores / 2, 20)	CPU usage increases or decreases

Analyzer

It processes the pattern finding query. When the merger gets the pattern finding query, it forwards it to the analyzer. Analyzer can process a maximum of 10 of these queries at a time.
If the Analyzer service performs poorly, alerts and dashboards with pattern finding queries are affected

Example

{
"analyzer": {"heap_size": 1024}
}

Parameter	What is it?	When to tune?	Values	Effect
heap_size	Amount of heap memory allocated to Index Searcher	G1 Garbage Collector(G1GC): If Full GC is running continuously. Shenandoah GC: If memory consumption by the service is equal to the allocated heap size and If Out Of Memory (OOM) occurs frequently for the service.	Depends on System Load, EPS, and No. of Searches. Min. 512 MB Max. 32 GB (Recommended)	Overall system memory increases or decreases.
max_free_heap_ratio	Percentage of the available heap that the service can hold after running GC	Only for G1GC, only tuned if heap_size is tuned	60 Recommended	Excess memory is released back to the OS
allowable_concurrent_analysis	No. of correlation queries that can be run concurrently	When the system is processing a large No. of correlation queries or When the system is processing very less No. of correlation queries than the set value of allowable_concurrent_analysis	Default: 10 Increase if there are a large number of correlation queries running in the system Decrease if there are fewer correlation queries running than the set value Decrease if you want to run analyzer service without allocating too much memory	CPU, Memory, and Disk Read/Write increases or decreases
max_num_queuable_analysis	No. of correlation queries that can be queued in the buffer	When the system is processing a large No. of correlation queries	Default: 100 Increase if there are a large number of correlation queries running in the system	CPU, Memory, and Disk Read/Write increases or decreases
searcher_response_timeout	Number of seconds after which searcher/merger response will be timed out	When there are a large number of queries getting time out	Default: 300s	CPU, Memory, and Disk Read/Write increases or decreases
maxClauseCount	Set the maximum number of clauses permitted per BooleanQuery.	When there are large No. of entries in the list in a search query	Default: 1024 Recommended: No of entries in largest list * 2.5	Search may be slow if the maxClauseCount value is increased

Normalizer

It checks the normalization_policy attached to the log and uses it to extract key-value pairs from the raw log in order to normalize it. It also adds log_ts to the key-value pair and checks if the log needs to be enriched or not. If enrichment is required, it forwards the log to the enrichment service. If not, it sends the log to the store handler. Normalizer can also forward the normalized log to the remote logpoint if it is set as Raw Syslog Forwarder.

If Normalizer performs poorly, it results in a buffer in the log collection pipeline and ultimately halts log collection.

Example

{
"normalizer": {"no_of_services": 8}
}

Parameter

What is it?

When to tune?

Values

Effect

no_of_services

Number of normalizer services to run

When there is a queue in the Normalization Layer (port 5502) and each individual normalizer has large enough throughput (500 MPS)

Default: No. of cores / 4

Min: 1

Max: No. of cores / 2 (Recommended)

CPU and Memory increases

File Keeper

It is a repo-dependent service, which means each repo has its own file keeper. It receives the normalized log from the store handler, stores the log in the repo (according to the routing_policy attached to the log), adds an _offset (location of the stored raw logs) to the key-value pair, and forwards it to the index searcher.

If File Keeper performs poorly, it can cause buffer in the previous layer and eventually halt the log collection.

Example

{
"file_keeper_default": {"no_of_threads": 16}
}

Parameter	What is it?	When to tune?	Values	Effect
no_of_threads	Number of File Keeper threads to run for storing logs	When the indexing MPS is high and there is a queue in File Keeper	Depends on system load Default: Max(1, No. of cores / 8) Suggested Tuning: Min(No. of cores / 2, 20)	CPU usage increases
heap_size	Amount of heap memory allocated to Index Searcher	G1 Garbage Collector(G1GC): If Full GC is running continuously. Shenandoah GC: If memory consumption by the service is equal to the allocated heap size and If Out Of Memory (OOM) occurs frequently for the service.	Depends on System Load, EPS, and No. of Searches. Min. 512 MB Max. 32 GB (Recommended)	Overall system memory increases or decreases.
max_free_heap_ratio	Percentage of the available heap that the service can hold after running GC	Only for G1GC, only tuned if heap_size is tuned	60 Recommended	Excess memory is released back to the OS
storage.base.path	Main repo path. Determines where to store the primary and the buffered logs.	When file keeper buffers are consistently higher than the default location at /opt/makalu/storage	Default: /opt/makalu/storage New path should be writable by log inspect user	Changes the location of the repo to store logs.
max_normal_time_period	Defines the maximum interval (in hours) up to which the difference between log_ts and current time can be considered “real-time”. If the difference between the current time and the log_ts is greater than this value, logs are treated as delayed logs and sent to OldLogsKeeper DB.	When you want to redefine real-time logs and delayed logs for file keeper. or When you need to process old logs as real-time logs	Default: 15	If this value is set low, most of the logs are treated as delayed logs If this value is set high, most of the logs are treated as real-time logs
no_of_db_storage_threads	Number of File Keeper threads to run for handling delayed logs	When there is a large buffer in OLDLogsKeeper.	Default: 2 Min: 1 Max: 5 (Recommended)	CPU and Disk Read/Write increases or decreases
max_open_files	Determines the no. of files that File Keeper can open at one time. If the number of open files exceeds this value, the oldest entry is added to the CompressionQueue.	When you want to change the no. of files that the file keeper can open at one time. If the value set is to too high, it can cause the "max open files reached" error. If the value set is to too low, it can cause the "Compression Queue Size Limit Reached" error.	Default: 75	Number of open files increases

Syslog Collector

It collects logs from either external devices or other logpoint machines, parses the collected logs, and forwards them to either the normfront or store handler. If Syslog Collector is performing poorly, syslog collection is impacted and can eventually stop entirely.

Example

{
"syslog_collector": {"no_of_threads": 16}
}

Parameter

What is it?

When to tune?

Values

Effect

ssl_ciphers

It is the list of supported cipher suites that are accepted by the Syslog Collector for TLS communication.

When you need to add, remove or change the supported TLS ciphers.

Default:

TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256

TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256

TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256

TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384

TLS_ECDHE_ECDSA_WITH_AES_256_CCM

TLS_ECDHE_ECDSA_WITH_AES_128_CCM

TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256

It changes the cipher suites for TLS communication.

no_of_threads

Number of processing threads for UDP server.

When EPS is high but the UDP server is unable to process. Only if enough CPU resource is available.

Default: Min(No. of cores, 8)

CPU usage increases

queue_size

Queue size for incoming UDP logs.

IF EPS is high and UDP Task pool full warning is seen in syslog collector service.

Default: 1000000
Max: 5000000 (Recommended)

Memory consumption increases

Enrich_db_populator

It takes the information from enrichment sources like CSV, ThreatIntelligence, or LDAP and populates the enrichment database. enrichment database has additional data that can be added to the log. If enrich_db_populator performs poorly, it can result in logs not being enriched.

Example

{
"enrich_db_populator": {"max_total_enrich_db_size": 5}
}

Parameter	What is it?	When to tune?	Values	Effect
Max_total_enrich_db_size	The size limit of the enrichment.db	When the amount of enrichment data fills the default database size	Default: 4 GB	Disk usage increases

Enrichment Service

It enriches logs by using the data from the enrichment database. It adds new information from the enrichment database to the log and forwards the log to the store handler. It is also called the enrichment layer and is an optional layer.

If it is not performing well, it can impact the whole log collection pipeline including the halt of log collection.

Example

{
"enrichment_service": {"heap_size": 1024}
}

Parameter	What is it?	When to tune?	Values	Effect
heap_size	Amount of heap memory allocated to Enrichment Service	G1 Garbage Collector(G1GC): If Full GC is running continuously. Shenandoah GC: If memory consumption by the service is equal to the allocated heap size and If Out Of Memory (OOM) occurs frequently for the service.	Depends on System Load, EPS, and No. of Searches. Min: 512 MB Max: 32 GB (Recommended)	Overall system memory increases or decreases.
number_of_primary_threads	Number of threads used for enrichment service.	If EPS is very high and Enrichment request is very high. Only If queue exists in 5540 port.	Default: 4 Min: 2 Max: No. of cores (Recommended)	CPU usage increases.

Batch Processor

It processes the files collected and forwarded by collectors/fetchers. Some of the collectors/fetchers send the files to the batch processor for processing and it parses the logs inside the file by applying the parsing rule attached to the log. If it is performing poorly, the files a collector or fetcher retrieves are not processed.

Example

{
"batch_processor": {"max_workers": 8}
}

Parameter

What is it?

When to tune?

Values

Effect

max_workers

No. of workers process for batch processor

When it is lagging behind during log collection

Default: 4

Min: 1

Max: No. of CPU / 4 (Recommended)

CPU and Memory usage increases

Autotuner

Autotuner is a service that is responsible for automatically tuning service parameters depending on the requirements of the system. It uses a certain threshold number to determine if the parameters need tuning or not.

Example

{
"autotuner": {"min_allowed_throughput": 60}
}

Parameter	What is it?	When to tune?	Values	Effect
heap_increment_pct	Percentage of heap to be increased during one up tuning	When a service is constantly facing a Lack of heap memory issue	Default: Min of 512MB(for system < 64 GB RAM) or 1024MB(for system > 64 GB RAM). or 20%	Increases the heap memory by the applied percentage.
min_allowed_throughput	Minimum threshold for service throughput, after which Autotuner tunes the parameter	When a service is not performing efficiently	Default: 50%	Increases the heap if service efficiency is below the set threshold.
restart_action	Whether to restart the service, when the threshold is below min_allowed_throughput	To allow Autotuner to restart the service	Default: Enable and Disable	It allows the autotuner to restart the service.
restart_count_threshold	Minimum threshold for the number of times a service is restarted before increasing its heap size	When you want to increase heap size immediately after a service restarts once or after it restarts multiple times	Default: 5	It increases the heap size after a service restarts for set no. of times
run_interval	The frequency the autotuner service loop should run	To change the number of frequency autotuner runs to check for service throughput	Default: 300s	Autotuner runs in the specified interval.

Introduction

Service Config File

Autotuner

Manual Tuning of Service

LP Services Config

Frequently Tuned Services

Premerger

Example

Merger

Example

Index Searcher

Indexing

Searching

Example

Analyzer

Example

Normalizer

Example

File Keeper

Example

Syslog Collector

Example

Enrich_db_populator

Example

Enrichment Service

Example

Batch Processor

Example

Autotuner

Example

Comments

Related articles