Setting Alarm

English | ν•œκΈ€

Alarm

Application's status is periodically checked and alarm is triggered if certain pre-configured conditions (rules) are satisfied.

pinpoint-batch server checks every 3 minutes based on the last 5 minutes of data. And if the conditions are satisfied, it sends sms/email/webhook to the users listed in the user group.

If an email/sms/webhook is sent everytime when a threshold is exceeded, we felt that alarm message would be spammable. Therefore we decided to gradually increase the transmission frequency for alarms. ex) If an alarm occurs continuously, transmission frequency is increased by a factor of two. 3 min -> 6min -> 12min -> 24min

NOTICE!

batch was run in the background of pinpoint-web server until v2.2.0 From v2.2.1 it will be dealt with in pinpoint-batch server. Since the batch logic(code) in pinpoint-web will be deprecated in the future, we advice you to transfer the execution of batch to pinpoint-batch server.

1. User Guide

Alarm Rules

SLOW COUNT
   Triggered when the number of slow requests sent to the application exceeds the configured threshold.

SLOW RATE
   Triggered when the percentage(%) of slow requests sent to the application exceeds the configured threshold.

ERROR COUNT
   Triggered when the number of failed requests sent to the application exceeds the configured threshold.

ERROR RATE
   Triggered when the percentage(%) of failed requests sent to the application exceeds the configured threshold.

TOTAL COUNT
   Triggered when the number of all requests sent to the application exceeds the configured threshold.

SLOW COUNT TO CALLEE
   Triggered when the number of slow requests sent by the application exceeds the configured threshold.
   You must specify the domain or the address(ip, port) in the configuration UI's "Note..." box 
   ex) www.naver.com, 127.0.0.1:8080

SLOW RATE TO CALLEE
   Triggered when the percentage(%) of slow requests sent by the application exceeds the configured threshold.
   You must specify the domain or the address(ip, port) in the configuration UI's "Note..." box 
   ex) www.naver.com, 127.0.0.1:8080

ERROR COUNT TO CALLEE
   Triggered when the number of failed requests sent by the application exceeds the configured threshold.
   You must specify the domain or the address(ip, port) in the configuration UI's "Note..." box 
   ex) www.naver.com, 127.0.0.1:8080

ERROR RATE TO CALLEE
   Triggered when the percentage(%) of failed requests sent by the application exceeds the configured threshold.
   You must specify the domain or the address(ip, port) in the configuration UI's "Note..." box 
   ex) www.naver.com, 127.0.0.1:8080

TOTAL COUNT TO CALLEE
   Triggered when the number of all requests sent by the application exceeds the configured threshold.
   You must specify the domain or the address(ip, port) in the configuration UI's "Note..." box 
   ex) www.naver.com, 127.0.0.1:8080

HEAP USAGE RATE
   Triggered when the application's heap usage(%) exceeds the configured threshold.

JVM CPU USAGE RATE
   Triggered when the application's CPU usage(%) exceeds the configured threshold.

SYSTEM CPU USAGE RATE
   Sends an alarm when the application's CPU usage(%) exceeds the configured threshold.

DATASOURCE CONNECTION USAGE RATE
   Triggered when the application's DataSource connection usage(%) exceeds the configured threshold.

FILE DESCRIPTOR COUNT
   Sends an alarm when the number of open file descriptors exceeds the configured threshold.

2. Configuration & Implementation

Alarms generated by Pinpoint can be configured to be sent over email, sms and webhook.

Sending alarms over email is simple - you will simply need to configure the property file. Sending alarms over sms requires some implementation. Read on to find out how to do this. The alarm using webhook requires an webhook receiver service to get webhook messages. You should implement the webhook receiver service - which is not provided by Pinpoint, or You can use the sample project

Few modifications are required in pinpoint-batch and pinpoint-web to use the alarm feature. Add some implementations and settings in pinpoint-batch. Configure Pinpoint-web for user to set an alarm settings.

2.1 Configuration & Implementation in pinpoint-batch

2.1.1) Email configuration, sms and webhook implementation

A. Email alarm service

To use the mailing feature, you need to configure the SMTP server information and information to be included in the email in the batch-root.properties file.

pinpoint.url= #pinpoint-web server url
alarm.mail.server.url= #smtp server address
alarm.mail.server.port= #smtp server port
alarm.mail.server.username= #username for smtp server authentication
alarm.mail.server.password= #password for smtp server authentication
alarm.mail.sender.address= #sender's email address

ex)
pinpoint.url=http://pinpoint.com
alarm.mail.server.url=stmp.server.com
alarm.mail.server.port=587
alarm.mail.server.username=pinpoint
alarm.mail.server.password=pinpoint
alarm.mail.sender.address=pinpoint_operator@pinpoint.com

The class that sends emails is already registered as Spring bean in applicationContext-batch-sender.xml.

    <bean id="mailSender" class="com.navercorp.pinpoint.batch.alarm.SpringSmtpMailSender">
        <constructor-arg ref="batchConfiguration"/>
        <constructor-arg ref="userGroupService"/>
        <constructor-arg ref="javaMailSenderImpl"/>
    </bean>

    <bean id="javaMailSenderImpl" class="org.springframework.mail.javamail.JavaMailSenderImpl">
        <property name="host" value="${alarm.mail.server.url:}" />
        <property name="port" value="${alarm.mail.server.port:587}" />
        <property name="username" value="${alarm.mail.server.username:}" />
        <property name="password" value="${alarm.mail.server.password:}" />
        <property name="javaMailProperties">
            <props>
                <prop key="mail.transport.protocol">${alarm.mail.transport.protocol:}</prop>
                <prop key="mail.smtp.port">${alarm.mail.smtp.port:}</prop>
                <prop key="mail.smtp.from">${alarm.mail.sender.address:}</prop>
                <prop key="mail.smtp.auth">${alarm.mail.smtp.auth:false}</prop>
                <prop key="mail.smtp.starttls.enable">${alarm.mail.smtp.starttls.enable:false}</prop>
                <prop key="mail.smtp.starttls.required">${alarm.mail.smtp.starttls.required:false}</prop>
                <prop key="mail.debug">${alarm.mail.debug:false}</prop>
            </props>
        </property>
    </bean>

If you would like to implement your own mail sender, simply replace the SpringSmtpMailSender, JavaMailSenderImpl beans above with your own implementation that implements com.navercorp.pinpoint.web.alarm.MailSender interface.

public interface MailSender {
   void sendEmail(AlarmChecker checker, int sequenceCount);
}

B. Sms alarm service

To send alarms over sms, you will need to implement your own sms sender by implementing com.navercorp.pinpoint.batch.alarm.SmsSender interface. If there is no SmsSender implementation, then alarms will not be sent over sms.

public interface SmsSender {
    public void sendSms(AlarmChecker checker, int sequenceCount);
}

C. Webhook alarm service

Webhook alarm service is a feature that can transmit Pinpoint's alarm message through Webhook API.

The webhook receiver service that receives the webhook message should be implemented by your own, or use a sample project provided (in this case Slack).

The alarm messages(refer to as payloads) sent to webhook receiver have the different schema depending on the Alarm Checker type. You can see the payload schemas in 3.Others - The Specification of webhook payloads and the examples.

To enable the webhook alarm service, You need to configure webhook.enable and webhook.receiver.url in batch-root.properties file.

# webhook config
webhook.enable=true
webhook.receiver.url=http://www.webhookexample.com/alarm/

NOTICE!

As the webhook alarm service has been available from Pinpoint 2.1.1, You should add column 'webhook_send' in table 'alarm_rule' of pinpoint MYSQL if you updated previous release of Pinpoint 2.2.1.

SQL : ALTER TABLE alarm_rule ADD COLUMN webhook_send CHAR(1) DEFAULT NULL;

The class in charge of sending the webhook is WebhookSenderImpl which Pinpoint provides.

WebhookSender class is added in applicationContext-batch-sender.xml of Pinpoint-batch.

   <bean id="webHookSender" class="com.navercorp.pinpoint.web.alarm.WebhookSenderImpl">
        <constructor-arg ref="batchConfiguration"/>
        <constructor-arg ref="userServiceImpl"/>
        <constructor-arg ref="restTemplate" />
    </bean>

2.1.2) Configuring MYSQL

step 1

Prepare MYSQL Instance to persist the alarm service metadata.

step 2

Set up a MYSQL server and configure connection information in jdbc-root.properties file.

jdbc.driverClassName=com.mysql.jdbc.Driver
jdbc.url=jdbc:mysql://localhost:13306/pinpoint
jdbc.username=admin
jdbc.password=admin

step 3

Create tables for the alarm service. Use below DDL files.

2.1.3) How to execute pinpoint-batch

The pinpoint-batch project is based on spring boot and can be executed with the following command. After build, the executable file is placed under the target/deploy folder of the pinpoint-batch.

java -Dspring.profiles.active=XXXX -jar pinpoint-batch-VERSION.jar 

ex) java -Dspring.profiles.active=local -jar pinpoint-batch-2.1.1.jar

2.2 How to configure pinpoint-web

2.2.1) Configuring MYSQL Server IP

In order to persist user alarm settings, set the mysql connection information in jdbc-root.properties file in pinpoint-web.

jdbc.driverClassName=com.mysql.jdbc.Driver
jdbc.url=jdbc:mysql://localhost:13306/pinpoint
jdbc.username=admin
jdbc.password=admin

2.2.2) Enabling Webhook Alarm Service

Set webhook.enable in batch-root.properties as true for user to configure the webhook alarm in Alarm menu.

# webhook config
webhook.enable=true

As you enable the webhook alarm service, You can set the webhook as alarm type. See the below.

3. Others

3.1 Configuration, Execution, Performance.

1) You may change the batch execution period by modifying the cron expression in applicationContext-batch-schedule.xml file

<task:scheduled-tasks scheduler="scheduler">
    <task:scheduled ref="batchJobLauncher" method="alarmJob" cron="0 0/3 * * * *" />
</task:scheduled-tasks>

2) Ways to improve alarm batch performance The alarm batch was designed to run concurrently. If you have a lot of applications with alarms registered, you may increase the size of the executor's thread pool by modifying pool-size in applicationContext-alarmJob.xml file.

Note that increasing this value will result in higher resource usage.

<task:executor id="poolTaskExecutorForPartition" pool-size="1" />

If there are a lot of alarms registered to applications, you may set the alarmStep registered in applicationContext-alarmJob.xml file to run concurrently.

<step id="alarmStep" xmlns="http://www.springframework.org/schema/batch">
    <tasklet task-executor="poolTaskExecutorForStep" throttle-limit="3">
        <chunk reader="reader" processor="processor" writer="writer" commit-interval="1"/>
    </tasklet>
</step>
<task:executor id="poolTaskExecutorForStep" pool-size="10" />

3) Use quickstart's web Pinpoint Web uses Mysql to persist users, user groups, and alarm configurations. However Quickstart uses MockDAO to reduce memory usage. Therefore if you want to use Mysql for Quickstart, please refer to Pinpoint Web's applicationContext-dao-config.xml, jdbc.properties.

3.2 Details on Webhook

3.2.1) webhook receiver sample project

Slack-Receiver is an example project of the webhook receiver. The project can receives alarm of the Pinpoint webhook and sends the message to Slack. If you want more details, see the project repository

3.2.2) The Specification of webhook payloads and the examples

The Schemas of webhook payloads

Key

UserGroup

Checker

UserMember

DetectedAgent

DataSourceAlarm

The Examples of the webhook Payload

LongValueAlarmChecker

{
 "pinpointUrl": "http://pinpoint.com",
 "batchEnv": "release",
 "applicationId": "TESTAPP",
 "serviceType": "TOMCAT",
 "userGroup": {
   "userGroupId": "Group-1",
   "userGroupMembers": [
     {
       "id": "msk1111",
       "name": "minsookim",
       "email": "pinpoint@naver.com",
       "department": "Platform",
       "phoneNumber": "01012345678",
       "phoneCountryCode": 82
     }
   ]
 },
 "checker": {
   "name": "TOTAL COUNT",
   "type": "LongValueAlarmChecker",
   "detectedValue": 33
 },
 "unit": "",
 "threshold": 15,
 "notes": "Note Example",
 "sequenceCount": 4
}

LongValueAgentChecker

{
 "pinpointUrl": "http://pinpoint.com",
 "batchEnv": "release",
 "applicationId": "TESTAPP",
 "serviceType": "TOMCAT",
 "userGroup": {
   "userGroupId": "Group-1",
   "userGroupMembers": [
     {
       "id": "msk1111",
       "name": "minsookim",
       "email": "pinpoint@naver.com",
       "department": "Platform",
       "phoneNumber": "01012345678",
       "phoneCountryCode": 82
     }
   ]
 },
 "checker": {
   "name": "HEAP USAGE RATE",
   "type": "LongValueAgentChecker",
   "detectedValue": [
     {
       "agentId": "test-agent",
       "agentValue": 8
     }
   ]
 },
 "unit": "%",
 "threshold": 5,
 "notes": "Note Example",
 "sequenceCount": 4
}

BooleanValueAgentChecker

{
 "pinpointUrl": "http://pinpoint.com",
 "batchEnv": "release",
 "applicationId": "TESTAPP",
 "serviceType": "TOMCAT",
 "userGroup": {
   "userGroupId": "Group-1",
   "userGroupMembers": [
     {
       "id": "msk1111",
       "name": "minsookim",
       "email": "pinpoint@naver.com",
       "department": "Platform",
       "phoneNumber": "01012345678",
       "phoneCountryCode": 82
     }
   ]
 },
 "checker": {
   "name": "DEADLOCK OCCURRENCE",
   "type": "BooleanValueAgentChecker",
   "detectedValue": [
     {
       "agentId": "test-agent",
       "agentValue": true
     }
   ]
 },
 "unit": "BOOLEAN",
 "threshold": 1,
 "notes": "Note Example",
 "sequenceCount": 4
}

DataSourceAlarmListValueAgentChecker

{
 "pinpointUrl": "http://pinpoint.com",
 "batchEnv": "release",
 "applicationId": "TESTAPP",
 "serviceType": "TOMCAT",
 "userGroup": {
   "userGroupId": "Group-1",
   "userGroupMembers": [
     {
       "id": "msk1111",
       "name": "minsookim",
       "email": "pinpoint@naver.com",
       "department": "Platform",
       "phoneNumber": "01012345678",
       "phoneCountryCode": 82
     }
   ]
 },
 "checker": {
   "name": "DATASOURCE CONNECTION USAGE RATE",
   "type": "DataSourceAlarmListValueAgentChecker",
   "detectedValue": [
     {
       "agentId": "test-agent",
       "agentValue": [
                 {
                     "databaseName": "test",
                     "connectionValue": 32
                 }
        ]
     }
   ]
 },
 "unit": "%",
 "threshold": 16,
 "notes": "Note Example",
 "sequenceCount": 4
}

Alarm

pinpointλŠ” application μƒνƒœλ₯Ό 주기적으둜 μ²΄ν¬ν•˜μ—¬ application μƒνƒœμ˜ μˆ˜μΉ˜κ°€ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•  경우 μ‚¬μš©μžμ—κ²Œ μ•ŒλžŒμ„ μ „μ†‘ν•˜λŠ” κΈ°λŠ₯을 μ œκ³΅ν•œλ‹€.

application μƒνƒœ 값이 μ‚¬μš©μžκ°€ μ„€μ •ν•œ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•˜λŠ”μ§€ νŒλ‹¨ν•˜λŠ” batchλŠ” pinpoint-batchμ—μ„œ λ™μž‘ ν•œλ‹€. alarm batchλŠ” 기본적으둜 3뢄에 ν•œλ²ˆμ”© λ™μž‘μ΄ λœλ‹€. 졜근 5λΆ„λ™μ•ˆμ˜ 데이터λ₯Ό μˆ˜μ§‘ν•΄μ„œ alarm 쑰건을 λ§Œμ‘±ν•˜λ©΄ user group에 μ†ν•œ user λ“€μ—κ²Œ sms/email/webhook messageλ₯Ό μ „μ†‘ν•œλ‹€.

μ—°μ†μ μœΌλ‘œ μ•ŒλžŒ 쑰건이 μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ κ²½μš°μ— 맀번 sms/email/webhookλ₯Ό μ „μ†‘ν•˜μ§€ μ•ŠλŠ”λ‹€. μ•ŒλžŒ 쑰건이 λ§Œμ‘±ν• λ•Œλ§ˆλ‹€ 맀번 sms/email/webhook이 μ „μ†‘λ˜λŠ”κ²ƒμ€ 였히렀 λ°©ν•΄κ°€ λœλ‹€κ³  μƒκ°ν•˜κΈ° λ•Œλ¬Έμ΄λ‹€. κ·Έλž˜μ„œ μ—°μ†ν•΄μ„œ μ•ŒλžŒμ΄ λ°œμƒν•  경우 sms/email/webhook 전솑 μ£ΌκΈ°κ°€ 점증적으둜 μ¦κ°€λœλ‹€. 예) μ•ŒλžŒμ΄ μ—°μ†ν•΄μ„œ λ°œμƒν•  경우, 전솑 μ£ΌκΈ°λŠ” 3λΆ„ -> 6λΆ„ -> 12λΆ„ -> 24λΆ„ 으둜 μ¦κ°€ν•œλ‹€.

μ•Œλ¦Ό

batchλŠ” pinpoint 2.2.0 λ²„μ „κΉŒμ§€λŠ” pinpoint-webμ—μ„œ λ™μž‘λ˜μ—ˆμ§€λ§Œ, 2.2.1 버전 λΆ€ν„°λŠ” batchκ°€ pinpoint-batchμ—μ„œ λ™μž‘λ˜λ„λ‘ λ‘œμ§μ„ λΆ„λ¦¬ν–ˆλ‹€. μ•žμœΌλ‘œ pinpoint-web의 batchλ‘œμ§μ€ 제거λ₯Ό ν•  μ˜ˆμ •μ΄λ―€λ‘œ, pinpoint-webμ—μ„œ batchλ₯Ό λ™μž‘μ‹œν‚€λŠ” 경우 pinpoint-batchμ—μ„œ batchλ₯Ό μ‹€ν–‰ν•˜λ„λ‘ κ΅¬μ„±ν•˜λŠ”κ²ƒμ„ μΆ”μ²œν•œλ‹€.

1. Alarm κΈ°λŠ₯ μ‚¬μš© 방법

alarm rule에 λŒ€ν•œ μ„€λͺ…은 μ•„λž˜λ₯Ό μ°Έκ³ ν•˜μ‹œμ˜€.

SLOW COUNT
   μ™ΈλΆ€μ—μ„œ application을 ν˜ΈμΆœν•œ μš”μ²­ 쀑에 μ™ΈλΆ€μ„œλ²„λ‘œ 응닡을 늦게 μ€€ μš”μ²­μ˜ κ°œμˆ˜κ°€ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.

SLOW RATE
   μ™ΈλΆ€μ—μ„œ application을 ν˜ΈμΆœν•œ μš”μ²­ 쀑에 μ™ΈλΆ€μ„œλ²„λ‘œ 응닡을 늦게 μ€€ μš”μ²­μ˜ λΉ„μœ¨(%)이 μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.

ERROR COUNT
   μ™ΈλΆ€μ—μ„œ application을 ν˜ΈμΆœν•œ μš”μ²­ 쀑에 μ—λŸ¬κ°€ λ°œμƒν•œ μš”μ²­μ˜ κ°œμˆ˜κ°€ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.

ERROR RATE
   μ™ΈλΆ€μ—μ„œ application을 ν˜ΈμΆœν•œ μš”μ²­ 쀑에 μ—λŸ¬κ°€ λ°œμƒν•œ μš”μ²­μ˜ λΉ„μœ¨(%)이 μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.

TOTAL COUNT
   μ™ΈλΆ€μ—μ„œ application을 ν˜ΈμΆœν•œ μš”μ²­ κ°œμˆ˜κ°€ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.

SLOW COUNT TO CALLEE
   application λ‚΄μ—μ„œ μ™ΈλΆ€μ„œλ²„λ₯Ό ν˜ΈμΆœν•œ μš”μ²­ 쀑 slow 호좜의 κ°œμˆ˜κ°€ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.
   μ„€μ • ν™”λ©΄μ˜ Note ν•­λͺ©μ— μ™ΈλΆ€μ„œλ²„μ˜ 도메인 μ΄λ‚˜ μ£Όμ†Œ(ip, port)λ₯Ό μž…λ ₯ν•΄μ•Ό ν•©λ‹ˆλ‹€. ex) naver.com, 127.0.0.1:8080

SLOW RATE TO CALLEE
   application λ‚΄μ—μ„œ μ™ΈλΆ€μ„œλ²„λ₯Ό ν˜ΈμΆœν•œ μš”μ²­ 쀑 slow 호좜의 λΉ„μœ¨(%)이 μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.
   μ„€μ • ν™”λ©΄μ˜ Note ν•­λͺ©μ— μ™ΈλΆ€μ„œλ²„μ˜ 도메인 μ΄λ‚˜ μ£Όμ†Œ(ip, port)λ₯Ό μž…λ ₯ν•΄μ•Ό ν•©λ‹ˆλ‹€. ex) naver.com, 127.0.0.1:8080

ERROR COUNT TO CALLEE
   application λ‚΄μ—μ„œ μ™ΈλΆ€μ„œλ²„λ₯Ό ν˜ΈμΆœν•œ μš”μ²­ 쀑 error κ°€ λ°œμƒν•œ 호좜의 κ°œμˆ˜κ°€ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.
   μ„€μ • ν™”λ©΄μ˜ Note ν•­λͺ©μ— μ™ΈλΆ€μ„œλ²„μ˜ 도메인 μ΄λ‚˜ μ£Όμ†Œ(ip, port)λ₯Ό μž…λ ₯ν•΄μ•Ό ν•©λ‹ˆλ‹€. ex) naver.com, 127.0.0.1:8080

ERROR RATE TO CALLEE
   application λ‚΄μ—μ„œ μ™ΈλΆ€μ„œλ²„λ₯Ό ν˜ΈμΆœν•œ μš”μ²­ 쀑 error κ°€ λ°œμƒν•œ 호좜의 λΉ„μœ¨μ΄ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.
   μ„€μ • ν™”λ©΄μ˜ Note ν•­λͺ©μ— μ™ΈλΆ€μ„œλ²„μ˜ 도메인 μ΄λ‚˜ μ£Όμ†Œ(ip, port)λ₯Ό μž…λ ₯ν•΄μ•Ό ν•©λ‹ˆλ‹€. ex) naver.com, 127.0.0.1:8080

TOTAL COUNT TO CALLEE
   application λ‚΄μ—μ„œ μ™ΈλΆ€μ„œλ²„λ₯Ό ν˜ΈμΆœν•œ μš”μ²­μ˜ κ°œμˆ˜κ°€ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.
   μ„€μ • ν™”λ©΄μ˜ Note ν•­λͺ©μ— μ™ΈλΆ€μ„œλ²„μ˜ 도메인 μ΄λ‚˜ μ£Όμ†Œ(ip, port)λ₯Ό μž…λ ₯ν•΄μ•Ό ν•©λ‹ˆλ‹€. ex) naver.com, 127.0.0.1:8080

HEAP USAGE RATE
   heap의 μ‚¬μš©λ₯ μ΄ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.

JVM CPU USAGE RATE
   applicaiton의 CPU μ‚¬μš©λ₯ μ΄ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.

SYSTEM CPU USAGE RATE
   μ„œλ²„μ˜ CPU μ‚¬μš©λ₯ μ΄ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.

DATASOURCE CONNECTION USAGE RATE
   applicaiton의 DataSourceλ‚΄μ˜ Connection μ‚¬μš©λ₯ μ΄ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ³Όν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.

FILE DESCRIPTOR COUNT
   μ—΄λ €μžˆλŠ” File Descriptor κ°œμˆ˜κ°€ μž„κ³„μΉ˜λ₯Ό μ΄ˆκ°€ν•œ 경우 μ•ŒλžŒμ΄ μ „μ†‘λœλ‹€.

2. μ„€μ • 및 κ΅¬ν˜„ 방법

μ•ŒλžŒμ„ μ „μ†‘ν•˜λŠ” 방법은 총 3κ°€μ§€λ‘œμ„œ, email, sms와 webhook으둜 μ•ŒλžŒμ„ 전솑 ν•  수 μžˆλ‹€.

email 전솑은 μ„€μ •λ§Œ μΆ”κ°€ν•˜λ©΄ κΈ°λŠ₯을 μ‚¬μš©ν•  수 있고, sms 전솑을 ν•˜κΈ° μœ„ν•΄μ„œλŠ” 직접 전솑 λ‘œμ§μ„ κ΅¬ν˜„ν•΄μ•Ό ν•œλ‹€. webhook 전솑은 webhook messageλ₯Ό λ°›λŠ” webhook receiver μ„œλΉ„μŠ€λ₯Ό λ³„λ„λ‘œ μ€€λΉ„ν•΄μ•Όν•œλ‹€. webhook receiver μ„œλΉ„μŠ€λŠ” μƒ˜ν”Œ ν”„λ‘œμ νŠΈλ₯Ό μ‚¬μš©ν•˜κ±°λ‚˜ 직접 κ΅¬ν˜„ν•΄μ•Ό ν•œλ‹€.

alarm κΈ°λŠ₯을 μ‚¬μš©ν•˜λ €λ©΄ pinpoint-batch와 pinpoint-webλ₯Ό μˆ˜μ •ν•΄μ•Όν•œλ‹€. pinpoint-batchμ—λŠ” alarm batch λ™μž‘μ„ μœ„ν•΄μ„œ μ„€μ • 및 κ΅¬ν˜„μ²΄λ₯Ό μΆ”κ°€ν•΄μ•Ό ν•œλ‹€. pinpoint-webμ—λŠ” μ‚¬μš©μžκ°€ μ•ŒλžŒμ„ μΆ”κ°€ν•  수 μžˆλ„λ‘ μ„€μ •ν•΄μ•Όν•œλ‹€.

2.1 pinpoint-batch μ„€μ • 및 κ΅¬ν˜„ 방법

2.1.1) email/sms/webhook 전솑 μ„€μ • 및 κ΅¬ν˜„

A. email 전솑

email 전솑 κΈ°λŠ₯을 μ‚¬μš©ν•˜κΈ° μœ„ν•΄μ„œ batch-root.propertiesνŒŒμΌμ— smtp μ„œλ²„ 정보와 email에 포함될 정보듀을 μ„€μ •ν•΄μ•Ό ν•œλ‹€.

pinpoint.url= #pinpoint-web μ„œλ²„μ˜ url 
alarm.mail.server.url= #smtp μ„œλ²„ μ£Όμ†Œ  
alarm.mail.server.port= #smtp μ„œλ²„ port 
alarm.mail.server.username= #smtp 인증을 μœ„ν•œ userName
alarm.mail.server.password= #smtp 인증을 μœ„ν•œ password
alarm.mail.sender.address= # μ†‘μ‹ μž email

ex)
pinpoint.url=http://pinpoint.com
alarm.mail.server.url=stmp.server.com
alarm.mail.server.port=587
alarm.mail.server.username=pinpoint
alarm.mail.server.password=pinpoint
alarm.mail.sender.address=pinpoint_operator@pinpoint.com

참고둜 applicationContext-batch-sender.xml νŒŒμΌμ— email을 μ „μ†‘ν•˜λŠ” classκ°€ bean으둜 등둝 λ˜μ–΄μžˆλ‹€.

    <bean id="mailSender" class="com.navercorp.pinpoint.batch.alarm.SpringSmtpMailSender">
        <constructor-arg ref="batchConfiguration"/>
        <constructor-arg ref="userGroupService"/>
        <constructor-arg ref="javaMailSenderImpl"/>
    </bean>

    <bean id="javaMailSenderImpl" class="org.springframework.mail.javamail.JavaMailSenderImpl">
        <property name="host" value="${alarm.mail.server.url:}" />
        <property name="port" value="${alarm.mail.server.port:587}" />
        <property name="username" value="${alarm.mail.server.username:}" />
        <property name="password" value="${alarm.mail.server.password:}" />
        <property name="javaMailProperties">
            <props>
                <prop key="mail.transport.protocol">${alarm.mail.transport.protocol:}</prop>
                <prop key="mail.smtp.port">${alarm.mail.smtp.port:}</prop>
                <prop key="mail.smtp.from">${alarm.mail.sender.address:}</prop>
                <prop key="mail.smtp.auth">${alarm.mail.smtp.auth:false}</prop>
                <prop key="mail.smtp.starttls.enable">${alarm.mail.smtp.starttls.enable:false}</prop>
                <prop key="mail.smtp.starttls.required">${alarm.mail.smtp.starttls.required:false}</prop>
                <prop key="mail.debug">${alarm.mail.debug:false}</prop>
            </props>
        </property>
    </bean>

λ§Œμ•½ email 전솑 λ‘œμ§μ„ 직접 κ΅¬ν˜„ν•˜κ³  μ‹Άλ‹€λ©΄ μœ„μ˜ SpringSmtpMailSender, JavaMailSenderImpl bean 선언을 μ œκ±°ν•˜κ³  com.navercorp.pinpoint.web.alarm.MailSender interfaceλ₯Ό κ΅¬ν˜„ν•΄μ„œ bean을 λ“±λ‘ν•˜λ©΄ λœλ‹€.

public interface MailSender {
   void sendEmail(AlarmChecker checker, int sequenceCount);
}

B. sms 전솑

sms 전솑 κΈ°λŠ₯을 μ‚¬μš© ν•˜λ €λ©΄ com.navercorp.pinpoint.batch.alarm.SmsSender interfaceλ₯Ό κ΅¬ν˜„ν•˜κ³  bean으둜 등둝해야 ν•œλ‹€. SmsSender κ΅¬ν˜„ classκ°€ μ—†λŠ” 경우 smsλŠ” μ „μ†‘λ˜μ§€ μ•ŠλŠ”λ‹€.

public interface SmsSender {
    public void sendSms(AlarmChecker checker, int sequenceCount);
}

C. webhook 전솑

Webhook 전솑 κΈ°λŠ₯은 Pinpoint의 Alarm messageλ₯Ό Webhook API둜 전솑 ν•  수 μžˆλŠ” κΈ°λŠ₯이닀.

webhook messageλ₯Ό μ „μ†‘λ°›λŠ” webhook receiver μ„œλΉ„μŠ€λŠ” μƒ˜ν”Œ ν”„λ‘œμ νŠΈλ₯Ό μ‚¬μš©ν•˜κ±°λ‚˜ 직접 κ΅¬ν˜„ν•΄μ•Ό ν•œλ‹€. Webhook Receiver μ„œλ²„μ— μ „μ†‘λ˜λŠ” Alarm message(μ΄ν•˜ payload)λŠ” Alarm Checker νƒ€μž…μ— 따라 μŠ€ν‚€λ§ˆκ°€ λ‹€λ₯΄λ‹€. Checker νƒ€μž…μ— λ”°λ₯Έ payload μŠ€ν‚€λ§ˆλŠ” 3.기타 - webhook νŽ˜μ΄λ‘œλ“œ μŠ€ν‚€λ§ˆ λͺ…μ„Έ, μ˜ˆμ‹œμ—μ„œ μ„€λͺ…ν•œλ‹€.

webhook κΈ°λŠ₯을 ν™œμ„±ν™” ν•˜κΈ°μœ„ν•΄μ„œ, batch-root.properties νŒŒμΌμ— Webhook 전솑 μ—¬λΆ€(webhook.enable)와 receiver μ„œλ²„ 정보(webhook.receiver.url)λ₯Ό μ„€μ •ν•œλ‹€.

# webhook config
webhook.enable=true
webhook.receiver.url=http://www.webhookexample.com/alarm/

μ•Œλ¦Ό webhook κΈ°λŠ₯이 μΆ”κ°€λ˜λ©΄μ„œ mysql ν…Œμ΄λΈ” μŠ€ν‚€λ§ˆκ°€ μˆ˜μ •λ˜μ—ˆκΈ° λ•Œλ¬Έμ—, Pinpoint 2.1.1 미만 λ²„μ „μ—μ„œ 2.1.1 버전 μ΄μƒμœΌλ‘œ μ—…κ·Έλ ˆμ΄λ“œν•œ 경우 Mysql의 'alarm_rule' ν…Œμ΄λΈ”μ— 'webhook_send' μ»¬λŸΌμ„ μΆ”κ°€ν•΄μ•Όν•œλ‹€.

SQL : ALTER TABLE alarm_rule ADD COLUMN webhook_send CHAR(1) DEFAULT NULL;

참고둜 Webhook을 μ „μ†‘ν•˜λŠ” ν΄λž˜μŠ€λŠ” Pinpointκ°€ μ œκ³΅ν•˜λŠ” WebhookSenderImplκ°€ λ‹΄λ‹Ήν•œλ‹€. WebhookSender ν΄λž˜μŠ€λŠ” Pinpoint-batch의 applicationContext-batch-sender.xml νŒŒμΌμ— bean으둜 등둝 λ˜μ–΄μžˆλ‹€.

<bean id="webHookSender" class="com.navercorp.pinpoint.web.alarm.WebhookSenderImpl">
    <constructor-arg ref="batchConfiguration"/>
    <constructor-arg ref="userServiceImpl"/>
    <constructor-arg ref="restTemplate" />
</bean>

2.1.2) MYSQL μ„œλ²„ IP μ£Όμ†Œ μ„€μ • & table 생성

step 1

μ•ŒλžŒμ— κ΄€λ ¨λœ 데이터λ₯Ό μ €μž₯ν•˜κΈ° μœ„ν•΄ Mysql μ„œλ²„λ₯Ό μ€€λΉ„ν•œλ‹€.

step 2

mysql 접근을 μœ„ν•΄μ„œ pinpoint-batch의 jdbc-root.properties νŒŒμΌμ— 접속 정보λ₯Ό μ„€μ •ν•œλ‹€.

jdbc.driverClassName=com.mysql.jdbc.Driver
jdbc.url=jdbc:mysql://localhost:13306/pinpoint
jdbc.username=admin
jdbc.password=admin

step 3

mysql에 Alarm κΈ°λŠ₯에 ν•„μš”ν•œ table을 μƒμ„±ν•œλ‹€. table μŠ€ν‚€λ§ˆλŠ” μ•„λž˜ νŒŒμΌμ„ μ°Έμ‘°ν•œλ‹€.

2.1.3) pinpoint-batch μ‹€ν–‰ 방법

pinpoint-batch ν”„λ‘œμ νŠΈλŠ” spring boot기반으둜 λ˜μ–΄μžˆκ³  μ•„λž˜μ™€ 같은 λͺ…λ Ήμ–΄λ‘œ μ‹€ν–‰ν•˜λ©΄ λœλ‹€. λΉŒλ“œν›„ μ‹€ν–‰νŒŒμΌμ€ pinpoint-batch λͺ¨λ“ˆμ˜ target/deploy 폴더 ν•˜μœ„μ—μ„œ 확인할 수 μžˆλ‹€.

java -Dspring.profiles.active=XXXX -jar pinpoint-batch-VERSION.jar 

ex) java -Dspring.profiles.active=local -jar pinpoint-batch-2.1.1.jar

2.2 pinpoint-web μ„€μ • 방법

2.2.1) MYSQL μ„œλ²„ IP μ£Όμ†Œ μ„€μ •

μ‚¬μš©μž μ•ŒλžŒ 섀정을 μ €μž₯ν•˜κΈ° μœ„ν•΄μ„œ pinpoint-web의 jdbc-root.properties νŒŒμΌμ— mysql 접속 정보λ₯Ό μ„€μ •ν•œλ‹€.

jdbc.driverClassName=com.mysql.jdbc.Driver
jdbc.url=jdbc:mysql://localhost:13306/pinpoint
jdbc.username=admin
jdbc.password=admin

2.2.2) webhook κΈ°λŠ₯ ν™œμ„±ν™”

μ‚¬μš©μžκ°€ μ•ŒλžŒ 섀정에 webhook κΈ°λŠ₯을 μ μš©ν• μˆ˜ μžˆλ„λ‘ batch-root.properties νŒŒμΌμ— webhook κΈ°λŠ₯을 ν™œμ„±ν™”ν•œλ‹€.

# webhook config
webhook.enable=true

webhook κΈ°λŠ₯을 ν™œμ„±ν™”ν•˜λ©΄, μ•„λž˜ 그림처럼 μ•ŒλžŒ μ„€μ • ν™”λ©΄μ—μ„œ webhook을 μ•ŒλžŒ νƒ€μž…μœΌλ‘œ 선택할 수 μžˆλ‹€.

3. 기타

3.1 μ„€μ •, μ‹€ν–‰, μ„±λŠ₯

1) batch의 λ™μž‘ μ£ΌκΈ°λ₯Ό μ‘°μ •ν•˜κ³  μ‹Άλ‹€λ©΄ applicationContext-batch-schedule.xml 파일의 cron expression을 μˆ˜μ •ν•˜λ©΄ λœλ‹€.

<task:scheduled-tasks scheduler="scheduler">
    <task:scheduled ref="batchJobLauncher" method="alarmJob" cron="0 0/3 * * * *" />
</task:scheduled-tasks>

2) alarm batch μ„±λŠ₯을 λ†’μ΄λŠ” 방법은 λ‹€μŒκ³Ό κ°™λ‹€. alarm batch μ„±λŠ₯ νŠœλ‹μ„ μœ„ν•΄μ„œ λ³‘λ ¬λ‘œ λ™μž‘μ΄ κ°€λŠ₯ν•˜λ„λ‘ κ΅¬ν˜„μ„ 해놨닀. κ·Έλž˜μ„œ μ•„λž˜μ—μ„œ μ–ΈκΈ‰λœ 쑰건에 ν•΄λ‹Ήν•˜λŠ” 경우 섀정값을 μ‘°μ •ν•œλ‹€λ©΄ μ„±λŠ₯을 ν–₯상 μ‹œν‚¬μˆ˜ μžˆλ‹€. 단 병렬성을 높이면 λ¦¬μ†ŒμŠ€μ˜ μ‚¬μš©λ₯ μ΄ λ†’μ•„μ§€λŠ”κ²ƒμ€ κ°μ•ˆν•΄μ•Όν•œλ‹€.

alarm이 λ“±λ‘λœ application의 κ°œμˆ˜κ°€ λ§Žλ‹€λ©΄ applicationContext-alarmJob.xml 파일의 poolTaskExecutorForPartition의 pool sizeλ₯Ό 늘렀주면 λœλ‹€.

<task:executor id="poolTaskExecutorForPartition" pool-size="1" />

application κ°κ°λ§ˆλ‹€ λ“±λ‘λœ alarm의 κ°œμˆ˜κ°€ λ§Žλ‹€λ©΄ applicationContext-alarmJob.xml νŒŒμΌμ— μ„ μ–Έλœ alarmStep이 λ³‘λ ¬λ‘œ λ™μž‘λ˜λ„λ‘ μ„€μ •ν•˜λ©΄ λœλ‹€.

<step id="alarmStep" xmlns="http://www.springframework.org/schema/batch">
    <tasklet task-executor="poolTaskExecutorForStep" throttle-limit="3">
        <chunk reader="reader" processor="processor" writer="writer" commit-interval="1"/>
    </tasklet>
</step>
<task:executor id="poolTaskExecutorForStep" pool-size="10" />

3) quickstart web을 μ‚¬μš©ν•œλ‹€λ©΄. pinpoint web은 mockDAOλ₯Ό μ‚¬μš©ν•˜κΈ° λ•Œλ¬Έμ— pinpont web의 섀정듀을 μ°Έκ³ ν•΄μ„œ κΈ°λŠ₯을 μ‚¬μš©ν•΄μ•Όν•œλ‹€. applicationContext-dao-config.xml, jdbc.properties.

3.2 webhook 상세

3.2.1 webhook receiver sample project

webhook receiver ν”„λ‘œμ νŠΈ μ˜ˆμ‹œ

Slack-Receiver λŠ” Webhook Receiver의 예제 ν”„λ‘œμ νŠΈμ΄λ‹€. 이 ν”„λ‘œμ νŠΈλŠ” Pinpoint의 webhook의 μ•ŒλžŒμ„ λ°›μ•„μ„œ Slack으둜 λ©”μ‹œμ§€λ₯Ό 전솑할 수 μžˆλŠ” μŠ€ν”„λ§ λΆ€νŠΈλ‘œ κ΅¬ν˜„λœ μ„œλΉ„μŠ€μ΄λ‹€. 이 ν”„λ‘œμ νŠΈμ˜ μžμ„Έν•œ 사항은 ν•΄λ‹Ή GitHub μ €μž₯μ†Œ λ₯Ό μ°Έκ³ ν•˜λ©΄ λœλ‹€.

3.2.2 webhook νŽ˜μ΄λ‘œλ“œ μŠ€ν‚€λ§ˆ 및 μ˜ˆμ‹œ

νŽ˜μ΄λ‘œλ“œ μŠ€ν‚€λ§ˆ

Key

UserGroup

Checker

UserMember

DetectedAgent

DataSourceAlarm

webhook Payload 예제

LongValueAlarmChecker

{
 "pinpointUrl": "http://pinpoint.com",
 "batchEnv": "release",
 "applicationId": "TESTAPP",
 "serviceType": "TOMCAT",
 "userGroup": {
   "userGroupId": "Group-1",
   "userGroupMembers": [
     {
       "id": "msk1111",
       "name": "minsookim",
       "email": "pinpoint@naver.com",
       "department": "Platform",
       "phoneNumber": "01012345678",
       "phoneCountryCode": 82
     }
   ]
 },
 "checker": {
   "name": "TOTAL COUNT",
   "type": "LongValueAlarmChecker",
   "detectedValue": 33
 },
 "unit": "",
 "threshold": 15,
 "notes": "Note Example",
 "sequenceCount": 4
}

LongValueAgentChecker

{
 "pinpointUrl": "http://pinpoint.com",
 "batchEnv": "release",
 "applicationId": "TESTAPP",
 "serviceType": "TOMCAT",
 "userGroup": {
   "userGroupId": "Group-1",
   "userGroupMembers": [
     {
       "id": "msk1111",
       "name": "minsookim",
       "email": "pinpoint@naver.com",
       "department": "Platform",
       "phoneNumber": "01012345678",
       "phoneCountryCode": 82
     }
   ]
 },
 "checker": {
   "name": "HEAP USAGE RATE",
   "type": "LongValueAgentChecker",
   "detectedValue": [
     {
       "agentId": "test-agent",
       "agentValue": 8
     }
   ]
 },
 "unit": "%",
 "threshold": 5,
 "notes": "Note Example",
 "sequenceCount": 4
}

BooleanValueAgentChecker

{
 "pinpointUrl": "http://pinpoint.com",
 "batchEnv": "release",
 "applicationId": "TESTAPP",
 "serviceType": "TOMCAT",
 "userGroup": {
   "userGroupId": "Group-1",
   "userGroupMembers": [
     {
       "id": "msk1111",
       "name": "minsookim",
       "email": "pinpoint@naver.com",
       "department": "Platform",
       "phoneNumber": "01012345678",
       "phoneCountryCode": 82
     }
   ]
 },
 "checker": {
   "name": "DEADLOCK OCCURRENCE",
   "type": "BooleanValueAgentChecker",
   "detectedValue": [
     {
       "agentId": "test-agent",
       "agentValue": true
     }
   ]
 },
 "unit": "BOOLEAN",
 "threshold": 1,
 "notes": "Note Example",
 "sequenceCount": 4
}

DataSourceAlarmListValueAgentChecker

{
 "pinpointUrl": "http://pinpoint.com",
 "batchEnv": "release",
 "applicationId": "TESTAPP",
 "serviceType": "TOMCAT",
 "userGroup": {
   "userGroupId": "Group-1",
   "userGroupMembers": [
     {
       "id": "msk1111",
       "name": "minsookim",
       "email": "pinpoint@naver.com",
       "department": "Platform",
       "phoneNumber": "01012345678",
       "phoneCountryCode": 82
     }
   ]
 },
 "checker": {
   "name": "DATASOURCE CONNECTION USAGE RATE",
   "type": "DataSourceAlarmListValueAgentChecker",
   "detectedValue": [
     {
       "agentId": "test-agent",
       "agentValue": [
                 {
                     "databaseName": "test",
                     "connectionValue": 32
                 }
        ]
     }
   ]
 },
 "unit": "%",
 "threshold": 16,
 "notes": "Note Example",
 "sequenceCount": 4
}

Last updated