首页 > 代码库 > [转载] scribe配置

[转载] scribe配置

2024-07-06 07:41:30 227人阅读

目录(?)[-]

Scribe can be configured with
Global Configuration Variables
Store Configuration
Store Configuration Variables
File Store Configuration
Network Store Configuration
Buffer Store Configuration
Bucket Store Configuration
Null Store Configuration
Multi Store Configuration
Thriftfile Store Configuration
架构
可靠性

容错机制
数据丢失性问题

配置

全局配置变量
store 配置

store 配置变量
File store
Network store
Buffer Store
Bucket store
Null store
Multi store
hdfs store

scribe 接口

Scribe can be configured with:

the file specified in the -c command line option
the file at DEFAULT_CONF_FILE_LOCATION in env_default.h

Global Configuration Variables

port: assigned to variable “port”

which port the scribe server will listen on
default 0, passed at command line with -p, can also be set in conf file

max_msg_per_second:

used in scribeHandler::throttleDeny
default 0
the default value is 0 and this parameter is ignored if the value is 0. With recent changes this parameter has become less relevant, and max_queue_size should be the parameter used for throttling bussiness

max_queue_size: in bytes

used in scribeHandler::Log
default 5,000,000 bytes

check_interval: in seconds

used to control how often to check each store
default 5

new_thread_per_category: yes/no

If yes, will create a new thread for every category seen. Otherwise, will only create a single thread for every store defined in the configuration.
For prefix stores or the default store, setting this parameter to “no” will cause all messages that match this category to get processed by a single store. Otherwise, a new store will be created for each unique category name.
default yes

num_thrift_server_threads:

Number of threads listening for incoming messages
default 3

Example:

port=1463
max_msg_per_second=2000000
max_queue_size=10000000
check_interval=3

Store Configuration

Scribe Server determines how to log messages based on the Stores defined in the configuration. Every store must specify what message category it handles with three exceptions:

default store: The ‘default’ category handles any category that is not handled by any other store. There can only be one default store.

category=default

prefix stores: If the specified category ends in a *, the store will handle all categories that begin with the specified prefix.

category=web*

multiple categories: Can use ‘categories=’ to create multiple stores with a single store definition.

categories=rock paper* scissors

In the above three cases, Scribe will create a subdirectory for each unique category in File Stores (unless new_thread_per_category is set to false).

Store Configuration Variables

category: Determines which messages are handled by this store
type:

file
buffer
network
bucket
thriftfile
null
multi

target_write_size: 16,384 bytes by default

determines how large to let the message queue grow for a given category before processing the messages

max_batch_size: 1,024,000 bytes by default (may not be in open-source yet)

determines the amount of data from the in-memory store queue to be handled at a time. In practice, this (together with buffer file rotation size) controls how big a thrift call can be.

max_write_interval: 10 seconds by default

determines how long to let the messages queue for a given category before processing the messages

must_succeed: yes/no

Whether to requeue messages and retry if a store failed to process messages.
If set to ‘no’, messages will be dropped if the store cannot process them.
Note: We recommended using Buffer Stores to specify a secondary store to handle logging failures.
default yes

Example:

<store>
category=statistics
type=file
target_write_size=20480
max_write_interval=2
</store>

File Store Configuration

File Stores write messages to a file.

file_path: defaults to “/tmp”

base_filename: defaults to category name

use_hostname_sub_directory: yes/no, default no

Create a subdirectory using the server’s hostname

sub_directory: string

Create a subdirectory with the specified name

rotate_period: “hourly”, “daily”, “never”, or number[suffix]; “never” by default

determines how often to create new files
suffix may be “s”, “m”, “h”, “d”, “w” for seconds (the default), minutes, hours, days and weeks, respectively

rotate_hour: 0-23, 1 by default

if rotation_period is daily, determines what hour of day to rotate

rotate_minute 0-59, 15 by default

if rotation_period is daily or hourly, determines how many minutes after the hour to rotate

max_size: 1,000,000,000 bytes by default

determines approximately how large to let a file grow before rotating to a new file

write_meta: “yes” or anything else; false by default

if the file was rotated, the last line will contain "scribe_meta: " followed by the next filename

fs_type: supports two types “std” and “hdfs”. “std” by default

chunk_size: 0 by default. If a chunk size is specified, no messages within the file will cross chunk boundaries unless there are messages larger than the chunk size

add_newlines: 0 or 1, 0 by default

if set to 1, will write a newline after every message

create_symlink: “yes” or anything else; “yes” by default

if true, will maintain a symlink that points to the most recently written file

write_stats: yes/no, yes by default

whether to create a scribe_stats file for each store to keep track of files written

max_write_size: 1000000 bytes by default. The file store will try to flush the data out to the file system in chunks of max_write_size of bytes. max_write_size cannot be more than max_size. Say due to target_write_size a certain number of messages were buffered. And then the file store was called to save these messages. The file-store will save these messages at least max_write_size bytes sized chunks at a time. The last write that the file store will make can be smaller than max_write_size.

Example:

<store>
category=sprockets
type=file
file_path=/tmp/sprockets
base_filename=sprockets_log
max_size=1000000
add_newlines=1
rotate_period=daily
rotate_hour=0
rotate_minute=10
max_write_size=4096
</store>

Network Store Configuration

Network Stores forward messages to other Scribe Servers. Scribe keeps persistent connections open as long as it is able to send messages. (It will only re-open a connection on error or if the downstream machine is overloaded). Scribe will send messages in batches during normal operation based on how many messages are currently sitting in the queue waiting to be sent. (If Scribe is backed up and buffering messages to local disk, Scribe will send messages in chunks based on the buffer file sizes.)

remote_host: name or ip of remote host to forward messages
remote_port: port number on remote host
timeout: socket timeout, in MS; defaults to DEFAULT_SOCKET_TIMEOUT_MS, which is set to 5000 in store.h
use_conn_pool: “yes” or anything else; defaults to false

whether to use connection pooling instead of opening up multiple connections to each remote host

Example:

<store>
category=default
type=network
remote_host=hal
remote_port=1465
</store>

Buffer Store Configuration

Buffer Stores must have two sub-stores named “primary” and “secondary”. Buffer Stores will first attempt to Log messages to the primary store and only log to the secondary if the primary is not available. Once the primary store comes back online, a Buffer store will read messages out of the secondary store and send them to the primary store (unless replay_buffer=no). Only stores that are readable (store that implement the readOldest() method) may be used as secondary store. Currently, the only readable stores are File Stores and Null Stores.

max_queue_length: 2,000,000 messages by default

if the number of messages in the queue exceeds this value, the buffer store will switch to writing to the secondary store

buffer_send_rate: 1 by default

determines, for each check_interval, how many times to read a group of messages from the secondary store and send them to the primary store

retry_interval: 300 seconds by default

how long to wait to retry sending to the primary store after failing to write to the primary store

retry_interval_range: 60 seconds by default

will randomly pick a retry interval that is within this range of the specified retry_interval

replay_buffer: yes/no, default yes

If set to ‘no’, Buffer Store will not remove messages from the secondary store and send them to the primary store

Example:

<store>
category=default
type=buffer
buffer_send_rate=1
retry_interval=30
retry_interval_range=10
  <primary>
    type=network
    remote_host=wopr
    remote_port=1456
  </primary>
  <secondary>
    type=file
    file_path=/tmp
    base_filename=thisisoverwritten
    max_size=10000000
  </secondary>
</store>

Bucket Store Configuration

Bucket Stores will hash messages to multiple files using a prefix of each message as the key.
You can define each bucket implicitly(using a single ‘bucket’ definition) or explicitly (using a bucket definition for every bucket). Bucket Stores that are defined implicitly must have a substore named “bucket” that is either a File Store, Network store or ThriftFile Store (see examples).

num_buckets: defaults to 1

number of buckets to hash into
messages that cannot be hashed into any bucket will be put into a special bucket number 0

bucket_type: “key_hash”, “key_modulo”, or “random”

delimiter: must be an ascii code between 1 and 255; otherwise the default delimiter is ‘:’

The message prefix up to(but not including) the first occurrence of the delimiter will be used as the key to do the hash/modulo. ‘random’ hashing does not use a delimiter.

remove_key: yes/no, defaults to no

whether to remove the key prefix from each message.

bucket_subdir: the name of each subdirectory will be this name followed by the bucket number if a single ‘bucket’ definition is used

Example:

<store>
category=bucket_me
type=bucket
num_buckets=5
bucket_subdir=bucket
bucket_type=key_hash
delimiter=58
  <bucket>
    type=file
    fs_type=std
    file_path=/tmp/scribetest
    base_filename=bucket_me
  </bucket>
</store>

Instead of using a single ‘bucket’ definition for all buckets, you can specify each bucket explicitly:

<store>
category=bucket_me
type=bucket
num_buckets=2
bucket_type=key_hash
  <bucket0>
    type=file
    fs_type=std
    file_path=/tmp/scribetest/bucket0
    base_filename=bucket0
  </bucket0>
  <bucket1>
    ...
  </bucket1>
  <bucket2>
    ...
  </bucket2>
</store>

You can also bucket into network stores as well:

<store>
category=bucket_me
type=bucket
num_buckets=2
bucket_type=random
  <bucket0>
    type=file
    fs_type=std
    file_path=/tmp/scribetest/bucket0
    base_filename=bucket0
  </bucket0>
  <bucket1>
    type=network
    remote_host=wopr
    remote_port=1463
  </bucket1>
  <bucket2>
    type=network
    remote_host=hal
    remote_port=1463
  </bucket2>
</store>

Null Store Configuration

Null Stores can be used to tell Scribe to ignore all messages of a given category.

(no configuration parameters)

Example:

<store>
category=tps_report*
type=null
</store>

Multi Store Configuration

A Multi Store is a store that will forward all messages to multiple sub-stores.

A Multi Store may have any number of substores named “store0”, “store1”, “store2”, etc

report_success: “all” or “any”, defaults to “all”

whether all substores or any substores must succeed in logging a message in order for the Multi Store to report the message logging as successful

Example:

<store>
category=default
type=multi
target_write_size=20480
max_write_interval=1
  <store0>
    type=file
    file_path=/tmp/store0
  </store0>
  <store1>
    type=file
    file_path=/tmp/store1
  </store1>
</store>

Thriftfile Store Configuration

A Thriftfile store is similar to a File store except that it stores messages in a Thrift TFileTransport file.

file_path: defaults to “/tmp”
base_filename: defaults to category name
rotate_period: “hourly”, “daily”, “never”, or number[suffix]; “never” by default

determines how often to create new files
suffix may be “s”, “m”, “h”, “d”, “w” for seconds (the default), minutes, hours, days and weeks, respectively

rotate_hour: 0-23, 1 by default

if rotation_period is daily, determines what hour of day to rotate

rotate_minute 0-59, 15 by default

if rotation_period is daily or hourly, determines how many minutes after the hour to rotate

max_size: 1,000,000,000 bytes by default

determines approximately how large to let a file grow before rotating to a new file

fs_type: currently only “std” is supported; “std” by default

chunk_size: 0 by default

if a chunk size is specified, no messages within the file will cross chunk boundaries unless there are messages larger than the chunk size

create_symlink: “yes” or anything else; “yes” by default

if true, will maintain a symlink that points to the most recently written file

flush_frequency_ms: milliseconds, will use TFileTransport default of 3000ms if not specified

determines how frequently to sync the Thrift file to disk

msg_buffer_size: in bytes, will use TFileTransport default of 0 if not specified

if non-zero, store will reject any writes larger than this size

Example:

<store>
category=sprockets
type=thriftfile
file_path=/tmp/sprockets
base_filename=sprockets_log
max_size=1000000
flush_frequency_ms=2000
</store>

1.2 架构

scribe的架构比较简单，主要包括三部分，分别为scribe agent， scribe和存储系统。

(1) scribe agent

scribe agent实际上是一个thrift client。向scribe发送数据的唯一方法是使用thrift client， scribe内部定义了一个thrift接口，用户使用该接口将数据发送给server。

(2) scribe

scribe接收到thrift client发送过来的数据，根据配置文件，将不同topic的数据发送给不同的对象。scribe提供了各种各样的store，如 file， HDFS等，scribe可将数据加载到这些store中。

(3) 存储系统

存储系统实际上就是scribe中的store，当前scribe支持非常多的store，包括file（文件），buffer（双层存储，一个主储存，一个副存储），network（另一个scribe服务器），bucket（包含多个 store，通过hash的将数据存到不同store中），null(忽略数据)，thriftfile（写到一个Thrift TFileTransport文件中）和multi（把数据同时存放到不同store中）。

2 可靠性

2.1 容错机制

Scribe系统设计成能容错：网络或者机器错误故障。如果客户端上的一个scribe实例不能发送消息到中央server，它会将消息保存到本地磁盘，当中央server或者网络故障恢复后，重新发送。为了避免中央server重启时，负载过重重新发送者会等待一个随机事件，再发送。如果中央server接近其处理极限返回TRY_LATER，这就告诉resender间隔几分钟后再试。

中央server在遇到故障时，处理机制类似。

2.2 数据丢失性问题

以下的错误会导致数据丢失

1）客户端不能连接到本地或中央server，message会丢失。

2）一个scribe server down了，内存中少量的消息会丢失，磁盘上的数据不会丢失。

3）Scribe server不能连接到中央server，本地磁盘溢出，消息会丢失。

4）超时，导致存在重复的消息

3 配置

配置文件由全局的section和一个或多个store的section组成。

3.1 全局配置变量

Port

scribe server监听的端口。默认值为0

Max_msg_per_second

scribe server每秒能处理的最大消息数。默认值为10,000

Max_queue_size

队列的最大容量，以字节为单位。默认为500,000

Check_interval

单位为秒。控制多长时间对每个store检查一次。

New_thread_per_category

如果为true,为每个category建一个线程来处理。

3.2 store 配置

Scribe server决定基于在配置中定义的store来如何记录消息。每个store必须指定它处理的category。

Default store：default category处理不被其他 store处理的category。

Category=default

Prefix stores：该store处理所有以指定前缀开头的category。

Category=web*

3.2.1 store 配置变量

3.2.2 File store

File store 将消息写入文件。

File_path：默认为/tmp

在启动scribe server时，如果file_path目录不存在，抛异常。

Base_filename：默认为category的名称。

Rotate_period：hourly daily 或never。默认为never。

决定了多长时间创建一个新文件。

Rotate_hour：0-23，默认为1。

如果rotate_period为daily，决定了哪个小时点翻转。

Rotate_minute：0-59.默认为15

如果rotation_period是dalily或hourly，决定指定hour后多少分钟翻转。

Max_size：默认为1,000,000,000字节

在翻转文件之前，该文件最大size。

Write_meta：yes或其他。默认为false。

是否在文件中记录元数据：消息的长度和下个文件名称(最后一行)。

Fs_type：当前支持std。

Chunk_size：默认为0

指定chunk_size，文件内没有任何消息将跨越chunk边界，除非有消息大于chunk size。

Add_newlines：0或1.默认为0

设置为1，每个消息后加入换行符。

Create_symlink：yes或其他。默认为yes。

维护一个symlink,指向最频繁使用的写文件

3.2.3 Network store

Network store将消息定位到其他scribe server上。

Remote_host：远程主机的ip或者名称。

Remote_port：远程主机的端口

Timeout：socket超时时间，默认为default_socket_timeout_ms，在store.h中设定默认为5000毫秒。

Use_conn_pool：yes或其他，默认为false。

是否使用连接池，而不是对每个远程主机开多个连接。

3.2.4 Buffer Store

Buffer store必须有两个sub-store：primary和secondray。Buffer store首先尝试记录消息到primary store，当primary store不可达时，记录到secondary store。一旦primary store恢复，将secondary中所有消息读出发送到primary store。

Max_queu_length：默认2,000,000个消息。

如果队列消息中德值超过这个值，buffer store切换到secondary store。

Buffer_send_rate：默认为1。

对每个check_interval，从secondary store中读取一组消息发送到primary store，执行多少次。

Retry_interval：默认300秒。

写入primary store失败后，重新尝试发送primary store的时间间隔。

Retry_interval_range：默认60秒

随机在指定范围之内随机值，作为retry_interval。

3.2.5 Bucket store

Bucket store将消息hash到多个文件，使用每个消息的前缀作为键值。

Bucket store必须有名为bucket的substore。

Num_buckets：默认为1。

Bucket数目。

不能被hash到任何bucket的消息被放到特殊的bucket 0。

Bucket_type：key_hash或key_modulo。

Delimiter：必须是0-255之间的ascii代码，默认为0

消息前缀：第一个分隔符之前的字符串作为键值。

Bucket_subdir：子目录的名称为名称紧跟bucket number。

3.2.6 Null store

Null store告诉scribe对给定的category，忽略所有的消息。

3.2.7 Multi store

Multi store将消息发送到多个sub-store。Substore命名store0、store1、等。

Report_success：all或者any。默认为all。

是否所有substore或者任何substore必须成功，报告消息记录成功。

3.2.8 hdfs store

Scribe 支持将文件写入分布式文件系统上。

type=file

fs_type=hdfs

file_path=hdfs://myhadoopserver:9000/scribedata

4 scribe 接口

Scribe实现了thrift接口

enum ResultCode

{

OK,

TRY_LATER

}

struct LogEntry

{

1: string category,

2: string message

}

service scribe extends fb303.FacebookService

{

ResultCode Log(1: list messages);

}

Scribe的独特之处是客户端日志实例包含两个字符串:类别和信息(a category and a message).类别(category)，是对预期目标信息的高层次描述。可以在Scribe服务器中进行配置,这样就允许我们可以通过更改配置文件的方式转移数据而不需要更改代码。

Scribe服务器也允许基于类别前缀(category prefix)进行配置,缺省状态下可以在文件路径中插入类别名称.灵活性和可扩展性，可通过“存储(store)“抽象.Stores可以通过一个配置文件静态配置,也可以在运行时无需停止服务器进行更改.

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > [转载] scribe配置

[转载] scribe配置

Scribe can be configured with:

Global Configuration Variables

Store Configuration

Store Configuration Variables

File Store Configuration

Network Store Configuration

Buffer Store Configuration

Bucket Store Configuration

Null Store Configuration

Multi Store Configuration

Thriftfile Store Configuration

1.2 架构

2 可靠性

2.1 容错机制

2.2 数据丢失性问题

3 配置

3.1 全局配置变量

3.2 store 配置

3.2.1 store 配置变量

3.2.2 File store

3.2.3 Network store

3.2.4 Buffer Store

3.2.5 Bucket store

3.2.6 Null store

3.2.7 Multi store

3.2.8 hdfs store

4 scribe 接口

看完仍有疑问？有类似问题直接问程序猿