flume

系统环境:CentOS 7 1804 x64

java版本: 1.8.0_171

flume二进制版本:1.9

安装java:

jdk-8u171-linux-x64.tar.gz
tar -zxf jdk-8u171-linux-x64.tar.gz
mv jdk1.8.0_171 /usr/local/

vi /etc/profile
JAVA_HOME=/usr/local/jdk1.8.0_171
JRE_HOME=$JAVA_HOME/jre
CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

source /etc/profile

flume下载地址:https://www-eu.apache.org/dist/flume/

wget https://www-eu.apache.org/dist/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz
tar -zxf apache-flume-1.9.0-bin.tar.gz
ln /usr/src/apache-flume-1.9.0-bin/bin/flume-ng  /usr/bin

简单获取nginx日志的配置文件


TAILDIR+kafka配置方法

ElasticSearchSink

属性
默认值

channel

-

type

-

hostNames

-

indexName

flume

indexType

logs

clusterName

elasticsearch

batchSize

ttl

-

serialier

org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer

serializer.*

-

示例:

File Roll Sink

属性
默认值

channel

-

type

-

sink.directory

-

sink.pathManager

DEFAULT

sink.pathManager.extension

-

sink.pathManager.prefix

-

sink.rollInterval

sink.serializer

TEXT

sink.batchSize

示例

Logger Sink

属性
默认值

channel

-

type

-

maxBytesToLog

示例

Avro Sink

属性
默认值

channel

-

type

-

hostname

-

port

-

batch-size

connect-timeout

request-timeout

reset-connection-interval

none

compression-type

none

compression-level

ssl

false

trust-all-cert

false

truststore

-

truststore-password

-

truststore-type

JKS

exclude-protocols

SSLv3

maxIoWorkers * the number of available processors in the machine

示例

HDFS Sink

Name
Default

channel

type

hdfs.path

hdfs.filePrefix

FlumeData

hdfs.fileSuffix

hdfs.inUsePrefix

hdfs.inUseSuffix

.tmp

hdfs.emptyInUseSuffix

FALSE

hdfs.rollInterval

30

hdfs.rollSize

1024

hdfs.rollCount

10

hdfs.idleTimeout

0

hdfs.batchSize

100

hdfs.codeC

hdfs.fileType

SequenceFile

hdfs.maxOpenFiles

5000

hdfs.minBlockReplicas

hdfs.writeFormat

Writable

hdfs.threadsPoolSize

10

hdfs.rollTimerPoolSize

1

hdfs.kerberosPrincipal

hdfs.kerberosKeytab

hdfs.proxyUser

hdfs.round

FALSE

hdfs.roundValue

1

hdfs.roundUnit

second

hdfs.timeZone

Local Time

hdfs.useLocalTimeStamp

FALSE

hdfs.closeTries

0

hdfs.retryInterval

180

serializer

TEXT

serializer.*

示例:

Taildir Source

属性
默认值

channel

-

type

-

filegroups

-

filegroups.

-

positionFile

~/.flume/taildir_position.json

headers..

-

byteOffsetHeader

false

skipToEnd

false

idleTimeout

120000

writePosInterval

3000

batchSize

100

maxBatchCount

Long.MAX_VALUE

backoffSleepIncrement

1000

maxBackoffSleep

5000

cachePatternMatching

true

fileHeader

false

fileHeaderKey

file

示例:

Exec Source 做测试可以,官方不推荐用这种方式,因为tail -F 不会因为目标无法写入(例如channel空间被占满)而停止,也不会记录当前传输位置。不利于数据完整性。推荐用TAILDIR或spooldir

属性
默认值

channel

-

type

-

command

-

shell

-

restartThrottle

10000

restart

false

logStdErr

false

batchSize

20

batchTimeout

3000

selector.type

replicating

selector.*

interceptors

-

interceptors.*

示例:

最后更新于