重庆分公司,新征程启航
为企业提供网站建设、域名注册、服务器等服务
ClickHouse 是一个开源的面向联机分析处理(OLAP, On-Line Analytical Processing) 的列式存储数据库管理系统。
在一个 "常规" 的行式数据库管理系统中,数据按下面的顺序存储:
id | name | age ---|----------|---1 | Zhangsan | 182 | GlonHo | 203 | Lisi | 22...| ... | ...
换言之,所有相关的值在一个行里面一个挨一个存储。行式存储的的数据库管理系统有:MySQL, Postgres, MS SQL Server 等。
在一个列式存储数据库管理系统中,数据存储的方式如下所示:
id: 1 2 3 ...name: Zhangsan GlonHo Lisi ...age: 18 20 22 ...
列式存储的数据库管理系统更适合于 OLAP 场景(对于大多数查询,至少有 100 倍的处理速度提升)的原因有:
对于一个分析的查询,只需要表中少量的列。在一个列存储数据库管理系统中,可以只读取所需的数据。例如,如果只需要从 100 列中读取 5 列,那么预期可以减少 20倍 I/O
列式存储数据,更易于压缩,进一步减少 I/O
由于减少了 I/O,系统中可以缓存更多符合要求的数据
执行一个查询需要处理大量的行,它有助于调度所有操作对整个向量而不是单独的行,或实现查询引擎,这样几乎没有调度成本,如果不这么做,对于任意还过得去的磁盘子系统,查询解释器不可避免地分摊 CPU。因此,把数据以列的方式来存储和处理是很有意义的。
有两种方法可以做到这一点:
vector 引擎。所有操作是写成向量的形式,而不是单独的值。这意味着你不需要频繁调用操作,并且调度成本可以忽略不计。
代码生成。生成的查询的代码中含有所有的间接调用。
不是所有的列式存储数据管理系统都会进行数据压缩,如:InfiniDB CE 和 MonetDB。然而,数据压缩真的提高了性能
一些列式存储数据管理系统只能在 RAM(Random Access Memory)上面工作,如:SAP HANA 和 Google PowerDrill。但是对于海量数据,RAM 的成本太大了。
支持非标准 SQL。 不支持 NULL,不支持相关子查询,支持 JOIN,支持在 FROM、IN、 JOIN 子句中的子查询和标量子查询。
不止以列的形式存储数据,部分列还经过向量处理。这样能取得高 CPU 性能。
ClickHouse支持主键表。为了迅速查询某个范围内的主键,数据使用合并树增量地进行排序。因此,数据可以不断被添加到表中,添加数据时没有加锁。
允许有一个主键
这使得 ClickHouse 可以用作 Web 系统的后端。低延时意味着查询可以被及时处理。
使用多主节点复制。数据被写入任何可用的复制节点后,分发给其他的复制节点。系统在不同的复制节点中维护相同的数据。在出现失败之后,数据会自动回复,或者在复杂的情况下使用一个 "按钮"。
ClickHouse 不是一个跨平台的系统,它要求 Linux Ubuntu 12.04 或更新版本,支持带有 4.2 SSE 指令集的 x86_64 架构。
检查是否支持 4.2 SSE 指令集:
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
推荐使用 Ubuntu 系统,连接终端必须是 UTF-8 编码(Ubuntu 默认是 UTF-8)。
在 /etc/apt/sources.list
或者一个单独的 /etc/apt/sources.list.d/clickhouse.list
文件中添加 repository:
在 Ubuntu Trusty (14.04):
deb http://repo.yandex.ru/clickhouse/trusty stable main
对于其他 Ubuntu 版本,把替换 trusty 成 xenial 或者 precise。
然后执行:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 # optionalsudo apt-get updatesudo apt-get install clickhouse-client clickhouse-server-common
也可以手动下载安装包:
ClickHouse 包含访问限制设置,设置在 users.xml
文件中(和 config.xml 放在一起)。
默认情况下,允许自任何地方的默认用户无密码的访问。
Linux 平台跟着 build.md 中的介绍进行 build,Mac OS X 则跟着 build_osx.md 进行 build。
你可以编译包然后安装,也可以不安装。
ITEM | 路径 |
---|---|
Client | src/dbms/src/Client/ |
Server | src/dbms/src/Server/ |
对于 Server,创建一个数据目录,如:
/var/lib/clickhouse/data/default//var/lib/clickhouse/metadata/default/
在 server config 中配置,然后给所需用户 chown 分配权限。
注意:服务器配置的路径是:src/dbms/src/Server/config.xml
Docker p_w_picpath: https://hub.docker.com/r/yandex/clickhouse-server/
Gentoo: https://github.com/kmeaw/clickhouse-overlay
以守护进程(daemon)的方式启动服务:
sudo service clickhouse-server start
日志目录:
/var/log/clickhouse-server/
如果启动失败,检查配置文件:
/etc/clickhouse-server/config.xml
还可以在控制台启动服务:
clickhouse-server --config-file=/etc/clickhouse-server/config.xml
在这种情况下,日志会输出到控制台,这在开发的时候还是挺方便的。如果配置文件就在当前目录(即与 clickhouse-server 同一目录),无需使用参数 --config-file,默认读取 ./config-file。
可以使用命令行客户端来连接服务:
clickhouse-client
clickhouse-client 参数介绍:
参数 | 描述 |
---|---|
--host, -h | 目标服务器名,默认为 localhost |
--port | 目标端口,默认为 9000 |
--user, -u | 连接用户,默认为 default |
--password | 连接用户密码,默认为空字符串 |
--query, -q | 非交互模式下执行的命令 |
--database, -d | 当前操作的数据库,默认选择配置文件配置的值(默认为 default 库) |
--multiline, -m | 如果设定,允许多行查询 |
--multiquery, -n | 如果指定,允许处理由分号分隔的多个查询。只有在非交互式模式工作。 |
--format, -f | 使用指定的默认格式输出结果 |
--vertical, -E | 如果指定,默认使用垂直格式输出结果,等同于 --format=Vertical。在这种格式中,每个值可在单独的行上,显示宽表时很有用。 |
--time, -t | 如果指定,在 stderr 中输出查询执行时间的非交互式模式下。 |
--stacktrace | 如果指定,如果发生异常,也会输出堆栈跟踪。 |
--config-file | 配置文件的名称,额外的设置或改变了上面列出的设置默认值。 |
默认情况下,配置文件的搜索顺序如下:
./clickhouse-client.xml
~/.clickhouse-client/config.xml
/etc/clickhouse-client/config.xml
如果三个文件同时存在N个,则以找到的第一个配置文件为准。
这个客户端还可以连接到一个远程服务端:
clickhouse-client --host=example.com
还可以指定将用于处理查询的任何设置,如:clickhouse-client --max_threads=1,表示查询处理线程的大数量为 1。
root@GlonHo:~# clickhouse-clientClickHouse client version 1.1.54198. Connecting to localhost:9000.Connected to ClickHouse server version 1.1.54198. :) :) select 1SELECT 1┌─1─┐ │ 1 │ └───┘1 rows in set. Elapsed: 0.023 sec. :)
恭喜你,it works!
如果你是 Yandex 的员工,你可以使用 Yandex.Metrica 的测试数据来探索系统的功能和性能,你在这里可以找到如何使用测试数据的介绍。另外,你可以使用一个公开的可用的数据集,看这里。
如果你是 Yandex 的员工,你可以使用 ClickHouse 内部邮件列表,你可以订阅这个列表来获取公告、新的发展信息和其他用户的问题。
另外,你可以在 Stack Overflow 上提问,在 Google Groups 上讨论,或者发邮件到开发者邮箱:clickhouse-feedback@yandex-team.com。
ClickHouse 目前官方只支持 Ubuntu,对于 RedHat 并没有什么描述。在 CentOS 6.9 上编译安装的时候,特别麻烦,最后放弃了,后来在官方的 Google Groups 上找到一个 RedHat 的包安装方式(或者直接到 GitHub),但还是找不到对应版本的依赖,搜了一下,可能需要重新编译内核,也就放弃了。最终找了个 Ubuntu 14.04 LTS 来做的实验。
CentOS:
[root@GlonHo ~]# grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported" SSE 4.2 supported [root@GlonHo ~]# yum-config-manager --add-repo http://repo.red-soft.biz/repos/clickhouse/repo/clickhouse-el6.repobash: yum-config-manager: command not found [root@GlonHo ~]# yum -y install yum-utils ... [root@GlonHo ~]# yum-config-manager --add-repo http://repo.red-soft.biz/repos/clickhouse/repo/clickhouse-el6.repoLoaded plugins: fastestmirror adding repo from: http://repo.red-soft.biz/repos/clickhouse/repo/clickhouse-el6.repo grabbing file http://repo.red-soft.biz/repos/clickhouse/repo/clickhouse-el6.repo to /etc/yum.repos.d/clickhouse-el6.repo clickhouse-el6.repo | 165 B 00:00 repo saved to /etc/yum.repos.d/clickhouse-el6.repo [root@GlonHo ~]# yum install clickhouse-server clickhouse-client clickhouse-server-common clickhouse-compressor -y Loaded plugins: fastestmirror Setting up Install Process Loading mirror speeds from cached hostfile * base: mirrors.zju.edu.cn * epel: mirrors.tuna.tsinghua.edu.cn * extras: mirrors.zju.edu.cn * updates: mirrors.163.com base | 3.7 kB 00:00 clickhouse | 2.9 kB 00:02 extras | 3.4 kB 00:00 updates | 3.4 kB 00:00 Package clickhouse-server-common-1.1.54198-3.el6.x86_64 already installed and latest versionPackage clickhouse-compressor-1.1.54198-3.el6.x86_64 already installed and latest versionResolving Dependencies--> Running transaction check---> Package clickhouse-client.x86_64 0:1.1.54198-3.el6 will be installed---> Package clickhouse-server.x86_64 0:1.1.54198-3.el6 will be installed--> Processing Dependency: libbfd-2.20.51.0.2-5.44.el6.so()(64bit) for package: clickhouse-server-1.1.54198-3.el6.x86_64--> Finished Dependency ResolutionError: Package: clickhouse-server-1.1.54198-3.el6.x86_64 (clickhouse) Requires: libbfd-2.20.51.0.2-5.44.el6.so()(64bit) You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest
Ubuntu:
root@GlonHo:~# grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"SSE 4.2 supported root@GlonHo:/etc/apt/sources.list.d# vim clickhouse.list deb http://repo.yandex.ru/clickhouse/trusty stable main root@GlonHo:~# apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 Executing: gpg --ignore-time-conflict --no-options --no-default-keyring --homedir /tmp/tmp.IoJhY8ePkd --no-auto-check-trustdb --trust-model always --keyring /etc/apt/trusted.gpg --primary-keyring /etc/apt/trusted.gpg --keyserver keyserver.ubuntu.com --recv E0C56BD4 gpg: requesting key E0C56BD4 from hkp server keyserver.ubuntu.com gpg: key E0C56BD4: public key "ClickHouse Repository Key" imported gpg: Total number processed: 1gpg: imported: 1 (RSA: 1) root@GlonHo:~# apt-get updateHit http://security.ubuntu.com trusty-security InRelease Ign http://archive.ubuntu.com trusty InRelease ... Reading package lists... Done root@GlonHo:~# root@GlonHo:~# apt-get install clickhouse-client clickhouse-server-commonReading package lists... Done Building dependency tree Reading state information... Done The following packages were automatically installed and are no longer required: acl at-spi2-core colord dconf-gsettings-backend dconf-service fontconfig fontconfig-config fonts-dejavu-core hicolor-icon-theme libasound2 libasound2-data libatk-bridge2.0-0 libatk1.0-0 libatk1.0-data libatspi2.0-0 libavahi-client3 libavahi-common-data libavahi-common3 libcairo-gobject2 libcairo2 libcanberra-gtk3-0 libcanberra-gtk3-module libcanberra0 libcolord1 libcolorhug1 libcups2 libdatrie1 libdconf1 libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libexif12 libfontconfig1 libfontenc1 libgd3 libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-common libgl1-mesa-dri libgl1-mesa-glx libglapi-mesa libgphoto2-6 libgphoto2-l10n libgphoto2-port10 libgraphite2-3 libgtk-3-0 libgtk-3-bin libgtk-3-common libgudev-1.0-0 libgusb2 libharfbuzz0b libice6 libieee1284-3 libjasper1 libjbig0 libjpeg-turbo8 libjpeg8 liblcms2-2 libllvm3.4 libltdl7 libnotify-bin libnotify4 libogg0 libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libpciaccess0 libpixman-1-0 libsane libsane-common libsm6 libtdb1 libthai-data libthai0 libtiff5 libtxc-dxtn-s2tc0 libv4l-0 libv4lconvert0 libvorbis0a libvorbisfile3 libvpx1 libwayland-client0 libwayland-cursor0 libx11-xcb1 libxaw7 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-render0 libxcb-shm0 libxcb-sync1 libxcomposite1 libxcursor1 libxdamage1 libxfixes3 libxfont1 libxi6 libxinerama1 libxkbcommon0 libxkbfile1 libxmu6 libxpm4 libxrandr2 libxrender1 libxshmfence1 libxt6 libxtst6 libxxf86vm1 notification-daemon sound-theme-freedesktop x11-common x11-xkb-utils xfonts-base xfonts-encodings xfonts-utils xserver-common xserver-xorg-core Use 'apt-get autoremove' to remove them. The following extra packages will be installed: clickhouse-server-base The following NEW packages will be installed: clickhouse-client clickhouse-server-base clickhouse-server-common0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded. Need to get 198 MB of archives. After this operation, 632 MB of additional disk space will be used. Do you want to continue? [Y/n] yGet:1 http://repo.yandex.ru/clickhouse/trusty/ stable/main clickhouse-server-base amd64 1.1.54198 [198 MB] Get:2 http://repo.yandex.ru/clickhouse/trusty/ stable/main clickhouse-client amd64 1.1.54198 [2,674 B] Get:3 http://repo.yandex.ru/clickhouse/trusty/ stable/main clickhouse-server-common amd64 1.1.54198 [7,578 B] Fetched 198 MB in 18min 7s (182 kB/s) Selecting previously unselected package clickhouse-server-base. (Reading database ... 62852 files and directories currently installed.) Preparing to unpack .../clickhouse-server-base_1.1.54198_amd64.deb ... Unpacking clickhouse-server-base (1.1.54198) ... Selecting previously unselected package clickhouse-client. Preparing to unpack .../clickhouse-client_1.1.54198_amd64.deb ... Unpacking clickhouse-client (1.1.54198) ... Selecting previously unselected package clickhouse-server-common. Preparing to unpack .../clickhouse-server-common_1.1.54198_amd64.deb ... Unpacking clickhouse-server-common (1.1.54198) ... Processing triggers for ureadahead (0.100.0-16) ... Setting up clickhouse-server-base (1.1.54198) ... Processing triggers for ureadahead (0.100.0-16) ... Setting up clickhouse-client (1.1.54198) ... Setting up clickhouse-server-common (1.1.54198) ... root@GlonHo:~# root@GlonHo:~# topTasks: 90 total, 2 running, 88 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 2049856 total, 1147260 used, 902596 free, 16004 buffers KiB Swap: 0 total, 0 used, 0 free. 960556 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1262 root 20 0 112712 34552 1112 S 0.0 1.7 0:00.00 ruby 1196 root 20 0 182364 34428 2728 S 0.0 1.7 0:00.77 puppet **2368 clickho+ 20 0 241336 12524 3504 S 0.0 0.6 0:00.04 clickhouse-serv** 1973 root 20 0 107720 4216 3220 S 0.0 0.2 0:00.02 sshd root@GlonHo:~# ls /var/log/clickhouse-server/clickhouse-server.err.log clickhouse-server.log stderr stdout root@GlonHo:~# cat /var/log/clickhouse-server/clickhouse-server.err.log 2017.04.24 09:30:03.783101 [ 1 ] ConfigProcessor: Include not found: networks2017.04.24 09:30:03.783125 [ 1 ] ConfigProcessor: Include not found: networks2017.04.24 09:30:05.803298 [ 2 ] ConfigProcessor: Include not found: clickhouse_remote_servers2017.04.24 09:30:05.803353 [ 2 ] ConfigProcessor: Include not found: clickhouse_compression root@GlonHo:~# root@GlonHo:~# cat /var/log/clickhouse-server/clickhouse-server.log 2017.04.24 09:30:03.708578 [ 1 ] : Starting daemon with revision 541982017.04.24 09:30:03.781176 [ 1 ] Application: starting up2017.04.24 09:30:03.781650 [ 1 ] Application: rlimit on number of file descriptors is 2621442017.04.24 09:30:03.781664 [ 1 ] Application: Initializing DateLUT.2017.04.24 09:30:03.781670 [ 1 ] Application: Initialized DateLUT with time zone `UTC'. 2017.04.24 09:30:03.782226 [ 1 ] Application: Configuration parameter 'interserver_http_host' doesn't exist or exists and empty. Will use 'ubuntu' as replica host. 2017.04.24 09:30:03.782338 [ 1 ] ConfigReloader: Loading config `/etc/clickhouse-server/users.xml' 2017.04.24 09:30:03.783093 [ 1 ] ConfigProcessor: Include not found: networks 2017.04.24 09:30:03.783121 [ 1 ] ConfigProcessor: Include not found: networks 2017.04.24 09:30:03.783472 [ 1 ] Application: Loading metadata. 2017.04.24 09:30:03.783610 [ 1 ] DatabaseOrdinary (default): Total 0 tables. 2017.04.24 09:30:03.783734 [ 1 ] Application: Loaded metadata. 2017.04.24 09:30:03.783848 [ 1 ] DatabaseOrdinary (system): Total 0 tables. 2017.04.24 09:30:03.784376 [ 1 ] Application: Listening http://[::1]:8123 2017.04.24 09:30:03.784420 [ 1 ] Application: Listening tcp: [::1]:9000 2017.04.24 09:30:03.784448 [ 1 ] Application: Listening interserver: [::1]:9009 2017.04.24 09:30:03.784473 [ 1 ] Application: Listening http://127.0.0.1:8123 2017.04.24 09:30:03.784491 [ 1 ] Application: Listening tcp: 127.0.0.1:9000 2017.04.24 09:30:03.784507 [ 1 ] Application: Listening interserver: 127.0.0.1:9009 2017.04.24 09:30:03.784621 [ 1 ] Application: Ready for connections. 2017.04.24 09:30:05.801608 [ 2 ] ConfigReloader: Loading config `/etc/clickhouse-server/config.xml'2017.04.24 09:30:05.803274 [ 2 ] ConfigProcessor: Include not found: clickhouse_remote_servers2017.04.24 09:30:05.803348 [ 2 ] ConfigProcessor: Include not found: clickhouse_compression root@GlonHo:~# cat /var/log/clickhouse-server/stderr Should logs to /var/log/clickhouse-server/clickhouse-server.logShould error logs to /var/log/clickhouse-server/clickhouse-server.err.logroot@GlonHo:~# clickhouse-clientClickHouse client version 1.1.54198. Connecting to localhost:9000. Connected to ClickHouse server version 1.1.54198. :) :) select 1SELECT 1┌─1─┐ │ 1 │ └───┘1 rows in set. Elapsed: 0.023 sec. :) :) select now() SELECT now() ┌───────────────now()─┐ │ 2017-04-24 09:37:31 │ └─────────────────────┘1 rows in set. Elapsed: 0.005 sec. :) :) Bye. (CTRL + d 退出客户端) root@GlonHo:~# service clickhouse-server stopStop clickhouse-server service: DONE
日志变化:
root@GlonHo:~# tail -f /var/log/clickhouse-server/clickhouse-server.log2017.04.24 09:36:59.286669 [ 3 ]作者:GlonHoTCPConnectionFactory: TCP Request. Address: 127.0.0.1:369422017.04.24 09:36:59.288258 [ 3 ] TCPHandler: Connected ClickHouse client version 1.1.54198, user: default.2017.04.24 09:37:15.669268 [ 3 ] executeQuery: (from 127.0.0.1:36942) select 12017.04.24 09:37:15.678877 [ 3 ] InterpreterSelectQuery: FetchColumns -> Complete2017.04.24 09:37:15.679000 [ 3 ] executeQuery: Query pipeline:Expression Expression One2017.04.24 09:37:15.679459 [ 3 ] executeQuery: Read 1 rows, 1.00 B in 0.010 sec., 98 rows/sec., 98.89 B/sec.2017.04.24 09:37:15.679521 [ 3 ] MemoryTracker: Peak memory usage (for query): 1.00 MiB.2017.04.24 09:37:15.679541 [ 3 ] MemoryTracker: Peak memory usage (for user): 1.00 MiB.2017.04.24 09:37:15.679548 [ 3 ] MemoryTracker: Peak memory usage (total): 1.00 MiB.2017.04.24 09:37:15.679559 [ 3 ] TCPHandler: Processed in 0.011 sec.2017.04.24 09:37:31.497405 [ 3 ] executeQuery: (from 127.0.0.1:36942) select now()2017.04.24 09:37:31.497653 [ 3 ] InterpreterSelectQuery: FetchColumns -> Complete2017.04.24 09:37:31.497976 [ 3 ] executeQuery: Query pipeline:Expression Expression One2017.04.24 09:37:31.500776 [ 3 ] executeQuery: Read 1 rows, 1.00 B in 0.003 sec., 313 rows/sec., 313.78 B/sec.2017.04.24 09:37:31.500856 [ 3 ] MemoryTracker: Peak memory usage (for query): 1.00 MiB.2017.04.24 09:37:31.500872 [ 3 ] MemoryTracker: Peak memory usage (for user): 1.00 MiB.2017.04.24 09:37:31.500880 [ 3 ] MemoryTracker: Peak memory usage (total): 1.00 MiB.2017.04.24 09:37:31.500893 [ 3 ] TCPHandler: Processed in 0.004 sec.2017.04.24 10:04:11.313863 [ 3 ] TCPHandler: Done processing connection.2017.04.24 10:04:36.359834 [ 4 ] Application: Received termination signal (Terminated)2017.04.24 10:04:36.359978 [ 1 ] Application: Received termination signal.2017.04.24 10:04:36.360021 [ 1 ] Application: Waiting for current connections to close.2017.04.24 10:04:37.000043 [ 1 ] Application: Closed all connections.2017.04.24 10:04:37.004456 [ 1 ] Application: Shutting down storages.2017.04.24 10:04:37.005499 [ 1 ] Application: Shutted down storages.2017.04.24 10:04:37.010125 [ 1 ] Application: Destroyed global context.2017.04.24 10:04:37.010483 [ 1 ] Application: shutting down2017.04.24 10:04:37.011206 [ 1 ] Application: Uninitializing subsystem: Logging Subsystem2017.04.24 10:04:37.011572 [ 4 ] BaseDaemon: Stop SignalListener thread
另外有需要云服务器可以了解下创新互联cdcxhl.cn,海内外云服务器15元起步,三天无理由+7*72小时售后在线,公司持有idc许可证,提供“云服务器、裸金属服务器、高防服务器、香港服务器、美国服务器、虚拟主机、免备案服务器”等云主机租用服务以及企业上云的综合解决方案,具有“安全稳定、简单易用、服务可用性高、性价比高”等特点与优势,专为企业上云打造定制,能够满足用户丰富、多元化的应用场景需求。