方式 | Oracle | SQL Server | 说明 |
备份还原 | 备份还原 | 备份还原 | 简单粗暴,无法实时,无法实现增量 |
日志备份 | 备库(Dataguard) |
数据库镜像(Database Mirroring) 日志传输(Log Shipping) |
读写操作受限 |
集群 | RAC(Real Application Clusters) | 集群(Database Cluster) | RAC配置复杂,SQL Server集群只有单节点工作。实际存储只有一份。 |
视图 | 物化视图(Materialized View) | 索引视图(Indexed Views) | 不可改表结构,如增加字段等。 |
数据变更捕获 | CDC(Change Data Capture) | CDC(Change Data Capture) | 不够灵活,无法配置只想获取部分事件,数据量很大。 |
订阅发布 |
ogg(Oracle Golden Gate) 流复制(Stream Replication) 高级复制(Oracle advanced Replication) 订阅发布(Publish and Subscribe) |
数据库复制(Database Replication) 订阅发布(Publish and Subscribe) |
最灵活的方式了,但也有限制。如果ogg在源加一列,或订阅发布的快照过期了,就惨了 |
Category Archives: Database
ElasticSearch2基本操作(06关于查询条件及过滤)
过滤
match_all | 全部匹配,不做过滤,默认 |
term | 精确匹配 |
terms | 精确匹配多个词 |
range | 范围匹配 |
exists | 文档包含某属性 |
missing | 文档不包含某属性 |
bool | 多个过滤条件的组合 |
其中,对于bool过滤,可以有下面的组合条件:
must | 多个查询条件的完全匹配,相当于 and。 |
must_not | 多个查询条件的相反匹配,相当于 not。 |
should | 至少有一个查询条件匹配, 相当于 or。 |
查询
match_all | 全部匹配,默认 |
match | 首先对查询条件进行分词,然后用TF/IDF评分 |
multi_match | 与match类似,但可以用多个条件 |
bool | 多个条件的组合查询 |
其中,对于bool查询,可以有下面的组合条件:
must | 多个查询条件的完全匹配,相当于 and。 |
must_not | 多个查询条件的相反匹配,相当于 not。 |
should | 至少有一个查询条件匹配, 相当于 or。 |
#查询性别为男,年龄不是25,家庭住址最好有魔都两个字的记录 curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d' { "query": { "bool": { "must": { "term": { "性别": "男" } }, "must_not": { "match": { "年龄": "25" } }, "should": { "match": { "家庭住址": "魔都" } } } } }' #查询注册时间从2015-04-01到2016-04-01的用户 curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d' { "query": { "bool": { "must": { "range": { "注册时间": { "gte": "2015-04-01 00:00:00", "lt": "2016-04-01 00:00:00" } } } } } }' #查询没有年龄字段的记录 curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d' { "query": { "bool": { "must": { "missing": { "field": "年龄" } } } } }' #查询家庭地址或工作地址中包含北京的用户 curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d' { "query": { "multi_match": { "query": "北京", "type": "most_fields", "fields": [ "家庭住址", "工作地址" ] } } }' #查询性别为男的用户 curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d' { "query": { "filtered": { "query": { "match_all": {} }, "filter": { "term": { "性别": "男" } } } } }' #查询注册时间为两年内的用户 curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d' { "query": { "filtered": { "query": { "match_all": {} }, "filter": { "range": { "注册时间": {"gt" : "now-2y"} } } } } }'
排序
#查询所有用户,按注册时间进行排序 curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d' { "query": { "match_all": {} }, "sort": { "注册时间": { "order": "desc" } } }'
分页
#查询前三条记录 curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d' { "query": { "match_all": {} }, "from": 0, "size": 3 }'
带缓存的分页
#进行分页 curl -XPOST http://127.0.0.1:9200/myindex/user/_search?search_type=scan&scroll=5m -d' { "query": { "match_all": {}}, "size": 10 }' #返回_scroll_id {"_scroll_id":"c2Nhbjs1OzE1MzE6NVR2MmE1WWFRRHFtelVGYlRwNGlhdzsxNTMzOjVUdjJhNVlhUURxbXpVRmJUcDRpYXc7MTUzNDo1VHYyYTVZYVFEcW16VUZiVHA0aWF3OzE1MzU6NVR2MmE1WWFRRHFtelVGYlRwNGlhdzsxNTMyOjVUdjJhNVlhUURxbXpVRmJUcDRpYXc7MTt0b3RhbF9oaXRzOjc7","took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":7,"max_score":0.0,"hits":[]}} #发送_scroll_id开始查询 curl -XPOST http://127.0.0.1:9200/_search/scroll?scroll=5m c2Nhbjs1OzE1MzE6NVR2MmE1WWFRRHFtelVGYlRwNGlhdzsxNTMzOjVUdjJhNVlhUURxbXpVRmJUcDRpYXc7MTUzNDo1VHYyYTVZYVFEcW16VUZiVHA0aWF3OzE1MzU6NVR2MmE1WWFRRHFtelVGYlRwNGlhdzsxNTMyOjVUdjJhNVlhUURxbXpVRmJUcDRpYXc7MTt0b3RhbF9oaXRzOjc7
ElasticSearch2基本操作(05关于搜索)
ES的搜索,不是关系数据库中的LIKE,而是通过搜索条件及文档之间的相关性来进行的。
对于一次搜索,对于每一个文档,都有一个浮点数字段_score 来表示文档与搜索主题的相关性, _score 的评分越高,相关性越高。
评分的计算方式取决于不同的查询类型:
fuzzy查询会计算与关键词的拼写相似程度
terms查询会计算找到的内容与关键词组成部分匹配的百分比
而全文本搜索是指计算内容与关键词的类似程度。
ES通过计算TF/IDF(即检索词频率/反向文档频率, Term Frequency/Inverse Document Frequency)作为相关性指标,具体与下面三个指标相关:
检索词频率TF: 对于一条记录,检索词在查询字段中出现的频率越高,相关性也越高。比如,一共有5个检索词,有4个出现在第一条记录,3条出现在第二条记录,则第一条记录TF会比第二条高一些。
反向文档频率IDF: 每个检索词在所有文档的该字段中出现的频率越高,则该词相关性越低。比如有5个检索词,如果一个词在所有文档中都出现,而另一个词之出现了一次,则所有文档中都包含的词几乎可以被忽略,只出现了一次的这个词权重会很高。
字段长度: 对于一条记录,查询字段的长度越长,相关性越低。比如有一条记录长度为10个词,另一条记录长度为100个词,而一个关键词,在两条记录里都出现了一次。则长度为10个词的记录,比长度为100个词的记录,相关性会高很多。
通过对TF/IDF的了解,可以让你解释一些看似不应该出现的结果。同时,你应该清楚,这不是一种精确匹配算法,而是一种评分算法,根据相关性进行了排序。
如果认为评分结果不合理,可以用下面的语句,查看评分过程:
#解释查询是如何进行评分的 crul -XPost http://127.0.0.1:9200/myindex/user/_search?explain -d' { "query" : { "match" : { "家庭住址" : "魔都大街" }} }' #结果如下: { "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 4, "hits": [ { "_shard": 4, "_node": "5Tv2a5YaQDqmzUFbTp4iaw", "_index": "myindex", "_type": "user", "_id": "u002", "_score": 4, "_source": { "用户ID": "u002", "姓名": "李四", "性别": "男", "年龄": "25", "家庭住址": "上海市闸北区魔都大街007号", "注册时间": "2015-02-01 08:30:00" }, "_explanation": { "value": 4, "description": "sum of:", "details": [ { "value": 4, "description": "sum of:", "details": [ { "value": 1, "description": "weight(家庭住址:魔 in 0) [PerFieldSimilarity], result of:", "details": [ { "value": 1, "description": "score(doc=0,freq=1.0), product of:", "details": [ { "value": 0.5, "description": "queryWeight, product of:", "details": [ { "value": 1, "description": "idf(docFreq=1, maxDocs=2)", "details": [] }, { "value": 0.5, "description": "queryNorm", "details": [] } ] }, { "value": 2, "description": "fieldWeight in 0, product of:", "details": [ { "value": 1, "description": "tf(freq=1.0), with freq of:", "details": [ { "value": 1, "description": "termFreq=1.0", "details": [] } ] }, { "value": 1, "description": "idf(docFreq=1, maxDocs=2)", "details": [] }, { "value": 2, "description": "fieldNorm(doc=0)", "details": [] } ] } ] } ] }, { "value": 1, "description": "weight(家庭住址:都 in 0) [PerFieldSimilarity], result of:", "details": [ { "value": 1, "description": "score(doc=0,freq=1.0), product of:", "details": [ { "value": 0.5, "description": "queryWeight, product of:", "details": [ { "value": 1, "description": "idf(docFreq=1, maxDocs=2)", "details": [] }, { "value": 0.5, "description": "queryNorm", "details": [] } ] }, { "value": 2, "description": "fieldWeight in 0, product of:", "details": [ { "value": 1, "description": "tf(freq=1.0), with freq of:", "details": [ { "value": 1, "description": "termFreq=1.0", "details": [] } ] }, { "value": 1, "description": "idf(docFreq=1, maxDocs=2)", "details": [] }, { "value": 2, "description": "fieldNorm(doc=0)", "details": [] } ] } ] } ] }, { "value": 1, "description": "weight(家庭住址:大街 in 0) [PerFieldSimilarity], result of:", "details": [ { "value": 1, "description": "score(doc=0,freq=1.0), product of:", "details": [ { "value": 0.5, "description": "queryWeight, product of:", "details": [ { "value": 1, "description": "idf(docFreq=1, maxDocs=2)", "details": [] }, { "value": 0.5, "description": "queryNorm", "details": [] } ] }, { "value": 2, "description": "fieldWeight in 0, product of:", "details": [ { "value": 1, "description": "tf(freq=1.0), with freq of:", "details": [ { "value": 1, "description": "termFreq=1.0", "details": [] } ] }, { "value": 1, "description": "idf(docFreq=1, maxDocs=2)", "details": [] }, { "value": 2, "description": "fieldNorm(doc=0)", "details": [] } ] } ] } ] }, { "value": 1, "description": "weight(家庭住址:街 in 0) [PerFieldSimilarity], result of:", "details": [ { "value": 1, "description": "score(doc=0,freq=1.0), product of:", "details": [ { "value": 0.5, "description": "queryWeight, product of:", "details": [ { "value": 1, "description": "idf(docFreq=1, maxDocs=2)", "details": [] }, { "value": 0.5, "description": "queryNorm", "details": [] } ] }, { "value": 2, "description": "fieldWeight in 0, product of:", "details": [ { "value": 1, "description": "tf(freq=1.0), with freq of:", "details": [ { "value": 1, "description": "termFreq=1.0", "details": [] } ] }, { "value": 1, "description": "idf(docFreq=1, maxDocs=2)", "details": [] }, { "value": 2, "description": "fieldNorm(doc=0)", "details": [] } ] } ] } ] } ] }, { "value": 0, "description": "match on required clause, product of:", "details": [ { "value": 0, "description": "# clause", "details": [] }, { "value": 0.5, "description": "_type:user, product of:", "details": [ { "value": 1, "description": "boost", "details": [] }, { "value": 0.5, "description": "queryNorm", "details": [] } ] } ] } ] } }, { "_shard": 0, "_node": "5Tv2a5YaQDqmzUFbTp4iaw", "_index": "myindex", "_type": "user", "_id": "u003", "_score": 0.71918744, "_source": { "用户ID": "u003", "姓名": "王五", "性别": "男", "年龄": "26", "家庭住址": "广州市花都区花城大街010号", "注册时间": "2015-03-01 08:30:00" }, "_explanation": { "value": 0.71918744, "description": "sum of:", "details": [ { "value": 0.71918744, "description": "product of:", "details": [ { "value": 1.4383749, "description": "sum of:", "details": [ { "value": 0.71918744, "description": "weight(家庭住址:大街 in 0) [PerFieldSimilarity], result of:", "details": [ { "value": 0.71918744, "description": "score(doc=0,freq=1.0), product of:", "details": [ { "value": 0.35959372, "description": "queryWeight, product of:", "details": [ { "value": 1, "description": "idf(docFreq=1, maxDocs=2)", "details": [] }, { "value": 0.35959372, "description": "queryNorm", "details": [] } ] }, { "value": 2, "description": "fieldWeight in 0, product of:", "details": [ { "value": 1, "description": "tf(freq=1.0), with freq of:", "details": [ { "value": 1, "description": "termFreq=1.0", "details": [] } ] }, { "value": 1, "description": "idf(docFreq=1, maxDocs=2)", "details": [] }, { "value": 2, "description": "fieldNorm(doc=0)", "details": [] } ] } ] } ] }, { "value": 0.71918744, "description": "weight(家庭住址:街 in 0) [PerFieldSimilarity], result of:", "details": [ { "value": 0.71918744, "description": "score(doc=0,freq=1.0), product of:", "details": [ { "value": 0.35959372, "description": "queryWeight, product of:", "details": [ { "value": 1, "description": "idf(docFreq=1, maxDocs=2)", "details": [] }, { "value": 0.35959372, "description": "queryNorm", "details": [] } ] }, { "value": 2, "description": "fieldWeight in 0, product of:", "details": [ { "value": 1, "description": "tf(freq=1.0), with freq of:", "details": [ { "value": 1, "description": "termFreq=1.0", "details": [] } ] }, { "value": 1, "description": "idf(docFreq=1, maxDocs=2)", "details": [] }, { "value": 2, "description": "fieldNorm(doc=0)", "details": [] } ] } ] } ] } ] }, { "value": 0.5, "description": "coord(2/4)", "details": [] } ] }, { "value": 0, "description": "match on required clause, product of:", "details": [ { "value": 0, "description": "# clause", "details": [] }, { "value": 0.35959372, "description": "_type:user, product of:", "details": [ { "value": 1, "description": "boost", "details": [] }, { "value": 0.35959372, "description": "queryNorm", "details": [] } ] } ] } ] } }, ...... ] } }
你可以看到,不仅是“魔都大街”的记录被查询出来了,只要有“大街”的记录也被查出来了哦。同时,也告诉了你,为什么”u002″是最靠前的。
还有一种用法,就是让ES告诉你,查询语句哪里错了:
curl -XPOST http://127.0.0.1:9200/myindex/user/_validate/query?explain -d' { "query" : { "matchA" : { "家庭住址" : "魔都大街" }} }' { "valid": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "explanations": [ { "index": "myindex", "valid": false, "error": "org.elasticsearch.index.query.QueryParsingException: No query registered for [matchA]" } ] }
ES会告诉你matchA这里错了哦。
ElasticSearch2基本操作(04关于分词)
恩,有些初步的感觉了没?那回过头来我们看下最基础的东西:
ES中,常见数据类型如下:
类型名称 | 数据类型 |
字符串 | string |
整数 | byte, short, integer, long |
浮点数 | float, double |
布尔 | boolean |
日期 | date |
对象 | object |
嵌套结构 | nested |
地理位置(经纬度) | geo_point |
常用字段分析类型如下:
分析类型 | 含义 |
analyzed | 首先分析这个字符串,然后索引。换言之,以全文形式索引此字段。 |
not_analyzed | 索引这个字段,使之可以被搜索,但是索引内容和指定值一样。不分析此字段。 |
no | 不索引这个字段。这个字段不能被搜索到。 |
然后,我们测试一下分词器
1、首先测试一下用标准分词进行分词
curl -XPOST http://localhost:9200/_analyze?analyzer=standard&text=小明同学大吃一惊 { "tokens": [ { "token": "小", "start_offset": 0, "end_offset": 1, "type": "<IDEOGRAPHIC>", "position": 0 }, { "token": "明", "start_offset": 1, "end_offset": 2, "type": "<IDEOGRAPHIC>", "position": 1 }, { "token": "同", "start_offset": 2, "end_offset": 3, "type": "<IDEOGRAPHIC>", "position": 2 }, { "token": "学", "start_offset": 3, "end_offset": 4, "type": "<IDEOGRAPHIC>", "position": 3 }, { "token": "大", "start_offset": 4, "end_offset": 5, "type": "<IDEOGRAPHIC>", "position": 4 }, { "token": "吃", "start_offset": 5, "end_offset": 6, "type": "<IDEOGRAPHIC>", "position": 5 }, { "token": "一", "start_offset": 6, "end_offset": 7, "type": "<IDEOGRAPHIC>", "position": 6 }, { "token": "惊", "start_offset": 7, "end_offset": 8, "type": "<IDEOGRAPHIC>", "position": 7 } ] }
2、然后对比一下用IK分词进行分词
curl -XGET http://localhost:9200/_analyze?analyzer=ik&text=小明同学大吃一惊 { "tokens": [ { "token": "小明", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 0 }, { "token": "同学", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 1 }, { "token": "大吃一惊", "start_offset": 4, "end_offset": 8, "type": "CN_WORD", "position": 2 }, { "token": "大吃", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 3 }, { "token": "吃", "start_offset": 5, "end_offset": 6, "type": "CN_WORD", "position": 4 }, { "token": "一惊", "start_offset": 6, "end_offset": 8, "type": "CN_WORD", "position": 5 }, { "token": "一", "start_offset": 6, "end_offset": 7, "type": "TYPE_CNUM", "position": 6 }, { "token": "惊", "start_offset": 7, "end_offset": 8, "type": "CN_CHAR", "position": 7 } ] }
3、测试一下按”家庭住址”字段进行分词
curl -XGET http://localhost:9200/myindex/_analyze?field=家庭住址&text=我爱北京天安门 { "tokens": [ { "token": "我", "start_offset": 0, "end_offset": 1, "type": "CN_CHAR", "position": 0 }, { "token": "爱", "start_offset": 1, "end_offset": 2, "type": "CN_CHAR", "position": 1 }, { "token": "北京", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 2 }, { "token": "京", "start_offset": 3, "end_offset": 4, "type": "CN_WORD", "position": 3 }, { "token": "天安门", "start_offset": 4, "end_offset": 7, "type": "CN_WORD", "position": 4 }, { "token": "天安", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 5 }, { "token": "门", "start_offset": 6, "end_offset": 7, "type": "CN_CHAR", "position": 6 } ] }
4、测试一下按”性别”字段进行分词
curl -XGET http://localhost:9200/myindex/_analyze?field=性别&text=我爱北京天安门 { "tokens": [ { "token": "我爱北京天安门", "start_offset": 0, "end_offset": 7, "type": "word", "position": 0 } ] }
大家可以看到,不同的分词器,使用场景、针对语言是不一样的,所以要选择合适的分词器。
此外,对于不同的字段,要选择不同的分析方式及适用的分词器,会让你事半功倍。
ElasticSearch2基本操作(03增删改查REST)
接上一篇:
11、更新文档
curl -XPOST http://localhost:9200/_bulk -d' { action: { metadata }}\n { request body }\n { action: { metadata }}\n { request body }\n '
操作类型 | 说明 |
create | 当文档不存在时创建之。 |
index | 创建新文档或替换已有文档。 |
update | 局部更新文档。 |
delete | 删除一个文档。 |
比如下面的操作:
首先删除一个文件
再新建一个文件
然后全局更加一个文件
最后局部更新一个文件
curl -XPOST http://localhost:9200/_bulk -d' { "delete": { "_index": "myindex", "_type": "user", "_id": "u004" }} { "create": { "_index": "myindex", "_type": "user", "_id": "u004" }} {"用户ID": "u004","姓名":"赵六","性别":"男","年龄":"27","家庭住址":"深圳市龙岗区特区大街011号","注册时间":"2015-04-01 08:30:00"} { "index": { "_index": "myindex", "_type": "user", "_id": "u004" }} {"用户ID": "u004","姓名":"赵六","性别":"男","年龄":"28","家庭住址":"深圳市龙岗区特区大街012号","注册时间":"2015-04-01 08:30:00"} { "update": { "_index": "myindex", "_type": "user", "_id": "u004"} } { "doc" : {"年龄" : "28"}}
结果如下:(局部更新没有执行,没查到原因)
{ "took": 406, "errors": false, "items": [ { "delete": { "_index": "myindex", "_type": "user", "_id": "u004", "_version": 10, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 200, "found": true } }, { "create": { "_index": "myindex", "_type": "user", "_id": "u004", "_version": 11, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 201 } }, { "index": { "_index": "myindex", "_type": "user", "_id": "u004", "_version": 12, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 200 } } ] }
ElasticSearch2基本操作(02增删改查REST)
接上一篇:
7、更新文档
#新增u004 curl -XPUT http://localhost:9200/myindex/user/u004 -d' { "用户ID": "u004", "姓名":"赵六", "性别":"男", "年龄":"27", "家庭住址":"深圳市龙岗区特区大街011号", "注册时间":"2015-04-01 08:30:00" }' #更新u004 curl -XPUT http://localhost:9200/myindex/user/u004 -d' { "用户ID": "u004", "姓名":"赵六", "性别":"男", "年龄":"27", "家庭住址":"深圳市龙岗区特区大街011号", "注册时间":"2015-04-01 08:30:00" }' #强制新增u004,如果已存在,则会报错 curl -XPUT http://localhost:9200/myindex/user/u004/_create -d' { "用户ID": "u004", "姓名":"赵六", "性别":"男", "年龄":"27", "家庭住址":"深圳市龙岗区特区大街012号", "注册时间":"2015-04-01 08:30:00" }'
返回结果如下:
#新增成功,版本为1 { "_index": "myindex", "_type": "user", "_id": "u004", "_version": 1, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true } #更新成功,版本为2 { "_index": "myindex", "_type": "user", "_id": "u004", "_version": 2, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": false } #强制新增失败 Http Error: Conflict
8、删除文档,注意版本号变化
#删除文档 curl -XDELETE http://localhost:9200/myindex/user/u004
9、然后新增,再做局部更新,注意版本号变化
#新增 curl -XPUT http://localhost:9200/myindex/user/u004 -d' { "用户ID": "u004", "姓名":"赵六", "性别":"男", "年龄":"27", "家庭住址":"深圳市龙岗区特区大街011号", "注册时间":"2015-04-01 08:30:00" }' #局部更新 curl -XPOST http://localhost:9999/myindex/user/u004/_update -d' { "doc": { "家庭住址": "深圳市龙岗区特区大街013号" } }' #取回 curl -XGET http://localhost:9999/myindex/user/u004
10、批量取回
#从index开始指定 curl -XGET http://localhost:9999/_mget' { "docs" : [ { "_index" : "myindex", "_type" : "user", "_id" : "u001" }, { "_index" : "myindex", "_type" : "user", "_id" : "u002", "_source": "家庭住址" } ] }' #index相同 GET -XGET http://localhost:9999/myindex/_mget' { "docs" : [ { "_type" : "user", "_id" : "u002"}, { "_type" : "user", "_id" : "u002" } ] }' #type相同 curl -XGET http://localhost:9999/myindex/user/_mget' { "ids" : [ "u001", "u002" ] }'
ElasticSearch2基本操作(01增删改查REST)
首先,大家要调整一下概念,对应于普通的关系型数据库,你可以暂时这样考虑
Relational DB | Elasticsearch |
Databases | Indexes |
Tables | Types |
Rows | Documents |
Columns | Fields |
1、创建索引myindex
curl -XPUT http://localhost:9200/myindex
2、创建类型user
curl -XPOST http://localhost:9200/myindex/user/_mapping -d' { "user": { "_all": { "analyzer": "ik_max_word", "search_analyzer": "ik_max_word", "term_vector": "no", "store": "false" }, "properties": { "用户ID": { "type": "string", "store": "no", "analyzer": "keyword", "search_analyzer": "keyword", "include_in_all": "true", "boost": 8 }, "姓名": { "type": "string", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word", "include_in_all": "true", "boost": 8 }, "性别": { "type": "string", "store": "no", "analyzer": "keyword", "search_analyzer": "keyword", "include_in_all": "true", "boost": 8 }, "年龄": { "type": "integer", "store": "no", "index": "not_analyzed", "include_in_all": "true", "boost": 8 }, "家庭住址": { "type": "string", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word", "include_in_all": "true", "boost": 8 }, "注册时间": { "type": "date", "format": "yyy-MM-dd HH:mm:ss", "store": "no", "index": "not_analyzed", "include_in_all": "true", "boost": 8 } } } }'
在这里类型user中,有几种索引类型,
key | 类型 | 分词方式 |
用户ID | string | keyword |
姓名 | string | ik_max_word |
性别 | string | keyword |
年龄 | integer | not_analyzed |
家庭住址 | string | ik_max_word |
注册时间 | date | not_analyzed |
其中,
ik_max_word,指的是用ik分词,然后将分词结果作为term,需要分词检索的字段,需要这样处理
keyword,指的是,不要分词,而是把整个词作为term,ID及字典很适合这样做
not_analyzed,是不做分词处理,如数字、时间,没有必要
3、上传文档
curl -XPUT http://localhost:9200/myindex/user/u001 -d' { "用户ID": "u001", "姓名":"张三", "性别":"男", "年龄":"25", "家庭住址":"北京市崇文区天朝大街001号", "注册时间":"2015-01-01 08:30:00" }' curl -XPUT http://localhost:9200/myindex/user/u002 -d' { "用户ID": "u002", "姓名":"李四", "性别":"男", "年龄":"25", "家庭住址":"上海市闸北区魔都大街007号", "注册时间":"2015-02-01 08:30:00" }' curl -XPUT http://localhost:9200/myindex/user/u003 -d' { "用户ID": "u003", "姓名":"王五", "性别":"男", "年龄":"26", "家庭住址":"广州市花都区花城大街010号", "注册时间":"2015-03-01 08:30:00" }'
4、文档是否存在
#判断id为u003的文档是否存在 curl -XHEAD http://localhost:9200/myindex/user/u003
5、获取文档
#获取id为u003的文档 curl -XGET http://localhost:9200/myindex/user/u003 #获取id为u003的文档的姓名及性别字段 http://localhost:9200/myindex/user/u003?_source=姓名,性别
6、查询文档
#查询文档,默认返回前10个 curl -XGET http://localhost:9200/myindex/user/_search #用参数进行查询 #年龄等于25的记录 curl -XGET http://localhost:9200/myindex/user/_search?q=年龄:25 #姓名等于王五的记录 curl -XGET http://localhost:9200/myindex/user/_search?q=姓名:王五 #姓名等于王五及年龄等于25的记录 curl -XGET http://localhost:9200/myindex/user/_search?q=+姓名:王五+年龄:26 #查询年龄等于25的用户 curl -XGET http://localhost:9200/myindex/user/_search -d' { "query" : { "match" : { "年龄" : "25" } } }' #查询年龄大于25,男性用户 curl -XGET http://localhost:9200/myindex/user/_search -d' { "query": { "filtered": { "filter": { "range": { "年龄": { "gt": 25 } } }, "query": { "match": { "性别": "男" } } } } }' #查询家庭住址中,包含北京或上海的用户 curl -XGET http://localhost:9200/myindex/user/_search -d' { "query" : { "match" : { "家庭住址" : "北京 上海" } } }' #查询词组 curl -XGET http://localhost:9200/myindex/user/_search -d' { "query" : { "match_phrase" : { "家庭住址" : "北京 崇文" } } } #按年龄分组聚合,并count curl -XGET http://localhost:9200/myindex/user/_search -d' { "aggs": { "all_interests": { "terms": { "field": "年龄" } } } } #男性患者,按年龄分组聚合,并count curl -XGET http://localhost:9200/myindex/user/_search -d' { "query": { "match": { "性别": "男" } }, "aggs": { "all_interests": { "terms": { "field": "年龄" } } } }
ElasticSearch2常用插件
1、在线安装常用插件
#head bin\plugin install mobz/elasticsearch-head #gui bin\plugin install jettro/elasticsearch-gui #bigdesk #bin\plugin install lukas-vlcek/bigdesk bin\plugin install hlstudio/bigdesk #kopf bin\plugin install lmenezes/elasticsearch-kopf #carrot2 bin\plugin install org.carrot2/elasticsearch-carrot2/2.2.1 #inquisitor bin\plugin install polyfractal/elasticsearch-inquisitor
2、离线安装常用插件
#上面的插件,都可手工下载后,通过命令行进行离线安装 bin\plugin install file:///PATH_TO_PLUGIN/PLUGIN.zip
3、手工安装分词插件
#到下面的地址下载release版本,解压,放到ES的plugins目录下,然后重启即可 https://github.com/medcl/elasticsearch-analysis-ik https://github.com/medcl/elasticsearch-analysis-pinyin https://github.com/medcl/elasticsearch-analysis-mmseg
eXistDB简单Tirgger示例03
- XCONF文件中指定XQuery文件路径
- XCONF文件中包含XQuery文件
- XCONF文件中指定Java类
第三种方式,是用XCONF文件通知eXistDB要对哪个collection中的哪些操作做触发,然后触发器指向一个JAVA类。
1、首先,编写触发器的java类,打成jar包,放到%existdb_home%\lib\user路径下
TriggerTest.java
package com.neohope.existdb.test; import org.exist.collections.Collection; import org.exist.collections.IndexInfo; import org.exist.collections.triggers.DocumentTrigger; import org.exist.collections.triggers.SAXTrigger; import org.exist.collections.triggers.TriggerException; import org.exist.dom.DocumentImpl; import org.exist.dom.NodeSet; import org.exist.security.PermissionDeniedException; import org.exist.security.xacml.AccessContext; import org.exist.storage.DBBroker; import org.exist.storage.txn.Txn; import org.exist.xmldb.XmldbURI; import org.exist.xquery.CompiledXQuery; import org.exist.xquery.XPathException; import org.exist.xquery.XQueryContext; import java.util.ArrayList; import java.util.Map; public class TriggerTest extends SAXTrigger implements DocumentTrigger { private String logCollection = "xmldb:exist:///db/Triggers"; private String logFileName = "logj.xml"; private String logUri; @Override public void configure(DBBroker broker, Collection parent, Map parameters) throws TriggerException { super.configure(broker, parent, parameters); ArrayList<String> objList = (ArrayList<String>)parameters.get("LogFileName"); if(objList!=null && objList.size()>0) { logFileName= objList.get(0); } logUri = logCollection+"/"+logFileName; } @Override public void beforeCreateDocument(DBBroker broker, Txn transaction, XmldbURI uri) throws TriggerException { LogEvent(broker,uri.toString(),"beforeCreateDocument"); } @Override public void afterCreateDocument(DBBroker broker, Txn transaction, DocumentImpl document) throws TriggerException { LogEvent(broker, document.getDocumentURI(),"afterCreateDocument"); } @Override public void beforeUpdateDocument(DBBroker broker, Txn transaction, DocumentImpl document) throws TriggerException { LogEvent(broker,document.getDocumentURI(), "beforeUpdateDocument"); } @Override public void afterUpdateDocument(DBBroker broker, Txn transaction, DocumentImpl document) throws TriggerException { LogEvent(broker, document.getDocumentURI(),"afterUpdateDocument"); } @Override public void beforeMoveDocument(DBBroker broker, Txn transaction, DocumentImpl document, XmldbURI newUri) throws TriggerException { LogEvent(broker, document.getDocumentURI(),"beforeMoveDocument"); } @Override public void afterMoveDocument(DBBroker broker, Txn transaction, DocumentImpl document, XmldbURI newUri) throws TriggerException { LogEvent(broker, document.getDocumentURI(),"afterMoveDocument"); } @Override public void beforeCopyDocument(DBBroker broker, Txn transaction, DocumentImpl document, XmldbURI newUri) throws TriggerException { LogEvent(broker, document.getDocumentURI(),"beforeCopyDocument"); } @Override public void afterCopyDocument(DBBroker broker, Txn transaction, DocumentImpl document, XmldbURI newUri) throws TriggerException { LogEvent(broker, document.getDocumentURI(),"afterCopyDocument"); } @Override public void beforeDeleteDocument(DBBroker broker, Txn transaction, DocumentImpl document) throws TriggerException { LogEvent(broker, document.getDocumentURI(),"beforeDeleteDocument"); } @Override public void afterDeleteDocument(DBBroker broker, Txn transaction, XmldbURI uri) throws TriggerException { LogEvent(broker, uri.toString(),"afterDeleteDocument"); } @Override public void beforeUpdateDocumentMetadata(DBBroker broker, Txn txn, DocumentImpl document) throws TriggerException { LogEvent(broker, document.getDocumentURI(),"beforeUpdateDocumentMetadata"); } @Override public void afterUpdateDocumentMetadata(DBBroker broker, Txn txn, DocumentImpl document) throws TriggerException { LogEvent(broker, document.getDocumentURI(),"afterUpdateDocumentMetadata"); } private void LogEvent(DBBroker broker,String uriFile, String logContent) throws TriggerException { String xQuery = "update insert <trigger event=\""+logContent+"\" uri=\""+uriFile+"\" timestamp=\"{current-dateTime()}\"/> into doc(\""+logUri+"\")/TriggerLogs"; try { XQueryContext context = broker.getXQueryService().newContext(AccessContext.TRIGGER); CreateLogFile(broker,context); CompiledXQuery compiled = broker.getXQueryService().compile(context,xQuery); broker.getXQueryService().execute(compiled, NodeSet.EMPTY_SET); } catch (XPathException e) { e.printStackTrace(); } catch (PermissionDeniedException e) { e.printStackTrace(); } } private void CreateLogFile(DBBroker broker,XQueryContext context) { String xQuery = "if (not(doc-available(\""+logUri+"\"))) then xmldb:store(\""+logCollection+"\", \""+logFileName+"\", <TriggerLogs/>) else ()"; try { CompiledXQuery compiled = broker.getXQueryService().compile(context,xQuery); broker.getXQueryService().execute(compiled, NodeSet.EMPTY_SET); } catch (XPathException e) { e.printStackTrace(); } catch (PermissionDeniedException e) { e.printStackTrace(); } } }
eXistDB简单Tirgger示例02
- XCONF文件中指定XQuery文件路径
- XCONF文件中包含XQuery文件
- XCONF文件中指定Java类
第二种方式,是用XCONF文件通知eXistDB要对哪个collection中的哪些操作做触发,然后将XQuery语句包含在XCONF文件中。
1、在你需要触发的collection的对应配置collection中,增加一个xconf文件,文件名任意,官方推荐collection.xconf。配置collection与原collection的对应关系为,在/db/system/config/db下,建立/db下相同的collection。
比如,如果你希望监控/db/cda02路径,就需要在/db/system/config/db/cda02路径下,新增一个collection.xconf。
collection.xconf
<collection xmlns="http://exist-db.org/collection-config/1.0"> <triggers> <trigger event="create" class="org.exist.collections.triggers.XQueryTrigger"> <parameter name="query" value=" xquery version '3.0'; module namespace trigger='http://exist-db.org/xquery/trigger'; declare namespace xmldb='http://exist-db.org/xquery/xmldb'; declare function trigger:before-create-document($uri as xs:anyURI) { local:log-event('before', 'create', 'document', $uri) }; declare function trigger:after-create-document($uri as xs:anyURI) { local:log-event('after', 'create', 'document', $uri) }; declare function trigger:before-delete-document($uri as xs:anyURI) { local:log-event('before', 'delete', 'document', $uri) }; declare function trigger:after-delete-document($uri as xs:anyURI) { local:log-event('after', 'delete', 'document', $uri) }; declare function local:log-event($type as xs:string, $event as xs:string, $object-type as xs:string, $uri as xs:string) { let $log-collection := '/db/Triggers' let $log := 'log02.xml' let $log-uri := concat($log-collection, '/', $log) return ( (: util:log does not work at all util:log('warn', 'trigger fired'), :) (: create the log file if it does not exist :) if (not(doc-available($log-uri))) then xmldb:store($log-collection, $log, <triggers/>) else () , (: log the trigger details to the log file :) update insert <trigger event='{string-join(($type, $event, $object-type), '-')}' uri='{$uri}' timestamp='{current-dateTime()}'/> into doc($log-uri)/triggers ) };"/> </trigger> </triggers> </collection>