在 Elasticsearch (ES) 进行查询

1. 查询优化

使用 _source 控制返回字段

默认查询返回整个_source 数据,如果字段很多,响应体会很大,影响性能. 可以用_source 控制返回字段

# 效果：只返回 user_id 和 message,减少数据传输,提高查询速度.
GET logs-*/_search
{
  "_source": ["user_id", "message"],
  "query": {
    "match": { "message": "error" }
  }
}

使用 stored_fields 仅返回指定字段如果字段已存储,可以使用 stored_fields 提高查询效率：

# 适用于 "_source": false 但存储了某些字段的情况.
GET logs-*/_search
{
  "stored_fields": ["user_id", "message"],
  "query": {
    "match": { "message": "error" }
  }
}

使用 doc_values 提高排序/聚合性能

默认情况下,字符串字段 (text) 不能用于排序,但 keyword 类型支持

GET logs-*/_search
{
  "sort": [
    { "timestamp": "desc" }
  ]
}

优化方式

# 时间字段使用 date 类型
# 字符串字段使用 keyword 而非 text
# 确保字段启用了 doc_values
PUT logs-*/_mapping
{
  "properties": {
    "timestamp": { "type": "date", "format": "epoch_millis" },
    "user_id": { "type": "keyword", "doc_values": true }
  }
}

避免 wildcard,使用 keyword 进行前缀搜索

wildcard 查询效率低,推荐使用 prefix

GET logs-*/_search
{
  "query": {
    "prefix": {
      "user_id": "abc"
    }
  }
}

或者使用 wildcard 但限制 keyword

GET logs-*/_search
{
  "query": {
    "wildcard": {
      "user_id.keyword": "abc*"
    }
  }
}

使用 filter 而不是 must(避免 score 计算)
使用 keyword 查询精确值
使用 exists 避免 null 检查

分页查询时,避免 deep pagination. 比如基于时间的查询

{
  "size": 10,
  "query": { "match_all": {} },
  "sort": [{ "timestamp": "desc" }],
  "search_after": [1689123456]
}

使用 terms 批量查询多个值

{
  "query": {
    "terms": { "status": ["active", "pending"] }
  }
}

在 Elasticsearch (ES) 进行查询

1. 查询优化

2. 资源管理

1. 限制 size,避免深分页

2. 使用 scroll 处理大数据量

3. 关闭 _source 仅返回 doc_values