自定义测试:Tracks

定义

一条Track定义了一个或多个测试场景,具体的结构定义在另一个文件中

从存在集群创建

如果你已经有一个存有数据的es集群,你可以使用rally的子命令create-track创建一个Rally的track。如基于一个已经部署的es集群通过productscompanies这两个索引创建一个tracks,命令如下:

esrally create-track --track=acme --target-hosts=127.0.0.1:9200 --indices="products,companies" --output-path=~/tracks

如果要连接到启用了TLS和基本身份验证的集群,需要指明--client-options代码如下:

esrally create-track --track=acme --target-hosts=abcdef123.us-central-1.gcp.cloud.es.io:9243 --client-options="timeout:60,use_ssl:true,verify_certs:true,basic_auth_user:'elastic',basic_auth_password:'secret-password'" --indices="products,companies" --output-path=~/tracks

需要将 basic_auth_userbasic_auth_password 进行相应的修改。track生成器将在指定的输出目录中创建一个具有track名称的文件夹:

> find tracks/acme
tracks/acme
tracks/acme/companies-documents.json
tracks/acme/companies-documents.json.bz2
tracks/acme/companies-documents-1k.json
tracks/acme/companies-documents-1k.json.bz2
tracks/acme/companies.json
tracks/acme/products-documents.json
tracks/acme/products-documents.json.bz2
tracks/acme/products-documents-1k.json
tracks/acme/products-documents-1k.json.bz2
tracks/acme/products.json
tracks/acme/track.json

文件组织如下:

  • track.json 包含了实际的测试track
  • companies.jsonproducts.json包含提取的索引的映射和设置。
  • *-documents.json(.bz2)包含提取索引中所有文档数据,带有-1k后缀的文件包含文档语料库的较小版本,以支持test模式。

从数据集构建

我们手把手的教你构建一个tracktutorial我们将所有内容存储在〜/rally-tracks/tutorial目录中,你也可以选择其他任何位置。

首先获取数据,Geonames根据creative commons license许可提供地理数据。下载allCountries.zip(大约300MB),解压缩并检查allCountries.txt。 该文件以制表符分隔,但要使用Elasticsearch批量索引数据,我们需要JSON格式的数据。使用以下脚本转换数据:

import json

cols = (("geonameid", "int", True),
        ("name", "string", True),
        ("asciiname", "string", False),
        ("alternatenames", "string", False),
        ("latitude", "double", True),
        ("longitude", "double", True),
        ("feature_class", "string", False),
        ("feature_code", "string", False),
        ("country_code", "string", True),
        ("cc2", "string", False),
        ("admin1_code", "string", False),
        ("admin2_code", "string", False),
        ("admin3_code", "string", False),
        ("admin4_code", "string", False),
        ("population", "long", True),
        ("elevation", "int", False),
        ("dem", "string", False),
        ("timezone", "string", False))


def main():
    with open("allCountries.txt", "rt", encoding="UTF-8") as f:
        for line in f:
            tup = line.strip().split("\t")
            record = {}
            for i in range(len(cols)):
                name, type, include = cols[i]
                if tup[i] != "" and include:
                    if type in ("int", "long"):
                        record[name] = int(tup[i])
                    elif type == "double":
                        record[name] = float(tup[i])
                    elif type == "string":
                        record[name] = tup[i]
            print(json.dumps(record, ensure_ascii=False))


if __name__ == "__main__":
    main()

把脚本存储为toJSON.py,放到tutorial文件夹底下(~/rally-tracks/tutorial)使用python命令执行它:python3 toJSON.py > documents.json

然后将以下映射文件作为index.json存储在tutorial目录中:

  "settings": {
    "index.number_of_replicas": 0
  },
  "mappings": {
    "docs": {
      "dynamic": "strict",
      "properties": {
        "geonameid": {
          "type": "long"
        },
        "name": {
          "type": "text"
        },
        "latitude": {
          "type": "double"
        },
        "longitude": {
          "type": "double"
        },
        "country_code": {
          "type": "text"
        },
        "population": {
          "type": "long"
        }
      }
    }
  }
}
注意 本教程假定您要对7.0.0之前的Elasticsearch版本进行基准测试。如果要对Elasticsearch 7.0.0或更高版本进行基准测试,则需要删除上面的映射类型。

有关es的语法的详细信息,请参阅有关映射的Elasticsearch文档和create index API最后,将tarck存储为tutorial目录中的track.json:

{
  "version": 2,
  "description": "Tutorial benchmark for Rally",
  "indices": [
    {
      "name": "geonames",
      "body": "index.json",
      "types": [ "docs" ]
    }
  ],
  "corpora": [
    {
      "name": "rally-tutorial",
      "documents": [
        {
          "source-file": "documents.json",
          "document-count": 11658903,
          "uncompressed-bytes": 1544799789
        }
      ]
    }
  ],
  "schedule": [
    {
      "operation": {
        "operation-type": "delete-index"
      }
    },
    {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        }
      }
    },
    {
      "operation": {
        "operation-type": "bulk",
        "bulk-size": 5000
      },
      "warmup-time-period": 120,
      "clients": 8
    },
    {
      "operation": {
        "operation-type": "force-merge"
      }
    },
    {
      "operation": {
        "name": "query-match-all",
        "operation-type": "search",
        "body": {
          "query": {
            "match_all": {}
          }
        }
      },
      "clients": 8,
      "warmup-iterations": 1000,
      "iterations": 1000,
      "target-throughput": 100
    }
  ]
}

其中文档数可以通过wc -l documents.json获取,文档未压缩大小可以通过ll获取。

注意 本教程假定您要对7.0.0之前的Elasticsearch版本进行基准测试。如果要对Elasticsearch 7.0.0或更高版本进行基准测试,则需要删除上面的types属性。
注意 您可以将任何支持的脚本与track一起存储。但是,您需要将它们放置在以“ _”开头的目录中,例如“support”。 Rally从任何目录加载跟踪插件(请参阅下文),但将忽略以“”开头的目录。
注意 我们为轨道定义了JSON模式,您可以使用它检查如何定义track。您还应该检查Rally提供的track以获取灵感。

当你运行esrally list tracks --track-path=~/rally-tracks/tutorial的时候,一条新的track就会出现:

dm@io:~ $ esrally list tracks --track-path=~/rally-tracks/tutorial

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/
Available tracks:

Name        Description                   Documents    Compressed Size  Uncompressed Size
----------  ----------------------------- -----------  ---------------  -----------------
tutorial    Tutorial benchmark for Rally      11658903  N/A              1.4 GB

您还可以通过以下方式显示有关track的详细信息: esrally info --track-path=~/rally-tracks/tutorial

dm@io:~ $ esrally info --track-path=~/rally-tracks/tutorial

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

Showing details for track [tutorial]:

* Description: Tutorial benchmark for Rally
* Documents: 11,658,903
* Compressed Size: N/A
* Uncompressed Size: 1.4 GB


Schedule:
----------

1. delete-index
2. create-index
3. cluster-health
4. bulk (8 clients)
5. force-merge
6. query-match-all (8 clients)

恭喜,您已经创建了第一条track!你可以用esrally --distribution-version=6.0.0 --track-path=~/rally-tracks/tutorial测试es集群了。

增加测试数据

这里提供的数据数量很有限,下面的脚本可以快速增加数据量:

import json,random

from tqdm import tqdm

MAX_NUM=10000000*3

def create_data():
  num = MAX_NUM
  for i in tqdm(range(num)):
    geonameid = random.randint(1,100)
    latitude = random.uniform(10,20)
    name = random.sample('zyxwvutsrqponmlkjihgfedcba',5)
    name = ''.join(name)
    longitude = random.uniform(20,30)
    population = random.randint(1,10000)
    data = {"geonameid":geonameid,"latitude":latitude,"name":name,"longitude":longitude,"population":population}
    print(json.dumps(data, ensure_ascii=False))

if __name__ == "__main__":
    create_data()

将其保存为createJson.py执行 python3 createJson.py >> documents.json 通过配置MAX_NUM的值,每10000000的大小为1.2G的数据。