Node. Testing

Tests you can run to check that the setup is correct.

General information

It's really frustrating when you think you've done everything right but still can't enter the epoch.

Better check everything several times.

circle-info

Attention! The text may contain errors because I'm not very familiar with server configuration. More precisely, "very unfamiliar," but these tests helped me.

Check that the keys are installed correctly

Sometimes multiple people configure nodes or you accidentally used the wrong command to assign the Consensus Public Key, resulting in different keys on the Node and in the Network.

That will definitely keep you out of the epoch. Check!

Consensus Key check

πŸ”Ž STEP 1. Find the Consensus Public Key on the node

circle-info

Attention! This check runs on the server where the ML Node is located (or the Network Node; I haven't fully figured this out yet because both nodes are on the same server for me).

docker exec node wget -qO- "http://127.0.0.1:26657/status" | jq -r '.result.validator_info.pub_key.value'

you will get something like:

{
  "value": "AD+NQncKPBzqw0u8KcSmlIMqogg7i4nhDfLIgIkGYiY="
}

πŸ‘‰ Copy the field "value".+

πŸ”Ž STEP 2. Find the Consensus Public Key in the network

circle-info

Attention! This check runs on the server where you created the keys β€” not on the Network Node or the ML Node.

Now see what the network thinks your key is:

curl -s http://node2.gonka.ai:8000/chain-api/productscience/inference/inference/participant/gonka1yplcem8kfe6vm06t4sl8fskm0we2zslxxu90ta | jq

circle-info

Attention! Replace the bolded part with the address of your Hot key.

You will get:

As a result you will receive a response like this:

{ "participant": { "index": "gonka1yplcem8kfe6vm06t4sl8fskm0we2zslxxu90ta", "address": "gonka1yplcem8kfe6vm06t4sl8fskm0we2zslxxu90ta", "weight": -1, "join_time": "1771876365572", "join_height": "2792955", "last_inference_time": "0", "inference_url": "http://203.168.252.195:8000", "status": "ACTIVE", "coin_balance": "0", "validator_key": "7GEr4jV5GjCv+C+jKOq3Eh4bwxMVs7kafm7tcWP0EOo=", "consecutive_invalid_inferences": "0", "worker_public_key": "", "epochs_completed": 0, "current_epoch_stats": { "inference_count": "0", "missed_requests": "0", "earned_coins": "0", "rewarded_coins": "0", "burned_coins": "0", "validated_inferences": "0", "invalidated_inferences": "0", "invalidLLR": { "value": "0", "exponent": 0 }, "inactiveLLR": { "value": "0", "exponent": 0 }, "confirmationPoCRatio": null }

We're interested in the value "validator_key".

πŸ”Ž STEP 3. Compare them. They must be identical

They should match. Ours do not. No wonder we aren't entering the epoch ))

Reasons for this mismatch can vary. I think you'll figure out how to fix it yourself.

How to fix it: I believe you'll manage. It's not hard.

---------------------------------------------------------------------------

Find out which model is on your node

Attention! If nothing appears, your ML Node might be on a different port. Possible options:

  • 5000

  • 8000

  • 8080

  • 9200

i.e., just replace that number in the command.

Expected response:

root@mlnode-308:/app# curl http://localhost:5000/v1/models {"object":"list","data":[{"id":"Qwen/Qwen3-235B-A22B-Instruct-2507-FP8","object":"model","created":1772106402,"owned_by":"vllm","root":"/root/models/Qwen3-235B-A22B-Instruct-2507-FP8","parent":null,"max_model_len":240000,"permission":[{"id":"modelperm-f9056e19f4b1494c9854c8df9887394b","object":"model_permission","created":1772106402,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}root@mlnode-308:/app#

circle-info

Attention! After running that command you will enter the Docker container. To continue working on the server command line, exit the container with: exit

Find out the Node configuration

Expected response:

/usr/bin/python3.12 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 --dtype float16 --port 5001 --host 0.0.0.0 --max-model-len 240000 --enable-auto-tool-choice --tool-call-parser hermes --tensor-parallel-size 4 --pipeline-parallel-size 2 --enable-expert-parallel --quantization fp8 --gpu-memory-utilization 0.846 --kv-cache-dtype fp8 --swap-space 4 --enforce-eager --cpu-offload-gb 4 --model /root/models/Qwen3-235B-A22B-Instruct-2507-FP8 --served-model-name Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 root@ecs-99605001-024:~#

Attention! You need to replace mlnode-308 with the name of your node. If you forgot it, you can find it with the command:

Expected output- one of these:

You can list the names of all containers:

Expected output

GPU status

Expected response:

Check the declared config

Let's verify the node setup:

Shows the configuration with which your ML Node is running. Presumably run on the ML Node server (if they are separate).

circle-info

Attention! These parameters are given as an example. They are certainly outdated. Yours will be different.

Node check with stop

Stop the node

Expected response:

{"status":"OK"}

Check the node's state (status)

Expected response:

{"state":"STOPPED"}root@submodel-sxA100-19-14:~/gonka/deploy/join#

circle-info

If you see something else β€” repeat the stop step.

Run a forced node test

Expected output

Watch the test progress via logs

After a few minutes (usually 5–15) a final result should appear. After PoC completes:

Expected response:

It's important that CUDA is loaded to 100%

To exit the test press the combination CTRL+C

Enable the node

Expected output

{"message":"node enabled successfully","node_id":"node1"} root@submodel-sxA100-19-14:~/gonka/deploy/join#

Check the status of your node:

Expected response:

root@ecs-99605001-024:# curl http://localhost:8080/api/v1/state {"state":"INFERENCE"}root@ecs-99605001-024:#

Find out your node's PoV status:

Unexpected response:

"detail":"Cannot run POW because MLNode is currently in ServiceState.INFERENCE mode. Please stop ServiceState.INFERENCE first."}root@ecs-99605001-024:~#

What the "expected response" should be I don't know yet ))

Check containers

After startup, first make sure the parameters you put into your node-config.json are applied in mlnode

Start container logs mlnode

If you see the model loaded like in the screenshot, you can exit the container with the combination CTRL+C

Start container logs node

circle-info

Runs on the Network Node.

If the node wasn't synchronized, we should see blockchain "chunks" being pulled in

625 - total count, 160 - last loaded

exit the container with the combination CTRL+C

Check node synchronization with the network

Expected output

So this number should be small. It's the time in seconds since the last block was created.

Check the current network block

Check the block our node is on

I don't know how yet )

And compare. They should be close.

Checklist for entering the epoch

Helps to understand where to look for the problem.

Expected response:

The red arrow marks the check field that FAIL absolutely everyone has. This parameter PASS is present only on Gonka master nodes (I think).

The blue arrow points to the field that you may have FAIL - if you have not yet been in any epoch.

END

Last updated