Node. Testing

Tests you can run to check that the setup is correct.

General information

It's really frustrating when you think you've done everything right but still can't enter the epoch.

Better check everything several times.

Attention! The text may contain errors because I'm not very familiar with server configuration. More precisely, "very unfamiliar," but these tests helped me.

Check that the keys are installed correctly

Sometimes multiple people configure nodes or you accidentally used the wrong command to assign the Consensus Public Key, resulting in different keys on the Node and in the Network.

That will definitely keep you out of the epoch. Check!

Consensus Key check

🔎 STEP 1. Find the Consensus Public Key on the node

Attention! This check runs on the server where the ML Node is located (or the Network Node; I haven't fully figured this out yet because both nodes are on the same server for me).

docker exec node wget -qO- "http://127.0.0.1:26657/status" | jq -r '.result.validator_info.pub_key.value'

you will get something like:

{
  "value": "AD+NQncKPBzqw0u8KcSmlIMqogg7i4nhDfLIgIkGYiY="
}

👉 Copy the field "value".+

🔎 STEP 2. Find the Consensus Public Key in the network

Attention! This check runs on the server where you created the keys — not on the Network Node or the ML Node.

Now see what the network thinks your key is:

curl -s http://node2.gonka.ai:8000/chain-api/productscience/inference/inference/participant/gonka1yplcem8kfe6vm06t4sl8fskm0we2zslxxu90ta | jq

Attention! Replace the bolded part with the address of your Hot key.

You will get:

"AD+NQncKPBzqw0u8KcSmlIMqogg7i4nhDfLIgIkGYiY="

As a result you will receive a response like this:

{ "participant": { "index": "gonka1yplcem8kfe6vm06t4sl8fskm0we2zslxxu90ta", "address": "gonka1yplcem8kfe6vm06t4sl8fskm0we2zslxxu90ta", "weight": -1, "join_time": "1771876365572", "join_height": "2792955", "last_inference_time": "0", "inference_url": "http://203.168.252.195:8000", "status": "ACTIVE", "coin_balance": "0", "validator_key": "7GEr4jV5GjCv+C+jKOq3Eh4bwxMVs7kafm7tcWP0EOo=", "consecutive_invalid_inferences": "0", "worker_public_key": "", "epochs_completed": 0, "current_epoch_stats": { "inference_count": "0", "missed_requests": "0", "earned_coins": "0", "rewarded_coins": "0", "burned_coins": "0", "validated_inferences": "0", "invalidated_inferences": "0", "invalidLLR": { "value": "0", "exponent": 0 }, "inactiveLLR": { "value": "0", "exponent": 0 }, "confirmationPoCRatio": null }

We're interested in the value "validator_key".

🔎 STEP 3. Compare them. They must be identical

They should match. Ours do not. No wonder we aren't entering the epoch ))

Reasons for this mismatch can vary. I think you'll figure out how to fix it yourself.

How to fix it: I believe you'll manage. It's not hard.

---------------------------------------------------------------------------

Find out which model is on your node

curl http://localhost:5000/v1/models

Attention! If nothing appears, your ML Node might be on a different port. Possible options:

5000
8000
8080
9200

i.e., just replace that number in the command.

Expected response:

root@mlnode-308:/app# curl http://localhost:5000/v1/models {"object":"list","data":[{"id":"Qwen/Qwen3-235B-A22B-Instruct-2507-FP8","object":"model","created":1772106402,"owned_by":"vllm","root":"/root/models/Qwen3-235B-A22B-Instruct-2507-FP8","parent":null,"max_model_len":240000,"permission":[{"id":"modelperm-f9056e19f4b1494c9854c8df9887394b","object":"model_permission","created":1772106402,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}root@mlnode-308:/app#

Attention! After running that command you will enter the Docker container. To continue working on the server command line, exit the container with: exit

Find out the Node configuration

docker exec -it mlnode-308 cat /proc/238/cmdline | tr '\0' ' '

Expected response:

/usr/bin/python3.12 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 --dtype float16 --port 5001 --host 0.0.0.0 --max-model-len 240000 --enable-auto-tool-choice --tool-call-parser hermes --tensor-parallel-size 4 --pipeline-parallel-size 2 --enable-expert-parallel --quantization fp8 --gpu-memory-utilization 0.846 --kv-cache-dtype fp8 --swap-space 4 --enforce-eager --cpu-offload-gb 4 --model /root/models/Qwen3-235B-A22B-Instruct-2507-FP8 --served-model-name Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 root@ecs-99605001-024:~#

Attention! You need to replace mlnode-308 with the name of your node. If you forgot it, you can find it with the command:

docker compose ps

Expected output- one of these:

You can list the names of all containers:

docker ps --format '{{.Names}}'

Expected output

GPU status

curl http://localhost:8080/api/v1/gpu/devices | jq

Expected response:

Check the declared config

Let's verify the node setup:

Shows the configuration with which your ML Node is running. Presumably run on the ML Node server (if they are separate).

Attention! These parameters are given as an example. They are certainly outdated. Yours will be different.

Node check with stop

Stop the node

curl -sS -X POST "http://127.0.0.1:8080/api/v1/stop" -H "Content-Type: application/json"
sleep 3

Expected response:

{"status":"OK"}

Check the node's state (status)

curl -sS "http://127.0.0.1:8080/api/v1/state"

Expected response:

{"state":"STOPPED"}root@submodel-sxA100-19-14:~/gonka/deploy/join#

If you see something else — repeat the stop step.

Run a forced node test

BLOCK_HEIGHT=$(curl -s "http://node1.gonka.ai:8000/chain-rpc/status" | jq -r '.result.sync_info.latest_block_height')
BLOCK_HASH=$(curl -s "http://node1.gonka.ai:8000/chain-rpc/status" | jq -r '.result.sync_info.latest_block_hash')

curl -sS -X POST "http://127.0.0.1:8080/api/v1/pow/init/generate" \
  -H "Content-Type: application/json" \
  -d "{
  \"node_id\": 0,
  \"node_count\": 1,
  \"block_hash\": \"D3470A4DDA3D4173BE7C7A55AF52323C19CAC8307DB492CFD004D0C83561068B\",
  \"block_height\": 2753612,
  \"public_key\": \"7GEr4jV5GjCv+C+jK0q3Eh4bwxMVs7kafm7tcWP0E0O=\",
  \"batch_size\": 1,
  \"r_target\": 10.0,
  \"fraud_threshold\": 0.01,
  \"params\": {
    \"dim\": 1792,
    \"n_layers\": 64,
    \"n_heads\": 64,
    \"n_kv_heads\": 64,
    \"vocab_size\": 8196,
    \"ffn_dim_multiplier\": 10.0,
    \"multiple_of\": 8192,
    \"norm_eps\": 1e-5,
    \"rope_theta\": 10000.0,
    \"use_scaled_rope\": false,
    \"seq_len\": 256
  },
  \"url\": \"http://api:9100/v1/poc-batches\"
}"
echo

Expected output

Watch the test progress via logs

docker logs mlnode-308 --tail 50 -f

After a few minutes (usually 5–15) a final result should appear. After PoC completes:

Expected response:

It's important that CUDA is loaded to 100%

To exit the test press the combination CTRL+C

Enable the node

curl -sS -X POST "http://127.0.0.1:9200/admin/v1/nodes/node1/enable" -H "Content-Type: application/json"

Expected output

{"message":"node enabled successfully","node_id":"node1"} root@submodel-sxA100-19-14:~/gonka/deploy/join#

Check the status of your node:

curl http://localhost:8080/api/v1/state

Expected response:

root@ecs-99605001-024:# curl http://localhost:8080/api/v1/state {"state":"INFERENCE"}root@ecs-99605001-024:#

Find out your node's PoV status:

curl http://localhost:8080/api/v1/pow/status

Unexpected response:

"detail":"Cannot run POW because MLNode is currently in ServiceState.INFERENCE mode. Please stop ServiceState.INFERENCE first."}root@ecs-99605001-024:~#

What the "expected response" should be I don't know yet ))

Check containers

After startup, first make sure the parameters you put into your node-config.json are applied in mlnode

Start container logs mlnode

docker logs -f --tail=200 mlnode-308

If you see the model loaded like in the screenshot, you can exit the container with the combination CTRL+C

Start container logs node

docker logs -f --tail=200 node

Runs on the Network Node.

If the node wasn't synchronized, we should see blockchain "chunks" being pulled in

625 - total count, 160 - last loaded

exit the container with the combination CTRL+C

Check node synchronization with the network

curl -s "http://127.0.0.1:9200/admin/v1/setup/report" | jq '.checks[] | select(.id == "block_sync")'

Expected output

So this number should be small. It's the time in seconds since the last block was created.

Check the current network block

curl -sS http://node1.gonka.ai:8000/chain-rpc/status | jq -r '.result.sync_info.latest_block_height'

Check the block our node is on

I don't know how yet )

And compare. They should be close.

Checklist for entering the epoch

Helps to understand where to look for the problem.

curl -s http://localhost:9200/admin/v1/setup/report | jq '.checks[] | {id,status,message}'

Expected response:

The red arrow marks the check field that FAIL absolutely everyone has. This parameter PASS is present only on Gonka master nodes (I think).

The blue arrow points to the field that you may have FAIL - if you have not yet been in any epoch.

hashtagGeneral information

hashtagCheck that the keys are installed correctly

hashtagConsensus Key check

hashtag🔎 STEP 1. Find the Consensus Public Key on the node

hashtag🔎 STEP 2. Find the Consensus Public Key in the network

hashtag🔎 STEP 3. Compare them. They must be identical

hashtagFind out which model is on your node

hashtagFind out the Node configuration

hashtagGPU status

hashtagCheck the declared config

hashtagNode check with stop

hashtagStop the node

hashtagCheck the node's state (status)

hashtagRun a forced node test

hashtagWatch the test progress via logs

hashtagEnable the node

hashtagCheck the status of your node:

hashtagCheck containers

hashtagStart container logs mlnode

hashtagStart container logs node

hashtagCheck node synchronization with the network

hashtagCheck the current network block

hashtagCheck the block our node is on

hashtagChecklist for entering the epoch

hashtagLinks

General information

Check that the keys are installed correctly

Consensus Key check

🔎 STEP 1. Find the Consensus Public Key on the node

🔎 STEP 2. Find the Consensus Public Key in the network

🔎 STEP 3. Compare them. They must be identical

Find out which model is on your node

Find out the Node configuration

GPU status

Check the declared config

Node check with stop

Stop the node

Check the node's state (status)

Run a forced node test

Watch the test progress via logs

Enable the node

Check the status of your node:

Check containers

Start container logs mlnode

Start container logs node

Check node synchronization with the network

Check the current network block

Check the block our node is on

Checklist for entering the epoch

Links