Skip to content
This repository has been archived by the owner on Nov 7, 2018. It is now read-only.
This repository has been archived by the owner on Nov 7, 2018. It is now read-only.

Data nodes failing to restart #239

Open
@gcyre

Description

Hello

I've followed the instructions to setup a ES cluster in our test k8s cluster and have run into a couple of issues with the data nodes.

The first issue is with creating the 2 data nodes, I'm not able to create 2 data pods on a single k8s worker node. In my es-data.yaml I've updated it to use a PVC on an external SAN. When the second pod comes up there's an error in the logs

org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: failed to obtain node locks, tried [[/data/data/myesdb]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:140) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:127) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.3.2.jar:6.3.2]
	at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:86) ~[elasticsearch-6.3.2.jar:6.3.2]
Caused by: java.lang.IllegalStateException: failed to obtain node locks, tried [[/data/data/myesdb]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?
	at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:243) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.node.Node.<init>(Node.java:270) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.node.Node.<init>(Node.java:252) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.3.2.jar:6.3.2]

The second issue is when the pod dies and get restarted by kubernetes. When the pod comes up there are errors relating to lock files

[2018-11-05T21:01:32,426][WARN ][o.e.i.e.Engine           ] [es-data-6c487c6b77-nfqv9] [logstash-2018.10.28][2] could not lock IndexWriter
org.apache.lucene.store.LockObtainFailedException: Lock held by another program: /data/data/nodes/0/indices/3V6MF-EXTiOJ-yO93SVX8g/2/index/write.lock
	at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:130) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:104) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:948) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:1939) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:1930) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:191) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:157) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:2160) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:2142) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1349) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1304) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:420) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1575) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$5(IndexShard.java:2028) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:626) [elasticsearch-6.3.2.jar:6.3.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]

Should I be creating the pods as Statefulsets?
Am I missing something in the configuration?

any help would be appreciated

thanks
Garry

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions