Distribute computation by sharing and loadbalancing instances or sharding model weights
LocalAI uses P2P technologies to enable distribution of work between peers. It is possible to share an instance with Federation and/or split the weights of a model across peers (only available with llama.cpp models). You can now share computational resources between your devices or your friends!
The network token can be used to either share the instance or join a federation or a worker network. Below you will find examples on how to start a new instance or a worker with this token.
Federated Nodes:
You can start LocalAI in federated mode to share your instance, or start the federated server to balance requests between nodes of the federation.
# Start a new instance to share with --federated and a TOKEN
export TOKEN="b3RwOgogIGRodDoKICAgIGludGVydmFsOiAzNjAKICAgIGtleTogVXpoVjdLSDhiV1ZQRmlZZmJPdFpzNWtMSGJIVGY4VHp0cWVXSDJBc3hhSwogICAgbGVuZ3RoOiA0MwogIGNyeXB0bzoKICAgIGludGVydmFsOiA5MDAwCiAgICBrZXk6IEVSRzh5Qmx2d2RjclBEc1ZVeVptQlhOZkpmbzBhNDRReDdoejVRUEtnbGwKICAgIGxlbmd0aDogNDMKcm9vbTogVEx0N05Mc2s5R1c5MGNBd0lud0ZlaU1md2tlTzlRTFFsVVhBTXA2djZINgpyZW5kZXp2b3VzOiA1c1B6TUI5RUhSOGtZckg1UW9pUElqY3cyaHRxSU5NcTBaeFYyMXNhTk1QCm1kbnM6IEEwa0d3WmVldjFFNGJGdFdUWXFYa1IzNThnWGlzSERIV1FLM015dEpTeUcKbWF4X21lc3NhZ2Vfc2l6ZTogMjA5NzE1MjAK"
local-ai run --federated --p2p
Note: If you don't have a token do not specify it and use the generated one that you can find in this page.
You can start llama.cpp workers to distribute weights between the workers and offload part of the computation. To start a new worker, you can use the CLI or Docker.