轻量级 Embedding 模型
分为了四种类型:
- actions, IDE 的 Action
- files, 项目代码文件的路径名
- classes, 项目代码的 Class 的类名
- symbols, 项目代码的其它 Symbol
轻量级 Code completion 模型
JetBrains 采用的轻量级模型:
<models version="1">
<model>
<version>0.1.62-native-onnx</version>
<size>161538626</size>
<languages>
<language>python</language>
</languages>
<binary>flcc.model</binary>
<bpe>flcc.bpe</bpe>
<config>flcc.json</config>
<native>
<archive>full-line-inference.zip</archive>
<version>0.2.45</version>
</native>
<changelog>- 100M model for Python trained on Starcoder dataset, HF optimal training duration</changelog>
</model>
<model>
<version>0.1.35-native-onnx</version>
<size>161543029</size>
<languages>
<language>kotlin</language>
</languages>
<binary>flcc.model</binary>
<bpe>flcc.bpe</bpe>
<config>flcc.json</config>
<native>
<archive>full-line-inference.zip</archive>
<version>0.2.45</version>
</native>
<changelog>- Giga Brand new Kotlin model from hack, but quantize :)</changelog>
</model>
</models>
三种模式:LLAMA_NATIVE、ONNX_NATIVE、K_INFERENCE
CLI Examples:
usage: full-line-inference [options]
options:
[-p, --port] [args] - the port the server is listening on, if not specified the server will choose the port
[-m, --model] [args] - [required] path to the model file
[-t, --type] [args] - [required] model type, available values: [onnx, llama]
[-bs, --beam-size] [args] - [required] beam size for beam search
[-ic, --iterations-count] [args] - [required] number of iterations for beam search
[-cr, --context-ratio] [args] - [required] an initializing context will be trimmed to (cr * max_contex) symbols
[-mc, --min-cached] [args] - [required] minimum number of reused symbols from the cache
[-mm, --max-missing] [args] - [required] maximum number of trimmed symbols when using the cache
[-pp, --parent] [args] - parent process PID
[-l, --logging-off] - this flag disables logging
[-h, --help] - show this help message and exit
usage:
./full-line-inference -p 8080 -m flcc.model -t onnx -bs 5 -ic 2 -cr 0.5 -mc 5 -mm 30
onnx wrapper uses 2 thread(s)
vocab_size = 16384
n_ctx = 384
n_embd = 1024
n_head = 16
head_size = 64
n_layer = 6
Input Node Name/Shape (9):
input_ids , type - 7 , dim - 2 : [-1, -1]
position_ids , type - 7 , dim - 2 : [-1, -1]
attention_mask , type - 7 , dim - 2 : [-1, -1]
past_0 , type - 1 , dim - 5 : [2, -1, 16, -1, 64]
past_1 , type - 1 , dim - 5 : [2, -1, 16, -1, 64]
past_2 , type - 1 , dim - 5 : [2, -1, 16, -1, 64]
past_3 , type - 1 , dim - 5 : [2, -1, 16, -1, 64]
past_4 , type - 1 , dim - 5 : [2, -1, 16, -1, 64]
past_5 , type - 1 , dim - 5 : [2, -1, 16, -1, 64]
Output Node Name/Shape (7):
logits , type - 1 , dim - 3 : [-1, -1, 16384]
present_0 , type - 1 , dim - 5 : [2, -1, 16, -1, 64]
present_1 , type - 1 , dim - 5 : [2, -1, 16, -1, 64]
present_2 , type - 1 , dim - 5 : [2, -1, 16, -1, 64]
present_3 , type - 1 , dim - 5 : [2, -1, 16, -1, 64]
present_4 , type - 1 , dim - 5 : [2, -1, 16, -1, 64]
present_5 , type - 1 , dim - 5 : [2, -1, 16, -1, 64]
2023-11-03 14:54:11.93039 [watchdog.cpp:27:watch] parent process PID to control: 64475
2023-11-03 14:54:11.95493 [grpc_server.cpp:26:run] server is listening on localhost:8080
[CONTROL MESSAGE] PORT 8080