Automatically generated documentation; use the command ./gradlew :docs-builder:run
and update comments in the source code to reflect changes.
Workflow
RAGScript 是一个使用 Kotlin DSL 的脚本语言,以用于快速使用、构建 RAG (检索增强,Retrieval Augmented Generation)应用的 PoC。
适用场景:安装有 Intellij IDEA、Kotlin Jupyter 环境或者 Kotlin 编译器的开发环境。
Sample: 代码语义化搜索
rag {
val apiKey = env?.get("OPENAI_API_KEY") ?: ""
val apiHost = env?.get("OPENAI_API_HOST") ?: ""
llm = LlmConnector(LlmType.OpenAI, apiKey, apiHost)
embedding = EmbeddingEngine(EngineType.SentenceTransformers)
store = Store(StoreType.Elasticsearch)
indexing {
val cliUrl = "https://github.com/archguard/archguard/releases/download/v2.0.7/scanner_cli-2.0.7-all.jar"
val file = Http.download(cliUrl)
var outputFile = File("0_codes.json");
if (!outputFile.exists()) {
outputFile = Exec.runJar(
file, args = listOf(
"--language", "Kotlin",
"--output", "json",
"--path", ".",
"--with-function-code"
)
).also {
File("0_codes.json")
}
}
// todo: use dataframe to parse json
val splitter = CodeSplitter()
val chunks: List<Document> = Json.decodeFromString<List<CodeDataStruct>>(outputFile.readText())
.map(splitter::split).flatten()
store.indexing(chunks)
}
querying {
val results = store.findRelevant("workflow dsl design ")
val sorted = results
.lowInMiddle()
llm.completion {
"""根据用户的问题,总结如下的代码
|${sorted.joinToString("\n") { "${it.score} ${it.embedded.text}" }}
|
|用户的问题是:如何设计一个 DSL 的 workflow
""".trimMargin()
}.also {
println(it)
}
}
}
Sample: 最简洁的 RAG 示例
rag("code") {
// 使用 OpenAI 作为 LLM 引擎
llm = LlmConnector(LlmType.OpenAI)
// 使用 SentenceTransformers 作为 Embedding 引擎
embedding = EmbeddingEngine(EngineType.SentenceTransformers)
// 使用 Memory 作为 Retriever
store = Store(StoreType.Memory)
indexing {
// 从文件中读取文档
val document = document("filename.txt")
// 将文档切割成 chunk
val chunks = document.split()
// 建立索引
store.indexing(chunks)
}
querying {
// 查询
store.findRelevant("workflow dsl design ").lowInMiddle().also {
println(it)
}
}
}
Sample: 最短 RAG 示例
rag {
indexing {
val chunks = text("fun main(args: Array<String>) {\n println(\"Hello, World!\")\n}").split()
store.indexing(chunks)
}
querying {
store.findRelevant("Hello World").also {
println(it)
}
}
}
document
document
function for provide document split for indexing, will auto-detect a file type. support: txt, pdf, doc, docx, xls, xlsx, ppt, pptx for example:
// 从文件中读取文档
val document = document("filename.txt")
// 将文档切割成 chunk
val chunks = document.split()
directory
Directory is a function for indexing data for the workflow. for example:
val docs = directory("docs")
val chunks = document.split()
code
TODO: code
function for provide code split for indexing.
text
text
function for provide text split for indexing. for example:
val chunks = text("fun main(args: Array<String>) {\n println(\"Hello, World!\")\n}").split()
prepare
Prepare is a function for preparing data for the workflow. You don’t need to call it as block.
indexing
Indexing is a function block for indexing data for the workflow. You don’t need to call it as block. for example:
indexing {
// 从文件中读取文档
val document = document("filename.txt")
// 将文档切割成 chunk
val chunks = document.split()
// 建立索引
store.indexing(chunks)
}
querying
querying is a function block for querying data for the workflow. you don’t need to call it as block. for example:
querying {
// 查询
store.findRelevant("workflow dsl design ").lowInMiddle().also {
println(it)
}
}
problem
Problem space is a function for defining the problem.
solution
Solution space is a function for defining the solution.
step
Step is for tagging function block only.