1224 Commits

Author SHA1 Message Date
Valdanito
1fafdb8471
fix(API): fixed retrieval api parameters matching (#2550)
### What problem does this PR solve?

fixed /datasets/retrieval API:
KeyError('size') and 'doc_ids': ['Field may not be null.']

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-24 12:05:15 +08:00
Valdanito
5110a3ba90
refactor(API): Split SDK class to optimize code structure (#2515)
### What problem does this PR solve?

1. Split SDK class to optimize code structure
`ragflow.get_all_datasets()`  ===>     `ragflow.dataset.list()`
2. Fixed the parameter validation to allow for empty values.
3. Change the way of checking parameter nullness, Because even if the
parameter is empty, the key still exists, this is a feature from
[APIFlask](https://apiflask.com/schema/).

`if "parser_config" in json_data` ===> `if json_data["parser_config"]`


![image](https://github.com/user-attachments/assets/dd2a26d6-b3e3-4468-84ee-dfcf536e59f7)

4. Some common parameter error messages, all from
[Marshmallow](https://marshmallow.readthedocs.io/en/stable/marshmallow.fields.html)

Parameter validation configuration
```
    kb_id = fields.String(required=True)
    parser_id = fields.String(validate=validators.OneOf([parser_type.value for parser_type in ParserType]),
                              allow_none=True)
```

When my parameter is
```
kb_id=None,
parser_id='A4'
```

Error messages
```
{
    "detail": {
        "json": {
            "kb_id": [
                "Field may not be null."
            ],
            "parser_id": [
                "Must be one of: presentation, laws, manual, paper, resume, book, qa, table, naive, picture, one, audio, email, knowledge_graph."
            ]
        }
    },
    "message": "Validation error"
}
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-20 17:28:57 +08:00
Valdanito
82b46d3760
fix(API): fixed swagger docs error in nginx external port (#2509)
### What problem does this PR solve?

1. Fixed swagger docs error in nginx external port
2. Add retrieval api
3. Add documentation for SDK API

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
- [x] Refactoring
2024-09-20 11:30:13 +08:00
Valdanito
93114e4af2
API: fixed documentss API request data schema & fixed documentss API request data schema (#2480)
### What problem does this PR solve?

- fixed documentss API request data schema
- add documents sdk api tests

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-18 18:57:30 +08:00
Valdanito
5c777920cb
refactor(API): Refactor datasets API (#2439)
### What problem does this PR solve?

discuss:https://github.com/infiniflow/ragflow/issues/1102

#### Completed
1. Integrate API Flask to generate Swagger API documentation, through
http://ragflow_host:ragflow_port/v1/docs visit
2. Refactored http_token_auth
```
class AuthUser:
    def __init__(self, tenant_id, token):
        self.id = tenant_id
        self.token = token

    def get_token(self):
        return self.token


@http_token_auth.verify_token
def verify_token(token: str) -> Union[AuthUser, None]:
    try:
        objs = APIToken.query(token=token)
        if objs:
            api_token = objs[0]
            user = AuthUser(api_token.tenant_id, api_token.token)
            return user
    except Exception as e:
        server_error_response(e)
    return None

# resources api
@manager.auth_required(http_token_auth)
def get_all_datasets(query_data):
	....
```
3. Refactored the Datasets (Knowledgebase) API to extract the
implementation logic into the api/apps/services directory

![image](https://github.com/user-attachments/assets/ad1f16f1-b0ce-4301-855f-6e162163f99a)
4. Python SDK, I only added get_all_datasets as an attempt, Just to
verify that SDK API and Server API can use the same method.
```
from ragflow.ragflow import RAGFLow
ragflow = RAGFLow('<ACCESS_KEY>', 'http://127.0.0.1:9380')
ragflow.get_all_datasets()
```
5. Request parameter validation, as an attempt, may not be necessary as
this feature is already present at the data model layer. This is mainly
easier to test the API in Swagger Docs service
```
class UpdateDatasetReq(Schema):
    kb_id = fields.String(required=True)
    name = fields.String(validate=validators.Length(min=1, max=128))
    description = fields.String(allow_none=True)
    permission = fields.String(validate=validators.OneOf(['me', 'team']))
    embd_id = fields.String(validate=validators.Length(min=1, max=128))
    language = fields.String(validate=validators.OneOf(['Chinese', 'English']))
    parser_id = fields.String(validate=validators.OneOf([parser_type.value for parser_type in ParserType]))
    parser_config = fields.Dict()
    avatar = fields.String()
```

#### TODO

1. Simultaneously supporting multiple authentication methods, so that
the Web API can use the same method as the Server API, but perhaps this
feature is not important.
I tried using this method, but it was not successful. It only allows
token authentication when not logged in, but cannot skip token
authentication when logged in 😢
```
def http_basic_auth_required(func):
    @wraps(func)
    def decorated_view(*args, **kwargs):
        if 'Authorization' in flask_request.headers:
            # If the request header contains a token, skip username and password verification
            return func(*args, **kwargs)
        if flask_request.method in EXEMPT_METHODS or current_app.config.get("LOGIN_DISABLED"):
            pass
        elif not current_user.is_authenticated:
            return current_app.login_manager.unauthorized()

        if callable(getattr(current_app, "ensure_sync", None)):
            return current_app.ensure_sync(func)(*args, **kwargs)
        return func(*args, **kwargs)

    return decorated_view
```
2. Refactoring the SDK API using the same method as the Server API is
feasible and constructive, but it still requires time
I see some differences between the Web and SDK APIs, such as the
key_mapping handling of the returned results. Until I figure it out, I
cannot modify these codes to avoid causing more problems

```
    for kb in kbs:
        key_mapping = {
            "chunk_num": "chunk_count",
            "doc_num": "document_count",
            "parser_id": "parse_method",
            "embd_id": "embedding_model"
        }
        renamed_data = {}
        for key, value in kb.items():
            new_key = key_mapping.get(key, key)
            renamed_data[new_key] = value
        renamed_list.append(renamed_data)
    return get_json_result(data=renamed_list)
```

### Type of change

- [x] Refactoring
2024-09-18 14:53:59 +08:00
JobSmithManipulation
7195742ca5
rename create_timestamp_flt to create_timestamp_float (#2473)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2024-09-18 12:50:05 +08:00
JobSmithManipulation
62cb5f1bac
update document sdk (#2445)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-09-18 11:08:19 +08:00
Kevin Hu
e7dd487779
fix ppt file from filemanager error (#2470)
### What problem does this PR solve?

#2467

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-18 09:22:14 +08:00
Michał Kiełtyka
e41268efc6
Add Multi-Language Descriptions for 'Switch' Component and Update Message Assistant Placeholder (#2450)
### What problem does this PR solve?

_This PR addresses the need to describe the "Switch" component across
different languages and corrects a misleading description for a
placeholder message not exclusively tied to a specific assistant type.
By providing clearer and more accurate descriptions, this PR aims to
improve user understanding and usability of the Switch component and the
"Message Resume Assistant..." placeholder in a multilingual context._

### Explanation of Changes

1. **Added Descriptions for "Switch" Component**: 
- Descriptions were added for the "Switch" component in three different
locales:
- **English (EN)**: Provides a concise description of what the "Switch"
component does, focusing on its ability to evaluate conditions and
direct the flow of execution.
- **Simplified Chinese (ZH)**: Translated the English description into
Simplified Chinese to cater to users who prefer this locale.
- **Traditional Chinese (ZH-Traditional)**: Added a Traditional Chinese
version of the description to support users in regions that use
Traditional Chinese.
   
2. **Corrected "Message Resume Assistant..." to "Message the
Assistant..."**:
- Updated the description from "Message Resume Assistant..." to "Message
the Assistant..." in the English locale. This correction makes the
description more generic and accurate, reflecting the placeholder's
broader functionality, which is not limited to Resume Assistants. It now
clearly communicates that the placeholder can be used with various types
of assistants, not just those related to resumes.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2024-09-15 16:16:10 +08:00
balibabu
2f33ec7ad0
feat: When voice is turned on, the page will not display an empty reply message when the answer is empty #1877 (#2447)
### What problem does this PR solve?

feat: When voice is turned on, the page will not display an empty reply
message when the answer is empty #1877

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
v0.11.0
2024-09-14 18:39:13 +08:00
balibabu
3b1375ef99
feat: If the tts model is not set, the Text to Speech switch is not allowed to be turned on #1877 (#2446)
### What problem does this PR solve?

feat: If the tts model is not set, the Text to Speech switch is not
allowed to be turned on #1877

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-14 17:45:29 +08:00
Toro
2c05e6e6bd
Update and rename agentic_rag_introduction.md to agent_introduction.md (#2443)
### What problem does this PR solve?

#2441 

### Type of change


- [x] Documentation Update
2024-09-14 17:36:57 +08:00
Toro
8ccc696723
Update _category_.json (#2442)
### What problem does this PR solve?

#2441 

### Type of change

- [x] Documentation Update
2024-09-14 17:36:35 +08:00
balibabu
1621313c0f
feat: After the voice in the new conversation window is played, jump to the tab of the conversation #1877 (#2440)
### What problem does this PR solve?

feat: After the voice in the new conversation window is played, jump to
the tab of the conversation #1877

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-14 17:19:04 +08:00
Kevin Hu
b94c15ef1e
prepare document for release (#2438)
### What problem does this PR solve?


### Type of change

- [x] Documentation Update
2024-09-14 16:09:42 +08:00
Kevin Hu
8a16c8cc44
fix duplicate function name (#2437)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-14 16:04:02 +08:00
balibabu
b12a437a30
feat: Supports text output and sound output #1877 (#2436)
### What problem does this PR solve?

feat: Supports text output and sound output #1877

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-14 15:58:02 +08:00
balibabu
deeb950e1c
feat: Add html to the description text of the parsing method general #336 (#2432)
### What problem does this PR solve?

feat: Add html to the description text of the parsing method general
#336

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-14 15:18:34 +08:00
balibabu
6a0702f55f
feat: Display mindmap in drawer #2247 (#2430)
### What problem does this PR solve?

feat: Display mindmap in drawer #2247

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-14 14:42:36 +08:00
Kevin Hu
3044cb85fd
fix batch size error for qianwen embedding (#2431)
### What problem does this PR solve?

#2402

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-14 14:40:57 +08:00
Kevin Hu
d3262ca378
refine the warning message for rewrite component (#2429)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2024-09-14 14:17:03 +08:00
JobSmithManipulation
99a7c0fb97
update sdk document and chunk (#2421)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-09-14 13:24:21 +08:00
Vitaliy Groshev
7e75b9d778
fix parsing spaces in russian language PDFs (#1987) (#2427)
### What problem does this PR solve?

[#1987](https://github.com/infiniflow/ragflow/issues/1987)

When scanning PDF files character by character, the parser excluded
spaces if the string did not match regex. Text from [Russian
documents](https://github.com/user-attachments/files/16659706/dogovor_oferta.pdf)
needs spaces, but it does not match the regex because it uses different
alphabet. That's why PDFs were parsed incorrectly and were almost
unusable as source. Fixed that by adding Russian alphabet to regex.

There might be problems with other languages that use different
alphabets. I additionally tested [PDF in
Spanish](https://www.scusd.edu/sites/main/files/file-attachments/howtohelpyourchildsucceedinschoolspanish.pdf?1338307816)
and old [a-zA-Z...] regex parses it correctly with spaces.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-14 13:14:39 +08:00
writinwaters
a467f31238
minor (#2422)
### What problem does this PR solve?


### Type of change

- [x] Documentation Update
2024-09-14 09:34:35 +08:00
Kevin Hu
54342ae0a2
boost highlight performace (#2419)
### What problem does this PR solve?

#2415

### Type of change

- [x] Performance Improvement
2024-09-13 18:10:32 +08:00
writinwaters
bdcf195b20
Initial draft of Create a General-purpose chatbot (#2411)
### What problem does this PR solve?


### Type of change

- [x] Documentation Update
2024-09-13 17:21:03 +08:00
Kevin Hu
3f571a13c2
fix empty children in mindmap (#2418)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-13 17:19:47 +08:00
Kevin Hu
9d4bb5767c
make highlight friendly to English (#2417)
### What problem does this PR solve?

#2415

### Type of change

- [x] Performance Improvement
2024-09-13 17:03:51 +08:00
Kevin Hu
5e7b93e802
add updates for README (#2413)
### What problem does this PR solve?



### Type of change

- [x] Documentation Update
2024-09-13 14:31:04 +08:00
balibabu
ec4def9a44
feat: When the mindmap data is empty, it will not be displayed on the search page #2247 (#2414)
### What problem does this PR solve?

feat: When the mindmap data is empty, it will not be displayed on the
search page #2247
### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2024-09-13 14:30:51 +08:00
balibabu
2bd71d722b
feat: Modify the style of the answer card on the search page #2247 (#2412)
### What problem does this PR solve?

feat: Modify the style of the answer card on the search page #2247

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-13 12:31:31 +08:00
balibabu
8f2c0176b4
feat: Use Spin to wrap the chunk list on the search page #2247 (#2409)
### What problem does this PR solve?

feat: Use Spin to wrap the chunk list on the search page #2247
### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2024-09-13 11:38:09 +08:00
Kevin Hu
b261b6aac0
fix pip install error (#2407)
### What problem does this PR solve?

#2356

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-13 10:06:54 +08:00
balibabu
cbdf54cf36
feat: Click on the chunk on the search page to locate the corresponding file location #2247 (#2399)
### What problem does this PR solve?

feat: Click on the chunk on the search page to locate the corresponding
file location #2247

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-13 08:54:26 +08:00
balibabu
db0606e064
feat: Wrap the searched chunk with a Popover #2247 (#2398)
### What problem does this PR solve?

feat: Wrap the searched chunk with a Popover #2247

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-12 19:15:44 +08:00
lidp
cfae63d107
Add RAGFlow benchmark (#2387)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-12 19:01:00 +08:00
lidp
88f8c8ed86
Fix volcengine yfinance confliction (#2386)
### What problem does this PR solve?

#2379 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-12 19:00:35 +08:00
balibabu
4158697fe6
feat: Add component AkShare #1739 (#2390)
### What problem does this PR solve?

 feat: Add component AkShare #1739

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-12 17:58:05 +08:00
balibabu
5f9cb16a3c
feat: Add component WenCai #1739 (#2388)
### What problem does this PR solve?

feat: Add component WenCai #1739

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-12 17:51:43 +08:00
Kevin Hu
4730145696
debug backend API for TAB 'search' (#2389)
### What problem does this PR solve?
#2247

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-12 17:51:20 +08:00
balibabu
68d0210e92
feat: Use Tree to display the knowledge base list on the search page #2247 (#2385)
### What problem does this PR solve?

feat: Use Tree to display the knowledge base list on the search page
#2247
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-12 17:23:32 +08:00
Fachuan Bai
f8e9a0590f
Common: Support postgreSQL database as the metadata db. (#2357)
https://github.com/infiniflow/ragflow/issues/2356

### What problem does this PR solve?

As title

### Type of change

- [X] New Feature (non-breaking change which adds functionality)
2024-09-12 15:12:39 +08:00
liuhua
ba834aee26
Add a default value for do_refer in Dialog (#2383)
### What problem does this PR solve?

Add a default value for do_refer in Dialog

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
2024-09-12 15:11:57 +08:00
balibabu
983540614e
feat: Cover the entire search page with a background image #2247 (#2381)
### What problem does this PR solve?

feat: Cover the entire search page with a background image #2247

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-12 14:20:04 +08:00
JobSmithManipulation
6722b3d558
update sdk document (#2374)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-09-12 14:19:45 +08:00
balibabu
6000c3e304
feat: Catching errors with IndentedTree #2247 (#2380)
### What problem does this PR solve?

feat: Catching errors with IndentedTree #2247

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-12 13:34:33 +08:00
Kevin Hu
333608a1d4
add search TAB backend api (#2375)
### What problem does this PR solve?
 #2247

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-11 19:49:18 +08:00
balibabu
8052cbc70e
feat: Retrieval chunks by page #2247 (#2373)
### What problem does this PR solve?

feat: Retrieval chunks by page #2247

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-11 19:48:11 +08:00
Kevin Hu
b0e0e1fdd0
fix json error (#2372)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-11 19:10:49 +08:00
balibabu
8e3228d461
feat: Catch errors in getting mindmap #2247 (#2368)
### What problem does this PR solve?

feat: Catch errors in getting mindmap #2247

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-09-11 16:19:14 +08:00