ragflow

Author	SHA1	Message	Date
Kevin Hu	3a99c2b5f4	Refa: PARALLEL_DEVICES is a static parameter. (#6168 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-03-17 16:49:54 +08:00
Kevin Hu	bfa8d342b3	Fix: retrieval debug mode issue. (#6150 ) ### What problem does this PR solve? #6139 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-17 13:07:13 +08:00
Debug Doctor	3e19044dee	Feat: add OCR's muti-gpus and parallel processing support (#5972 ) ### What problem does this PR solve? Add OCR's muti-gpus and parallel processing support ### Type of change - [x] New Feature (non-breaking change which adds functionality) @yuzhichang I've tried to resolve the comments in #5697. OCR jobs can now be done on both CPU and GPU. ( By the way, I've encountered a “Generate embedding error” issue #5954 that might be due to my outdated GPUs? idk. ) Please review it and give me suggestions. GPU: ![gpu_ocr](https://github.com/user-attachments/assets/0ee2ecfb-a665-4e50-8bc7-15941b9cd80e) ![smi](https://github.com/user-attachments/assets/a2312f8c-cf24-443d-bf89-bec50503546d) CPU: ![cpu_ocr](https://github.com/user-attachments/assets/1ba6bb0b-94df-41ea-be79-790096da4bf1)	2025-03-17 11:58:40 +08:00
Yongteng Lei	4ff609b6a8	Fix: optimize OCR garbage identification to reduce unnecessary filtering (#6027 ) ### What problem does this PR solve? Optimize OCR garbage identification to reduce unnecessary filtering. #5713 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-13 18:48:32 +08:00
Yongteng Lei	7cd37c37cd	Feat: add CSV file parsing support (#5989 ) ### What problem does this PR solve? Add CSV file parsing support #4552, #5849, #5870 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-12 19:20:50 +08:00
hy89	b0c21b00d9	Refactor: Optimize error handling and support parsing of XLS(EXCEL97—2003) files. (#5633 ) Optimize error handling and support parsing of XLS(EXCEL97—2003) files.	2025-03-05 11:55:27 +08:00
Kevin Hu	b418ce5643	Fix table parser issue. (#5482 ) ### What problem does this PR solve? #1475 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-28 16:09:12 +08:00
Kevin Hu	4f40f685d9	Code refactor (#5371 ) ### What problem does this PR solve? #5173 ### Type of change - [x] Refactoring	2025-02-26 15:40:52 +08:00
Kevin Hu	c28bc41a96	Fix docx table issue. (#5117 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-19 12:40:06 +08:00
Kevin Hu	c24137bd11	Fix too long integer for `Table`. (#4651 ) ### What problem does this PR solve? #4594 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-26 12:54:58 +08:00
Kevin Hu	9d717f0b6e	Fix csv reader exception. (#4628 ) ### What problem does this PR solve? #4552 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-24 14:47:19 +08:00
Kevin Hu	13f04b7cca	Fix pdf applying Q&A issue. (#4599 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-23 12:30:46 +08:00
Kevin Hu	dd0ebbea35	Light GraphRAG (#4585 ) ### What problem does this PR solve? #4543 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-22 19:43:14 +08:00
Jin Hai	3894de895b	Update comments (#4569 ) ### What problem does this PR solve? Add license statement. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-01-21 20:52:28 +08:00
Kevin Hu	f556f0239c	Fix dify retrieval issue. (#4473 ) ### What problem does this PR solve? #4464 #4469 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-14 13:16:05 +08:00
Kevin Hu	e098fcf6ad	Fix csv for TAG. (#4454 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-13 12:03:18 +08:00
Kevin Hu	c5da3cdd97	Tagging (#4426 ) ### What problem does this PR solve? #4367 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-09 17:07:21 +08:00
Yingfeng	50f209204e	Synchronize with enterprise version (#4325 ) ### Type of change - [x] Refactoring	2025-01-02 13:44:44 +08:00
Kevin Hu	8fb18f37f6	Code refactor. (#4291 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-12-30 18:38:51 +08:00
TeslaZY	dd13a5d05c	Fix some bugs in text2sql.(#4279 )(#4281 ) (#4280 ) Fix some bugs in text2sql.(#4279)(#4281) ### What problem does this PR solve? - The incorrect results in parsing CSV files of the QA knowledge base in the text2sql scenario. Process CSV files using the csv library. Decouple CSV parsing from TXT parsing - Most llm return results in markdown format ```sql query ```, Fix execution error caused by LLM output SQLmarkdown format.### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-30 10:32:19 +08:00
ly0303521	101b8ff813	fix chunk method "Table" losing content when the Excel file has multi… (#4123 ) …ple sheets ### What problem does this PR solve? discussed in https://github.com/infiniflow/ragflow/pull/4102 - In excel_parser.py, `total` means the total number of rows in Excel, but it return in the first iterate, that lead to the wrong `to_page` - In table.py, it when Excel file has multiple sheets, it will be divided into multiple parts, every part size is 3000, `data` may be empty, because it has recorded in the last iterate. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-19 17:30:26 +08:00
liuhua	1d65299791	Fix rerank_model bug in chat and markdown bug (#4061 ) ### What problem does this PR solve? Fix rerank_model bug in chat and markdown bug #4000 #3992 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>	2024-12-17 16:03:37 +08:00
Zhichang Yu	03f00c9e6f	Rename page_num_list, top_list, position_list (#3940 ) ### What problem does this PR solve? Rename page_num_list, top_list, position_list to page_num_int, top_int, position_int ### Type of change - [x] Refactoring	2024-12-10 16:32:58 +08:00
Kevin Hu	927873bfa6	Fix syn error. (#3953 ) ### What problem does this PR solve? Close #3696 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-10 10:54:54 +08:00
Zhichang Yu	0d68a6cd1b	Fix errors detected by Ruff (#3918 ) ### What problem does this PR solve? Fix errors detected by Ruff ### Type of change - [x] Refactoring	2024-12-08 14:21:12 +08:00
Jin Hai	821fdf02b4	Fix parsing JSON file error (#3829 ) ### What problem does this PR solve? Close issue: #3828 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-12-03 19:02:03 +08:00
Jin Hai	08c1a5e1e8	Refactor parse progress (#3781 ) ### What problem does this PR solve? Refactor parse file progress ### Type of change - [x] Refactoring Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-12-01 22:28:00 +08:00
Jin Hai	e079656473	Update progress info and start welcome info (#3768 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Refactoring --------- Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-11-30 18:48:06 +08:00
kuschzzp	e678819f70	Fix RGBA error (#3707 ) ### What problem does this PR solve? Passing cv_mdl.describe() is not an RGB converted image ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-28 13:09:02 +08:00
Zhichang Yu	bc701d7b4c	Edit chunk shall update instead of insert it (#3709 ) ### What problem does this PR solve? Edit chunk shall update instead of insert it. Close #3679 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-28 13:00:38 +08:00
Kevin Hu	609236f5c1	Let 'One' applicable for tables in docx (#3619 ) ### What problem does this PR solve? #3598 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Performance Improvement	2024-11-25 09:57:54 +08:00
Zhichang Yu	482c1b59c8	Check tika.parser return result (#3564 ) ### What problem does this PR solve? Check tika.parser return result. Close #3229 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2024-11-22 11:05:06 +08:00
Michal Masrna	c4f2464935	fix: laws.py added missing import logging (#3501 ) ### What problem does this PR solve? _Choosing Laws Chunk Method results in an error when parsing a document. The error is caused by a missing import in the `laws.py` file._ ``` Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 445, in handle_task do_handle_task(task) File "/ragflow/rag/svr/task_executor.py", line 384, in do_handle_task cks = build(r) ^^^^^^^^ File "/ragflow/rag/svr/task_executor.py", line 196, in build cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"], ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ragflow/rag/app/laws.py", line 161, in chunk for txt, poss in pdf_parser(filename if not binary else binary, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ragflow/rag/app/laws.py", line 124, in __call__ logging.debug("layouts:".format( ^^^^^^^ NameError: name 'logging' is not defined. Did you forget to import 'logging' ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: Michal Masrna <m.marna1@gmail.com>	2024-11-20 20:52:05 +08:00
Zhichang Yu	30f6421760	Use consistent log file names, introduced initLogger (#3403 ) ### What problem does this PR solve? Use consistent log file names, introduced initLogger ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [x] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2024-11-14 17:13:48 +08:00
Kevin Hu	83c6b1f308	set DLA active for KG (#3386 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-11-13 16:59:19 +08:00
Zhichang Yu	a2a5631da4	Rework logging (#3358 ) Unified all log files into one. ### What problem does this PR solve? Unified all log files into one. ### Type of change - [x] Refactoring	2024-11-12 17:35:13 +08:00
Zhichang Yu	f4c52371ab	Integration with Infinity (#2894 ) ### What problem does this PR solve? Integration with Infinity - Replaced ELASTICSEARCH with dataStoreConn - Renamed deleteByQuery with delete - Renamed bulk to upsertBulk - getHighlight, getAggregation - Fix KGSearch.search - Moved Dealer.sql_retrieval to es_conn.py ### Type of change - [x] Refactoring	2024-11-12 14:59:41 +08:00
Kevin Hu	f86826b7a0	refactor error message of qwen (#3074 ) ### What problem does this PR solve? #3055 ### Type of change - [x] Refactoring	2024-10-29 10:08:08 +08:00
Kevin Hu	1fce6caf80	make titles in markdown not be splited with following content (#2971 ) ### What problem does this PR solve? #2970 ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2024-10-22 15:25:23 +08:00
Kevin Hu	b540d41cdc	let presentation do raptor (#2838 ) ### What problem does this PR solve? #2837 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-10-15 10:11:09 +08:00
lidp	20e63f8ec4	Fix docx images (#2756 ) ### What problem does this PR solve? #2755 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-10-09 19:37:32 +08:00
yqkcn	570ad420a8	remove unused import (#2679 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-09-30 16:59:39 +08:00
Kevin Hu	fc867cb959	rename get_txt to get_text (#2649 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-09-29 12:47:09 +08:00
yqkcn	aea553c3a8	Add get_txt function (#2639 ) ### What problem does this PR solve? Add get_txt function to reduce duplicate code ### Type of change - [x] Refactoring --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2024-09-29 10:29:56 +08:00
yqkcn	34abcf7704	style: fix typo and format code (#2618 ) ### What problem does this PR solve? - Fix typo - Remove unused import - Format code ### Type of change - [x] Other (please describe): typo and format	2024-09-27 13:17:25 +08:00
Kevin Hu	78856703c4	make excel parsing configurable (#2517 ) ### What problem does this PR solve? #2516 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-09-20 15:33:38 +08:00
Kevin Hu	01acc3fd5a	fix duplicated llm name betweeen different suppliers (#2477 ) ### What problem does this PR solve? #2465 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-09-18 16:09:22 +08:00
黄腾	e4765ebe0c	add support for markdown file in one parse way (#2052 ) ### What problem does this PR solve? #2021 add support for markdown file in one parse way ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Zhedong Cen <cenzhedong2@126.com>	2024-08-22 15:32:35 +08:00
Jin Hai	6b3a40be5c	Format file format from Windows/dos to Unix (#1949 ) ### What problem does this PR solve? Related source file is in Windows/DOS format, they are format to Unix format. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2024-08-15 09:17:36 +08:00
Kevin Hu	d73a75506e	fix mind map bug (#1934 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-08-13 19:42:28 +08:00

1 2 3

144 Commits