Method

This project implements a verifiable information pipeline: collect a broad paper pool from primary sources, rank it with a multi-signal rules engine, and generate compact bilingual daily briefs.

Collection

The default collector covers AI-related arXiv categories including cs.AI, cs.CL, cs.LG, cs.CV, cs.MA, cs.IR, cs.RO, cs.SD, cs.MM, cs.HC, cs.SE, cs.DC, stat.ML, and eess.AS. Categories and limits are configuration-driven.

Ranking Signals

Cadence

The production pipeline computes the publication date in Beijing/Taipei time and fetches the target date, or the nearest usable date when arXiv has no usable rows. Without external API keys, production runs continue with arXiv metadata; mock-run is reserved for fixed demo data and offline validation.

Transparency

The frontend presents only the compact brief. Machine-readable details, run reports, and QA results remain in the data directory for inspecting collection scale, pagination, dedupe, and fallback behavior.

Limits

Briefs are generated from titles, abstracts, and public metadata by default. They do not replace full-paper reading, and arXiv preprints must not be described as verified conclusions or conference acceptances.