Method

This site uses a verifiable lightweight information pipeline: collect AI-related papers from arXiv by source date, deduplicate them, rank them with rule-based signals, and render compact bilingual daily briefs.

Collection

The default collector covers AI-related arXiv categories including cs.AI, cs.CL, cs.LG, cs.CV, cs.MA, cs.IR, cs.RO, cs.SD, cs.MM, cs.HC, cs.SE, cs.DC, stat.ML, and eess.AS. The production source date is normally two days before the publication date so issues do not reuse data across dates.

Ranking Signals

Keywords in titles and abstracts, including institutions, conferences, code, deployment, inference, agents, RAG, safety, and evaluation.
arXiv category weights and topic classification to surface core AI directions.
Deduplication and recent-topic penalties to reduce repeated papers and overly similar consecutive issues.
Safety, ethics, and governance keywords receive additional visibility for model-risk tracking.

Cadence

GitHub Actions runs on a Beijing/Taipei schedule. The public issue date is the publication date, while the paper source date is two days earlier. Production runs do not silently fall back to older data; if the strict source date has no usable papers, the run should fail or skip publication explicitly.

Transparency

The frontend presents only the compact brief. Machine-readable details, run reports, and QA results remain in the data directory for inspecting collection scale, pagination, dedupe, date binding, and generation status.

Limits

Briefs are generated from titles, abstracts, and public metadata by default. They do not replace full-paper reading, and arXiv preprints must not be described as verified conclusions or conference acceptances.