Building NewsChrono: A Short-Form News Platform Powered by LLMs

3 minute read

Published on April 16, 2025

newschrono.com is a French short-form news platform I designed and developed end-to-end — from ideation and product design to backend architecture and production deployment.

The core idea behind NewsChrono is simple:

help users stay informed through concise, readable news summaries, while still enabling discovery of related topics and context.

Behind this simplicity sits a production-grade architecture that combines LLMs, semantic embeddings, serverless infrastructure, and scalable backend services. This article walks through the product vision, technical choices, and system design.

1. Product Vision: Short-Form, Not Shallow

Traditional news platforms overwhelm users with:

long articles,
redundant coverage,
noisy feeds.

NewsChrono focuses on:

short, high-signal summaries,
fast consumption,
semantic navigation across related stories.

The goal is not to replace journalism, but to optimize access to information for users who want to understand what matters quickly.

2. LLM-Based News Summarization

At the heart of NewsChrono is automatic article summarization.

Each full-length news article is processed using:

OpenAI LLM APIs,
prompt-engineered summarization logic,
consistent output formatting.

The summarization pipeline:

Ingests the raw article
Applies an LLM-based summarization prompt
Produces a concise, readable summary
Stores both raw and summarized content

This approach ensures:

consistent tone and length,
language clarity,
scalability across sources and topics.

Summarization is treated as a backend service, not a frontend feature.

Short-form content is more powerful when paired with contextual discovery.

To achieve this, each article is represented using:

LLM-generated embeddings,
a dense vector capturing semantic meaning.

Embedding Pipeline

For each article:

Generate an embedding using an LLM embedding model
Store the embedding vector in the database
Use it as a semantic descriptor of the article

Similarity Search

When displaying an article, NewsChrono retrieves:

the 4 most related articles,
based on cosine similarity between embeddings.

This is implemented using:

Firebase find_nearest
cosine distance as the similarity metric

The result is a recommendation system based on meaning, not keywords or categories.

4. Backend Services: Systemd-Managed Workers

Both:

article summarization,
and embedding generation + similarity indexing

are handled by a dedicated backend service, running as a systemd-managed process.

This design choice provides:

isolation from the web frontend,
reliable background processing,
controlled retries and monitoring.

The backend service is responsible for:

calling OpenAI APIs,
managing embeddings,
updating Firebase records,
triggering similarity indexing.

This separation keeps the system robust and maintainable.

5. Web Application Architecture

The NewsChrono web application is built as a:

FastAPI application
fully Dockerized
deployed on Google Cloud Run

Why FastAPI

clean API design,
async-friendly,
easy integration with backend services.

Why Cloud Run

serverless scaling,
zero infrastructure management,
cost efficiency,
fast iteration cycles.

Cloud Run allows the platform to scale automatically with traffic, without over-provisioning resources.

6. Firebase for Data Management

Firebase is used as the main data backend, handling:

article storage,
summaries,
embeddings,
metadata,
similarity queries.

Key benefits:

managed infrastructure,
tight integration with Cloud Run,
native support for vector similarity queries.

Firebase acts as both:

a traditional database,
and a lightweight semantic index.

7. End-to-End Data Flow

Putting it all together:

News article is ingested
Backend service summarizes it using an LLM
Embedding is generated and stored
Article is indexed for similarity search
Web app fetches summary + related articles
User consumes concise news with semantic context

Each component is loosely coupled but clearly defined.

8. From Ideation to Production

NewsChrono is not a prototype — it is a complete product lifecycle:

idea and positioning,
system design,
LLM integration,
backend services,
cloud deployment,
production operation.

Building it end-to-end required combining:

product thinking,
AI engineering,
backend architecture,
and cloud infrastructure.

Closing Thoughts

NewsChrono demonstrates how LLMs can be productized responsibly, not just experimented with.

By combining:

summarization,
semantic embeddings,
scalable backend services,
and serverless deployment,

the platform turns raw news streams into structured, digestible information.

This project reflects a broader belief:

AI is most powerful when it quietly improves how people access and understand information.

Share on

X Facebook LinkedIn Bluesky

Amine AYARI

Building NewsChrono: A Short-Form News Platform Powered by LLMs

1. Product Vision: Short-Form, Not Shallow

2. LLM-Based News Summarization

Embedding Pipeline

Similarity Search

4. Backend Services: Systemd-Managed Workers

5. Web Application Architecture

Why FastAPI

Why Cloud Run

6. Firebase for Data Management

7. End-to-End Data Flow

8. From Ideation to Production

Closing Thoughts

Share on

You May Also Enjoy

Serving LLMs in Production with vLLM

Launching LatentVideo: An End-to-End AI Video SaaS

Fast LLM Experimentation with Ollama

Optimizing and Deploying Stable Diffusion 1.5 in Production with ONNX, TensorRT and NVIDIA Triton

Amine AYARI

1. Product Vision: Short-Form, Not Shallow

2. LLM-Based News Summarization

3. Semantic Similarity: Finding Related Articles

Embedding Pipeline

Similarity Search

4. Backend Services: Systemd-Managed Workers

5. Web Application Architecture

Why FastAPI

Why Cloud Run

6. Firebase for Data Management

7. End-to-End Data Flow

8. From Ideation to Production

Closing Thoughts

Share on

You May Also Enjoy

Serving LLMs in Production with vLLM

Launching LatentVideo: An End-to-End AI Video SaaS

Fast LLM Experimentation with Ollama

Optimizing and Deploying Stable Diffusion 1.5 in Production with ONNX, TensorRT and NVIDIA Triton