To be refined in separate ontology session (2026-Q2). This document proposes conceptual design and pilot scope.
Knowledge Feature Store Expansion
Problem Definition: Why Feature Store Alone Is Insufficient
Traditional Feature Stores (Feast, SageMaker Feature Store, Tecton) are optimized for efficiently providing scalar values and embedding vectors. However, in Agentic AI environments, the following limitations emerge:
Traditional Feature Store Limitations
Specific Problem Cases:
-
Absence of Entity Relations → Hallucinations
- Question: "What devices are connected to customer A's recent contracts?"
- Traditional FS: Returns customer embedding and contract embedding separately
- Result: LLM connects unrelated devices causing hallucination
- Required:
(Customer)-[:HAS_CONTRACT]->(Contract)-[:USES]->(Device)relationship
-
Absence of Ontology → Domain Term Misunderstanding
- Question: "Usage patterns of Premium grade users"
- Traditional FS: Treats 'Premium' as simple string
- Result: Cannot understand relationships with 'VIP', 'Gold', 'Platinum'
- Required:
Premium subClassOf HighValueCustomer,VIP equivalentTo Premiumdefinition
-
Absence of Provenance → Audit Failure
- Requirement: "What is the data source for this answer?"
- Traditional FS: Only provides vector similarity, cannot track source data
- Result: Compliance (SOC2, GDPR) failure
- Required: Feature → Raw Data → Source System → Timestamp chain
-
Absence of Temporal Relationships → Context Errors
- Question: "Prior usage patterns of customers who churned in 2025 Q4"
- Traditional FS: Only supports point-in-time queries
- Result: Cannot connect relationships before and after churn
- Required: Temporal edge
BEFORE,AFTERrelationship
Knowledge Feature Store Conceptual Model
Knowledge Feature Store (KFS) extends the traditional Feature Store with a 3-plane architecture, adding relationships and semantics to scalar/vector data.
3-Plane Architecture
Role of Each Plane
| Plane | Responsibility | Data Format | Read Latency | Example Query |
|---|---|---|---|---|
| Feature Plane | Provide Scalar/Vector features | Parquet, Protobuf | <10ms | get_features(entity_id, feature_names) |
| Knowledge Plane | Entity Relations·Ontology | RDF, Property Graph | <50ms | traverse(Customer, depth=2, relation='HAS_CONTRACT') |
| Retrieval Plane | Vector search + graph expansion | HNSW Index, Cypher | <100ms | hybrid_search(query_embedding, kg_expand=True) |
Unified Read API
from kfs import KnowledgeFeatureStore
kfs = KnowledgeFeatureStore(
feature_store="feast://cluster.local",
knowledge_graph="neptune://cluster.amazonaws.com",
vector_store="milvus://milvus.svc.cluster.local:19530"
)
# Unified query: Vector search + graph expansion + feature loading
result = kfs.retrieve(
query="Recent usage patterns of Premium grade users",
retrieval_config={
"vector_top_k": 10,
"graph_expand": {
"depth": 2,
"relations": ["HAS_CONTRACT", "USES_DEVICE"]
},
"features": ["usage_last_30d", "churn_risk_score"]
}
)
# Result:
# - contexts: 10 documents found by vector search
# - entities: Entity nodes connected by graph expansion
# - features: Scalar/vector features of each entity
# - provenance: Source and timestamp of each data point
Ontology Schema and Entity Interpretation
Domain Ontology Definition
Defines domain entities in Agentic AI Platform (Customer, Contract, Device, Usage) using SKOS/OWL-lite subset.
@prefix kfs: <http://platform.ai/ontology/kfs#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
# Core Entities
kfs:Customer a owl:Class ;
skos:prefLabel "Customer"@en, "고객"@ko ;
skos:definition "Individual or organization using the service" .
kfs:Contract a owl:Class ;
skos:prefLabel "Contract"@en, "계약"@ko ;
skos:definition "Service contract with customer" .
kfs:Device a owl:Class ;
skos:prefLabel "Device"@en, "디바이스"@ko ;
skos:definition "Device for service delivery" .
kfs:Usage a owl:Class ;
skos:prefLabel "Usage"@en, "이용"@ko ;
skos:definition "Service usage event" .
# Relationship definition
kfs:hasContract a owl:ObjectProperty ;
rdfs:domain kfs:Customer ;
rdfs:range kfs:Contract ;
skos:prefLabel "Has contract"@en .
kfs:usesDevice a owl:ObjectProperty ;
rdfs:domain kfs:Contract ;
rdfs:range kfs:Device ;
skos:prefLabel "Uses device"@en .
kfs:recordedUsage a owl:ObjectProperty ;
rdfs:domain kfs:Device ;
rdfs:range kfs:Usage ;
skos:prefLabel "Usage record"@en .
# Attributes definition
kfs:customerGrade a owl:DatatypeProperty ;
rdfs:domain kfs:Customer ;
rdfs:range xsd:string ;
skos:prefLabel "Customer grade"@en .
kfs:churnRisk a owl:DatatypeProperty ;
rdfs:domain kfs:Customer ;
rdfs:range xsd:float ;
skos:prefLabel "Churn risk"@en .
# Grade Hierarchy (SKOS Concept Scheme)
kfs:CustomerGradeScheme a skos:ConceptScheme ;
skos:prefLabel "Customer grade system"@en .
kfs:Premium a skos:Concept ;
skos:inScheme kfs:CustomerGradeScheme ;
skos:prefLabel "Premium"@en, "Premium"@ko ;
skos:broader kfs:HighValue .
kfs:VIP a skos:Concept ;
skos:inScheme kfs:CustomerGradeScheme ;
skos:exactMatch kfs:Premium ;
skos:prefLabel "VIP"@en .
kfs:HighValue a skos:Concept ;
skos:inScheme kfs:CustomerGradeScheme ;
skos:prefLabel "High-value customer"@ko .
Managed vs Open Source Options
| Implementation | Managed Option | Open Source Option | Selection Criteria |
|---|---|---|---|
| Knowledge Graph | Amazon Neptune Analytics | Neo4j, JanusGraph | Scale, operational capability, cost |
| Ontology Store | AWS RDF Store (Neptune) | Oxigraph, Apache Jena | Ontology complexity, inference requirements |
| Vector DB | - | Milvus, Weaviate | Already built on EKS |
Neptune Analytics Advantages:
- Serverless graph analytics (no provisioning required)
- Millisecond query latency
- Gremlin, openCypher support
- Direct S3 data loading
- Cost: $1.08/vCPU/hr (on-demand), $0.10/Compute Unit per query
Neo4j Advantages:
- Mature ecosystem, rich plugins
- Complete control of EKS deployment
- Cypher query language standard
- APOC for advanced algorithms
KG-aware RAG Pattern
Vector Search + Graph Expansion
Traditional RAG selects context only by vector similarity, but KG-aware RAG leverages graph relationships to expand context.
Implementation Example
from kfs import KnowledgeFeatureStore
from ragas import evaluate
from ragas.metrics import faithfulness, context_recall
kfs = KnowledgeFeatureStore(...)
def kg_aware_rag(query: str) -> dict:
# 1. Question Embedding
query_embedding = embedding_model.encode(query)
# 2. Milvus top-k Vector Search
vector_results = kfs.vector_search(
embedding=query_embedding,
collection="documents",
top_k=20,
metric="COSINE"
)
# 3. Extract connected entities from each document
entities = []
for doc in vector_results:
# Identify entities mentioned in documents
doc_entities = kfs.extract_entities(doc.text)
entities.extend(doc_entities)
# 4. 1-hop expansion in Knowledge Graph
expanded_entities = kfs.graph_expand(
entities=entities,
depth=1,
relations=["HAS_CONTRACT", "USES_DEVICE", "RECORDED_USAGE"]
)
# 5. Re-rank by distance between expanded entities and question
scored_contexts = []
for doc in vector_results:
# Document score = vector similarity + graph distance weight
vector_score = doc.score
entity_distance = kfs.min_distance(
doc.entities,
query_entities
)
graph_score = 1 / (1 + entity_distance) # Inverse distance
final_score = 0.7 * vector_score + 0.3 * graph_score
scored_contexts.append((doc, final_score))
# 6. Select top-5 contexts
final_contexts = sorted(
scored_contexts,
key=lambda x: x[1],
reverse=True
)[:5]
return {
"contexts": [doc.text for doc, score in final_contexts],
"entities": expanded_entities,
"provenance": [doc.metadata for doc, score in final_contexts]
}
# 7. Evaluation with Ragas
result = kg_aware_rag("Recent usage patterns of Premium grade users")
eval_dataset = {
"question": ["Recent usage patterns of Premium grade users"],
"contexts": [result["contexts"]],
"answer": [llm.generate(result["contexts"])],
"ground_truth": ["Premium customers average 150GB monthly..."]
}
ragas_result = evaluate(
eval_dataset,
metrics=[faithfulness, context_recall]
)
print(ragas_result)
Expected Improvements
| Metric | Vector-only RAG | KG-aware RAG | Improvement |
|---|---|---|---|
| Faithfulness | 0.72 | 0.89 | +24% |
| Context Recall | 0.68 | 0.85 | +25% |
| Answer Relevancy | 0.81 | 0.87 | +7% |
| Hallucination Rate | 18% | 7% | -61% |
Improvement Mechanism:
- Remove irrelevant contexts via graph relationships → Increase Precision
- Supplement missing entities with 1-hop expansion → Increase Recall
- Clarify provenance with tracking → Increase Faithfulness
Write Path and Consistency Model
CDC-based Event Flow
Knowledge Feature Store detects changes in source database in real-time and propagates them to Feature Plane, Knowledge Plane, and Retrieval Plane.
Offline Batch vs Online Stream
| Characteristic | Offline Batch | Online Stream | Hybrid |
|---|---|---|---|
| Latency | Hourly (Glue/EMR) | Seconds (Kinesis) | Batch → Online |
| Accuracy | 100% (Full recomputation) | 99%+ (Incremental update) | Periodic accuracy calibration |
| Cost | Low | High | Medium |
| Use Case | Historical data loading | Real-time recommendation | Production standard |
Eventual Consistency Model
Knowledge Feature Store adopts Eventual Consistency. The 3 planes may not update simultaneously but eventually reach a consistent state.
# Ensure point-in-time consistency
result = kfs.retrieve(
query="...",
consistency_mode="point_in_time",
timestamp="2026-04-18T10:30:00Z"
)
# This query:
# 1. Feature Plane: Returns only features before timestamp
# 2. Knowledge Plane: Traverses only relationships before timestamp
# 3. Retrieval Plane: Searches only documents indexed before timestamp
# → All 3 planes aligned to same point in time
Write Pipeline Example
from kafka import KafkaConsumer
import json
def kfs_materializer():
consumer = KafkaConsumer(
'customer-events',
bootstrap_servers=['kafka.svc.cluster.local:9092'],
value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)
for message in consumer:
event = message.value
# 1. Update Feature Plane
feast_client.push(
feature_view="customer_features",
entity_rows=[{
"customer_id": event["customer_id"],
"churn_risk_score": event["churn_risk"],
"event_timestamp": event["timestamp"]
}]
)
# 2. Update Knowledge Graph
if event["type"] == "CONTRACT_CREATED":
neptune_client.execute(f"""
MATCH (c:Customer {{id: '{event["customer_id"]}'}})
CREATE (c)-[:HAS_CONTRACT]->
(contract:Contract {{
id: '{event["contract_id"]}',
start_date: '{event["start_date"]}'
}})
""")
# 3. Update Vector DB (when documents change)
if event["type"] == "DOCUMENT_UPDATED":
embedding = embedding_model.encode(event["content"])
milvus_client.insert(
collection_name="documents",
data={
"id": event["doc_id"],
"embedding": embedding.tolist(),
"metadata": event["metadata"],
"timestamp": event["timestamp"]
}
)
# 4. Record provenance
provenance_store.record(
entity_id=event["customer_id"],
source_system="app-db",
source_table="customers",
change_type=event["type"],
timestamp=event["timestamp"]
)
Governance, Security, and Roadmap
Row/Attribute-level Authorization
Knowledge Feature Store performs access control at both entity level and attribute level.
# Role-based Access Control
kfs_config = {
"access_control": {
"roles": {
"data_scientist": {
"entities": ["Customer", "Usage"],
"attributes": {
"Customer": ["id", "grade", "churn_risk"],
"Usage": ["*"] # All attributes
},
"relations": ["HAS_CONTRACT", "RECORDED_USAGE"]
},
"compliance_officer": {
"entities": ["Customer", "Contract"],
"attributes": {
"Customer": ["*"],
"Contract": ["*"]
},
"relations": ["*"],
"provenance": True # Provenance read permission
},
"external_analyst": {
"entities": ["Usage"],
"attributes": {
"Usage": ["device_type", "usage_gb"] # Exclude PII
},
"pii_masking": True
}
}
}
}
# Verify role on query execution
result = kfs.retrieve(
query="...",
role="external_analyst"
)
# → Customer.name, Customer.ssn etc. automatically masked
PII Masking On-Read
Sensitive information is masked at read time, minimizing data copies.
# Attribute-level Masking
masking_rules = {
"Customer": {
"ssn": lambda x: f"{x[:3]}-**-****",
"phone": lambda x: f"{x[:3]}-****-{x[-4:]}",
"email": lambda x: f"{x.split('@')[0][:2]}***@{x.split('@')[1]}"
}
}
# Automatically applied to query results
masked_result = kfs.retrieve(
query="...",
masking_rules=masking_rules,
audit_log=True # Audit log for masking application
)
Lineage (OpenLineage)
Knowledge Feature Store follows the OpenLineage standard to track data lineage.
{
"eventType": "COMPLETE",
"eventTime": "2026-04-18T10:30:00.000Z",
"run": {
"runId": "abc-123-def"
},
"job": {
"namespace": "kfs",
"name": "materialize_customer_features"
},
"inputs": [
{
"namespace": "postgres",
"name": "app_db.customers",
"facets": {
"schema": {...},
"dataSource": {
"name": "postgres://prod-db:5432/app"
}
}
}
],
"outputs": [
{
"namespace": "feast",
"name": "customer_features",
"facets": {
"schema": {...}
}
},
{
"namespace": "neptune",
"name": "Customer",
"facets": {
"schema": {...}
}
}
]
}
Audit Log
Records all read/write operations in audit log.
# Automatically record audit log
kfs.retrieve(
query="...",
audit_context={
"user": "data-scientist@company.com",
"purpose": "churn prediction model",
"ticket": "JIRA-1234"
}
)
# Recorded in CloudWatch Logs:
# {
# "timestamp": "2026-04-18T10:30:00Z",
# "user": "data-scientist@company.com",
# "action": "retrieve",
# "entities": ["Customer", "Contract"],
# "features": ["churn_risk_score", "usage_last_30d"],
# "purpose": "churn prediction model",
# "ticket": "JIRA-1234",
# "pii_accessed": false,
# "masking_applied": false
# }
Pilot Roadmap
| Phase | Duration | Goal | Key Actions |
|---|---|---|---|
| Phase 0 | 2 weeks | Schema Design | Draft domain ontology, define entities & relationships |
| Phase 1 | 4 weeks | Read API | Integrate Milvus + Neptune, develop unified query API |
| Phase 2 | 6 weeks | Write Pipeline | Build Debezium CDC → Kafka → Materializer |
| Phase 3 | 4 weeks | Governance | RBAC, PII masking, OpenLineage integration |
| Phase 4 | 2 weeks | Evaluation | Evaluate Ragas KG-aware RAG, establish metric baseline |
Phase 0 Schema Draft Scope:
- 4 Core Entities: Customer, Contract, Device, Usage
- 6 relationships: HAS_CONTRACT, USES_DEVICE, RECORDED_USAGE, BEFORE, AFTER, RELATED_TO
- 10 attributes: customer_grade, churn_risk, contract_type, device_model, usage_gb, ...
- 1 SKOS scheme: CustomerGradeScheme (Premium, VIP, Standard, ...)
Conclusion
Knowledge Feature Store integrates Ontology and Knowledge Graph with the traditional Feature Store's scalar/vector feature provisioning capability to achieve the following:
- Reduced Hallucinations: Explicitly models Entity Relations to prevent LLMs from connecting unrelated information
- Provenance Tracking: Enables tracing answer sources through provenance chain to meet compliance requirements
- Domain Entity Utilization: Defines domain terminology and hierarchy through Ontology to improve LLM domain understanding
- KG-aware RAG: Combines vector search with graph expansion to improve Faithfulness +24%, Context Recall +25%
The Phase 0 schema draft will be reviewed in the 2026-Q2 Ontology session to finalize pilot scope.