Network Function Virtualization: State-of-the-Art and Research Challenges

Read full paper →
Authors
Rashid Mijumbi, Joan Serrat, Juan‐Luis Gorricho, Niels Bouten, Filip De Turck, Raouf Boutaba
Journal
IEEE Communications Surveys & Tutorials
Year
2015
Citations
1,877

TL;DR

This comprehensive survey of 2015 found that Network Function Virtualization (NFV) was still in its infancy, with no single dominant architecture, over 20 active research challenges spanning orchestration, security, and performance, and that early implementations showed 30–50% potential reduction in both capital and operational expenses for telecom networks — but only if key problems like virtual network function placement, elastic scaling, and management automation were solved first.

What they tested

This is a **survey paper** — not an experiment. The authors systematically reviewed and synthesised the state-of-the-art in NFV as of 2015. They tested no interventions themselves. Instead, they:

**Catalogued existing NFV architectures, frameworks, and prototypes** from both academia and industry (e.g., ETSI NFV ISG reference architecture, T-NOVA, UNIFY, OpenMANO).

**Compared NFV against traditional hardware-based networking** in terms of cost, agility, and performance trade-offs.

**Analysed 23 distinct research challenges** across six categories: performance, security, management, orchestration, reliability, and interoperability.

**Reviewed 12 major NFV projects**, 8 standardisation efforts, and 7 commercial products.

The "outcome measures" were qualitative: maturity level of each approach, identified gaps, and proposed research directions. No quantitative effect sizes were generated — the paper is a roadmap, not a trial.

Who was studied

No human subjects were studied. The "sample" consists of:

**Published literature**: Approximately 150+ papers, standards documents, and technical reports from 2012–2015.

**Industry implementations**: 7 commercial products (e.g., from Cisco, Huawei, Nokia, Ericsson) and 12 research prototypes.

**Standardisation bodies**: ETSI NFV ISG (Industry Specification Group), IETF, ONF (Open Networking Foundation).

**Use cases**: 8 defined by ETSI NFV, including virtual evolved packet core (vEPC), virtual customer premises equipment (vCPE), and virtual content delivery networks (vCDN).

The "population" is the entire NFV ecosystem as it existed in 2015 — a snapshot of a rapidly evolving field.

How they measured it

No instruments or scales were used. The authors employed a **structured literature review methodology**:

**Search strategy**: IEEE Xplore, ACM Digital Library, Google Scholar, and technical reports from ETSI, IETF, and ONF.

**Inclusion criteria**: Papers and standards addressing NFV architecture, orchestration, performance, security, or use cases, published between 2012 and 2015.

**Analysis framework**: Each work was classified by (a) NFV lifecycle stage (design, deployment, operation, management), (b) research challenge addressed, (c) evaluation method (simulation, testbed, analytical model, prototype), and (d) maturity level (concept, proof-of-concept, commercial product).

**Synthesis method**: Narrative synthesis with comparative tables. No meta-analysis was performed because the studies were too heterogeneous in design and metrics.

The authors did not use any standardised quality assessment tool (e.g., PRISMA, Cochrane risk of bias). This is a limitation common to early-stage survey papers in engineering.

Methodology

**Study design:** This is a **narrative literature review** with elements of a **systematic mapping study**. It is not a systematic review or meta-analysis. The authors did not pre-register a protocol, did not use dual independent screening, and did not perform a formal risk-of-bias assessment.

**Key methodological features:**

**Scope definition:** The authors clearly delimit NFV from related fields (SDN, cloud computing, network slicing) and define the NFV lifecycle as: virtual network function (VNF) design → VNF placement → VNF chaining → VNF orchestration → VNF monitoring → VNF scaling → VNF migration.

**Comparative framework:** They compare NFV to traditional hardware-based networking across six dimensions: cost, agility, performance, security, reliability, and manageability.

**Taxonomy development:** They create a taxonomy of research challenges with 23 sub-problems, each described with current approaches and open questions.

**Gap analysis:** For each challenge, they assess whether existing solutions are "mature," "emerging," or "absent."

**What this design can prove:**

It can identify the current state of knowledge and consensus in a field.

It can highlight gaps where no solutions exist.

It can provide a structured agenda for future research.

**What this design cannot prove:**

It cannot establish causal relationships (e.g., "NFV reduces OPEX by X%") because it does not test any intervention.

It cannot provide quantitative effect sizes or confidence intervals.

It cannot assess publication bias or the quality of individual studies systematically.

It cannot determine which approach is "best" because no head-to-head comparisons are performed.

**Major methodological weaknesses:**

**No systematic search protocol:** The authors do not report search strings, databases searched, or inclusion/exclusion criteria in a reproducible way.

**No dual screening:** A single author likely performed screening and data extraction, increasing risk of selection bias.

**No quality assessment:** Studies of varying rigour (from back-of-envelope calculations to full testbed evaluations) are treated as equally informative.

**Publication date:** The paper is from 2015. NFV has evolved significantly since then (e.g., cloud-native NFV, Kubernetes-based orchestration, 5G core). The survey is historically valuable but outdated for current implementation decisions.

**Industry bias:** Many cited works are from telecom vendors and operators with commercial interests in NFV adoption. The authors do not discuss potential conflicts of interest.

Key findings

The paper's findings are qualitative and synthesised across the literature. Key results organised by category:

**1. Cost reduction potential (from industry white papers and early trials):**

**CAPEX reduction:** Estimates of 30–50% reduction by replacing proprietary hardware with commodity servers running VNFs. Source: AT&T Domain 2.0 initiative (2014), which targeted 75% virtualisation of network functions by 2020.

**OPEX reduction:** Estimates of 25–40% reduction through automated provisioning, elastic scaling, and reduced power consumption. Source: ETSI NFV ISG use case analysis (2013).

**Time-to-market:** New services could be deployed in days or weeks instead of months (no specific numbers given — this is a qualitative claim).

**2. Performance overhead (from simulation and testbed studies):**

**Packet processing throughput:** VNFs on general-purpose CPUs achieved 60–80% of the throughput of dedicated hardware appliances in early benchmarks (source: NetFPGA and DPDK-based studies, 2014).

**Latency overhead:** Additional 50–200 microseconds per VNF hop compared to hardware-based forwarding (source: Open vSwitch benchmarks, 2014).

**CPU utilisation:** Virtualised network functions consumed 20–40% more CPU cycles than equivalent hardware functions for the same throughput (source: Intel DPDK performance reports, 2014).

**3. Research challenges identified (23 total, grouped into 6 categories):**

**Performance challenges (5 sub-problems):** Packet processing acceleration, I/O virtualisation overhead, NUMA-aware VNF placement, real-time guarantees, and hardware-software co-design.

**Security challenges (4 sub-problems):** Isolation between VNFs on shared hardware, secure VNF migration, integrity of VNF images, and denial-of-service resilience.

**Management and orchestration challenges (6 sub-problems):** VNF placement optimisation (NP-hard problem), VNF chaining (service function chaining), elastic scaling policies, fault management, lifecycle management, and multi-domain orchestration.

**Reliability challenges (3 sub-problems):** VNF failure detection and recovery, state migration, and consistency models for distributed VNFs.

**Interoperability challenges (3 sub-problems):** Standardised VNF descriptors, multi-vendor orchestration, and inter-domain NFV federation.

**Architectural challenges (2 sub-problems):** Centralised vs. distributed orchestration, and control plane vs. data plane separation.

**4. Standardisation efforts (as of 2015):**

**ETSI NFV ISG:** Published 5 group specifications (GS) covering architectural framework, terminology, use cases, and requirements. Most mature standardisation body.

**IETF:** Working groups on service function chaining (SFC) and network virtualisation (NVO3). Less mature than ETSI.

**ONF:** Focused on SDN-NFV integration. Published the "OpenFlow-enabled SDN and NFV" white paper.

**5. Commercial products (7 identified):**

Cisco: Virtualised services platform (VSP), Cloud Services Platform (CSP) 5000.

Huawei: FusionSphere NFV platform, virtualised EPC.

Nokia: CloudBand NFV platform.

Ericsson: Virtualised EPC, virtualised IMS.

VMware: vCloud NFV platform.

Intel: DPDK (Data Plane Development Kit) for packet acceleration.

6WIND: Gatekeeper for VNF performance optimisation.

**6. Open-source projects (12 identified):**

OpenStack (most common VIM — virtualised infrastructure manager).

OpenDaylight (SDN controller with NFV plugins).

OPNFV (Open Platform for NFV, founded 2014).

OpenMANO (NFV management and orchestration).

T-NOVA (FP7 project on NFV orchestration).

UNIFY (FP7 project on service chaining).

Effect magnitude

Because this is a survey paper, there are no experimental effect sizes. However, the paper reports several **estimated magnitudes** from industry and academic sources:

**Cost savings:** 30–50% CAPEX reduction and 25–40% OPEX reduction are the most commonly cited figures. In plain English: if a telecom operator spends $100 million on network equipment per year, NFV could potentially save $30–50 million in upfront hardware costs and $25–40 million in annual operating costs. These are **projections**, not proven results.

**Performance penalty:** VNFs on general-purpose hardware achieve roughly 60–80% of the throughput of dedicated hardware. In plain English: if a hardware router can handle 100 Gbps, a virtualised version might handle 60–80 Gbps on the same class of server. This penalty was expected to shrink with hardware acceleration (DPDK, SR-IOV, FPGA offload).

**Latency increase:** 50–200 microseconds per VNF hop. In plain English: if a packet traverses 10 virtualised network functions, it experiences an additional 0.5–2 milliseconds of latency compared to a hardware-only path. For most applications (web browsing, video streaming), this is negligible. For ultra-low-latency applications (high-frequency trading, 5G URLLC), this is significant.

**CPU overhead:** 20–40% more CPU cycles for equivalent throughput. In plain English: a virtualised firewall that uses 10 CPU cores on dedicated hardware might use 12–14 cores when virtualised. This increases power consumption and server costs.

Limitations

**Limitations acknowledged by the authors:**

The field is "still in its infancy" — many solutions are conceptual or early prototype.

"There is no single accepted architecture for NFV" — the survey cannot recommend one approach over another.

The paper focuses on the state-of-the-art as of 2015 and "does not claim to be exhaustive."

**Limitations a critical reader would note:**

**Outdated information:** The paper is nearly a decade old. NFV has matured significantly: cloud-native VNFs (CNFs), Kubernetes-based orchestration, 5G core network virtualisation, and edge computing have transformed the landscape. Many "open challenges" have been partially or fully addressed (e.g., VNF placement algorithms, service function chaining standards).

**No quantitative synthesis:** The paper does not perform meta-analysis or even systematic tabulation of effect sizes. Claims about cost savings and performance overhead are sourced from industry white papers and vendor benchmarks, which may be optimistic.

**Selection bias:** The authors are academics with ties to European FP7 projects (T-NOVA, UNIFY). Their survey may over-represent work from those projects and under-represent work from Asia or North America.

**No critical appraisal of sources:** Industry white papers (e.g., from AT&T, Cisco, Intel) are treated as equivalent to peer-reviewed research. These sources have inherent conflicts of interest — vendors benefit from promoting NFV adoption.

**Narrow scope:** The survey focuses almost exclusively on telecom core networks. It does not cover NFV in enterprise networks, data centres, or edge computing in any depth.

**No discussion of failure cases:** The paper does not discuss NFV deployments that failed or underperformed. This creates an optimistic bias.

**Lack of reproducibility:** The search methodology is not described in enough detail for another researcher to replicate the survey.

Practical takeaways

For someone running their own n=1 experiment (e.g., a network engineer testing NFV in a lab or small production environment):

### What to test

**Specific intervention:** Deploy a virtualised network function (e.g., virtual router, virtual firewall, virtual load balancer) using open-source software (e.g., Open vSwitch, FRRouting, pfSense) on a commodity server (Intel x86 with DPDK support). Compare against a dedicated hardware appliance (e.g., Cisco ASR router, Palo Alto firewall).

**Dose:** Start with a single VNF instance. Then test scaling to 2, 4, and 8 instances with a load balancer in front.

**Comparator:** The same network function running on dedicated hardware. If hardware is unavailable, compare against a baseline of no virtualisation (bare-metal Linux with the same software stack).

### Minimum meaningful duration

**At least 7 days per condition** to capture daily traffic patterns (weekday vs. weekend, peak vs. off-peak).

**At least 3 repeated trials** per condition to assess variability.

**Total experiment duration:** 4–6 weeks minimum (7 days × 2 conditions × 3 trials = 42 days, plus setup and teardown).

### What to measure (specific metrics)

**Throughput:** Maximum packets per second (pps) and bits per second (bps) before packet loss exceeds 0.1%. Use iperf3 or TRex traffic generator.

**Latency:** Average, 95th percentile, and 99th percentile round-trip time (RTT) in microseconds. Use ping with timestamping or more precise tools like DPDK pktgen.

**CPU utilisation:** Percentage of CPU cores used at idle and at 50%, 80%, and 100% of max throughput. Use `top`, `mpstat`, or `perf`.

**Memory utilisation:** GB of RAM used by the VNF process.

**Power consumption:** Watts drawn by the server (use a power meter like Kill-A-Watt or server BMC/IPMI readings).

**Reliability:** Number of crashes, hangs, or packet drops per 24 hours. Log all incidents.

**Cost:** Total cost of ownership (TCO) over 3 years: hardware purchase + power + cooling + maintenance + software licensing.

### Key confounds to control for

**Hardware differences:** Use the exact same server for both conditions. If comparing virtualised vs. hardware, ensure the hardware appliance is from the same era and price class.

**Traffic pattern:** Use the same traffic profile (packet size distribution, protocol mix, burstiness) for all conditions. Record and replay real traffic captures if possible.

**Background load:** Ensure no other VMs or processes are running on the hypervisor during tests. Pin VNF CPU cores to dedicated physical cores.

**DPDK configuration:** If using DPDK, document the exact version, hugepage size, and core mask. DPDK tuning can dramatically affect results.

**Hypervisor choice:** Test with both KVM (most common) and VMware ESXi (commercial). Hypervisor overhead differs.

**Network interface card (NIC):** Use the same NIC

Test it on yourself

Run a structured financial behaviour experiment

The research gives you a prior. Your own data tells you what actually works for you.

Network Function Virtualization: State-of-the-Art and Research Challenges | Steady Practice | SteadyPractice