Embedded Linux development network device driver

Keywords: network Linux socket Mac

0. Preface

It's been a while since I left my job. The epidemic hasn't passed yet. If I can't go out, I still don't go out. I feel like I've been living a long time every day. I haven't found a job yet. I'm just looking for LDD3 recently. I transplanted the AR8031 driver from my previous job. I just took advantage of this time to record and summarize. After all, good memory is not as good as bad writing.

1.Linux network device driver structure

The architecture of Linux network device driver is divided into four layers from top to bottom:

  • Network protocol interface layer

It provides a unified interface for receiving and sending data packets to the network layer protocol. It sends data through the dev_queue_xmit() function and receives data through the netif_rx() function. Make upper layer protocol independent of specific equipment

  • Network equipment interface layer

It provides the protocol interface layer with a uniform structure, net ﹣ device, which is used to describe the properties and operations of the specific network devices. The structure is the container of the functions in the device driver function layer.

  • Device driver function layer providing actual function

Each function of this layer is a specific member of the net device data structure of the network device interface layer. It is a program for the network device hardware to complete the corresponding actions. The sending operation is started by the function of ndo_start_xmit(), and the receiving operation is triggered by the interrupt on the network device.

  • Network device and media layer (for Linux system, both network device and media can be virtual)

The physical entity to send and receive packets, including network adapter and specific transmission media, which is physically driven by functions in device driver function layer.

  • Network protocol interface layer

The main function of network protocol interface layer is to provide transparent data packet sending and receiving interface for upper layer protocol. When the upper ARP or IP needs to send data packets, it will call the dev ﹣ queue ﹣ xmit() function of the network protocol interface layer to send data packets. At the same time, it needs to pass a pointer to the struct sk ﹣ buff data interface to the function. The upper layer also receives data packets by passing a pointer of strcut SK Bu data structure to the nentif_rx() function.
When sending packets, the network processing module of Linux kernel must establish a sk ﹐ buff containing the packets to be transmitted, and then pass sk ﹐ buff to the lower layer. Each layer adds different protocol headers in SK ﹐ buff until it is sent to the network device. Similarly, when a network device receives a packet from a network medium, it must transfer the received data to the SK ﹣ buff data structure and transfer it to the upper layer. Each layer strips off the corresponding protocol header until it is handed over to the user.

  • Network equipment interface layer

Define a unified and abstract data structure for the ever-changing network devices.
The net device structure refers to a network device in the kernel, which is defined in the include/linux/netdevice.h file. The network device driver only needs to fill in the specific members of the net device and register the net device to realize the connection between the hardware operation function and the kernel.

2. Network device driver

  • Equipment registration

The registration method of network device driver in its module initialization function is different from that of character driver. Because there is no equivalent to primary device number and secondary device number for network interface, network driver does not need to request such device number. For each newly detected interface, the driver inserts a data connection into the global network device list.
Each interface is described by a net device structure. Because it is related to other structures, it must be allocated dynamically. The kernel function used to perform the allocation is alloc ﹣ netdev ﹣ MQS.

struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
		unsigned char name_assign_type,
		void (*setup)(struct net_device *),
		unsigned int txqs, unsigned int rxqs)
  1. Sizeof priv is the size of the "private data" area of the driver, which is allocated to the network devices together with the net device structure.
  2. Name is the name of the interface, which is visible in user space.
  3. Step is an initialization function to set the rest of the net device structure.
  4. txqs is the number of TX subqueues allocated.
  5. rxqs is the number of RX subqueues allocated.

The return value of the function must be checked to determine that the assignment completed successfully.
Once the net UU device structure is initialized, the rest of the work is to pass the structure to the register UU netdev function.

ret = register_netdev(ndev);
if (ret)
	goto failed_register;

When you call the register? Netdev function, you can call the driver to operate the device. So you have to register after initializing everything.

static const struct net_device_ops fec_netdev_ops = {
	.ndo_open		= fec_enet_open,
	.ndo_stop		= fec_enet_close,
	.ndo_start_xmit		= fec_enet_start_xmit,
	.ndo_set_rx_mode	= set_multicast_list,
	.ndo_validate_addr	= eth_validate_addr,
	.ndo_tx_timeout		= fec_timeout,
	.ndo_set_mac_address	= fec_set_mac_address,
	.ndo_do_ioctl		= fec_enet_ioctl,
#ifdef CONFIG_NET_POLL_CONTROLLER
	.ndo_poll_controller	= fec_poll_controller,
#endif
	.ndo_set_features	= fec_set_features,
};

static int fec_enet_init(struct net_device *ndev)
{
	struct fec_enet_private *fep = netdev_priv(ndev);
	struct bufdesc *cbd_base;
	dma_addr_t bd_dma;
	int bd_size;
	unsigned int i;
	unsigned dsize = fep->bufdesc_ex ? sizeof(struct bufdesc_ex) :
			sizeof(struct bufdesc);
	unsigned dsize_log2 = __fls(dsize);
	int ret;

	WARN_ON(dsize != (1 << dsize_log2));
#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
	fep->rx_align = 0xf;
	fep->tx_align = 0xf;
#else
	fep->rx_align = 0x3;
	fep->tx_align = 0x3;
#endif

	/* Check mask of the streaming and coherent API */
	ret = dma_set_mask_and_coherent(&fep->pdev->dev, DMA_BIT_MASK(32));
	if (ret < 0) {
		dev_warn(&fep->pdev->dev, "No suitable DMA available\n");
		return ret;
	}

	fec_enet_alloc_queue(ndev);

	bd_size = (fep->total_tx_ring_size + fep->total_rx_ring_size) * dsize;

	/* Allocate memory for buffer descriptors. */
	cbd_base = dmam_alloc_coherent(&fep->pdev->dev, bd_size, &bd_dma,
				       GFP_KERNEL);
	if (!cbd_base) {
		return -ENOMEM;
	}

	/* Get the Ethernet address */
	fec_get_mac(ndev);
	/* make sure MAC we just acquired is programmed into the hw */
	fec_set_mac_address(ndev, NULL);

	/* Set receive and transmit descriptor base. */
	for (i = 0; i < fep->num_rx_queues; i++) {
		struct fec_enet_priv_rx_q *rxq = fep->rx_queue[i];
		unsigned size = dsize * rxq->bd.ring_size;

		rxq->bd.qid = i;
		rxq->bd.base = cbd_base;
		rxq->bd.cur = cbd_base;
		rxq->bd.dma = bd_dma;
		rxq->bd.dsize = dsize;
		rxq->bd.dsize_log2 = dsize_log2;
		rxq->bd.reg_desc_active = fep->hwp + offset_des_active_rxq[i];
		bd_dma += size;
		cbd_base = (struct bufdesc *)(((void *)cbd_base) + size);
		rxq->bd.last = (struct bufdesc *)(((void *)cbd_base) - dsize);
	}

	for (i = 0; i < fep->num_tx_queues; i++) {
		struct fec_enet_priv_tx_q *txq = fep->tx_queue[i];
		unsigned size = dsize * txq->bd.ring_size;

		txq->bd.qid = i;
		txq->bd.base = cbd_base;
		txq->bd.cur = cbd_base;
		txq->bd.dma = bd_dma;
		txq->bd.dsize = dsize;
		txq->bd.dsize_log2 = dsize_log2;
		txq->bd.reg_desc_active = fep->hwp + offset_des_active_txq[i];
		bd_dma += size;
		cbd_base = (struct bufdesc *)(((void *)cbd_base) + size);
		txq->bd.last = (struct bufdesc *)(((void *)cbd_base) - dsize);
	}


	/* The FEC Ethernet specific entries in the device structure */
	ndev->watchdog_timeo = TX_TIMEOUT;
	ndev->netdev_ops = &fec_netdev_ops;
	ndev->ethtool_ops = &fec_enet_ethtool_ops;

	writel(FEC_RX_DISABLED_IMASK, fep->hwp + FEC_IMASK);
	netif_napi_add(ndev, &fep->napi, fec_enet_rx_napi, NAPI_POLL_WEIGHT);

	if (fep->quirks & FEC_QUIRK_HAS_VLAN)
		/* enable hw VLAN support */
		ndev->features |= NETIF_F_HW_VLAN_CTAG_RX;

	if (fep->quirks & FEC_QUIRK_HAS_CSUM) {
		ndev->gso_max_segs = FEC_MAX_TSO_SEGS;

		/* enable hw accelerator */
		ndev->features |= (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM
				| NETIF_F_RXCSUM | NETIF_F_SG | NETIF_F_TSO);
		fep->csum_flags |= FLAG_RX_CSUM_ENABLED;
	}

	if (fep->quirks & FEC_QUIRK_HAS_AVB) {
		fep->tx_align = 0;
		fep->rx_align = 0x3f;
	}

	ndev->hw_features = ndev->features;

	fec_restart(ndev);

	if (fep->quirks & FEC_QUIRK_MIB_CLEAR)
		fec_enet_clear_ethtool_stats(ndev);
	else
		fec_enet_update_ethtool_stats(ndev);

	return 0;
}
static int
fec_probe(struct platform_device *pdev)
{

	...
	
	/*Assign network devices*/
	ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
				  FEC_STATS_SIZE, num_tx_qs, num_rx_qs);
	if (!ndev)
		return -ENOMEM;

	SET_NETDEV_DEV(ndev, &pdev->dev);
	
	...
	
	ret = fec_enet_init(ndev);
	if (ret)
		goto failed_init;
		
	...
	
	/*Network device registration*/
	ret = register_netdev(ndev);
	if (ret)
		goto failed_register;

	device_init_wakeup(&ndev->dev, fep->wol_flag &
			   FEC_WOL_HAS_MAGIC_PACKET);
			   
	...
	
	return ret;
}
  • Module uninstallation

The unregister_netdev function removes the interface from the system, and the free_netdev function returns the net_device structure to the system.
The internal purge function can only be executed after the device has been unregistered. It must be executed before the net device structure is returned to the system. Once the free net device is called, the device or private data area can no longer be referenced.

static int
fec_drv_remove(struct platform_device *pdev)
{
	struct net_device *ndev = platform_get_drvdata(pdev);
	struct fec_enet_private *fep = netdev_priv(ndev);
	struct device_node *np = pdev->dev.of_node;
	int ret;

	ret = pm_runtime_get_sync(&pdev->dev);
	if (ret < 0)
		return ret;

	cancel_work_sync(&fep->tx_timeout_work);
	fec_ptp_stop(pdev);
	unregister_netdev(ndev);
	fec_enet_mii_remove(fep);
	if (fep->reg_phy)
		regulator_disable(fep->reg_phy);

	if (of_phy_is_fixed_link(np))
		of_phy_deregister_fixed_link(np);
	of_node_put(fep->phy_node);
	free_netdev(ndev);

	clk_disable_unprepare(fep->clk_ahb);
	clk_disable_unprepare(fep->clk_ipg);
	pm_runtime_put_noidle(&pdev->dev);
	pm_runtime_disable(&pdev->dev);

	return 0;
}
  • On and off
  1. open function

Before an interface can transmit packets, the kernel must open the interface and give it an address.
Before the interface can communicate with the outside world, the hardware address (MAC) should be copied from the hardware device to dev - > dev_addr. The hardware address can be copied to the device during opening.
Once you are ready to start sending data, the open method should also start the interface's transmission queue (allowing the interface to accept transmission packets). The kernel provides the following functions to start the queue.

static inline void netif_tx_start_all_queues(struct net_device *dev)
{
	unsigned int i;

	for (i = 0; i < dev->num_tx_queues; i++) {
		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
		netif_tx_start_queue(txq);
	}
}

static __always_inline void netif_tx_start_queue(struct netdev_queue *dev_queue)
{
	clear_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state);
}
static int
fec_enet_open(struct net_device *ndev)
{
	struct fec_enet_private *fep = netdev_priv(ndev);
	int ret;
	bool reset_again;

	...

	ret = fec_enet_alloc_buffers(ndev);
	if (ret)
		goto err_enet_alloc;

	/* Initialize MAC */
	fec_restart(ndev);

	/* Probe and connect to PHY when open the interface */
	ret = fec_enet_mii_probe(ndev);
	if (ret)
		goto err_enet_mii_probe;

	/*Start transmission queue*/
	netif_tx_start_all_queues(ndev);

	device_set_wakeup_enable(&ndev->dev, fep->wol_flag &
				 FEC_WOL_FLAG_ENABLE);

	return 0;

err_enet_mii_probe:
	fec_enet_free_buffers(ndev);
err_enet_alloc:
	fec_enet_clk_enable(ndev, false);
clk_enable:
	pm_runtime_mark_last_busy(&fep->pdev->dev);
	pm_runtime_put_autosuspend(&fep->pdev->dev);
	pinctrl_pm_select_sleep_state(&fep->pdev->dev);
	return ret;
}
  1. close function

The close function is the inverse of the open function.

static __always_inline void netif_tx_stop_queue(struct netdev_queue *dev_queue)

It is the reverse operation of netif ﹣ TX ﹣ start ﹣ queue. It marks that the device cannot transmit other packets. This function must be called when the interface is closed, but it can also be used to temporarily stop transmission.

static inline void netif_tx_disable(struct net_device *dev)
{
	unsigned int i;
	int cpu;

	local_bh_disable();
	cpu = smp_processor_id();
	for (i = 0; i < dev->num_tx_queues; i++) {
		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);

		__netif_tx_lock(txq, cpu);
		netif_tx_stop_queue(txq);
		__netif_tx_unlock(txq);
	}
	local_bh_enable();
}
static __always_inline void netif_tx_stop_queue(struct netdev_queue *dev_queue)
{
	set_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state);
}
static int
fec_enet_close(struct net_device *ndev)
{
	struct fec_enet_private *fep = netdev_priv(ndev);

	phy_stop(ndev->phydev);

	if (netif_device_present(ndev)) {
		napi_disable(&fep->napi);
		netif_tx_disable(ndev);
		fec_stop(ndev);
	}

	phy_disconnect(ndev->phydev);

	if (fep->quirks & FEC_QUIRK_ERR006687)
		imx6q_cpuidle_fec_irqs_unused();

	fec_enet_update_ethtool_stats(ndev);

	fec_enet_clk_enable(ndev, false);
	pinctrl_pm_select_sleep_state(&fep->pdev->dev);
	pm_runtime_mark_last_busy(&fep->pdev->dev);
	pm_runtime_put_autosuspend(&fep->pdev->dev);

	fec_enet_free_buffers(ndev);

	return 0;
}
  • Packet transfer

To transmit a packet, the kernel calls the driver's ndo start Xmit to put the data into the outgoing queue. Each packet processed by the kernel is in a socket buffer structure (sk_buff). The input / output buffers of all sockets are linked lists formed by sk_buff structure.
The socket buffer passed to ndo_start_xmit contains the physical packets (in its media format) and has a full transport layer packet header. SKB - > data refers to the packets to be transmitted, and SKB - > len is the length in octet.

 static int
fec_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
	struct fec_enet_private *fep = netdev_priv(dev);
	struct bufdesc *bdp;
	void *bufaddr;
	unsigned short	status;
	unsigned long	estatus;
	unsigned long flags;

	...
	/*disable interrupt*/
	spin_lock_irqsave(&fep->hw_lock, flags);
	/*TX ring Entrance*/
	bdp = fep->cur_tx;

	status = bdp->cbd_sc;

	...

	/*Set buffer length and buffer pointer*/
	bufaddr = skb->data;
	bdp->cbd_datlen = skb->len;

	/*4 byte alignment*/
	if (((unsigned long) bufaddr) & FEC_ALIGNMENT) {
		unsigned int index;
		index = bdp - fep->tx_bd_base;
		memcpy(fep->tx_bounce[index], (void *)skb->data, skb->len);
		bufaddr = fep->tx_bounce[index];
	}

	...
	
	/* Save skb pointer */
	fep->tx_skbuff[fep->skb_cur] = skb;

	dev->stats.tx_bytes += skb->len;
	fep->skb_cur = (fep->skb_cur+1) & TX_RING_MOD_MASK;

	/* Establish streaming DMA mapping*/
	bdp->cbd_bufaddr = dma_map_single(&dev->dev, bufaddr,
			FEC_ENET_TX_FRSIZE, DMA_TO_DEVICE);

	/* Tell FEC that it is ready and interrupt after sending*/
	status |= (BD_ENET_TX_READY | BD_ENET_TX_INTR
			| BD_ENET_TX_LAST | BD_ENET_TX_TC);
	bdp->cbd_sc = status;

	/*Triggered transmission*/
	writel(0, fep->hwp + FEC_X_DES_ACTIVE);

	/* If this was the last BD in the ring, start at the beginning again. */
	if (status & BD_ENET_TX_WRAP)
		bdp = fep->tx_bd_base;
	else
		bdp++;

	if (bdp == fep->dirty_tx) {
		fep->tx_full = 1;
		/*Inform the network system that no other packet transfer can be started until the hardware can accept the new data*/
		netif_stop_queue(dev);
	}

	fep->cur_tx = bdp;

	spin_unlock_irqrestore(&fep->hw_lock, flags);

	return NETDEV_TX_OK;
}
static void
fec_enet_tx(struct net_device *dev)
{
	struct	fec_enet_private *fep;
	struct  fec_ptp_private *fpp;
	struct bufdesc *bdp;
	unsigned short status;
	unsigned long estatus;
	struct	sk_buff	*skb;

	fep = netdev_priv(dev);
	fpp = fep->ptp_priv;
	spin_lock(&fep->hw_lock);
	bdp = fep->dirty_tx;

	while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
		
		...
		
		/*Delete DMA map*/
		dma_unmap_single(&dev->dev, bdp->cbd_bufaddr, FEC_ENET_TX_FRSIZE, DMA_TO_DEVICE);
		bdp->cbd_bufaddr = 0;

		skb = fep->tx_skbuff[fep->skb_dirty];
		/*Check error*/
		if (status & (BD_ENET_TX_HB | BD_ENET_TX_LC |
				   BD_ENET_TX_RL | BD_ENET_TX_UN |
				   BD_ENET_TX_CSL)) {
			dev->stats.tx_errors++;
			if (status & BD_ENET_TX_HB)  /* No heartbeat */
				dev->stats.tx_heartbeat_errors++;
			if (status & BD_ENET_TX_LC)  /* Late collision */
				dev->stats.tx_window_errors++;
			if (status & BD_ENET_TX_RL)  /* Retrans limit */
				dev->stats.tx_aborted_errors++;
			if (status & BD_ENET_TX_UN)  /* Underrun */
				dev->stats.tx_fifo_errors++;
			if (status & BD_ENET_TX_CSL) /* Carrier lost */
				dev->stats.tx_carrier_errors++;
		} else {
			dev->stats.tx_packets++;
		}

		if (status & BD_ENET_TX_READY)
			printk("HEY! Enet xmit interrupt and TX_READY.\n");

		/* Deferred means some collisions occurred during transmit,
		 * but we eventually sent the packet OK.
		 */
		if (status & BD_ENET_TX_DEF)
			dev->stats.collisions++;

#if defined(CONFIG_ENHANCED_BD)
		if (fep->ptimer_present) {
			estatus = bdp->cbd_esc;
			if (estatus & BD_ENET_TX_TS)
				fec_ptp_store_txstamp(fpp, skb, bdp);
		}
#elif defined(CONFIG_IN_BAND)
		if (fep->ptimer_present) {
			if (status & BD_ENET_TX_PTP)
				fec_ptp_store_txstamp(fpp, skb, bdp);
		}
#endif

		/*Release the skb buffer*/
		dev_kfree_skb_any(skb);
		fep->tx_skbuff[fep->skb_dirty] = NULL;
		fep->skb_dirty = (fep->skb_dirty + 1) & TX_RING_MOD_MASK;

		/* Update pointer to next buffer descriptor to transfer*/
		if (status & BD_ENET_TX_WRAP)
			bdp = fep->tx_bd_base;
		else
			bdp++;

		/* Since we have freed up a buffer, the ring is no longer full
		 */
		if (fep->tx_full) {
			fep->tx_full = 0;
			if (netif_queue_stopped(dev))
				/*Concurrent transmission, notify the network system to start the transmission of packets again*/
				netif_wake_queue(dev);
		}
	}
	fep->dirty_tx = bdp;
	spin_unlock(&fep->hw_lock);
}
  • Packet reception

It is more complex to receive data from the network than to transmit data, because an sk_buff must be allocated in the atomic context and passed to the upper layer for processing. The network driver realizes two modes of receiving packets: interrupt driving mode and polling mode.
(1) . update its statistics counter to record each received packet, main members rx_packets, rx_bytes, etc.
(2) . configure a buffer to save packets. The allocation function of the buffer needs to know the data length.
(3) . once you have a valid sk_buff pointer, call memcpy to copy the packet data into the buffer.
(4) . before processing packets, the network layer must know some information about the packets. Therefore, it is necessary to correctly assign values to the dev and protocol members before passing the buffer to the upper layer.
(5) Finally, it is executed by netif_rx to pass the socket buffer to the upper software for processing.

 static void
fec_enet_rx(struct net_device *dev)
{
	struct	fec_enet_private *fep = netdev_priv(dev);
	struct  fec_ptp_private *fpp = fep->ptp_priv;
	struct bufdesc *bdp;
	unsigned short status;
	struct	sk_buff	*skb;
	ushort	pkt_len;
	__u8 *data;

#ifdef CONFIG_M532x
	flush_cache_all();
#endif

	spin_lock(&fep->hw_lock);

	/*Get information about incoming packets*/
	bdp = fep->cur_rx;

	while (!((status = bdp->cbd_sc) & BD_ENET_RX_EMPTY)) {

		...
		
		/*Check error*/
		#if 0
		if (status & (BD_ENET_RX_LG | BD_ENET_RX_SH | BD_ENET_RX_NO |
			   BD_ENET_RX_CR | BD_ENET_RX_OV)) {
		#else
		if (status & (BD_ENET_RX_LG | BD_ENET_RX_SH | BD_ENET_RX_NO | BD_ENET_RX_OV)) { 
		#endif
			dev->stats.rx_errors++;
			if (status & (BD_ENET_RX_LG | BD_ENET_RX_SH)) {
				/* Frame too long or too short. */
				dev->stats.rx_length_errors++;
			}
			if (status & BD_ENET_RX_NO)	/* Frame alignment */
				dev->stats.rx_frame_errors++;
			if (status & BD_ENET_RX_CR)	/* CRC Error */
				dev->stats.rx_crc_errors++;
			if (status & BD_ENET_RX_OV)	/* FIFO overrun */
				dev->stats.rx_fifo_errors++;
		}

		/* Report late collisions as a frame error.
		 * On this error, the BD is closed, but we don't know what we
		 * have in the buffer.  So, just drop this frame on the floor.
		 */
		if (status & BD_ENET_RX_CL) {
			dev->stats.rx_errors++;
			dev->stats.rx_frame_errors++;
			goto rx_processing_done;
		}

		/*Update statistics counter, total octet transferred, address of data*/
		dev->stats.rx_packets++;
		pkt_len = bdp->cbd_datlen;
		dev->stats.rx_bytes += pkt_len;
		data = (__u8*)__va(bdp->cbd_bufaddr);

	        dma_unmap_single(NULL, bdp->cbd_bufaddr, bdp->cbd_datlen,
        			DMA_FROM_DEVICE);
#ifdef CONFIG_ARCH_MXS
		swap_buffer(data, pkt_len);
#endif
		/* Allocate a buffer to hold packets, 16 byte aligned, packet length including FCS, but we do not want to include FCS when passing the upper layer*/
		skb = dev_alloc_skb(pkt_len - 4 + NET_IP_ALIGN);

		if (unlikely(!skb)) {
			printk("%s: Memory squeeze, dropping packet.\n",
					dev->name);
			dev->stats.rx_dropped++;
		} else {
			/*Add data and tail. The header space can be reserved before the buffer is filled. Most Ethernet interfaces retain two bytes before the packet, so that the IP header can be aligned on the 16 byte boundary after the 14 byte Ethernet header*/
			skb_reserve(skb, NET_IP_ALIGN);
			/*Update the tail and len members in the sk_buff structure to add data at the end of the buffer.*/
			skb_put(skb, pkt_len - 4);	/* Make room */
			/*Copy packet data into buffer*/
			skb_copy_to_linear_data(skb, data, pkt_len - 4);
			/* 1588 messeage TS handle */
			if (fep->ptimer_present)
				fec_ptp_store_rxstamp(fpp, skb, bdp);
			/*Determine the protocol ID of the package*/
			skb->protocol = eth_type_trans(skb, dev);
			/*Pass socket buffer to upper software for processing*/
			netif_rx(skb);
		}

        	bdp->cbd_bufaddr = dma_map_single(NULL, data, bdp->cbd_datlen,
			DMA_FROM_DEVICE);
rx_processing_done:
		/* Clear the status flags for this buffer */
		status &= ~BD_ENET_RX_STATS;

		/* Mark the buffer empty */
		status |= BD_ENET_RX_EMPTY;
		bdp->cbd_sc = status;
#ifdef CONFIG_ENHANCED_BD
		bdp->cbd_esc = BD_ENET_RX_INT;
		bdp->cbd_prot = 0;
		bdp->cbd_bdu = 0;
#endif

		/* Update BD pointer to next entry */
		if (status & BD_ENET_RX_WRAP)
			bdp = fep->rx_bd_base;
		else
			bdp++;
		/* Doing this here will keep the FEC running while we process
		 * incoming frames.  On a heavily loaded network, we should be
		 * able to keep up at the expense of system resources.
		 */
		writel(0, fep->hwp + FEC_R_DES_ACTIVE);
	}
	fep->cur_rx = bdp;

	spin_unlock(&fep->hw_lock);
}
  • Interrupt handling

Most hardware interfaces are controlled by interrupt handling routines. The interface triggers an interrupt under two possible events: the arrival of a new packet, or the transfer of an outgoing packet is complete. The network interface can also generate the generation of interrupt notification error, connection status, etc.
The interrupt routine checks the status register in the physical device to distinguish the arrival interrupt of new packets from the completion interrupt of data transmission.
For the "transfer completed" processing, the statistics are updated, and the socket buffer that is no longer used is returned to the system by calling dev_kfree_skb_any. Finally, if the driver temporarily terminates the transmission queue, restart the transmission queue using netif? Wake? Queue (see packet transmission).
In contrast to transmission, packet reception does not require any special terminal processing.

static irqreturn_t
fec_enet_interrupt(int irq, void * dev_id)
{
	struct	net_device *dev = dev_id;
	struct fec_enet_private *fep = netdev_priv(dev);
	struct fec_ptp_private *fpp = fep->ptp_priv;
	uint	int_events;
	irqreturn_t ret = IRQ_NONE;

	do {
		int_events = readl(fep->hwp + FEC_IEVENT);
		writel(int_events, fep->hwp + FEC_IEVENT);

		if (int_events & FEC_ENET_RXF) {
			ret = IRQ_HANDLED;
			fec_enet_rx(dev);
		}

		/*Transfer complete, or non fatal error. Update buffer descriptor*/
		if (int_events & FEC_ENET_TXF) {
			ret = IRQ_HANDLED;
			fec_enet_tx(dev);
		}

		if (int_events & FEC_ENET_TS_TIMER) {
			ret = IRQ_HANDLED;
			if (fep->ptimer_present && fpp)
				fpp->prtc++;
		}

		if (int_events & FEC_ENET_MII) {
			ret = IRQ_HANDLED;
			complete(&fep->mdio_done);
		}
	} while (int_events);

	return ret;
}
  • Do not use receive interrupt

When the interface receives every packet, the processor will be interrupted, which will degrade the system performance. In order to improve the performance of Linux on the broadband system, another interface based on polling is provided, which is called NAPI.
A NAPI interface must be able to hold multiple packets. The interface can not interrupt the received packets, but can also interrupt the transmission and other time.
During initialization, the weight and poll members of NaPi struct must be set, and the poll member must be set as the polling function of the driver.

 struct napi_struct {
	...
	
	int			weight;
	...
	int			(*poll)(struct napi_struct *, int);
	
	...

};
static int fec_enet_init(struct net_device *ndev)
{
	struct fec_enet_private *fep = netdev_priv(ndev);
	struct bufdesc *cbd_base;
	dma_addr_t bd_dma;
	int bd_size;
	unsigned int i;
	unsigned dsize = fep->bufdesc_ex ? sizeof(struct bufdesc_ex) :
			sizeof(struct bufdesc);
	unsigned dsize_log2 = __fls(dsize);
	int ret;
	
	...

	writel(FEC_RX_DISABLED_IMASK, fep->hwp + FEC_IMASK);
	/*Set the weight and poll members of NaPi struct*/
	netif_napi_add(ndev, &fep->napi, fec_enet_rx_napi, NAPI_POLL_WEIGHT);
	
	...

	return 0;
}

The next step in creating the NAPI driver is to modify the interrupt handling routine. When the interface notifies the packet of arrival, the interrupt program cannot process the packet. It wants to disable receiving interrupts and tell the kernel to start the polling interface from now on.

static irqreturn_t
fec_enet_interrupt(int irq, void *dev_id)
{
	struct net_device *ndev = dev_id;
	struct fec_enet_private *fep = netdev_priv(ndev);
	uint int_events;
	irqreturn_t ret = IRQ_NONE;

	int_events = readl(fep->hwp + FEC_IEVENT);
	writel(int_events, fep->hwp + FEC_IEVENT);
	fec_enet_collect_events(fep, int_events);

	if ((fep->work_tx || fep->work_rx) && fep->link) {
		ret = IRQ_HANDLED;

		/*Check if napi can schedule*/
		if (napi_schedule_prep(&fep->napi)) {
			/*disable interrupt*/
			writel(FEC_NAPI_IMASK, fep->hwp + FEC_IMASK);
			/*Dispatch receive function, call poll function*/
			__napi_schedule(&fep->napi);
		}
	}

	if (int_events & FEC_ENET_MII) {
		ret = IRQ_HANDLED;
		complete(&fep->mdio_done);
	}
	return ret;
}

When the interface notifies the packet, the interrupt handling routine is separated from the interface, calling the __napi_schedule function, and then calling the poll function.

static int fec_enet_rx_napi(struct napi_struct *napi, int budget)
{
	struct net_device *ndev = napi->dev;
	struct fec_enet_private *fep = netdev_priv(ndev);
	int pkts;

	/*Receive packets*/
	pkts = fec_enet_rx(ndev, budget);

	fec_enet_tx(ndev);

	if (pkts < budget) {
		/*If all packets are processed, call NaPi? Complete? Done to close the polling function and reopen the interrupt*/
		napi_complete_done(napi, pkts);
		writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK);
	}
	return pkts;
}
  • Socket buffer

sk_buff structure is very important. It is defined in the include/linux/skbuff.h file, which means "socket buffer". It is used to transfer data between layers in the Linux network subsystem.
When sending packets, the network processing module of the Linux kernel must assign a sk ﹐ buff containing the packets to be transmitted, and then pass the SK ﹐ buff to the lower layer. Each layer adds different protocol headers in the SK ﹐ buff until it is sent to the network device. Similarly, when a network device receives a packet from a network medium, it must copy the received data to the SK ﹣ buff data structure and transfer it to the upper layer, and each layer strips off the corresponding protocol header until it is handed over to the user.

struct sk_buff {
	union {
		struct {
			/* These two members must be first. */
			struct sk_buff		*next;
			struct sk_buff		*prev;

			union {
				struct net_device	*dev;   /*Devices that receive and send this buffer*/
				/* Some protocols might use this space to store information,
				 * while device pointer would be NULL.
				 * UDP receive path is one user.
				 */
				unsigned long		dev_scratch;
			};
		};
		struct rb_node		rbnode; /* used in netem, ip4 defrag, and tcp stack */
		struct list_head	list;
	};

	union {
		struct sock		*sk;
		int			ip_defrag_offset;
	};

	union {
		ktime_t		tstamp;
		u64		skb_mstamp_ns; /* earliest departure time */
	};
	/*
	 * This is the control buffer. It is free to use for every
	 * layer. Please put your private variables there. If you
	 * want to keep them across layers you have to do a skb_clone()
	 * first. This is owned by whoever has the skb queued ATM.
	 */
	char			cb[48] __aligned(8);

	union {
		struct {
			unsigned long	_skb_refdst;
			void		(*destructor)(struct sk_buff *skb);
		};
		struct list_head	tcp_tsorted_anchor;
	};

#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
	unsigned long		 _nfct;
#endif
	/*len It is the length of all data in the data package, and data "len" is the length of the data segment that is divided and stored.*/
	unsigned int		len,
				data_len;
	__u16			mac_len,
				hdr_len;

	/* Following fields are _not_ copied in __copy_skb_header()
	 * Note that queue_mapping is here mostly to fill a hole.
	 */
	__u16			queue_mapping;

/* if you move cloned around you also must adapt those constants */
#ifdef __BIG_ENDIAN_BITFIELD
#define CLONED_MASK	(1 << 7)
#else
#define CLONED_MASK	1
#endif
#define CLONED_OFFSET()		offsetof(struct sk_buff, __cloned_offset)

	__u8			__cloned_offset[0];
	__u8			cloned:1,
				nohdr:1,
				fclone:2,
				peeked:1,
				head_frag:1,
				pfmemalloc:1;
#ifdef CONFIG_SKB_EXTENSIONS
	__u8			active_extensions;
#endif
	/* fields enclosed in headers_start/headers_end are copied
	 * using a single memcpy() in __copy_skb_header()
	 */
	/* private: */
	__u32			headers_start[0];
	/* public: */

/* if you move pkt_type around you also must adapt those constants */
#ifdef __BIG_ENDIAN_BITFIELD
#define PKT_TYPE_MAX	(7 << 5)
#else
#define PKT_TYPE_MAX	7
#endif
#define PKT_TYPE_OFFSET()	offsetof(struct sk_buff, __pkt_type_offset)

	__u8			__pkt_type_offset[0];
	__u8			pkt_type:3;    /*The type of packet used in the send process. It is the driver's responsibility to set it to either pack? Host, pack? Otherhost, pack? Broadcast, or pack? Multicast.*/
	__u8			ignore_df:1;
	__u8			nf_trace:1;
	__u8			ip_summed:2;   /*For the verification policy of packets, the driver sets the incoming packets.*/
	__u8			ooo_okay:1;

	__u8			l4_hash:1;
	__u8			sw_hash:1;
	__u8			wifi_acked_valid:1;
	__u8			wifi_acked:1;
	__u8			no_fcs:1;
	/* Indicates the inner headers are valid in the skbuff. */
	__u8			encapsulation:1;
	__u8			encap_hdr_csum:1;
	__u8			csum_valid:1;

#ifdef __BIG_ENDIAN_BITFIELD
#define PKT_VLAN_PRESENT_BIT	7
#else
#define PKT_VLAN_PRESENT_BIT	0
#endif
#define PKT_VLAN_PRESENT_OFFSET()	offsetof(struct sk_buff, __pkt_vlan_present_offset)
	__u8			__pkt_vlan_present_offset[0];
	__u8			vlan_present:1;
	__u8			csum_complete_sw:1;
	__u8			csum_level:2;
	__u8			csum_not_inet:1;
	__u8			dst_pending_confirm:1;
#ifdef CONFIG_IPV6_NDISC_NODETYPE
	__u8			ndisc_nodetype:2;
#endif

	__u8			ipvs_property:1;
	__u8			inner_protocol_type:1;
	__u8			remcsum_offload:1;
#ifdef CONFIG_NET_SWITCHDEV
	__u8			offload_fwd_mark:1;
	__u8			offload_l3_fwd_mark:1;
#endif
#ifdef CONFIG_NET_CLS_ACT
	__u8			tc_skip_classify:1;
	__u8			tc_at_ingress:1;
	__u8			tc_redirected:1;
	__u8			tc_from_ingress:1;
#endif
#ifdef CONFIG_TLS_DEVICE
	__u8			decrypted:1;
#endif

#ifdef CONFIG_NET_SCHED
	__u16			tc_index;	/* traffic control index */
#endif

	union {
		__wsum		csum;
		struct {
			__u16	csum_start;
			__u16	csum_offset;
		};
	};
	__u32			priority;
	int			skb_iif;
	__u32			hash;
	__be16			vlan_proto;
	__u16			vlan_tci;
#if defined(CONFIG_NET_RX_BUSY_POLL) || defined(CONFIG_XPS)
	union {
		unsigned int	napi_id;
		unsigned int	sender_cpu;
	};
#endif
#ifdef CONFIG_NETWORK_SECMARK
	__u32		secmark;
#endif

	union {
		__u32		mark;
		__u32		reserved_tailroom;
	};

	union {
		__be16		inner_protocol;
		__u8		inner_ipproto;
	};

	__u16			inner_transport_header;
	__u16			inner_network_header;
	__u16			inner_mac_header;

	/*The message header of each layer in the packet, the transport header is the transmission layer header, the network header is the network layer header, and the MAC header is the link layer header*/
	__be16			protocol;
	__u16			transport_header;
	__u16			network_header;
	__u16			mac_header;

	/* private: */
	__u32			headers_end[0];
	/* public: */

	/* These elements must be at the end, see alloc_skb() for details.  */
	/*The pointer to the data in the packet, the head to the beginning of the allocated space, and the data is the beginning of the effective octet (usually larger than the head),
	 *tail Is the end of a valid octet, and end points to the maximum address that tail can reach.
	 *It can also be concluded from these pointers that the available buffer space is SKB - > end - SKB - > head, and the currently used data space is SKB - > tail - SKB - > data.
	 */
	sk_buff_data_t		tail;
	sk_buff_data_t		end;
	unsigned char		*head,
				*data;
	unsigned int		truesize;
	refcount_t		users;

#ifdef CONFIG_SKB_EXTENSIONS
	/* only useable after checking ->active_extensions != 0 */
	struct skb_ext		*extensions;
#endif
};



reference material:
LINUX device driver version 3
linux-5.4.9
linux-2.6.35.3

Published 11 original articles, won praise and 3438 visitors
Private letter follow

Posted by davidlenehan on Thu, 12 Mar 2020 03:41:52 -0700