首页 > 代码库 > c#使用正则表达式抓取a标签的链接和innerhtml

c#使用正则表达式抓取a标签的链接和innerhtml

  //读取网页html            string text = File.ReadAllText(Environment.CurrentDirectory + "//test.txt", Encoding.GetEncoding("gb2312"));            string prttern = "<a(\\s+(href=http://www.mamicode.com/"(?<url>([^\"])*)\"|‘([^‘])*‘|\\w+=\"(([^\"])*)\"|‘([^‘])*‘))+>(?<text>(.*?))</a>";            var maths = Regex.Matches(text, prttern);            //抓取出来写入的文件            using (FileStream w = new FileStream(Environment.CurrentDirectory + "//wirter.txt", FileMode.Create))            {                for (int i = 0; i < maths.Count; i++)                {                    byte[] bs = Encoding.UTF8.GetBytes(string.Format("链接地址:{0},   innerhtml:{1}", maths[i].Groups["url"].Value,                        maths[i].Groups["text"].Value) + "\r\n");                    w.Write(bs, 0, bs.Length);                    Console.WriteLine();                }            }            Console.ReadKey();

图解正则

技术分享

 

c#使用正则表达式抓取a标签的链接和innerhtml