用C#过滤HTML代码的函数

2019-03-28 23:33:45浏览：431 来源：山村网

核心摘要：正好有时间所以用C#写了一段正则表达式,作用是删除 Page 里面Code 中的 HTML标签,这在做采集信息,消除其中的HTML很有用处。以下

正好有时间所以用C#写了一段正则表达式,作用是删除 Page 里面Code 中的 HTML标签,这在做采集信息,消除其中的HTML很有用处。

以下是引用片段：
publicstringcheckStr(stringhtml)
{
System.Text.Regularexpressions.Regexregex1=newSystem.Text.Regularexpressions.Regex(@"<script[sS]+</script*>",System.Text.Regularexpressions.RegexOptions.IgnoreCase);
System.Text.Regularexpressions.Regexregex2=newSystem.Text.Regularexpressions.Regex(@"href*=*[sS]*script*:",System.Text.Regularexpressions.RegexOptions.IgnoreCase);
System.Text.Regularexpressions.Regexregex3=newSystem.Text.Regularexpressions.Regex(@"no[sS]*=",System.Text.Regularexpressions.RegexOptions.IgnoreCase);
System.Text.Regularexpressions.Regexregex4=newSystem.Text.Regularexpressions.Regex(@"<iframe[sS]+</iframe*>",System.Text.Regularexpressions.RegexOptions.IgnoreCase);
System.Text.Regularexpressions.Regexregex5=newSystem.Text.Regularexpressions.Regex(@"<frameset[sS]+</frameset*>",System.Text.Regularexpressions.RegexOptions.IgnoreCase);
System.Text.Regularexpressions.Regexregex6=newSystem.Text.Regularexpressions.Regex(@"<img[^>]+>",System.Text.Regularexpressions.RegexOptions.IgnoreCase);　
System.Text.Regularexpressions.Regexregex7=newSystem.Text.Regularexpressions.Regex(@"</p>",System.Text.Regularexpressions.RegexOptions.IgnoreCase);
System.Text.Regularexpressions.Regexregex8=newSystem.Text.Regularexpressions.Regex(@"<p>",System.Text.Regularexpressions.RegexOptions.IgnoreCase);
System.Text.Regularexpressions.Regexregex9=newSystem.Text.Regularexpressions.Regex(@"<[^>]*>",System.Text.Regularexpressions.RegexOptions.IgnoreCase);
html=regex1.Replace(html,"");//过滤<script></script>标记
html=regex2.Replace(html,"");//过滤href=javascript:(<A>)属性
html=regex3.Replace(html,"_disibledevent=");//过滤其它控件的on...事件
html=regex4.Replace(html,"");//过滤iframe
html=regex5.Replace(html,"");//过滤frameset
html=regex6.Replace(html,"");//过滤frameset
html=regex7.Replace(html,"");//过滤frameset
html=regex8.Replace(html,"");//过滤frameset
html=regex9.Replace(html,"");
html=html.Replace("","");
html=html.Replace("</strong>","");
html=html.Replace("<strong>","");
returnhtml;
}

(责任编辑：豆豆)

打赏

免责声明

•: 本文仅代表作者个人观点，本站未对其内容进行核实，请读者仅做参考，如若文中涉及有违公德、触犯法律的内容，一经发现，立即删除，作者需自行承担相应责任。涉及到版权或其他问题，请及时联系我们 xfptx@outlook.com

• HTML相对路径和绝对路径	• 用Google Analytics跟踪404页面
• 一个生成html的新方法	• Frontpage排版中的三种回车符
• XML的十九个热点问题	• 为Html的Select加一个提示语和输入方法
• HTML元素:ol	• HTML元素:iframe
• 网页中的内部、外部与脚本链接	• 直接双击页面元素进行修改的HTML代码

用C#过滤HTML代码的函数

PHP分页、防止英文单词被截段、去除HTML代码

在Foxmail中直接查看HTML格式邮件